AHC.

BanglaNewsClassification

A Bangla news-classification experiment using a 408K-article dataset, classical ML pipelines, TF-IDF features, and ensemble models.

2025Yearproject timeline
PythonStackprimary category
NoFeaturedportfolio selection
0LinksResearch project
Back to projects Private build

Problem

Bangla NLP systems need strong baselines and large-scale experiments, especially for classification workflows where data imbalance and feature representation affect accuracy.

Solution

I built preprocessing and training pipelines with TF-IDF, SMOTE, Random Forest, SVM, and evaluation routines to compare model performance.

Impact

Reached 98.27% accuracy with Random Forest.

Worked with a large 408K-row Bangla dataset.

Created a reusable classical-ML baseline for Bangla text classification.

Highlights

408K Bangla news articles

98.27% Random Forest accuracy

TF-IDF feature pipeline

SMOTE imbalance handling

Architecture

Python preprocessing scripts

Scikit-learn model training

TF-IDF vectorization

SMOTE balancing

Comparative model evaluation

Next Steps

Add transformer baselines for comparison.

Publish a cleaned methodology page with dataset notes and confusion matrices.

Want to build something similar?

I can help turn operational chaos into a shipped product.

Start a conversation