BanglaNewsClassification
A Bangla news-classification experiment using a 408K-article dataset, classical ML pipelines, TF-IDF features, and ensemble models.
Problem
Bangla NLP systems need strong baselines and large-scale experiments, especially for classification workflows where data imbalance and feature representation affect accuracy.
Solution
I built preprocessing and training pipelines with TF-IDF, SMOTE, Random Forest, SVM, and evaluation routines to compare model performance.
Impact
Reached 98.27% accuracy with Random Forest.
Worked with a large 408K-row Bangla dataset.
Created a reusable classical-ML baseline for Bangla text classification.
Highlights
408K Bangla news articles
98.27% Random Forest accuracy
TF-IDF feature pipeline
SMOTE imbalance handling
Architecture
Python preprocessing scripts
Scikit-learn model training
TF-IDF vectorization
SMOTE balancing
Comparative model evaluation
Next Steps
Add transformer baselines for comparison.
Publish a cleaned methodology page with dataset notes and confusion matrices.
Want to build something similar?
I can help turn operational chaos into a shipped product.