A powerful spam email classifier combining traditional machine learning and deep learning (BERT, DistilBERT) to detect spam with high accuracy. Includes a full pipeline from preprocessing to deployment with an interactive web interface.
Clone the repo and install dependencies:
git clone https://github.com/allmen/email-spam-classifier.git
cd spam-classifier
pip install -r requirements.txt
Download NLTK data:
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt')"
# Train all models (ML + DL)
python train_models.py
# Train only ML (faster)
python train_models.py --no-dl
python evaluate_models.py
python app.py
# Visit http://localhost:5000
train_models.py
: Train spam classifiersevaluate_models.py
: Evaluate performanceapp.py
: Flask web appnotebooks/
: Interactive notebookweb/templates/index.html
: UI page
- Traditional ML: Naive Bayes, SVM, Random Forest, Logistic Regression
- Deep Learning: BERT, DistilBERT
- Ensemble: Combines ML and DL for best results
Special thanks to the contributors of this project:
- Hugging Face for their pre-trained transformer models
- SpamAssassin for foundational spam detection datasets and techniques
Built with ❤️ by a passionate team for email security and AI learning.