Skip to content

Apfirebolt/spam_email_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Email Classifier

Python Scikit-learn pandas PyQt5

This application uses machine learning techniques to classify emails as spam or not spam. It leverages the Scikit-learn library for building the classification model, pandas for data manipulation, and PyQt5 for the graphical user interface.

Screenshots

The basic UI showing Email form.

Screenshot 1

The next screenshots show emails which are classified as spam and not spam.

Screenshot 2 Screenshot 3

Features

  • Train a spam classifier using a dataset of labeled emails
  • Evaluate the performance of the classifier
  • Classify new emails as spam or not spam
  • User-friendly GUI for easy interaction using PyQt5

Algorithms used

from sklearn.feature_extraction.text import CountVectorizer

CountVectorizer is a class from the sklearn.feature_extraction.text module in the scikit-learn library. It is used to convert a collection of text documents to a matrix of token counts. This is a crucial step in text preprocessing for machine learning models, as it transforms text data into numerical data that can be used by algorithms. Each row in the resulting matrix represents a document, and each column represents a unique word (token) from the entire corpus. The value in each cell indicates the count of the word in the corresponding document.

from sklearn.naive_bayes import MultinomialNB

MultinomialNB

Multinomial Naive Bayes (MultinomialNB) is a variant of the Naive Bayes algorithm that is particularly suited for classification with discrete features (e.g., word counts for text classification). It assumes that the features follow a multinomial distribution. This classifier is often used for document classification problems, where the frequency of each word is used as a feature for training.

Key characteristics:

  • Suitable for discrete data.
  • Commonly used in text classification and natural language processing (NLP).
  • Assumes that the features are conditionally independent given the class.

Parameters:

  • alpha: Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).

Methods:

  • fit(X, y): Fit the model according to the given training data.
  • predict(X): Perform classification on an array of test vectors X.
  • predict_proba(X): Return probability estimates for the test vector X.
  • score(X, y): Returns the mean accuracy on the given test data and labels.

Installation

To install the required dependencies, run:

pip install -r requirements.txt

Usage

To start the application, run:

python app.py

License

This project is licensed under the MIT License.

About

An Email classifier using CountVectorizer and Naive Bayes strategy. PyQt5 is used for GUI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages