An end-to-end machine learning pipeline for predicting customer churn using a Kaggle telecom dataset. This project includes data loading, cleaning, preprocessing, feature encoding, and model training with hyperparameter tuning using RandomizedSearchCV. Evaluation is performed using classification metrics and AUC score.
This project is an end-to-end machine learning pipeline to predict customer churn based on a telecom dataset from Kaggle. It includes steps from data loading to model training and evaluation using scikit-learn.
- Source: Customer Churn Analysis Dataset
- The dataset contains customer information such as contract type, tenure, payment method, and service usage.
- The target variable is
Churn
(Yes/No), indicating whether a customer left the company.
β
Load and explore real-world-like telecom data
β
Clean missing values and convert data types
β
Encode categorical variables using OneHotEncoding
β
Scale numeric features with StandardScaler
β
Use Random Forest classifier
β
Tune hyperparameters with RandomizedSearchCV
β
Evaluate performance using classification report and ROC AUC
- Python 3
- pandas
- scikit-learn
- numpy
- Jupyter / Google Colab