A complete end-to-end machine learning regression project using the California Housing Dataset, focusing on predicting median house values based on multiple features. This project was developed as a task for the Machine Learning Internship at Arch Technologies.
This project demonstrates:
- Data preprocessing & visualization
- Building regression models:
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor
- Cross-validation evaluation
- Hyperparameter tuning using:
- GridSearchCV
- RandomizedSearchCV
- Final model testing and RMSE evaluation
- Source: Built-in California housing dataset from
sklearn.datasets
- Features:
- MedInc: Median income in block group
- HouseAge: Median house age
- AveRooms: Average number of rooms
- AveBedrms: Average number of bedrooms
- Population: Block group population
- AveOccup: Average number of household members
- Latitude & Longitude
- Target:
- MedHouseVal: Median house value
- Python 3.x
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-learn
- SciPy
- Visualized distributions of features using histograms
- Heatmap correlation matrix to identify feature relationships
- Scatter plot between
MedInc
andMedHouseVal
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor
- Root Mean Squared Error (RMSE) using 10-fold Cross-Validation
Abdul Rafay
π BS Software Engineering | π― AI & ML Enthusiast
π LinkedIn
This repository is licensed under the MIT License.
If you found this helpful:
- β Star the repo
- π΄ Fork it and contribute
- π’ Share on LinkedIn and tag me!
π Accurate detection. π― Precise segmentation. π Built with passion.