Skip to content

An end-to-end machine learning project that predicts automotive parts shortages for a major automotive manufacturing and distribution company. This project showcases data preprocessing, exploratory analysis, feature engineering, and robust predictive modeling (including XGBoost) to drive actionable insights under a strict NDA.

Notifications You must be signed in to change notification settings

toofanCodes/predictingStockouts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Repository Name: AutoParts-ML-Insights


ML Analysis for Automotive Manufacturing & Parts Distribution

This repository contains a Jupyter Notebook that demonstrates an end-to-end machine learning analysis for a leading multi-billion dollar automotive manufacturing and parts distribution company. Due to confidentiality agreements, specific client details have been anonymized.

Table of Contents

Overview

This project provides a comprehensive machine learning workflow applied to automotive manufacturing and parts distribution data. The analysis includes data cleaning, exploratory data analysis (EDA), feature engineering, model training, and evaluation. The insights generated are aimed at driving informed decision-making in operations and business strategies.

Project Details

  • Domain: Automotive Manufacturing & Parts Distribution
  • Client: A multi-billion dollar company (name confidential under NDA)
  • Scope: The analysis covers data preprocessing, in-depth exploratory data analysis, model building using advanced ML algorithms, and thorough performance evaluation.
  • Objective: To uncover actionable insights and predictive patterns to optimize operations and mitigate parts shortages.

Iterative Approach & Lessons Learned

The project followed a highly iterative and experimental approach:

  • KNN Imputation:

    • Goal: To fill missing values.
    • Outcome: Although conceptually sound, this approach was computationally intensive and ultimately dropped.
  • Decision Tree Analysis:

    • Goal: To understand relationships between key entities (PDCs and Desks) and evaluate baseline performance.
    • Outcome: Revealed differences in performance across subsets and low recall in some categories.
  • Class Weighting in Decision Trees:

    • Goal: To improve recall by penalizing misclassifications.
    • Outcome: Did not yield the desired improvements during training.
  • Clustering and Segmented Modeling:

    • Goal: To identify distinct subsets and apply tailored models (Decision Trees, Random Forests) for enhanced accuracy and precision.
    • Outcome: Although promising, this approach did not produce significantly better results.
  • XGBoost Implementation:

    • Goal: To build a robust model with superior predictive performance.
    • Outcome: Delivered improved results. A comprehensive hold-out analysis was performed across various scenarios (single/multiple Desks and PDCs, skewed subsets) to ensure metrics were robust and consistent.

These experiments not only refined the model selection process but also deepened the understanding of data nuances within a high-stakes, industrial context.

Features

  • Data Preprocessing:
    Comprehensive cleaning and transformation of raw data, including handling missing values and checking for imbalances.

  • Exploratory Data Analysis (EDA):
    In-depth statistical analysis and visualizations to uncover data distributions, trends, and correlations.

  • Feature Engineering:
    Development and selection of impactful features to capture underlying data patterns.

  • Model Training & Evaluation:
    Iterative experimentation with multiple models—ranging from Decision Trees to XGBoost—along with rigorous hold-out testing.

  • Robustness Checks:
    Detailed hold-out analysis across various scenarios ensuring consistency in model performance.

  • Visualization:
    Use of plots and graphs to effectively communicate insights and model evaluations.

Getting Started

Prerequisites

  • Python 3.7+
  • Jupyter Notebook or JupyterLab

Installation

  1. Clone the repository:

    git clone https://github.com/toofanCodes/AutoParts-ML-Insights.git
    cd AutoParts-ML-Insights
  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate   # On Windows: venv\Scripts\activate
  3. Install the required packages:

    pip install -r requirements.txt

    Note: If a requirements.txt file is not provided, install the necessary libraries manually (e.g., pandas, numpy, scikit-learn, matplotlib, seaborn).

Notebook Structure

  • Data Loading & Preprocessing:
    Steps to import data, clean it, and handle missing values.

  • Exploratory Data Analysis (EDA):
    Visual and statistical exploration of data to understand distributions and relationships.

  • Feature Engineering:
    Techniques to generate and select features crucial for the analysis.

  • Model Development:
    Iterative development of models—starting with Decision Trees, progressing through class weighting and clustering, and ultimately implementing XGBoost.

  • Model Evaluation:
    Robust evaluation of model performance through multiple hold-out tests and analysis across different data subsets.

Environment and Dependencies

The analysis was conducted using Python with the following libraries:

  • pandas
  • numpy
  • scikit-learn
  • matplotlib
  • seaborn
  • Jupyter Notebook

Ensure your environment has these libraries to replicate the analysis.

Confidentiality Notice

This project was developed under a Non-Disclosure Agreement (NDA) for a high-profile client in the automotive sector. As a result, specific details such as the client’s name and certain sensitive data have been redacted or anonymized. All analysis presented here is for demonstration purposes only.

License

This project is distributed under the MIT License. See the LICENSE file for more details.

Contact

For any questions or further information, please contact:

About

An end-to-end machine learning project that predicts automotive parts shortages for a major automotive manufacturing and distribution company. This project showcases data preprocessing, exploratory analysis, feature engineering, and robust predictive modeling (including XGBoost) to drive actionable insights under a strict NDA.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published