Skip to content

krishnamami/Distributed_ML_Sagemaker_Pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Distributed_ML_Sagemaker_Pipelines

An end-to-end machine learning pipeline built on AWS SageMaker Pipelines, designed to support parallel model development and batch scoring on distributed, containerized infrastructure.

Table of Contents

Overview

This project demonstrates the use of SageMaker Pipelines to operationalize a machine learning workflow that includes:

  • Feature engineering
  • Model training with XGBoost
  • Model evaluation based on MSE threshold
  • Conditional model registration
  • Offline batch scoring using SageMaker Batch Transform

Ideal for MLOps teams looking to streamline experimentation, ensure consistency in deployment workflows, and scale processing across compute instances.


Architecture :

image

  • Parameters:

image

⚙️ Pipeline Stages

Stage Description
Processing Executes preprocessing.py to clean and split data
Training Trains XGBoost model on training set
Evaluation Evaluates model against validation set using MSE
Register Model Saves model if MSE < threshold
Batch Transform Scores batch data using newly trained model

Take Aways:

With the learnings from this experiment, we successfully implemented parallel model development and scoring pipelines for four models—supporting both Purchase and Refinance scenarios in production.

image

▶️ How to Run

-->Clone the repo: git clone https://github.com/krishnamami/Distributed_ML_Sagemaker_Pipelines.git

-->pip install -r requirements.txt

-->python sage_maker_pipeline.py

Related Projects

Fine_Tuning_LLM

Markov_Chain_Attribution

Multi Agent Anamoly Detection

Author Krishna Goud

Head of Data Engineering & MLOps | Rocket LA LinkedIn

About

Scalable SageMaker pipeline reducing model training time by 40% for enterprise ML.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages