This project focuses on Hyperparameter Optimization for a CatDog Classification model, leveraging powerful tools like Aim, MLflow, DVC, CML, Hydra, and Optuna. It demonstrates how to streamline experiment tracking, model tuning, and reporting with modern ML tools.
This repository applies hyperparameter optimization techniques. It uses a sample dataset of 100 images per class from the Chest X-Ray Pneumonia dataset, and integrates cutting-edge ML tools to automate experiment tracking, model training, and reporting.
Aim is an open-source tool designed for experiment tracking and visualization. It provides fast and efficient tracking for large-scale experiments, with an interactive web-based dashboard to compare runs.
- Lightweight and scalable experiment tracking.
- Rich visualizations for metrics and hyperparameter tuning.
- Easy to integrate with frameworks like PyTorch and TensorFlow.
- Start Aim dashboard:
aim up
- Run an experiment and track metrics:
aim run python src/train.py experiment=<experiment_name>
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, including experimentation, model packaging, and deployment.
- Logs parameters, metrics, and artifacts.
- Provides a model registry to store and version models.
- Powerful UI for visualizing and comparing multiple runs.
- Start the MLflow UI:
mlflow ui
- Run an experiment and log metrics:
mlflow run . --experiment-name "<experiment_name>"
DVC (Data Version Control) is a version control system tailored for machine learning projects, handling datasets and models.
- Manages datasets and models with Git-like version control.
- Integrates with cloud storage like Google Drive, AWS S3, and Azure.
- Ensures reproducibility across experiments by tracking datasets and models.
- Install dependencies and pull the dataset:
UV_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cpu uv sync
- Pull the data from DVC:
uv run dvc pull
CML (Continuous Machine Learning) integrates machine learning workflows into GitHub Actions or GitLab CI, automating the entire pipeline.
- Automates ML pipelines within CI/CD systems.
- Generates reports for each experiment.
- Supports model evaluation and metrics reporting directly in GitHub/GitLab.
- Create reports and post results automatically during CI runs:
uv run python scripts/generate_results.py
Hydra is a framework for managing complex configurations in machine learning experiments. It simplifies running multiple configurations and allows you to easily override settings on the command line.
- Dynamic configuration management for complex projects.
- Supports multirun for running multiple experiments simultaneously.
- Easily override configuration values from the command line.
- Run a training script with configuration overrides:
python src/train.py experiment=<experiment_name> +trainer.fast_dev_run=True
- Run multiple experiments with different configurations:
python src/train.py --multirun experiment=<experiment_name> model.embed_dim=16,32,64 +trainer.log_every_n_steps=5
Optuna is an open-source hyperparameter optimization framework designed to efficiently search hyperparameter spaces for the best configurations.
- Efficient hyperparameter search using techniques like Bayesian optimization.
- Parallelization for faster search.
- Visualizations to help analyze search results.
- Run a hyperparameter search with Optuna:
python src/train.py --multirun hparam=catdog_vit_hparam +trainer.log_every_n_steps=5 hydra.sweeper.n_jobs=4
Parameter | Value | |
---|---|---|
0 | name | optuna |
1 | model.drop_path_rate | 0.2089407556793585 |
2 | model.head_init_scale | 0.9292090024255693 |
3 | model.lr | 0.00023458293902856107 |
4 | model.weight_decay | 0.0005513192559352005 |
5 | data.batch_size | 64 |
6 | augmentation.horizontal_flip | True |
7 | augmentation.random_crop | True |
8 | augmentation.random_rotation | True |
9 | regularization.gradient_clipping | 0.15371010694861154 |
10 | best_value | 1.0 |
base_model | num_classes | pretrained | lr | optimizer | weight_decay | scheduler_patience | scheduler_factor | drop_path_rate | head_init_scale | horizontal_flip | random_crop | random_rotation | val_acc | val_loss | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | convnext_nano | 2 | False | 0.0003 | AdamW | 0.0010 | 3 | 0.1000 | 0.2257 | 0.6574 | True | True | True | 0.8000 | 0.5406 |
1 | convnext_nano | 2 | False | 0.0002 | AdamW | 0.0006 | 3 | 0.1000 | 0.2089 | 0.9292 | True | True | True | 0.6000 | 0.5929 |
2 | convnext_nano | 2 | False | 0.0005 | AdamW | 0.0007 | 3 | 0.1000 | 0.0723 | 1.0152 | True | True | True | 0.7000 | 0.6401 |
3 | convnext_nano | 2 | False | 0.0004 | AdamW | 0.0000 | 3 | 0.1000 | 0.2775 | 1.7625 | True | True | True | 0.8000 | 0.5590 |
4 | convnext_nano | 2 | False | 0.0002 | AdamW | 0.0002 | 3 | 0.1000 | 0.1194 | 1.6070 | True | True | True | 0.8000 | 0.4880 |
5 | convnext_nano | 2 | False | 0.0003 | AdamW | 0.0009 | 3 | 0.1000 | 0.2148 | 0.5226 | True | True | True | 0.8000 | 0.4855 |
6 | convnext_nano | 2 | False | 0.0009 | AdamW | 0.0005 | 3 | 0.1000 | 0.1251 | 1.5220 | True | True | True | 0.7000 | 0.6624 |
7 | convnext_nano | 2 | False | 0.0009 | AdamW | 0.0003 | 3 | 0.1000 | 0.0952 | 1.1222 | True | True | True | 0.6000 | 0.7140 |
8 | convnext_nano | 2 | False | 0.0001 | AdamW | 0.0004 | 3 | 0.1000 | 0.0881 | 1.4465 | True | True | True | 0.7000 | 0.5756 |
9 | convnext_nano | 2 | False | 0.0006 | AdamW | 0.0007 | 3 | 0.1000 | 0.0287 | 1.8280 | True | True | True | 0.7000 | 0.4451 |