This is a simple data cleansing project using MySQL and Python. It based on https://www.youtube.com/watch?v=4UltKCnnnTA&t=168s.
The goal is to load raw layoffs data into a database, apply SQL-based transformations, and produce a cleaned dataset.
- Start the environment/database
docker compose up -d
This will start a MySQL container and run init.sql to create DB schema. You can connect to the database using your local MySQL Workbench or through Docker’s Admirer UI (on localhost).
- Load raw data
python3 src/app.py
This script uses pandas to load the raw data in batches into the MySQL database.
- Clean the data
src/data-cleaning.sql
These queries demonstrate the cleansing process.
.
├── data # raw data
│ └── layoffs.csv
├── db-scripts # initialization SQL scripts (executed on container startup)
│ └── init.sql
├── docker-compose.yml # Docker setup for MySQL
├── README.md
└── src
├── app.py # Python script to load raw data into MySQL
├── data_cleansing.sql # SQL transformations for data cleansing
└── utils # Python helpers
└── path.py
Data Ingestion: Python + pandas loads CSV/raw files into MySQL. Data Cleansing: SQL queries apply transformations, handle duplicates, and standardize values. Outcome: A clean dataset ready for further analysis or reporting.