🦠 COVID-19 Data Pipeline End To End with Microsoft Azure

This project is a fully cloud-based data pipeline built using Microsoft's Azure ecosystem. It ingests COVID-19 data from Our World in Data, processes it through various layers of transformation, and stores the curated data in an Azure SQL Database for visualization in Power BI.

🧭 Architecture Overview

🧪 Pipeline Overview

Python API Our World in Data
          ↓

Azure Data Lake Gen2 (parquet)
          ↓

Synapse Serverless - VIEW bronze.covid19data
          ↓

Synapse Serverless - VIEW silver.tables (SQL transformations)
          ↓

Azure Data Factory - Loop + Copy Activity
          ↓

Azure SQL Database (gold.tables)
          ↓

Power BI - Direct connection to Azure SQL Database

🔄 Azure Data Factory Pipeline (ETL Flow)

Below is a screenshot of the ADF pipeline that orchestrates the data movement and transformation process:

This pipeline performs the following steps:

Lookup Views: Reads metadata about views to be processed.
Set Variable: Dynamically assigns view names.
ForEach Loop: Iterates over each view and runs the copy activity.
Copy Activity: Transfers data from Silver Layer to Gold Layer in Azure SQL.

✅ All activities are executed dynamically and succeeded during testing.

🔍 Why I Built This Project

The main motivation for this project was to gain hands-on experience building a real-world, end-to-end data pipeline in the cloud. I wanted to simulate an actual enterprise-grade data flow using modern tools, with special attention to performance, cost-efficiency, and maintainability.

COVID-19 data was chosen because it's publicly available, updated daily, and offers a rich variety of metrics across different countries and dates—perfect for testing time-series transformations, incremental loads, and business intelligence use cases.

📊 Why Our World in Data?

Our World in Data offers a comprehensive, reliable, and well-documented API for COVID-19 data. It's maintained by a reputable academic institution and includes detailed information about cases, deaths, testing, vaccinations, and more, covering all countries globally. This makes it an ideal data source for building a globally applicable analytics solution.

🛠️ Why Microsoft Azure?

I chose the Microsoft Azure ecosystem for the following reasons:

Hands-on Experience: Working with Azure services such as Synapse, Data Factory, and Data Lake allowed me to practice skills in high demand in the industry.
Cost-Effectiveness: Services like Synapse Serverless and Azure SQL Database offer great performance at a low cost, making them suitable for individual projects.
End-to-End Solution: Azure offers all components needed to build a complete pipeline—from ingestion and storage to transformation and visualization—within the same environment.
Real-World Relevance: The design and tools mirror those used in professional data engineering teams, making the project experience as close as possible to enterprise use cases.

🚀 Technologies Used

Python – For extracting data from the API and writing to Azure Data Lake Gen2
Azure Data Lake Gen2 – Raw and curated data storage in Parquet format
Azure Synapse Analytics (Serverless) – SQL views for bronze and silver layer transformations
Azure Data Factory – Orchestrating data movement and copy operations
Azure SQL Database – Gold layer tables for serving data to Power BI
Power BI – Reporting and data visualization

📁 Project Structure

.
├── er_diagram/
├── powerbi/
├── sql/
│   ├── Bronze/bronze_covid19data.sql
│   ├── Silver/
│	      ├── Dimensions/
│	      ├── Facts/
│   └── Gold/
│	      ├── Dimensions/
│	      ├── Facts/
├── src/
│    ├── azure_upload.py
├── main.py
└── README.md

📈 Dashboard

Power BI connects directly to the Azure SQL Database and visualizes trends such as annual cases, deaths, and vaccination progress in Canada over time.

🧠 Key Metrics Presented

Total Confirmed Cases & Deaths

Vaccination Progress (%)

Annual Case Trends

Covid-19 Tests

Hospitalization and ICU Rates Over Time

The interactive dashboard allows the user to filter and drill down into data per year, month, or date group, which facilitates deeper insights into the evolution of the pandemic in Canada.

🧱 Data Model Overview

To support the dashboard visuals, a dimensional data model was designed using the star schema approach. This consists of fact and dimension tables linked by primary and foreign keys, enabling optimized queries and performance in Power BI.

✅ Conclusion and Learnings

This project helped reinforce important concepts in data modeling, ETL, and interactive dashboard creation. By connecting Power BI to an Azure SQL Database, the entire flow from raw data ingestion to actionable insights was demonstrated.

Potential future improvements include:

Adding international comparisons (e.g., USA, BR).

Automating data refreshes via Azure Data Factory.

🏆 This end-to-end solution is a practical example of using modern cloud-based BI tools to deal with real-world public health challenges.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
er_diagram		er_diagram
power_bi		power_bi
sql		sql
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦠 COVID-19 Data Pipeline End To End with Microsoft Azure

🧭 Architecture Overview

🧪 Pipeline Overview

🔄 Azure Data Factory Pipeline (ETL Flow)

🔍 Why I Built This Project

📊 Why Our World in Data?

🛠️ Why Microsoft Azure?

🚀 Technologies Used

📁 Project Structure

📈 Dashboard

🧠 Key Metrics Presented

🧱 Data Model Overview

✅ Conclusion and Learnings

About

Uh oh!

Releases

Packages

Languages

License

augusto-rosa/covid19-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

🦠 COVID-19 Data Pipeline End To End with Microsoft Azure

🧭 Architecture Overview

🧪 Pipeline Overview

🔄 Azure Data Factory Pipeline (ETL Flow)

🔍 Why I Built This Project

📊 Why Our World in Data?

🛠️ Why Microsoft Azure?

🚀 Technologies Used

📁 Project Structure

📈 Dashboard

🧠 Key Metrics Presented

🧱 Data Model Overview

✅ Conclusion and Learnings

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages