Skip to content

urbanclimatefr/de-zoomcamp-2025-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Zoomcamp 2025 Project

Overview

This repository hosts the source code and documentation for the 2025 project of the Data Engineering Zoomcamp's project for 2025.

The README content is also accessible at https://github.com/urbanclimatefr/de-zoomcamp-2025-project


Goal

The objective of this project is to develop an end-to-end batch data pipeline that includes ingestion, processing, transformation, persistence, and visualization. Utilizing data from the Hong Kong Observatory, users can access hourly temperature data through a Looker Studio report.


Data Source

Hong Kong Observatory's API Web Service, which offers various APIs for collecting real-time weather data.


Data Collection

Rather than using pre-existing historical data, this project will gather data hourly and establish a batch processing pipeline to handle the data daily. The processed data will then be displayed in dashboards created using Google Cloud's Looker Studio.


Data Visualization

The culmination of this project is a Weather Report Looker report.

The initial page of the report displays summary statistics, allowing users to filter by Station and/or Date.

The first dashboard on this page indicates the number of records collected for the chosen station and date.

The second dashboard presents the lowest, average, and highest temperatures for the selected station and date.


image

The second page of the report illustrates the time series temperature data for the selected station and date range.


image

Flowchart


data flow


Data Pipelines


See data pipelines


Prerequisites

Before executing the pipeline, the necessary infrastructure must be provisioned.

Docker Container Creation

This involves building a local Kestra image and running Kestra and Postgres containers.

Kestra


Lessons Learned

  • Early decisions on data visualization help in defining the scope and the type of processing required for the pipeline. Starting from the desired end state and working backward is an effective strategy to maintain focus.

  • Documentation can be as time-consuming as the development process itself.