This project aims to create a rich dataset by merging performance metrics (xG from Understat) with market data (betting odds from football-data.co.uk). The combined dataset is used to analyze team performance, explore the relationship between performance and market expectations, and identify potential market inefficiencies.
- Performance Data:
worldfootballR
package (sourcing from Understat.com). - Market Data: CSV files from
football-data.co.uk
.
The project follows a standard R project structure:
/data/raw
: Contains the raw, untouched data files./data/processed
: Contains the final, cleaned, and merged datasets (.rds
and.csv
)./R
: Contains the R scripts for the analysis, numbered in order of execution./outputs
: Contains saved plots and other analytical outputs.
This project uses the renv
package for dependency management, ensuring full reproducibility.
- Clone this repository to your local machine.
- Open the
epl_xg_odds_analysis.Rproj
file in RStudio. - The
renv
package should automatically prompt you to restore the project library. Typerenv::restore()
in the console if it doesn't. - Run the scripts in the
/R
folder in numerical order:01_data_acquisition.R
: Fetches the Understat data.02_data_processing.R
: Cleans and merges all data sources.03_initial_analysis.R
: Performs the exploratory analysis and generates plots.