This project performs Exploratory Data Analysis (EDA) on a Premier League dataset to uncover insights about team performance, player stats, match outcomes, and more. The analysis helps identify trends, outliers, and relationships across seasons.
- About the Project
- Dataset
- Objectives
- Technologies Used
- Key Insights
- Visualizations
- Future Work
- Author
The Premier League is one of the most watched football leagues in the world. This project uses EDA techniques to:
- Understand player and team performance
- Analyze match statistics
- Discover season-level patterns
- Explore correlations and trends
The dataset used contains information such as:
- Match results (Home/Away goals, winners)
- Team stats (Possession, Shots, Fouls, etc.)
- Player performance (Goals, Assists, Cards)
- Seasonal summaries
Dataset Source: https://drive.google.com/file/d/1jB20GritU6nWHU2zNYsWM2DuEJIzz8FI/view?usp=sharing
- Clean and preprocess the data
- Generate descriptive statistics
- Identify key trends and patterns
- Visualize relationships using plots and graphs
- Draw conclusions to support decision-making in football analytics
- Python 🐍
- Pandas 📊
- NumPy 🔢
- Matplotlib 📈
- Seaborn 🌊
- Google Colab 📓
After performing thorough EDA and visualizing various aspects of the Premier League dataset, here are the major insights obtained:
- Strong positive correlation between shots on target and goals scored, indicating shooting accuracy plays a key role.
- A mild negative correlation between fouls committed and points earned, suggesting that disciplined teams often perform better.
- Goals scored have a few significant outliers — indicating occasional high-scoring matches or standout players.
- Yellow cards are more consistently distributed, with most teams falling within a typical range, but a few overly aggressive teams stand out.
- Home teams tend to win more matches than away teams, confirming the traditional home advantage trend.
- Away teams have slightly fewer goals on average.
- Top goal scorers also lead in shots taken and minutes played, highlighting the importance of regular playtime.
- A few players accumulate high assists with fewer goals, indicating their role as creative playmakers.
- Teams with higher clearances and tackles often finish mid-table — showing that strong defense alone may not ensure top position.
- Goalkeepers' saves were highest in lower-ranked teams, indicating defensive pressure and fewer clean sheets.
Here are a few visualizations used in the project:
- 📊 Goals scored per season (Bar chart)
- 🥧 Win distribution by team (Pie chart)
- 🔥 Heatmap of feature correlations
- 🧍 Top 10 goal scorers (Horizontal bar chart)
- Include data from more seasons
- Add xG (Expected Goals) and advanced metrics
- Perform predictive modeling (e.g., match outcome prediction)
- Dashboard creation using Plotly or Power BI