Super Mario Bros AI Agent 🎮🧠

This project explores the application of Reinforcement Learning (RL) and Computer Vision techniques to train an agent capable of completing a level of the classic Super Mario Bros game. It was developed for the Artificial Intelligence course at the University of Salerno.

📌 Project Summary

Three main approaches were tested:

DDQN (Deep Double Q-Network)
PPO (Proximal Policy Optimization) with different configurations
PPO + YOLOv5 integration for object detection

The agent was trained to complete the level SuperMarioBros-1-1-v0 using gym-super-mario-bros.

🧠 Objectives

Train an agent to autonomously complete a level of Super Mario Bros
Compare RL models (DDQN vs PPO)
Improve vision-based decisions with YOLOv5 object detection
Analyze performance through metrics like reward, Q-values, and policy loss

📁 Project Structure

.
├── docs/                       # Project documentation
│   ├── Traccia.pdf             # Project prompt
│   ├── Relazione Super Mario Bros.pdf   # Final report
│   └── Presentazione Super Mario.pptx   # Presentation slides
│
├── media/                      # Output media
│   ├── completo.mp4            # Full run demo
│   └── Vittoria_Mario.mp4      # Winning episode clip ✅
│
├── prog/                       # Code and notebooks
│   ├── ddqn_agent.py           # DDQN implementation (PyTorch)
│   ├── ppo_512_batch.ipynb     # PPO with 512 steps
│   ├── ppo_2048_batch.ipynb    # PPO with 2048 steps
│   ├── ppo_with_yolo.ipynb    # PPO + YOLOv5 integration
│   └── mario_env/              # Ignored virtual environment
│
├── .gitignore                  # Excludes `mario_env`
└── README.md                   # You're here

🛠️ Technologies Used

Python 3, PyTorch, OpenCV
Stable-Baselines3 (PPO)
YOLOv5 (Roboflow + Ultralytics)
gym-super-mario-bros
Custom wrappers (frame stack, reward shaping)

🎯 Key Results

Model	Victories	Notes
DDQN	1/1000 episodes	High stability
PPO (512 steps)	54/10M steps	Best result overall
PPO (2048)	0	Failed to converge
PPO + YOLOv5	0	Better perception, poor translation to actions

YOLOv5 achieved 94.2% precision and 100% recall, but the integrated agent still failed to win due to difficulty performing complex jumps.

🎮 Gameplay Demo

🎥 Demo Video

Watch the agent win!
📹 Victory Clip

Or check the full episode:
📺 Full Demo

📌 Notes

This project required significant hardware resources (GPU for training PPO/YOLOv5).
Training time: ~48h for PPO + YOLOv5 on Mac without GPU.
All models trained on SuperMarioBros-1-1-v0.

👥 Authors

📄 License

This project is licensed under the CC BY-NC-SA 4.0 License

You may share and adapt this work for non-commercial purposes only, as long as you give appropriate credit and distribute your contributions under the same license.
For commercial use, explicit permission from the authors is required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Super Mario Bros AI Agent 🎮🧠

📌 Project Summary

🧠 Objectives

📁 Project Structure

🛠️ Technologies Used

🎯 Key Results

🎮 Gameplay Demo

🎥 Demo Video

📌 Notes

👥 Authors

📄 License

About

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
media		media
prog		prog
.gitignore		.gitignore
README.md		README.md

Marco210210/SuperMario-RL-DDQN-PPO-YOLOv5

Folders and files

Latest commit

History

Repository files navigation

Super Mario Bros AI Agent 🎮🧠

📌 Project Summary

🧠 Objectives

📁 Project Structure

🛠️ Technologies Used

🎯 Key Results

🎮 Gameplay Demo

🎥 Demo Video

📌 Notes

👥 Authors

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 3

Uh oh!

Languages