Skip to content

An advanced Reinforcement Learning project where Super Mario is trained to complete a level using DDQN, PPO, and YOLOv5 for object detection and decision-making

Notifications You must be signed in to change notification settings

Marco210210/SuperMario-RL-DDQN-PPO-YOLOv5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Super Mario Bros AI Agent 🎮🧠

This project explores the application of Reinforcement Learning (RL) and Computer Vision techniques to train an agent capable of completing a level of the classic Super Mario Bros game. It was developed for the Artificial Intelligence course at the University of Salerno.


📌 Project Summary

Three main approaches were tested:

  1. DDQN (Deep Double Q-Network)
  2. PPO (Proximal Policy Optimization) with different configurations
  3. PPO + YOLOv5 integration for object detection

The agent was trained to complete the level SuperMarioBros-1-1-v0 using gym-super-mario-bros.


🧠 Objectives

  • Train an agent to autonomously complete a level of Super Mario Bros
  • Compare RL models (DDQN vs PPO)
  • Improve vision-based decisions with YOLOv5 object detection
  • Analyze performance through metrics like reward, Q-values, and policy loss

📁 Project Structure

.
├── docs/                       # Project documentation
│   ├── Traccia.pdf             # Project prompt
│   ├── Relazione Super Mario Bros.pdf   # Final report
│   └── Presentazione Super Mario.pptx   # Presentation slides
│
├── media/                      # Output media
│   ├── completo.mp4            # Full run demo
│   └── Vittoria_Mario.mp4      # Winning episode clip ✅
│
├── prog/                       # Code and notebooks
│   ├── ddqn_agent.py           # DDQN implementation (PyTorch)
│   ├── ppo_512_batch.ipynb     # PPO with 512 steps
│   ├── ppo_2048_batch.ipynb    # PPO with 2048 steps
│   ├── ppo_with_yolo.ipynb    # PPO + YOLOv5 integration
│   └── mario_env/              # Ignored virtual environment
│
├── .gitignore                  # Excludes `mario_env`
└── README.md                   # You're here

🛠️ Technologies Used

  • Python 3, PyTorch, OpenCV
  • Stable-Baselines3 (PPO)
  • YOLOv5 (Roboflow + Ultralytics)
  • gym-super-mario-bros
  • Custom wrappers (frame stack, reward shaping)

🎯 Key Results

Model Victories Notes
DDQN 1/1000 episodes High stability
PPO (512 steps) 54/10M steps Best result overall
PPO (2048) 0 Failed to converge
PPO + YOLOv5 0 Better perception, poor translation to actions

YOLOv5 achieved 94.2% precision and 100% recall, but the integrated agent still failed to win due to difficulty performing complex jumps.


🎮 Gameplay Demo


🎥 Demo Video

Watch the agent win!
📹 Victory Clip

Or check the full episode:
📺 Full Demo


📌 Notes

  • This project required significant hardware resources (GPU for training PPO/YOLOv5).
  • Training time: ~48h for PPO + YOLOv5 on Mac without GPU.
  • All models trained on SuperMarioBros-1-1-v0.

👥 Authors


📄 License

This project is licensed under the CC BY-NC-SA 4.0 License
License: CC BY-NC-SA 4.0

You may share and adapt this work for non-commercial purposes only, as long as you give appropriate credit and distribute your contributions under the same license.
For commercial use, explicit permission from the authors is required.

About

An advanced Reinforcement Learning project where Super Mario is trained to complete a level using DDQN, PPO, and YOLOv5 for object detection and decision-making

Topics

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •