Table of Contents:
- 🚀 Overview
- 📺 Demo
- ⚙️ Installation
- 📊 Dataset
- 🧹 Project Modules
- 🤖 AI Models
- 🧪 Experiments & Evaluation
- 🏅 Training Approach
- 📊 Ensemble Voting
- 📚 Notebooks & Further Reading
JudgerAI is an innovative NLP application that predicts legal case outcomes with impressive accuracy by analyzing past cases, precedents, and case facts. It empowers legal professionals to:
- 📈 Increase prediction accuracy
- ⏱️ Save valuable time on case research
- 🧠 Make informed, data-driven decisions
Watch JudgerAI in action:
JudgerAI.2.0.-.Project.Demo.-.Trim.mp4
1️⃣ Clone the repository
git clone https://github.com/MohammedAly22/JudgerAI.git
2️⃣ Download GloVe embeddings (50-dim) from Kaggle and save as:
./GloVe/glove.6B.50d.txt
3️⃣ Download pre-trained models (heavy files) and place them in models/
.
Download Models from Here
4️⃣ Directory structure:
JudgerAI/
├── csvs/
├── dataset/
├── GloVe/
├── models/
├── src/
└── *.ipynb
5️⃣ Run the app:
streamlit run src/main.py
- Total cases: 3,464
- Key columns: ID, name, href, first/second_party, winning_party, winner_index (0/1), facts
- Input:
facts
→ Output:winner_index
Here is the dataset summary:
column | datatype | description |
---|---|---|
ID |
int64 | Defines the case ID |
name |
string | Defines the case name |
href |
string | Defines the case hyper-reference |
first_party |
string | Defines the name of the first party (petitioner) of a case |
second_party |
string | Defines the name of the second party (respondent) of a case |
winning_party |
string | Defines the winning party name of a case |
winner_index |
int64 | Defines the winning index of a case, 0 => the first party wins, 1 => the second party wins |
facts |
string | Contains the case facts that are needed to determine who is the winner of a specific case |
Modular structure for maintainability and clarity:
Module | Location | Responsibilities |
---|---|---|
Preprocessing | src/preprocessing.py |
Tokenization, balancing, anonymization, vectorization |
Plotting | src/plotting.py |
Visualizing performance, confusion matrices, ROC-AUC, heatmaps |
Utils | src/utils.py |
Training helpers, k-fold CV, accuracy/loss summary builders |
Streamlit App | src/main.py |
Frontend UI for demo and deployment |
Deployment Utils | src/deployment_utils.py |
Model loader, sample picker, vectorizer generator, highlights words |
JudgerAI incorporates 7 different models:
- Doc2Vec – Documents as dense vectors
- 1D-CNN – Convolutional features over text
- TF-IDF + TextVectorization – Weighted bag-of-words embedding
- GloVe – Global co-occurrence embeddings
- FastText – Subword-enhanced embeddings
- LSTM – Memory-capable sequences
- BERT – Contextual pre-trained transformer
A mix of traditional and modern architectures to maximize coverage.
Three core preprocessing decisions were evaluated:
- Preprocessing steps – stopword removal, stemming, etc.
- Data anonymization – replacing party names with
_PARTY_
- Label imbalance – strategies for balanced classes
This results in 2³ = 8 experiments, each run with 4-fold cross-validation, giving thorough analysis across 32 total runs per model.
Combination # | Preprocessing | Data Anonymization | Label Class Imbalance |
---|---|---|---|
1 | No | No | No |
2 | No | No | Yes |
3 | No | Yes | No |
4 | No | Yes | Yes |
5 | Yes | No | No |
6 | Yes | No | Yes |
7 | Yes | Yes | No |
8 | Yes | Yes | Yes |
- 80/20 train/test split
- 4-fold CV on training set
- Best combination selected per model based on accuracy
Workflow:
- Train each model × 8 preprocessing setups × 4 CV folds
- Evaluate and select best-performing model
Final predictions are generated through an ensemble voting method across all tuned models to ensure robustness.
In-depth exploration available in the following notebooks:
BERT_experiments.ipynb
cnn_experiments.ipynb
doc2vec_experiments.ipynb
FastText_experiments.ipynb
glove_experiments.ipynb
LSTM_experiments.ipynb
tf_idf_experiments.ipynb
voting_experiments.ipynb
Contributions welcome! Feel free to contribute to JudgerAI.
Thank you for exploring JudgerAI—ushering in a smarter future for legal decision-making with AI-powered precision.