Skip to content

Navya0203/LLM_LegalDocSummarization

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Legal Document Summarization

This repository contains the code and configuration for the LLM Legal Document Summarization project, deployed on Chameleon Cloud. Follow the sections below to understand the system lifecycle from data ingestion to production serving, and see links to the specific implementation files.


1. Value Proposition

Target Customer: Legal analysts at corporate law firms who need fast, accurate summaries of incoming legal documents to accelerate review.

  • Customer Details:
    • Receives >100 documents/day (.pdf, .docx)
    • Needs to look up previous judgements and their summaries by searching key words
    • Requires summary within minutes of upload
    • Ground-truth labels (expert summaries) available after review

Design Influences: Data size, latency requirements, retraining frequency.

2. Scale

  • Offline Data: 10 GB raw documents (~20 K files) - Zenodo data source, 3.3k files containing case data: Kaggle data source deepcontractor/supreme-court-judgment-prediction
  • Model Size: Fine-tuned Llama-2-7B; training takes uses 2×A100 GPUs
  • Deployment Throughput: ~500 inference requests/day (~1 req/min)

3. Architecture Diagram

image

4. Infrastructure & IaC

Provisioning and configuration via Terraform and Ansible:


5. Persistent Storage

On Chameleon:

  • Object Store:

Notebook with instructions to create and access object store, code to download dataset, preprocess, partition and store in the object store created

Structure and contents:

object-persist-project33

├── production.jsonl
├── test.jsonl
└── train.jsonl

We created a block volume of 50 GiB initially, but extended to 100 GiB to store our ONNX model, RAG data, etc.

Structure and contents:

block-persist-project33

├── minio_data
  ├── mlflow-artifacts
  └── ray
├── postgres_data
└── rag_data
  ├── model_rag
    ├── index_to_doc.pkl
    └── legal-facts.index
  └── rag_chunks  

Mlflow artifacts folder contains all the artifacts generated during serving and training. Ray folder contains as ray train related checkpoints Postgress folder contains Rag Data folder contains all data related to our RAG model sentence-transformers/all-MiniLM-L6-v2 which includes the data chunks, vector db, mapping info.


6. Offline Data

Training Dataset & Data Lineage

We use the Zenodo Indian & UK Legal Judgments Dataset containing ~20K court cases and corresponding human-written summaries.

  • Sources: IN-Abs, UK-Abs, and IN-Ext
  • Data Size: ~10 GB total, over 20,000 legal documents and associated summaries - from Zenodo.
  • Format: Paired .txt files for full judgments and summaries

Example Sample (train.jsonl)

{
  "filename": "UKCiv2012.txt",
  "judgement": "The claimant seeks damages following breach of contract. The court heard evidence from both parties. After reviewing the statutory framework and case law precedent, the court finds that the defendant did not fulfill their obligations...",
  "summary": "The defendant breached the contract. The court awarded damages to the claimant.",
  "meta": {
    "doc_words": 2176,
    "sum_words": 132,
    "ratio": 0.06
  }
}

Relation to Customer Use Case

Our target user (a legal analyst at a law firm) regularly deals with such long-form judicial decisions. The Zenodo dataset closely mirrors their real-world workflow: • They review lengthy judgment documents daily. • They generate or consume summaries internally for client reporting. • Our model mimics this process by learning from historic summaries.

About Production Samples

Production samples (the 10% test set): • Contain no ground-truth summaries at inference time. • In a deployed setting, these would represent new unseen judgments uploaded by users. • Once reviewed by a human expert, feedback summaries could be used to retrain the model thus closing the feedback loop.


7. Data Pipeline

Processing Pipeline

Steps handled in data_preprocessing.py:

  1. Ingestion: Load documents from raw folders.
  2. Merging: Combine segment-wise summaries if full summary not available.
  3. Cleaning: Normalize unicode, remove extra whitespace, lowercase.
  4. Sanity checks: Remove empty/duplicate/missing files.
  5. Filtering: Retain samples with 50–1500 summary words and acceptable doc:summary ratios.
  6. Split: 70% train, 20% test, 10% production — written to *.jsonl.

Data Pipeline Overview

         ┌──────────────────────────────┐
         │    Raw Zenodo Dataset        │
         │  (/data/raw/* subfolders)    │
         └────────────┬─────────────────┘
                      │
                      ▼
         ┌──────────────────────────────┐
         │   Ingestion & File Loading   │
         │ - Load judgment + summary    │
         │ - Handle IN-Abs, UK-Abs,     │
         │   IN-Ext variants            │
         └────────────┬─────────────────┘
                      │
                      ▼
         ┌──────────────────────────────┐
         │     Merging Segment-wise     │
         │ - Combine partial summaries  │
         │   (facts, statute, etc.)     │
         └────────────┬─────────────────┘
                      │
                      ▼
         ┌──────────────────────────────┐
         │         Cleaning Text        │
         │ - Unicode normalization      │
         │ - Lowercasing                │
         │ - Remove extra whitespace    │
         └────────────┬─────────────────┘
                      │
                      ▼
         ┌──────────────────────────────┐
         │       Sanity Checks          │
         │ - Remove empty/missing files │
         │ - Check for duplicates       │
         └────────────┬─────────────────┘
                      │
                      ▼
         ┌──────────────────────────────┐
         │       Statistical Filter     │
         │ - 50–1500 summary words      │
         │ - Ratio: 1–50% of doc length │
         └────────────┬─────────────────┘
                      │
                      ▼
         ┌──────────────────────────────┐
         │        Split & Dump          │
         │ - 70% train                  │
         │ - 20% test                   │
         │ - 10% production             │
         │ → Output as `.jsonl` files   │
         └──────────────────────────────┘

RAG Pipeline: Supreme Court Judgment Summarization

  1. Download & Extract

    • Use Kaggle API to pull deepcontractor/supreme-court-judgment-prediction into /mnt/block/rag_data and unzip.
  2. Load & Inspect

    • Read justice.csv with Pandas to verify row count and columns (name, facts, etc.).
  3. Clean & Serialize

    • Normalize newlines, strip empty lines, and write each case’s facts to rag_txt/{idx}_{safe_name}.txt.
  4. Chunk Documents

    • Tokenize with sentence-transformers/all-MiniLM-L6-v2 (512-token window, 64-token overlap).
    • Save each piece to rag_chunks/{original}_chunkXXX.txt.
  5. Embed & Index

    • Encode chunks via SentenceTransformer.
    • Build a FAISS L2 index over the vectors.
    • Persist model_rag/legal-facts.index and model_rag/index_to_doc.pkl.
  6. Query‐Time Retrieval

    • Embed user query, FAISS search → top-K chunks.
    • Load snippets, assemble prompt, send to fine-tuned Llama-2 for final summary.

8. Model Training

8.1 Provisioning our resources and Jupyter container

8.2 Fine-tuning with LoRA + Ray Train + Lightning + MLflow

  • Training script: Ray-Train/sft_train_llama

  • Frameworks: PyTorch Lightning, Ray Train (DDP + fault‐tolerance), PEFT (LoRA), MLflow for experiment tracking

  • Checkpointing:

    • We save both the best val_loss and the last epoch into ./checkpoints/ via Lightning’s ModelCheckpoint(save_top_k=1, save_last=True) callback.
    • On worker restarts, Ray will supply the last checkpoint directory and Lightning will resume from checkpoints/last.ckpt.
  • Logging:

    • Metrics (train/val loss, epochs) are automatically logged to MLflow via the MLFlowLogger.

8.3 Experiment Tracking

8.4 Retrain Code


9. Model Serving & Evaluation

9.1 Serving and API Endpoint

  • Merged the trained LoRA adapters into the Llama-2-7b base and exported the combined model as an FP16 ONNX file.
  • Ran ONNX Runtime on CPU, CUDA, and TensorRT providers, then selected the fastest execution path. Code to this
  • Registered the resulting model in MLflow as a checkpoint, which the FastAPI endpoint then pulls for inference. Code where we are creating the Fast API
  • Dockerfile: Dockerfile
  • Input: User Prompt appended with RAG output
  • Output: summary text

9.2 Offline Evaluation

  • Ran the PyTest script (tests/test_offline_eval.py) to validate end-to-end preprocessing, inference, and summary format on sample inputs.Monitoring_and_Evaluation /1_Setup_ModelEvalAndMonitoring.ipynb
  • Executed the finalized model on the held-out test set to compute ROUGE metrics, then log all scores to MLflow against the checkpoint registered in Section 9.1.

9.3 Load Testing

  • Ran a Locust simulation against the /generate endpoint while monitoring throughput, latency, and errors in Grafana’s “FastAPI Load Test” dashboard notebook for load testing

9.4 Business-Specific Evaluation

9.5 Staged Deployment


10. Online Data & Monitoring


11. CI/CD & Continuous Training

  • GitHub Actions workflow: CI git merge test
  • Triggers: push to main → tests → build Docker images → deploy to staging
  • Flask App: We have a flask app, which takes input from user, looks up on RAG, appends it to the user promt, sends the request with the new promt to our ONNX model through FastAPI, which then returns the summary. The summary is then appended to the UI, and user has the option to download the summary text. Code

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 92.8%
  • Python 6.1%
  • HCL 0.4%
  • HTML 0.2%
  • CSS 0.2%
  • JavaScript 0.2%
  • Dockerfile 0.1%