Skip to content

vianarafael/civil-code-copilot

Repository files navigation

Civil Code Copilot

Civil Code Copilot is a local-first FastAPI app for answering Japanese family‑law questions with grounded citations. It uses a hybrid retriever (BM25 + vector search via ChromaDB) over a small corpus, and can run entirely offline with a local LLM stub. Answers are conservative, cite JP statutes first, and avoid inventing numbers or laws.

Features

  • Hybrid retrieval: vector + BM25 with rank fusion and domain boosts
  • JP focus: prioritizes Civil Code (e.g., 766, 819) and family‐law procedure
  • Visitation‑aware routing: queries mentioning 「面会交流」 are routed to a visitation template and anchored to 民法 766
  • Deterministic composer: quotes short statute snippets; shows banners when sources are missing (no hallucinated citations)
  • Local‑only by default: ships with a minimal HTTP LLM stub
  • Sessioned chat: history and backend choice saved per session

Quick Start

  • Requirements: Python 3.11+, macOS/Linux/WSL. No GPU required.
  1. Create a virtual environment
  • python3 -m venv .venv
  • source .venv/bin/activate (Windows: ./.venv/Scripts/activate)
  1. Install dependencies
  • pip install -U pip
  • pip install -r requirements.txt
  • Optional (improves JP BM25): pip install sudachipy sudachidict_core or pip install janome
  1. Start the local LLM stub (echo‑style)
  • uvicorn dev.local_llm_stub:app --port 8080
  • It listens on http://127.0.0.1:8080/generate.
  1. Run the app
  • In a new terminal: source .venv/bin/activate
  • uvicorn --env-file .env app.main:app --reload
  • Open http://127.0.0.1:8000
  1. Ask a question (JP/EN)
  • The app retrieves supporting text and generates a cautious, cited answer.
  • Language switch (JP/EN/Both) affects only output language, not retrieval.

Data & Indexing

This repo includes a small processed dataset in data/processed/. To (re)build the Chroma index:

  • python -m rag.index

If you add more processed JSONL (e.g., JP guides or child‑support tables), re‑run the command. Collections:

  • law_jp, law_en — statutes
  • guides_jp, guides_en — practice guides
  • practice_tables — child‑support tables (算定表)

Corpus setup (downloading source documents)

You control what the app can cite by placing raw source files under data/raw/ and generating data/processed/*.jsonl via the ingestion pipeline.

  • See source.yml for the expected documents and their file paths (JP Civil Code XML, guides HTML, 算定表 PDFs, etc.).
  • Place/download the files into the paths referenced in source.yml (or edit source.yml to point to your copies).
  • Build processed chunks:
    • python -m ingest.loader # reads source.yml, writes data/processed/*.jsonl
  • Index into Chroma:
    • python -m rag.index

Notes

  • Statutes (e.g., 民法) should be XML to allow accurate article detection (e.g., 766, 819).
  • Guides can be HTML; the loader chunks large sections automatically.
  • 算定表 PDFs are supported (text extraction with fallback logic; OCR may be needed for scanned pages).

Backends & Environment

By default, the app uses the local stub. Configure via .env (already gitignored):

  • Local default:
    • DEFAULT_LLM_BACKEND=local
    • LOCAL_LLM_URL=http://127.0.0.1:8080/generate
  • Optional OpenAI (incurs cost):
    • ALLOW_EXTERNAL_LLM=true
    • DEFAULT_LLM_BACKEND=openai
    • OPENAI_API_KEY=...

Tests

Run all tests:

  • pytest -q

Included tests cover:

  • Issue routing bias for visitation queries
  • No hallucinated statute numbers when statutes are missing
  • Guide‑only output still contains phased schedule guidance (with a law‑missing banner)
  • Retrieval returns a guide + 民法 766 for typical visitation prompts

CLI Utilities

  • Rebuild index: python -m rag.index
  • Inspect retrieval: python -m rag.retrieve "面会交流 調停" --k 5 --lang jp --debug

Architecture (brief)

  • rag/retrieve.py — Hybrid retriever: BM25 sampling across collections + vector search + rank fusion; deterministic anchors (e.g., article 766 for visitation)
  • rag/compose.py — Conservative composer: statute quotes first, guide quotes second, explicit banners for missing sources
  • rag/index.py — Indexer from data/processed/*.jsonl into Chroma collections
  • app/ — FastAPI app, Jinja templates, CSS
  • db/chat_store.py — SQLite chat store (sessions + messages)

Privacy & Safety

  • Sensitive files are ignored by default:
    • .env, data/case/**, case_profile.yml, db/**, chroma/**
  • Composer and tests explicitly avoid fabricating statute numbers or numeric bands when sources are missing.

Case profile (personal context)

You can provide light personal context the app can refer to (kept locally and git‑ignored):

  • Create data/case/case_profile.yml (preferred) or case_profile.yml alongside the repo.

  • Example:

    initials: "AB" age: 36 children: 1 income_parent_a: 500 income_parent_b: 200

  • The composer includes a short “Mapping to your profile” section using these fields. Do not put anything sensitive; keep it high‑level.

Sessions and chat history

  • The app keeps context per session in db/chat.sqlite3 (git‑ignored). Use the Session switcher at the bottom to create/select sessions.
  • To clear history, delete the db/ folder while the app is stopped.

Disclaimer

This project provides legal information, not legal advice. Always consult a qualified attorney for specific cases.

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published