Civil Code Copilot is a local-first FastAPI app for answering Japanese family‑law questions with grounded citations. It uses a hybrid retriever (BM25 + vector search via ChromaDB) over a small corpus, and can run entirely offline with a local LLM stub. Answers are conservative, cite JP statutes first, and avoid inventing numbers or laws.
- Hybrid retrieval: vector + BM25 with rank fusion and domain boosts
- JP focus: prioritizes Civil Code (e.g., 766, 819) and family‐law procedure
- Visitation‑aware routing: queries mentioning 「面会交流」 are routed to a visitation template and anchored to 民法 766
- Deterministic composer: quotes short statute snippets; shows banners when sources are missing (no hallucinated citations)
- Local‑only by default: ships with a minimal HTTP LLM stub
- Sessioned chat: history and backend choice saved per session
- Requirements: Python 3.11+, macOS/Linux/WSL. No GPU required.
- Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate
(Windows:./.venv/Scripts/activate
)
- Install dependencies
pip install -U pip
pip install -r requirements.txt
- Optional (improves JP BM25):
pip install sudachipy sudachidict_core
orpip install janome
- Start the local LLM stub (echo‑style)
uvicorn dev.local_llm_stub:app --port 8080
- It listens on
http://127.0.0.1:8080/generate
.
- Run the app
- In a new terminal:
source .venv/bin/activate
uvicorn --env-file .env app.main:app --reload
- Open
http://127.0.0.1:8000
- Ask a question (JP/EN)
- The app retrieves supporting text and generates a cautious, cited answer.
- Language switch (JP/EN/Both) affects only output language, not retrieval.
This repo includes a small processed dataset in data/processed/
. To (re)build the Chroma index:
python -m rag.index
If you add more processed JSONL (e.g., JP guides or child‑support tables), re‑run the command. Collections:
law_jp
,law_en
— statutesguides_jp
,guides_en
— practice guidespractice_tables
— child‑support tables (算定表)
You control what the app can cite by placing raw source files under data/raw/
and generating data/processed/*.jsonl
via the ingestion pipeline.
- See
source.yml
for the expected documents and their file paths (JP Civil Code XML, guides HTML, 算定表 PDFs, etc.). - Place/download the files into the paths referenced in
source.yml
(or editsource.yml
to point to your copies). - Build processed chunks:
python -m ingest.loader
# readssource.yml
, writesdata/processed/*.jsonl
- Index into Chroma:
python -m rag.index
Notes
- Statutes (e.g., 民法) should be XML to allow accurate article detection (e.g., 766, 819).
- Guides can be HTML; the loader chunks large sections automatically.
- 算定表 PDFs are supported (text extraction with fallback logic; OCR may be needed for scanned pages).
By default, the app uses the local stub. Configure via .env
(already gitignored):
- Local default:
DEFAULT_LLM_BACKEND=local
LOCAL_LLM_URL=http://127.0.0.1:8080/generate
- Optional OpenAI (incurs cost):
ALLOW_EXTERNAL_LLM=true
DEFAULT_LLM_BACKEND=openai
OPENAI_API_KEY=...
Run all tests:
pytest -q
Included tests cover:
- Issue routing bias for visitation queries
- No hallucinated statute numbers when statutes are missing
- Guide‑only output still contains phased schedule guidance (with a law‑missing banner)
- Retrieval returns a guide + 民法 766 for typical visitation prompts
- Rebuild index:
python -m rag.index
- Inspect retrieval:
python -m rag.retrieve "面会交流 調停" --k 5 --lang jp --debug
rag/retrieve.py
— Hybrid retriever: BM25 sampling across collections + vector search + rank fusion; deterministic anchors (e.g., article 766 for visitation)rag/compose.py
— Conservative composer: statute quotes first, guide quotes second, explicit banners for missing sourcesrag/index.py
— Indexer fromdata/processed/*.jsonl
into Chroma collectionsapp/
— FastAPI app, Jinja templates, CSSdb/chat_store.py
— SQLite chat store (sessions + messages)
- Sensitive files are ignored by default:
.env
,data/case/**
,case_profile.yml
,db/**
,chroma/**
- Composer and tests explicitly avoid fabricating statute numbers or numeric bands when sources are missing.
You can provide light personal context the app can refer to (kept locally and git‑ignored):
-
Create
data/case/case_profile.yml
(preferred) orcase_profile.yml
alongside the repo. -
Example:
initials: "AB" age: 36 children: 1 income_parent_a: 500 income_parent_b: 200
-
The composer includes a short “Mapping to your profile” section using these fields. Do not put anything sensitive; keep it high‑level.
- The app keeps context per session in
db/chat.sqlite3
(git‑ignored). Use the Session switcher at the bottom to create/select sessions. - To clear history, delete the
db/
folder while the app is stopped.
This project provides legal information, not legal advice. Always consult a qualified attorney for specific cases.