Civil Code Copilot

Civil Code Copilot is a local-first FastAPI app for answering Japanese family‑law questions with grounded citations. It uses a hybrid retriever (BM25 + vector search via ChromaDB) over a small corpus, and can run entirely offline with a local LLM stub. Answers are conservative, cite JP statutes first, and avoid inventing numbers or laws.

Features

Hybrid retrieval: vector + BM25 with rank fusion and domain boosts
JP focus: prioritizes Civil Code (e.g., 766, 819) and family‐law procedure
Visitation‑aware routing: queries mentioning 「面会交流」 are routed to a visitation template and anchored to 民法 766
Deterministic composer: quotes short statute snippets; shows banners when sources are missing (no hallucinated citations)
Local‑only by default: ships with a minimal HTTP LLM stub
Sessioned chat: history and backend choice saved per session

Quick Start

Requirements: Python 3.11+, macOS/Linux/WSL. No GPU required.

Create a virtual environment

python3 -m venv .venv
source .venv/bin/activate (Windows: ./.venv/Scripts/activate)

Install dependencies

pip install -U pip
pip install -r requirements.txt
Optional (improves JP BM25): pip install sudachipy sudachidict_core or pip install janome

Start the local LLM stub (echo‑style)

uvicorn dev.local_llm_stub:app --port 8080
It listens on http://127.0.0.1:8080/generate.

Run the app

In a new terminal: source .venv/bin/activate
uvicorn --env-file .env app.main:app --reload
Open http://127.0.0.1:8000

Ask a question (JP/EN)

The app retrieves supporting text and generates a cautious, cited answer.
Language switch (JP/EN/Both) affects only output language, not retrieval.

Data & Indexing

This repo includes a small processed dataset in data/processed/. To (re)build the Chroma index:

python -m rag.index

If you add more processed JSONL (e.g., JP guides or child‑support tables), re‑run the command. Collections:

law_jp, law_en — statutes
guides_jp, guides_en — practice guides
practice_tables — child‑support tables (算定表)

Corpus setup (downloading source documents)

You control what the app can cite by placing raw source files under data/raw/ and generating data/processed/*.jsonl via the ingestion pipeline.

See source.yml for the expected documents and their file paths (JP Civil Code XML, guides HTML, 算定表 PDFs, etc.).
Place/download the files into the paths referenced in source.yml (or edit source.yml to point to your copies).
Build processed chunks:
- python -m ingest.loader # reads source.yml, writes data/processed/*.jsonl
Index into Chroma:
- python -m rag.index

Notes

Statutes (e.g., 民法) should be XML to allow accurate article detection (e.g., 766, 819).
Guides can be HTML; the loader chunks large sections automatically.
算定表 PDFs are supported (text extraction with fallback logic; OCR may be needed for scanned pages).

Backends & Environment

By default, the app uses the local stub. Configure via .env (already gitignored):

Local default:
- DEFAULT_LLM_BACKEND=local
- LOCAL_LLM_URL=http://127.0.0.1:8080/generate
Optional OpenAI (incurs cost):
- ALLOW_EXTERNAL_LLM=true
- DEFAULT_LLM_BACKEND=openai
- OPENAI_API_KEY=...

Tests

Run all tests:

pytest -q

Included tests cover:

Issue routing bias for visitation queries
No hallucinated statute numbers when statutes are missing
Guide‑only output still contains phased schedule guidance (with a law‑missing banner)
Retrieval returns a guide + 民法 766 for typical visitation prompts

CLI Utilities

Rebuild index: python -m rag.index
Inspect retrieval: python -m rag.retrieve "面会交流調停" --k 5 --lang jp --debug

Architecture (brief)

rag/retrieve.py — Hybrid retriever: BM25 sampling across collections + vector search + rank fusion; deterministic anchors (e.g., article 766 for visitation)
rag/compose.py — Conservative composer: statute quotes first, guide quotes second, explicit banners for missing sources
rag/index.py — Indexer from data/processed/*.jsonl into Chroma collections
app/ — FastAPI app, Jinja templates, CSS
db/chat_store.py — SQLite chat store (sessions + messages)

Privacy & Safety

Sensitive files are ignored by default:
- .env, data/case/**, case_profile.yml, db/**, chroma/**
Composer and tests explicitly avoid fabricating statute numbers or numeric bands when sources are missing.

Case profile (personal context)

You can provide light personal context the app can refer to (kept locally and git‑ignored):

Create data/case/case_profile.yml (preferred) or case_profile.yml alongside the repo.
Example:

initials: "AB" age: 36 children: 1 income_parent_a: 500 income_parent_b: 200
The composer includes a short “Mapping to your profile” section using these fields. Do not put anything sensitive; keep it high‑level.

Sessions and chat history

The app keeps context per session in db/chat.sqlite3 (git‑ignored). Use the Session switcher at the bottom to create/select sessions.
To clear history, delete the db/ folder while the app is stopped.

Disclaimer

This project provides legal information, not legal advice. Always consult a qualified attorney for specific cases.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
app		app
config		config
data		data
dev		dev
eval		eval
ingest		ingest
rag		rag
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
source.yml		source.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Civil Code Copilot

Features

Quick Start

Data & Indexing

Corpus setup (downloading source documents)

Backends & Environment

Tests

CLI Utilities

Architecture (brief)

Privacy & Safety

Case profile (personal context)

Sessions and chat history

Disclaimer

About

Uh oh!

Releases

Packages

Languages

vianarafael/civil-code-copilot

Folders and files

Latest commit

History

Repository files navigation

Civil Code Copilot

Features

Quick Start

Data & Indexing

Corpus setup (downloading source documents)

Backends & Environment

Tests

CLI Utilities

Architecture (brief)

Privacy & Safety

Case profile (personal context)

Sessions and chat history

Disclaimer

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages