Skip to content
Harshal Sawant edited this page Jun 6, 2025 · 4 revisions

LLM-QA-System Wiki

Welcome to the LLM-QA-System wiki!


Table of Contents

  1. Project Purpose
  2. Why This Project?
  3. What Makes This Project Different?
  4. How It Works (Architecture)
  5. How We Created It
  6. Getting Started
  7. API Reference
  8. Troubleshooting & FAQ

Project Purpose

The LLM-QA-System was created to make technical documentation truly searchable and interactive.
It allows users and teams to ask natural language questions about large, complex documentation and receive context-aware answers.

Motivation:

  • Technical docs become huge and hard to search.
  • Keyword search is too basic; it misses synonyms and context.
  • LLMs and vector search enable "semantic" Q&A.

Goals:

  • High-quality, interactive Q&A over any docs.
  • Modular, open-source, and easy deployment.
  • Extensible for any domain or team.

Why This Project?

  • Existing solutions are often closed, expensive, or hard to extend.
  • Keyword search misses intent and context.
  • This project is fast, open, modular, and easy to run anywhere.
  • Vision: Empower everyone to "chat with their docs" using state-of-the-art, open technology.

What Makes This Project Different?

  • Modular: Rust backend and Python ML service are independent.
  • Open: All code is open-source and easily extensible.
  • Fast: Async Rust, efficient Python, and FAISS for semantic search.
  • Trustworthy: Returns both answers and the source snippets for verification.
  • Easy to Deploy: One-command Docker Compose setup.
  • Developer Friendly: Clear and documented API, easy to extend or swap out any component.

How It Works (Architecture)

Components:

  • Rust Backend API (Actix-web):
    Receives REST requests, forwards queries to Python, and returns answers plus sources.
  • Python ML Service (FastAPI):
    Loads a transformer model (e.g., BERT), embeds docs and queries, uses FAISS for similarity search, and returns the best matches.
  • FAISS Vector Database:
    Stores vector embeddings of doc chunks for fast semantic search.

Typical Flow:

  1. User sends a question to the Rust backend (/api/query).
  2. Rust backend forwards the question to the Python ML service.
  3. Python ML service embeds the question, searches the FAISS database, and finds the best matching doc snippets.
  4. Python service returns the answer and sources to Rust backend.
  5. Rust backend responds to the user.

Diagram:

Client (UI/CLI) <---> Rust Backend API <---> Python ML Inference Service <---> FAISS Vector DB

How We Created It

  1. Requirements:

    • Needed a better documentation search/Q&A tool.
    • Decided on features: semantic search, modular, open, easy to deploy.
  2. Technology Choices:

    • Rust (backend): performance, reliability, async support.
    • Python (ML): best ML libraries.
    • FAISS: fast vector search.
    • Docker Compose: easy orchestration.
  3. System Design:

    • RESTful APIs for clear separation.
    • Clean contracts between backend and ML service.
  4. Implementation:

    • Rust backend with /api/query and /api/status.
    • Python FastAPI for model inference and FAISS search.
    • Preprocessing/indexing script for docs.
    • Dockerfiles for both services and shared data volume.
    • CI workflow for automated build/test.
  5. Documentation:

    • Used GitHub Wiki for step-by-step documentation (as shown in the screenshot above).

Getting Started

Prerequisites

  • Docker & Docker Compose
  • Python 3.8+ (for preprocessing)
  • Git

Setup Steps

  1. Clone the repository:

    git clone https://github.com/c0d3h01/llm-qa-system.git
    cd llm-qa-system
  2. Add your documentation:

    • Place .txt or .md files into data/raw/.
    • Each line or paragraph will be a chunk.
  3. Build the vector index:

    pip install -r ml-inference/requirements.txt
    python scripts/build_index.py
  4. Start the system:

    docker compose up --build
  5. Test the API:

    curl http://localhost:8080/api/status
    curl -X POST http://localhost:8080/api/query \
      -H "Content-Type: application/json" \
      -d '{"question": "What is OAuth2?"}'

API Reference

Rust Backend

  • GET /api/status
    Health check. Returns "LLM-QA Backend is running".

  • POST /api/query
    Request:

    { "question": "How do I reset my password?" }

    Response:

    {
      "answer": "Relevant info: ...",
      "sources": [
        "source snippet 1",
        "source snippet 2"
      ]
    }

Python ML Service

  • POST /inference
    Internal endpoint for answering queries (used by backend).

Troubleshooting & FAQ

Common Issues

  • ML service fails to start:

    • Ensure Python dependencies are installed:
      pip install -r ml-inference/requirements.txt
    • Check data/faiss.index and data/docs.txt exist.
  • Rust backend can't connect:

    • Check the ML_INFERENCE_URL environment variable.
    • Use docker compose logs for debugging.
  • Query returns no results:

    • Make sure docs are in data/raw/ and index is rebuilt.

FAQ

  • Can I use my own transformer model?
    Yes, change MODEL_NAME in ml-inference/app.py or set via environment variable.

  • How do I add new docs?
    Put them in data/raw/, rebuild the index, and restart the stack.

  • Is this production ready?
    See Improvement Ideas for steps to productionize.

  • Can I run this on Windows/Mac?
    Yes, with Docker Desktop.