title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned |
---|---|---|---|---|---|---|---|
ThreadNavigatorAI2.0 |
π§΅ |
red |
green |
streamlit |
1.33.0 |
app.py |
true |
π ThreadNavigatorAI 2.0 is a modular, multi-agent Reddit analyzer that summarizes threads, verifies claims, and scores insights using LLM-as-a-Judge. Built for high-scale moderation and discourse comprehension under real-world latency and token constraints.
π― Designed for folks evaluating top-tier AI engineering talent.
ThreadNavigatorAI 2.0 builds directly on v1.0 by introducing:
Feature | 1.0 | π 2.0 Upgrade |
---|---|---|
Agent Orchestration | LangGraph (Reply, Mod) | Modular pipeline (Summarizer, FC, Eval) |
Summarization | Prompt + RAG | Semantic + config-driven transformer |
Moderation | Rule-based + LLM | π Fact Checker Agent (tool-augmented) |
Evaluation | Manual | π§ LLM-as-a-Judge (rubric-based) |
UI | Basic Streamlit | β¨ Tabbed layout, latency toggle, download |
Scalability | Single-thread only | 100 threads (real + mock hybrid) |
Thread JSON β [Multi-Agent Orchestration]
βββ π Summarizer Agent β Extracts core insights
βββ π§ͺ Fact Checker Agent β Verifies claims via tools
βββ π§ Evaluator Agent β LLM-as-a-Judge on rubric (Relevance, Factuality...)
β³ Config-driven model orchestration (via config.yaml)
β³ Latency + model tracking in UI
β³ Deployed on Hugging Face Spaces (Streamlit + Docker)
β
Multi-Agent Stack (Summarizer, Fact Checker, Evaluator)
β
Semantic Summarization + Retrieval
β
LLM-as-a-Judge via rubric-based evaluation (like RAGAS)
β
Hybrid Simulation Mode (10 real + 90 mock = 100 total threads)
β
Live Latency Display + Model Attribution
β
Streamlit UX with onboarding, source links, toggles, download
β
Free-tier Deployable (OpenRouter APIs, Streamlit, Hugging Face)
Modular multi-agent architecture modeled after real-world Reddit analysis workflows.
Agent | Role | Model Used (Free-tier) |
---|---|---|
π§ Summarizer | Extracts semantic insights from messy Reddit posts using transformer-based compression | moonshotai/kimi-k2:free |
π§ͺ Fact Checker | Verifies factuality using retrieval tools | deepseek/deepseek-r1:free |
π Evaluator | Scores output on Relevance, Coherence, and Factuality (LLM-as-a-Judge) | openrouter/mistralai/mistral-7b-instruct:free |
π§ Tools | External retrieval & KB (Serper, Wikipedia) | β |
All agents use individual OpenRouter models, with latency tracked per call.
Select a Reddit-style thread from 100 examples.
Agents are triggered in sequence:
- π§ Summarizer β condenses the entire thread
- π§ͺ Fact Checker β verifies major claims with external data
- π Evaluator β scores the summary based on LLM rubrics
Output is displayed in a polished tabbed UI with latency + model trace.
Optionally download the result as JSON.
All agents are modular β swap-in / swap-out via config.yaml
.
Tabbed layout includes:
- π§ Summary (with model and source)
- π Fact Checks (claim + judgment)
- β‘ Latency (per agent)
- π Evaluation (scored by LLM)
π Try the fully working UI on Hugging Face:
π ThreadNavigatorAI 2.0 β Live Space
No login, no billing β real inference with OpenRouter free-tier models.
- π§ Summary: Debate around GPT-4 vs Gemini, with user opinions and factual misunderstandings
- π Claim: βGemini was trained on YouTube commentsβ β π΄ Incorrect
π Eval Score:
- Relevance: π’ 5 β Very relevant
- Coherence: π‘ 3 β Some redundancy
- Factuality: π’ 4 β Mostly accurate
β±οΈ Latency:
- Summarizer: 2.1s
- FactChecker: 3.4s
- Evaluator: 1.9s
ThreadNavigatorAI2.0/
βββ ui/
β βββ app.py # Streamlit frontend
βββ cli/
β βββ batch_run.py # Multi-thread processor
βββ config/ # Models and tool settings
β βββ config.yaml
βββ agents/
β βββ summarizer_agent.py
β βββ factchecker_agent.py
β βββ evaluator_agent.py
βββ utils/
β βββ llm_utils.py
βββ data/
β βββ batch_output.json # 10 threads real inference and 90 threads simulated inference
β βββ threads_100.json # 100 simulated threads
βββ assets/
β βββ threadnavigator_flow.png # Architecture Diagram
β βββ threadnavigator-demo.gif # UI Demo
βββ requirements.txt
- β Represents real-world pipeline for Reddit moderation/analysis
- β Dynamic toggles: latency, evaluation, download
- β Hybrid inference simulates scale under free-tier
- β LLM-as-a-Judge with model attribution & traceability
- β Fully local, frictionless UI with onboarding
- β Built like an internal debugging tool for social platforms
git clone https://github.com/rajesh1804/ThreadNavigatorAI2.0
cd ThreadNavigatorAI2.0
pip install -r requirements.txt
Create .env
:
OPENROUTER_API_KEY=sk-
SERPER_API_KEY=
Then run:
streamlit run app.py
Project | Description | Link |
---|---|---|
π§΅ ThreadNavigatorAI 1.0 | RAG-based summarizer + moderator + reply agent using LangGraph | π View |
π RideCastAI | Real-time ride fare/ETA prediction with spatial heatmaps | π View |
π GroceryGPT+ | Vector search + reranking grocery assistant with fuzzy recall | π View |
π¬ StreamWiseAI | Netflix-style movie recommender with Retention Coach Agent | π View |
- π§ͺ
gemma-3-4b-it
rejected evaluation due to instruction tuning being disabled β switched tomistralai-7b:free
- π Reddit data volume limited to 10 threads (real) due to OpenRouter limits β mock data generated for scalability tests
- β±οΈ Tool-based fact-checking adds ~2β3 seconds latency β handled with async retries + optional toggle
- π§΅ No full LangGraph orchestration in this version (ThreadNavigator 1.0 had it) β but this version enables more precise per-agent control
Rajesh Marudhachalam β AI/ML Engineer with a focus on building LLM-native applications.
π GitHub | LinkedIn
Projects: RideCastAI, StreamWiseAI, GroceryGPT+
- OpenRouter β Free-tier LLM APIs
- Streamlit β UI framework
- Serper.dev β Web search API
- Wikipedia API β Factual KB
- Hugging Face Spaces β App Hosting
MIT License
βοΈ Star this repo if it impressed you. Follow for more elite-level ML + LLM builds.