A sophisticated Retrieval-Augmented Generation (RAG) system designed for clinical decision support, leveraging the MIMIC-IV-Ext Direct dataset. This system employs advanced language models and semantic search capabilities to provide evidence-based clinical insights and diagnostic reasoning support.
DiReCT-RAG implements a multi-stage retrieval pipeline that processes clinical notes to provide contextually relevant medical information. The system utilizes state-of-the-art language models and embedding techniques to ensure accurate and reliable clinical information retrieval and generation.
-
Advanced RAG Architecture
- Multi-stage retrieval pipeline with semantic chunking
- Dynamic confidence scoring system
- Source consistency evaluation
- Mixed relevance detection
- Clinical section parsing and standardization
-
Intelligent Query Processing
- Context-aware medical information retrieval
- Semantic similarity-based document chunking
- Maximal Marginal Relevance (MMR) for diverse source selection
- Dynamic confidence thresholding
-
Clinical Response Generation
- Structured medical analysis format
- Evidence-based recommendations
- Source verification and confidence metrics
- Automated warning generation for low-confidence responses
-
Document Ingestion
- Custom JSON flattening for clinical notes
- Automatic category/subcategory extraction
- Clinical section header standardization
- Metadata enrichment
-
Vector Embedding System
- Primary: Google's Generative AI Embeddings (models/embedding-001)
- Secondary: Clinical ModernBERT support
- Task-specific retrieval optimization
-
Semantic Search Implementation
- ChromaDB vector store integration
- MMR search configuration:
- Retrieved documents: k=5
- Candidate pool: fetch_k=10
- Diversity factor: lambda_mult=0.7
-
LLM Integration
- Model: Palmyra-Med-70B-32k via NVIDIA AI Endpoints
- Specialized medical prompt engineering
- Context-aware response generation
- Python 3.12+
- LangChain framework
- Streamlit
- ChromaDB
- NVIDIA AI Endpoints
- Google Generative AI
- SentenceTransformers
- HuggingFace Transformers
- PyTorch
- NumPy
- Transformers
- python-dotenv
The system utilizes the MIMIC-IV-Ext Direct dataset (version 1.0.0), which includes:
- Comprehensive clinical notes
- Diagnostic flowcharts
- Medical knowledge graphs
- Structured clinical information across multiple specialties
- Custom implementation for medical text processing
- Coherence-preserving document splitting
- Dynamic chunk size optimization based on content structure
- Multi-factor confidence calculation
- Query complexity adaptation
- Source consistency evaluation
- Automated threshold adjustment
- Structured clinical section parsing
- Professional medical response formatting
- Automated warning generation for low-confidence scenarios
- Dual embedding support (Google AI & Clinical ModernBERT)
- Task-specific optimization
- Runtime embedding selection
- Clone the repository:
git clone https://github.com/saadsohail05/DiReCT-RAG-for-Diagnostic-Reasoning-in-Clinical-Notes.git
cd DiReCT-RAG-for-Diagnostic-Reasoning-in-Clinical-Notes
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Configure environment variables:
NVIDIA_API_KEY=your_key_here
GOOGLE_API_KEY=your_key_here
HUGGINGFACE_TOKEN=your_token_here
- Initialize the database:
python mainfile.py
- Start the web interface:
streamlit run app.py
-
Language Model
- Name: Palmyra-Med-70B-32k
- Provider: NVIDIA AI Endpoints
- Context Window: 32,768 tokens
- Domain: Medical/Clinical
-
Embedding Models
- Primary: Google Generative AI Embeddings (models/embedding-001)
- Secondary: Clinical ModernBERT
- Implementation: Task-specific optimization for medical content
- LLM Judge: Gemini 2.0 Flash LLM
- Acts as an automated clinical QA evaluator
- Zero temperature setting for consistent evaluation
- Processes full clinical notes for context
- Sample size: Up to 100 documents
- Automated Relevance Assessment:
- 1-5 scoring scale where:
- 1: Irrelevant/incorrect
- 5: Highly accurate and relevant
- Independent evaluation per clinical note
- Systematic scoring by LLM judge
- 1-5 scoring scale where:
- Model Confidence:
- Built-in confidence metrics
- Real-time validation
- Automated tracking
- Comprehensive evaluation pipeline
- Error handling and logging
- Category-based performance analysis
This system is designed for research and educational purposes only. It should not be used as a substitute for professional medical advice, diagnosis, or treatment. All clinical decision-making should be performed by qualified healthcare providers.
This project utilizes the MIMIC-IV-Ext Direct dataset and follows all applicable licensing and usage restrictions. For detailed licensing information, please refer to the LICENSE file.
- MIMIC-IV Dataset: https://physionet.org/content/mimiciv/2.2/
- LangChain Documentation: https://python.langchain.com/docs/get_started/introduction.html
- NVIDIA AI Endpoints: https://www.nvidia.com/en-us/gpu-cloud/ai-endpoints/