|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +Redis Retrieval Optimizer is a scientific framework for benchmarking and optimizing information retrieval systems using Redis. It supports systematic evaluation of vector search, hybrid retrieval, BM25, and reranking strategies. |
| 8 | + |
| 9 | +## Development Commands |
| 10 | + |
| 11 | +### Setup |
| 12 | +```bash |
| 13 | +make install # Install dependencies with Poetry |
| 14 | +make redis-start # Start Redis Stack container (required for tests) |
| 15 | +make redis-stop # Stop Redis container |
| 16 | +``` |
| 17 | + |
| 18 | +### Testing & Quality |
| 19 | +```bash |
| 20 | +make test # Run full pytest suite |
| 21 | +make check # Run linting + tests |
| 22 | +make format # Black + isort formatting |
| 23 | +make check-types # MyPy type checking |
| 24 | +make lint # Full linting (format + mypy) |
| 25 | +``` |
| 26 | + |
| 27 | +### Single Test Execution |
| 28 | +```bash |
| 29 | +poetry run pytest tests/unit/test_cost_fn.py::test_specific_function |
| 30 | +poetry run pytest tests/integration/test_grid.py -v |
| 31 | +``` |
| 32 | + |
| 33 | +## Architecture Overview |
| 34 | + |
| 35 | +### Core Study Types |
| 36 | +- **Grid Study** (`redis_retrieval_optimizer/grid_study.py`) - Systematic parameter exploration |
| 37 | +- **Bayesian Study** (`redis_retrieval_optimizer/bayes_study.py`) - Optuna-based optimization |
| 38 | +- **Search Study** (`redis_retrieval_optimizer/search_study.py`) - Test methods on existing indices |
| 39 | + |
| 40 | +### Search Methods (`redis_retrieval_optimizer/search_methods/`) |
| 41 | +All methods follow `SearchMethodInput` → `SearchMethodOutput` interface: |
| 42 | +- **BM25** - Lexical search |
| 43 | +- **Vector** - Semantic search |
| 44 | +- **Hybrid** - Combined lexical + semantic |
| 45 | +- **Rerank** - Two-stage retrieval with cross-encoder |
| 46 | +- **Weighted RRF** - Reciprocal Rank Fusion |
| 47 | + |
| 48 | +### Configuration System |
| 49 | +- YAML-based study configurations |
| 50 | +- Pydantic schema validation (`redis_retrieval_optimizer/schema.py`) |
| 51 | +- Configuration examples in `tests/integration/*_data/` |
| 52 | + |
| 53 | +## Key Dependencies |
| 54 | + |
| 55 | +- **RedisVL** (>=0.8.1) - Primary Redis vector library |
| 56 | +- **Optuna** (>=4.3.0) - Bayesian optimization |
| 57 | +- **BEIR** (>=2.1.0) - IR benchmarking datasets |
| 58 | +- **RANX** (>=0.3.20) - Evaluation metrics (NDCG, precision, recall) |
| 59 | +- **Redis** (>=5.0) - Direct Redis client |
| 60 | +- **Poetry** - Dependency management (not uv) |
| 61 | + |
| 62 | +## Redis Requirements |
| 63 | + |
| 64 | +- **Redis Stack** container with vector search capabilities |
| 65 | +- Tests require Redis 7.0+ for full functionality |
| 66 | +- Use `make redis-start` to ensure proper Redis version |
| 67 | + |
| 68 | +## Testing Architecture |
| 69 | + |
| 70 | +- **Integration tests** require running Redis instance |
| 71 | +- **Unit tests** for isolated functionality |
| 72 | +- **Configuration-driven** tests with YAML fixtures |
| 73 | +- **pytest-asyncio** for async test support |
| 74 | + |
| 75 | +## Common Development Patterns |
| 76 | + |
| 77 | +### Adding New Search Methods |
| 78 | +1. Implement in `redis_retrieval_optimizer/search_methods/` |
| 79 | +2. Follow `SearchMethodInput` → `SearchMethodOutput` interface |
| 80 | +3. Add to method registry in appropriate study type |
| 81 | +4. Create unit tests and integration tests |
| 82 | + |
| 83 | +### Study Configuration |
| 84 | +```yaml |
| 85 | +embedding_models: |
| 86 | + - type: "hf" |
| 87 | + model: "sentence-transformers/all-MiniLM-L6-v2" |
| 88 | + dim: 384 |
| 89 | + embedding_cache_name: "vec-cache" |
| 90 | + |
| 91 | +search_methods: ["bm25", "vector", "hybrid"] |
| 92 | +vector_data_types: ["float16", "float32"] |
| 93 | +``` |
| 94 | +
|
| 95 | +### Data Requirements |
| 96 | +- **Corpus** - Documents to index (JSON format) |
| 97 | +- **Queries** - Search queries (JSON format) |
| 98 | +- **Qrels** - Relevance judgments (JSON format) |
| 99 | +
|
| 100 | +## Performance Considerations |
| 101 | +
|
| 102 | +- Use **RedisVL** for high-level operations |
| 103 | +- Use **redis-py** directly for custom low-level operations |
| 104 | +- Vector data types: float16 vs float32 trade-offs |
| 105 | +- Batch operations for large datasets |
| 106 | +
|
| 107 | +## Troubleshooting |
| 108 | +
|
| 109 | +### Redis Connection Issues |
| 110 | +- Ensure Redis Stack is running: `make redis-start` |
| 111 | +- Check Redis version compatibility (7.0+ recommended) |
| 112 | +- Verify RedisVL compatibility with Redis version |
| 113 | + |
| 114 | +### Test Failures |
| 115 | +- Run `make redis-start` before testing |
| 116 | +- Check for stale Redis indices from previous tests |
| 117 | +- Use `make clean` to clear build artifacts |
0 commit comments