
🎯 The Objective
AEGIS-RAG is a production-grade Retrieval-Augmented Generation system designed for reliable question answering over policy documents.
It focuses on retrieval quality, ranking precision, and safe abstention to minimize hallucinations.
🏗️ The Architecture & Methodology
The system follows a three-stage pipeline:
- Hybrid Retrieval: Dense (semantic) + BM25 (lexical) combined with RRF fusion
- Reranking: Cross-encoder to optimize top-K relevance
- Generation: LLM with grounded context and no-answer fallback
This separation enables independent optimization of retrieval, ranking, and generation.
📊 Key Performance Metrics
- Recall@5: 0.85
- nDCG@5: 0.80
- Semantic Match: 0.75
- Latency: ~0.73s (with reranking)
Reranking improves accuracy but introduces a ~7× latency trade-off.
💡 Core Insights & Business Impact
- Retrieval quality is the main driver of performance
- Reranking improves precision but reduces throughput
- Safe abstention effectively prevents hallucinations
- Fine-tuning is not necessary for small, structured datasets
⚙️ Technical Stack
Python, LangChain, Chroma, SentenceTransformers, BM25, Cross-Encoders, Ollama, RAGAS