Open-source RAG projects split into three layers: indexing, retrieval evaluation, and answer generation.
Table of Contents
This organization hosts three repositories that cover different parts of a Retrieval-Augmented Generation pipeline. Each repo has a narrow scope, its own FastAPI service, Docker setup, and notebooks.
The projects share the BEIR SciFact dataset and Qdrant as the vector store. They are built as separate services so each layer can be developed, tested, and replaced on its own.
- RAG quality depends heavily on indexing and retrieval — not just the language model.
- Each layer lives in its own repo with a clear boundary and its own tests.
- SciFact gives a fixed public dataset so results are comparable across repos.
- The same patterns show up in all three projects: FastAPI, Docker, config files, and step-by-step notebooks.
Raw documents
│
▼
┌──────────────────────────────┐
│ rag-data-indexing-service │ load → clean → chunk → embed → Qdrant
└──────────────────────────────┘
│ output: searchable vector index
▼
┌──────────────────────────────┐
│ rag-retrieval-benchmark │ dense / BM25 / hybrid / RRF → metrics
└──────────────────────────────┘
│ output: retrieval scores and benchmark reports
▼
┌──────────────────────────────┐
│ production-rag-answering-api│ retrieve → generate → cite → validate
└──────────────────────────────┘
│ output: grounded answers with citations
| Repository | Layer | Description | Docs |
|---|---|---|---|
| rag-data-indexing-service | Indexing | Ingests raw documents, cleans and chunks text, generates embeddings, and stores indexed chunks in Qdrant. | README |
| rag-retrieval-benchmark | Retrieval | Benchmarks retrieval strategies on SciFact using Recall@k, MRR, and related metrics. No answer generation. | README |
| production-rag-answering-api | Answering | Full RAG answering API: retrieval, context building, grounded generation, citations, validation, caching, and tracing. | README |
If you are new to the stack, work through the repos in this order:
- Indexing — rag-data-indexing-service: build a searchable vector index from documents.
- Retrieval — rag-retrieval-benchmark: compare retrieval methods before touching generation.
- Answering — production-rag-answering-api: run the end-to-end answering pipeline through the API.
| Goal | Repository |
|---|---|
| Document ingestion and Qdrant indexing only | rag-data-indexing-service |
| Retrieval metrics and strategy comparison only | rag-retrieval-benchmark |
| End-to-end RAG answering API | production-rag-answering-api |
Each repository README has a Quick Start section with make commands and notebook order.
These are not toy scripts. Each repo includes:
- HTTP API — FastAPI endpoints for the main workflow.
- CLI — command-line entry points where a service layer is not enough.
- Configuration — YAML files and environment variables instead of hard-coded values.
- Docker Compose — local stack with Qdrant and related services.
- Tests — pytest suites; the retrieval benchmark also ships evaluation metrics.
- Notebooks — phased walkthroughs that call the same code as the API.
- Observability — Phoenix tracing in the answering API.
- Developers who want to see RAG split into clear, testable layers.
- Engineers who need retrieval metrics before adding an LLM.
- Anyone working through SciFact as a small, reproducible RAG dataset.
| Component | Used in |
|---|---|
| Python, FastAPI | All three repos |
| Qdrant | All three repos |
| Docker / Docker Compose | All three repos |
| sentence-transformers | Indexing, retrieval benchmark, answering API |
| SciFact (BEIR) | All three repos |
| Phoenix / OpenTelemetry | Answering API |
| SQLite cache | Answering API |
Clone the repo you need, copy the env file, and start the stack. Details differ per project — follow the linked README for notebook order and data download steps.
# Indexing
git clone https://github.com/RAG-Implementation/rag-data-indexing-service.git
cd rag-data-indexing-service
cp .env.example .env
make up
# Retrieval benchmark
git clone https://github.com/RAG-Implementation/rag-retrieval-benchmark.git
cd rag-retrieval-benchmark
cp .env.example .env
make up
# Answering API
git clone https://github.com/RAG-Implementation/production-rag-answering-api.git
cd production-rag-answering-api
cp .env.example .env
make build && make upQuestions or collaboration: Max Ghadri on LinkedIn.
All repositories in this organization are released under the MIT License.
