Skip to content

FOR-sight-ai/FORetrieval

 
 

Repository files navigation

FORetrieval

FORetrieval is a multimodal document retrieval library built on top of colpali-engine. It indexes document pages as images using late-interaction models (ColPali, ColQwen2, ColQwen2.5) and retrieves the most relevant pages for a given query. It is used by FORag as its retrieval backend.

Key features:

  • Four storage backendslocal (Colpali legacy .pt files), qdrant (default, embedded), milvus (Milvus Lite), and remote (HTTP-delegated vector-DB server)
  • Remote embedding server — offload all embedding computation to a remote vLLM GPU server; the local machine needs no GPU
  • Metadata generation — filesystem metadata always; AI-generated tags, language detection, and short descriptions optionally
  • Metadata filtering — filter the retrieval pool by ext, mtime, language, tags, document_type, or arbitrary regex patterns before scoring
  • Docling ingestion — optional semantic PDF chunking using Docling, producing image chunks aligned with document structure
  • Heatmap and circle visualisation — relevance overlays for retrieved pages

Installation

uv sync

# Optional extras:
uv sync --extra qdrant          # Qdrant storage backend (recommended for large indexes)
uv sync --extra docling         # Docling-based PDF chunking
uv sync --extra embedding_server  # Remote vLLM embedding server (adds paramiko for auto-deploy)
uv sync --extra quantization    # 4-bit / 8-bit local model quantization (adds bitsandbytes)

Releases

FORetrieval uses CalVer (YYYY.MM.MICRO). Releases are published on GitHub only (no PyPI) and inherit the visibility of this private repository.

Install a specific release directly from a git tag (requires SSH access to the repo):

uv pip install "foretrieval @ git+ssh://git@github.com/FOR-sight-ai/FORetrieval.git@v2026.5.0"

# With extras:
uv pip install "foretrieval[qdrant,vector_db_server] @ git+ssh://git@github.com/FOR-sight-ai/FORetrieval.git@v2026.5.0"

Or download the .whl / .tar.gz attached to a release on the Releases page and install it with uv pip install <file>.

Pre-requisites

Poppler

Required by pdf2image for PDF-to-image conversion:

Debian / Ubuntu

sudo apt-get install -y poppler-utils

Flash-Attention (optional)

Speeds up ColQwen2 / Gemma-based models significantly:

uv pip install flash-attn

Hardware

ColPali uses multi-billion parameter models. A GPU is strongly recommended for indexing and search. Weak or older GPUs (sm_70+) work fine; CPU is supported but slow.

Quick usage

from foretrieval import MultiModalRetrieverModel

# Index a folder of PDFs
model = MultiModalRetrieverModel.from_pretrained(
    "vidore/colqwen2.5-v0.2",
    index_root="my_indexes",
    storage_qdrant=True,   # use Qdrant backend (default)
)
model.index(
    input_path="path/to/docs/",
    index_name="my_index",
    store_collection_with_index=True,
)
# Indexing is recursive: all files in subdirectories are also indexed.
# Use update_index_from_folder() to add only new files to an existing index,
# also recursing into subdirectories.

# Load an existing index and search
model = MultiModalRetrieverModel.from_index(
    index_path="my_index",
    index_root="my_indexes",
)
results = model.search("maximum output current", k=3)
for r in results:
    print(r.doc_id, r.page_num, r.score)

Storage backends

FORetrieval supports four backends for storing and searching embeddings:

Backend storage_backend value Dep Scoring Typical use
Local "local" Exact MAX_SIM (in-RAM) Development, small corpora
Qdrant (default) "qdrant" foretrieval[qdrant] Exact MAX_SIM (native) Large local indexes, best accuracy
Milvus "milvus" foretrieval[milvus] Approximate (mean-pool ANN + late-interaction rerank) Milvus ecosystem
Remote "remote" httpx (core dep) Delegated to server GPU/network-separated deployments

The backend is fixed when an index is first created. It cannot be changed without recreating the index.

# Local (on-disk .pt files)
model = MultiModalRetrieverModel.from_pretrained(..., storage_backend="local")

# Qdrant (embedded, on-disk)
model = MultiModalRetrieverModel.from_pretrained(..., storage_backend="qdrant")

# Milvus Lite (file-based)
model = MultiModalRetrieverModel.from_pretrained(..., storage_backend="milvus")

# Remote server (server holds collections; local machine stays stateless)
from foretrieval.vector_db_server import VectorDBServerConfig

model = MultiModalRetrieverModel.from_pretrained(
    "athrael-soju/colqwen3.5-4.5B-v3",
    storage_backend="remote",
    storage_config={
        "url": "http://gpu-server:18000",
        "backend": "qdrant",   # server-side backend
    },
)

# Load existing index — backend and server URL auto-read from index_config.json.gz
# api_key must be re-supplied at load time (it is never persisted to disk)
model = MultiModalRetrieverModel.from_index(
    "my_index",
    index_root=".",
    storage_config={"api_key": "my-secret"},
)

Note: The deprecated storage_qdrant=True/False flag still works (maps to storage_backend="qdrant"/"local") but will be removed in a future release.

Metadata generation

Metadata can be attached to each document at indexing time. Two levels are available:

Filesystem metadata (no AI required): always populated from the file itself.

Field Source
stem, ext, mime filename and MIME type
mtime file modification time (ISO-8601 UTC)
page_count number of pages (PDFs only)
author, title embedded PDF metadata (may be absent)
image_width, image_height dimensions (images only)

AI-generated metadata (requires an LLM provider): language, tags, document_type, short_description.

from foretrieval.metadata import ai_metadata_provider_factory
from foretrieval.models_metadata import build_metadata_list_for_dir

# No-AI provider: filesystem fields only
provider = ai_metadata_provider_factory(None)

# AI provider: enriches with language, tags, document_type, short_description
provider = ai_metadata_provider_factory({
    "provider": "openrouter",
    "name": "mistralai/mistral-small-3.2-24b-instruct",
    "api_key": "...",
})

metadata_list = build_metadata_list_for_dir(Path("docs/"), provider)

model.index(
    input_path="docs/",
    index_name="my_index",
    metadata=metadata_list,
)

Metadata filtering

When an index was built with metadata, search() accepts a filter_metadata dict that restricts the scoring pool to matching documents only.

Declared filter fields

from foretrieval.models_metadata import MetadataFilter

# Only PDF files
results = model.search("max current", k=3, filter_metadata={"ext": ".pdf"})

# Files modified after a date
results = model.search("max current", k=3, filter_metadata={
    "mtime": {">=": "2025-01-01T00:00:00Z"}
})

# Multiple criteria (AND by default)
results = model.search("max current", k=3, filter_metadata={
    "ext": ".pdf",
    "language": "en",
})

# OR logic
results = model.search("max current", k=3, filter_metadata={
    "ext": [".pdf", ".docx"],
    "logic": "OR",
})
Filter field Type Description
ext str or list[str] File extension(s)
mtime dict Operators: >=, <=, >, <, == against ISO-8601 string
language str or list[str] Language code(s), e.g. "en"
tags str or list[str] Any tag in common (requires AI metadata)
document_type str or list[str] Document type (requires AI metadata)
logic "AND" or "OR" How to combine criteria (default: "AND")

Any other key is matched by exact string equality against the stored metadata dict.

Regex pattern matching

Use the regex field for substring or pattern matching on any text field. Patterns use Python re.search and are always case-insensitive:

# Files whose name contains "general"
results = model.search("max current", k=3, filter_metadata={
    "regex": {"stem": "general"}
})

# Title contains "motor" or "pump"
results = model.search("specs", k=3, filter_metadata={
    "regex": {"title": "motor|pump"}
})

# Combine with ext filter
results = model.search("specs", k=3, filter_metadata={
    "ext": ".pdf",
    "regex": {"stem": "^report_2025"},
})

When the filter matches no documents, search() returns an empty list [] without raising.

Docling ingestion

FORetrieval optionally uses Docling to convert PDFs into semantically meaningful image chunks rather than whole pages. Each chunk corresponds to a coherent region of text and associated figures.

model = MultiModalRetrieverModel.from_pretrained(
    "vidore/colqwen2.5-v0.2",
    ingestion={"backend": "docling"},
    index_root="my_indexes",
)
model.index(input_path="docs/", index_name="chunked_index")

Results include a chunk_num field identifying the exact Docling chunk within the page.

Running the test suite

Install the dev dependencies first:

uv sync --extra dev

Unit tests

No API keys, no GPU required — runs in seconds:

pytest -m "not slow and not integration"

Metadata tests (no AI)

pytest tests/test_metadata_no_ai.py

Metadata tests (with AI)

Set at least one API key:

export OPENROUTER_API_KEY=...
export OPENAI_API_KEY=...
export MISTRAL_API_KEY=...
export OLLAMA_HOST=http://localhost:11434   # + optionally OLLAMA_MODEL (default: mistral-small-latest)
pytest tests/test_metadata_ai.py -v

All available backends are detected automatically and the suite runs once per backend.

Vector-store backend tests

Unit tests for all backends (no GPU, backends mocked or run in-process):

# Local backend
pytest tests/test_vector_store_local.py

# Qdrant backend (unit: mock client; slow: embedded Qdrant round-trip)
pytest tests/test_vector_store_qdrant.py -m "not slow"
pytest tests/test_qdrant.py -m "not slow and not integration"

# Milvus backend (unit: mock client; slow: Milvus Lite round-trip)
pytest tests/test_vector_store_milvus.py -m "not slow"

# Remote backend (unit: HTTP calls mocked; server app: FastAPI TestClient)
pytest tests/test_vector_store_remote.py
pytest tests/test_vector_db_server_app.py
pytest tests/test_vector_db_server_config.py
pytest tests/test_vector_db_server_client.py
pytest tests/test_vector_db_server_manager.py

# Factory and backend dispatch
pytest tests/test_vector_store_factory.py
pytest tests/test_colpali_backend_dispatch.py

Integration tests (require a live vector-DB server — see Remote vector-DB server section):

# Set the server URL to skip the skipif guard
export FORETRIEVAL_TEST_DB_SERVER_URL=http://localhost:18000
pytest tests/ -m "slow and integration" -v

Metadata filter tests

pytest tests/test_metadata_filter.py

Slow tests (GPU-dependent)

Full ColPali indexing and search:

pytest -m slow

Markers reference

Marker Meaning
slow GPU-dependent or computationally expensive
integration Requires a live API key or Ollama daemon

Remote embedding server

FORetrieval can offload all embedding computation to a remote GPU server running vLLM. The local machine only loads the processor (tokenizer + image preprocessor) — no model weights, no GPU required locally.

Requirements:

  • vLLM ≥ 0.19.0 on the remote server
  • Only ColQwen3 / ColQwen3.5 models are supported by the vLLM /pooling endpoint. ColPali, ColQwen2, and ColQwen2.5 are not supported.
  • Recommended model: athrael-soju/colqwen3.5-4.5B-v3 (rank 3 on ViDoRe V3, 320-dim, Apache 2.0)

Quick start

from foretrieval import MultiModalRetrieverModel
from foretrieval.embedding_server import EmbeddingServerConfig

cfg = EmbeddingServerConfig(
    url="http://gpu-server:8000",
    model_name="athrael-soju/colqwen3.5-4.5B-v3",
)

model = MultiModalRetrieverModel.from_pretrained(
    "athrael-soju/colqwen3.5-4.5B-v3",
    index_root="my_indexes",
    embedding_server=cfg,
)
model.index("path/to/docs/", index_name="my_index")
results = model.search("maximum altitude", k=3)

Auto-deploy

Set auto_deploy=True to have FORetrieval SSH to the GPU server and start the vLLM Docker container automatically if it is not already running. Requires foretrieval[embedding_server] (adds paramiko).

cfg = EmbeddingServerConfig(
    url="http://gpu-server:8000",
    model_name="athrael-soju/colqwen3.5-4.5B-v3",
    auto_deploy=True,
    ssh_host="gpu-server",       # SSH target
    ssh_user="myuser",           # optional, defaults to $USER
    n_gpus=-1,                   # -1 = all available GPUs (auto-detected via nvidia-smi)
)

The manager pulls vllm/vllm-openai:latest, starts the container with --tensor-parallel-size N, and writes a metadata file at ~/.foretrieval/deployment.json on the remote. Subsequent calls detect the running container and skip redeployment.

Authentication and SSL

cfg = EmbeddingServerConfig(
    url="https://gpu-server:8000",
    model_name="athrael-soju/colqwen3.5-4.5B-v3",
    api_key="my-secret-token",   # Authorization: Bearer header
    verify_ssl=False,            # for self-signed certificates
)

Deploy vLLM with --api-key my-secret-token to require authentication.

SSH tunnel (firewalled servers)

If port 8000 is not directly reachable, open an SSH tunnel first:

ssh -fNL 8000:localhost:8000 gpu-server

Then use http://localhost:8000 as the URL.

EmbeddingServerConfig reference

Field Default Description
url required Base URL of the vLLM server
model_name required HuggingFace model ID (must contain colqwen3)
auto_deploy false SSH + Docker auto-deploy
ssh_host None SSH hostname (required when auto_deploy=True)
ssh_user None SSH username (defaults to $USER)
ssh_key_path None Path to SSH private key (defaults to SSH agent)
n_gpus -1 Number of GPUs (-1 = all available)
port 8000 Port exposed on the remote server
hf_token None HuggingFace token for gated models
api_key None Bearer token for server authentication
verify_ssl True Verify SSL certificates
batch_size 4 Images per request (auto-halved on OOM)
request_timeout 120 HTTP timeout in seconds

Remote vector-DB server

FORetrieval can offload all vector-store operations (indexing, search, fetch) to a remote HTTP server. The local machine only stores the processor and the shared sidecar files — collections live entirely on the server.

Requires: foretrieval[vector_db_server] (adds fastapi, uvicorn, paramiko).

Quick start

Start the server manually on a remote host:

pip install "foretrieval[qdrant,milvus,vector_db_server]"   # or: uv pip install "foretrieval[qdrant,milvus,vector_db_server]"
uvicorn foretrieval.vector_db_server.server:app --host 0.0.0.0 --port 18000
# or: foretrieval-db-server   (console script)

Then use it from the client:

from foretrieval import MultiModalRetrieverModel

model = MultiModalRetrieverModel.from_pretrained(
    "athrael-soju/colqwen3.5-4.5B-v3",
    storage_backend="remote",
    storage_config={
        "url": "http://gpu-server:18000",
        "backend": "qdrant",   # server-side storage backend: local | qdrant | milvus
    },
)
model.index("path/to/docs/", index_name="my_index")
results = model.search("maximum altitude", k=3)

Auto-deploy

Set auto_deploy=True to have FORetrieval SSH to the remote host, build the Docker image from the local foretrieval source, and start the container automatically. Requires foretrieval[vector_db_server] and Docker on the remote host.

model = MultiModalRetrieverModel.from_pretrained(
    "athrael-soju/colqwen3.5-4.5B-v3",
    storage_backend="remote",
    storage_config={
        "url": "http://gpu-server:18000",
        "backend": "qdrant",
        "auto_deploy": True,
        "ssh_host": "gpu-server",
        "data_dir": "/var/lib/foretrieval_db",  # bind-mounted into container
    },
)

The manager:

  1. Uploads the foretrieval/ package source to ~/foretrieval_db_build/ via SSH.
  2. Runs docker build -t foretrieval-vector-db:local on the remote.
  3. Starts the container: docker run -p 18000:18000 -v <data_dir>:/data ….
  4. Writes metadata to ~/.foretrieval/db_deployment.json on the remote. Subsequent calls detect the running container and skip re-deployment.

Authentication and SSL

storage_config={
    "url": "https://gpu-server:18000",
    "backend": "qdrant",
    "api_key": "my-secret-token",   # Authorization: Bearer header
    "verify_ssl": False,            # for self-signed certificates
}

Start the server with FOR_DB_API_KEY=my-secret-token to require authentication.

SSH tunnel (firewalled servers)

If port 18000 is not directly reachable, open an SSH tunnel first:

ssh -fNL 18000:localhost:18000 gpu-server

Then use http://localhost:18000 as the URL.

Server environment variables

Variable Default Description
FOR_DB_DATA_DIR /data Root directory where collections are persisted
FOR_DB_API_KEY "" Bearer token (auth disabled if empty)
FOR_DB_HOST 0.0.0.0 Bind address
FOR_DB_PORT 18000 Bind port

VectorDBServerConfig reference

Field Default Description
url required Base URL of the vector-DB server
backend "qdrant" Server-side storage backend (local, qdrant, or milvus)
storage_config None Extra backend-specific config forwarded to the server (e.g. {"candidate_limit": 128} for Milvus)
auto_deploy false SSH + Docker auto-deploy
ssh_host None SSH hostname (required when auto_deploy=True)
ssh_user None SSH username (defaults to $USER)
ssh_key_path None Path to SSH private key (defaults to SSH agent)
port 18000 Port exposed on the remote server
api_key None Bearer token for server authentication (never persisted to disk)
verify_ssl True Verify SSL certificates
request_timeout 120 HTTP timeout in seconds
data_dir /var/lib/foretrieval_db Data path on the remote host (bind-mounted into container)

Persistence and reload

When a remote index is exported (model._export_index()), the index_config.json.gz on the local filesystem stores storage_backend="remote" and the server URL. Sensitive fields (api_key) are never persisted to disk. To reload the index later:

model = MultiModalRetrieverModel.from_index(
    "my_index",
    index_root=".",
    storage_config={"api_key": "my-secret"},   # re-supply at load time
)

Local model quantization

For local (non-remote) inference, 4-bit and 8-bit quantization reduce VRAM usage via BitsAndBytes. Requires foretrieval[quantization] and a CUDA device.

model = MultiModalRetrieverModel.from_pretrained(
    "vidore/colqwen2.5-v0.2",
    load_in_4bit=True,                  # or load_in_8bit=True
    bnb_4bit_quant_type="nf4",          # "nf4" (default) or "fp4"
    bnb_4bit_compute_dtype="float16",   # compute dtype
)

Acknowledgements

FORetrieval was originally forked from Byaldi, a wrapper around the ColPali repository. It has since diverged significantly to add metadata generation and filtering, Qdrant storage, Docling ingestion, and heatmap visualisation.

About

Colpali-based retriever

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 99.5%
  • Other 0.5%