This repository contains a production-ready, modular AI system that verifies insurance-style damage claims using submitted images, claim conversations, risk history profiles, and evidence rules.
The core design prioritizes visual evidence as the primary source of truth. User history adds contextual metadata but never overrides visual facts.
The goal of this system is to automatically evaluate customer claims across three main object types:
- Cars (e.g., front_bumper, side_mirror, door)
- Laptops (e.g., screen, keyboard, hinge)
- Packages (e.g., box, seal, contents)
For each claim, the system evaluates:
- Claim Understanding: What damage is reported on which part?
- Evidence Standards: Do the submitted images meet the minimum evidence required by the claims policy?
- Multi-Image Quality and Damage Detection: Are the images blurry? Is the correct object and part visible? Is there actual damage?
- Contextual Risk: Does the customer have a history of frequent rejections or manual reviews?
- Deterministic Decision: Is the claim supported, contradicted, or does it contain not enough information to evaluate?
The pipeline processes each claim sequentially through the following modules:
graph TD
A[Claims Input / claims.csv] --> B[Claim Parser]
B --> C[Evidence Checker]
C --> D[Image Analyzer]
D --> E[Risk Assessor]
E --> F[Decision Engine]
F --> G[Output Generator / output.csv]
- Claim Parser (
src/claim_parser.py): Normalizes conversations into semantic labels (issue_type,object_part, andissue_family). - Evidence Checker (
src/evidence_checker.py): Matches claims against policy requirements insideevidence_requirements.csv. - Image Analyzer (
src/image_analyzer.py): Executes OpenCV visual quality checks (variance of Laplacian for blur, pixel brightness statistics for glare/low-light) and handles multi-modal model analysis (via Gemini, OpenAI, or Mock). - Risk Assessor (
src/risk_assessor.py): Loadsuser_history.csvto flag anomalies. - Decision Engine (
src/decision_engine.py): Determines final status (supported,contradicted,not_enough_information) using rule-based decision trees. - Output Generator (
src/output_generator.py): Compiles fields and saves results intooutput.csv.
- Python 3.11+
From the root directory containing the code/ folder:
pip install -r code/requirements.txtThe main script provides arguments to run the verification engine, execute evaluation tasks, and configure providers.
Run the claims processor on the test set:
python code/main.py --claims-file dataset/claims.csv --output-file output.csv --vision-provider mockTo calculate metrics on the sample claims dataset:
python code/evaluation/main.py --run-evaluation --sample-claims-file dataset/sample_claims.csv --vision-provider mockThis writes evaluation metrics and reports under the code/evaluation/ directory:
code/evaluation/metrics.jsoncode/evaluation/confusion_matrix.csvcode/evaluation/sample_predictions.csvcode/evaluation/evaluation_report.md
The system defines a pluggable adapter client (BaseVisionModel in src/models.py). You can swap adapters using CLI arguments:
Runs fully offline using local image heuristics and sample mappings. Useful for test suites and development.
python code/main.py --vision-provider mockUses Gemini 2.5 Flash for multimodal parsing and verification.
- Export your API key:
# Unix export GEMINI_API_KEY="your-key" # Windows PowerShell $env:GEMINI_API_KEY="your-key"
- Run the main file:
python code/main.py --vision-provider gemini
Uses GPT-4o for visual checks.
- Export your API key:
# Unix export OPENAI_API_KEY="your-key" # Windows PowerShell $env:OPENAI_API_KEY="your-key"
- Run the main file:
python code/main.py --vision-provider openai
user_id: Unique identifier for the claimantimage_paths: Semicolon-separated path stringsuser_claim: Conversation dialogue textclaim_object:car|laptop|package
user_id: Customer identifierpast_claim_count: Total claims filedaccept_claim: Supported claims countmanual_review_claim: Manual review flags countrejected_claim: Contradicted/rejected claims countlast_90_days_claim_count: Claims in recent 90 dayshistory_flags: Semicolon-separated tags (e.g.user_history_risk;manual_review_required)history_summary: Text summary of risk profile
- API Key Dependency: Live validation requires model keys. When unset, the system falls back gracefully to Mock Mode.
- Local Heuristics: Blurriness detection assumes a default Laplacian threshold which may flag images with shallow depth-of-field as blurry.
- Rate Limits (RPM/TPM): Paid API tiers should be used for concurrency. For free tiers, sequential processing with a 4.0-second delay is recommended.
- Batching: When processing thousands of claims, run the process in batches using a message queue (e.g., Celery + RabbitMQ) to allow asynchronous, retry-safe executions.
- Caching: Cache prompt templates and image inputs to reduce duplicate requests and model token usage.