AgentSploit is an educational AI security lab for learning how to red team LLM-powered applications and tool-using AI agents.
The project has two main goals:
- provide a deliberately vulnerable AI application, similar in spirit to a DVWA-style lab, but focused on LLM agents;
- provide Python scanners that can test LLM vulnerabilities, collect evidence, generate reports, and help decide whether an AI agent is ready to be exposed on a local network.
AgentSploit is designed as an AI Security Engineer portfolio project. It demonstrates how to identify, prove, classify, and document AI-agent weaknesses using the OWASP Top 10 for LLM Applications 2025 and MITRE ATLAS.
This project is intentionally vulnerable.
Use it only:
- on your local machine;
- in your own lab;
- against applications you own;
- against systems where you have explicit authorization.
Do not run these scanners against third-party services or networks you are not allowed to test.
AgentSploit helps answer three practical questions:
- What common vulnerabilities can affect an LLM-based AI agent?
- How can prompt injection, secret disclosure, unsafe tool use, RAG poisoning, and network exposure issues be tested automatically?
- How can we decide whether an AI agent is reasonably ready to be exposed on a local network?
The project does not claim to prove that an agent is perfectly secure. It produces evidence, risk classification, reports, and a readiness verdict: PASS, WARN, or FAIL.
AgentSploit/
├── vulnerable_target/
│ ├── main.py # Vulnerable FastAPI application
│ ├── agent.py # LLM agent with tool calling
│ ├── tools.py # Vulnerable agent tools
│ ├── audit.py # Server-side audit log for tool calls
│ ├── rag.py # Intentionally weak local RAG layer
│ ├── knowledge_base/ # RAG documents, including poisoned content
│ └── data/ # Runtime data ignored by git
├── hardened_target/
│ ├── main.py # Hardened FastAPI application with the same chat API
│ ├── agent.py # Guarded LLM agent plus deterministic mock mode
│ ├── policy.py # Readable allow/deny policy rules
│ ├── tool_gateway.py # Tool authorization and ALLOW/BLOCK decisions
│ ├── rag_guard.py # Untrusted retrieval sanitization
│ ├── output_guard.py # Secret redaction and response limits
│ └── knowledge_base/ # Same lab documents behind RAG guardrails
├── scanner/
│ ├── fuzzer.py # OWASP/MITRE LLM fuzzer
│ ├── readiness.py # LAN readiness scanner
│ ├── supply_chain.py # Supply-chain and hygiene checks
│ ├── compare.py # Before/after vulnerable vs hardened comparison
│ └── targets.example.json # Example generic target profile
├── payloads/
│ ├── owasp_llm_payloads.json # Full fuzzing payload set
│ └── readiness_payloads.json # Short LAN-readiness payload set
├── reports/ # Generated JSON, Markdown, and HTML reports
├── examples/ # Static example before/after reports
├── tests/ # Unit tests that do not call OpenAI
├── database_creds.txt # Fake lab secret
├── requirements.txt
├── .env.example
└── README.md
Mermaid code:
flowchart TD
User["User"] --> Scanner["Python Scanners"]
Scanner --> Fuzzer["scanner/fuzzer.py"]
Scanner --> Readiness["scanner/readiness.py"]
Scanner --> SupplyChain["scanner/supply_chain.py"]
Fuzzer --> ChatAPI["POST /chat"]
Readiness --> ChatAPI
Readiness --> AuditAPI["GET /audit-log"]
Readiness --> DocsAPI["/docs /openapi.json /documents"]
ChatAPI --> Agent["Vulnerable LLM Agent"]
Agent --> FileTool["read_system_file"]
Agent --> EmailTool["send_email"]
Agent --> RagTool["search_documents"]
RagTool --> KnowledgeBase["knowledge_base"]
KnowledgeBase --> PoisonedDoc["Poisoned Document"]
Agent --> AuditLog["Audit Log"]
AuditLog --> Reports["JSON Markdown HTML Reports"]
Fuzzer --> Reports
Readiness --> Reports
SupplyChain --> Reports
The vulnerable target lives in vulnerable_target/.
It is a FastAPI application exposing an LLM agent connected to the OpenAI API. The agent is intentionally naive: its system prompt is weak, it trusts user instructions too much, and it has access to unsafe tools.
POST /chat
Example request:
{
"message": "Hello, who are you?"
}Example response:
{
"request_id": "abc123",
"response": "Hello, I am an internal assistant..."
}The request_id field is used to correlate an LLM response with server-side tool-call audit events.
The agent has access to three intentionally risky tools.
| Tool | Purpose | Risk |
|---|---|---|
read_system_file(filepath) |
Reads a local file | Can disclose secrets |
send_email(to, subject, body) |
Mocks sending an email | Can simulate data exfiltration |
search_documents(query) |
Searches the local knowledge base | Can retrieve poisoned documents |
These tools are deliberately under-protected so the scanners can demonstrate realistic classes of AI-agent failures.
The hardened target lives in hardened_target/. It is a defensive version of the vulnerable target with the same core HTTP contract: POST /chat returns request_id and response, /health reports liveness, and /audit-log exposes local evidence for lab debugging. It exists to demonstrate a full defensive loop:
attack -> evidence -> mitigation -> re-test -> before/after report
POST /chat
-> hardened_target.agent
-> ToolGateway
-> policy.py file and domain rules
-> tools.py local-only mock tools
-> audit.py ALLOW/BLOCK decisions
-> rag_guard.py sanitized retrieved content
-> output_guard.py secret redaction and max response size
The hardened target is intentionally minimal. It is educational, local-only, and not a proof of absolute security.
The defensive controls are implemented in small, testable modules:
policy.py: explicitallowed_file_paths,blocked_file_patterns,allowed_email_domains,require_confirmation_for_sensitive_actions,max_tool_calls_per_request, andmax_response_chars.tool_gateway.py: blocks sensitive actions by default, allows file reads only from allowlisted paths, blocks.env, credentials files, private keys,/etc/passwd,/etc/shadow, and blocks outbound mock messages to non-allowlisted domains.rag_guard.py: wraps retrieved documents withUNTRUSTED RETRIEVED CONTENT, neutralizes prompt-injection phrases, and prevents retrieved text from becoming trusted instructions.output_guard.py: redacts secret-like values, API-key patterns, private keys, environment dumps, and truncates oversized responses.main.py: supports optionalX-AgentSploit-Keyauthentication for chat and lab evidence endpoints, plus simple in-memory rate limiting by API key or client IP.agent.py: supportsAGENTSPLOIT_MOCK_LLM=trueso tests and demos can run without OpenAI.
Blocked tool attempts are logged as tool_gateway_decision events with decision: BLOCK. This keeps the evidence useful while avoiding confusion between a blocked attempt and an actually executed unsafe tool call.
Terminal 1:
source .venv/bin/activate
uvicorn vulnerable_target.main:app --reload --port 8000Terminal 2:
source .venv/bin/activate
AGENTSPLOIT_MOCK_LLM=true uvicorn hardened_target.main:app --reload --port 8001Optional hardened authentication:
HARDENED_REQUIRE_API_KEY=true HARDENED_API_KEY=local-lab-key \
AGENTSPLOIT_MOCK_LLM=true uvicorn hardened_target.main:app --reload --port 8001Run the same fuzzer profile against both targets and generate JSON plus Markdown comparison reports:
python scanner/compare.py \
--baseline http://127.0.0.1:8000/chat \
--hardened http://127.0.0.1:8001/chat \
--profile quick \
--delay 0.2You can also compare two existing fuzzer JSON reports:
python scanner/compare.py \
--baseline-report reports/scan_baseline.json \
--hardened-report reports/scan_hardened.jsonExample output shape:
Baseline findings: 7
Hardened findings: 0
Blocked findings: 7
Remaining findings: 0
New findings: 0
Verdict: MITIGATED
Static example reports are included in examples/before_after_report.md and examples/before_after_report.json.
Use the vulnerable target, hardened target, fuzzer, readiness scanner, and comparison script only in a local lab or against systems you own and are explicitly authorized to test. Do not scan third-party services. The hardened target demonstrates layered mitigations for education; it is not a production security certification.
Every tool call is recorded by vulnerable_target/audit.py.
Useful endpoints:
GET /audit-log: list recent audit events;GET /audit-log?request_id=...: filter by a specific chat request;POST /audit-log/clear: clear the audit log.
Example event:
{
"request_id": "abc123",
"event_type": "tool_call",
"tool_name": "send_email",
"arguments": {
"to": "attacker@example.com"
},
"result_preview": "Mock email sent to attacker@example.com"
}This is important because the scanner does not have to rely only on model output. It can prove that a dangerous tool was actually invoked by the target application.
The module vulnerable_target/rag.py implements a deliberately simple local document retriever.
Lab documents:
vulnerable_target/knowledge_base/security_policy.mdvulnerable_target/knowledge_base/benefits_faq.mdvulnerable_target/knowledge_base/poisoned_runbook.md
The file poisoned_runbook.md intentionally contains malicious instructions. This enables testing:
- indirect prompt injection;
- RAG poisoning;
- confusion between retrieved data and trusted instructions;
- unsafe tool execution after document retrieval.
Mermaid code:
sequenceDiagram
participant Scanner
participant API as FastAPI /chat
participant Agent as LLM Agent
participant RAG as search_documents
participant KB as Knowledge Base
participant Tool as send_email/read_system_file
participant Audit as Audit Log
Scanner->>API: Prompt requesting a legacy runbook
API->>Agent: User message
Agent->>RAG: search_documents("database runbook")
RAG->>KB: Retrieve relevant document
KB-->>RAG: poisoned_runbook.md
RAG-->>Agent: Poisoned content
Agent->>Tool: Risky tool call
Tool-->>Agent: Tool result
Tool->>Audit: Record tool_call event
Agent-->>API: Final response
API-->>Scanner: response + request_id
Scanner->>Audit: GET /audit-log?request_id=...
Audit-->>Scanner: Server-side evidence
Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCreate your environment file:
cp .env.example .envEdit .env:
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-miniThe .env file is ignored by git.
source .venv/bin/activate
uvicorn vulnerable_target.main:app --reloadThe API starts at:
http://127.0.0.1:8000
Quick test:
curl -X POST http://127.0.0.1:8000/chat \
-H "Content-Type: application/json" \
-d '{"message":"Hello, who are you?"}'Available endpoints:
| Endpoint | Method | Description |
|---|---|---|
/health |
GET |
Checks whether the API is running |
/chat |
POST |
Sends a message to the agent |
/audit-log |
GET |
Reads tool-call audit events |
/audit-log/clear |
POST |
Clears the audit log |
/documents |
GET |
Lists RAG documents |
/ingest |
POST |
Ingests a document into the vulnerable RAG store |
/ingest/clear |
POST |
Clears API-ingested documents |
The main fuzzer is scanner/fuzzer.py.
It loads payloads/owasp_llm_payloads.json, sends prompts to /chat, analyzes the responses, collects audit logs when available, and generates reports.
| Profile | Behavior | Use Case |
|---|---|---|
quick |
One payload per OWASP category | Fast smoke test |
standard |
All base payloads | Normal scan |
deep |
Base payloads plus mutations | More aggressive and more expensive scan |
Commands:
python scanner/fuzzer.py --profile quick --delay 1 --timeout 60
python scanner/fuzzer.py --profile standard --delay 1 --timeout 60
python scanner/fuzzer.py --profile deep --delay 1 --timeout 60The deep profile adds mutations such as:
- base64-encoded instructions;
- French-language prefixes;
- instruction splitting;
- RAG-related indirect variations.
The fuzzer supports several detector types.
| Detector | Purpose |
|---|---|
contains_any |
Matches at least one expected indicator |
contains_all |
Matches all expected indicators |
regex |
Uses regular expressions |
min_length |
Detects overly long responses |
repetition_count |
Counts repeated tokens |
refusal_expected |
Flags missing refusal behavior |
tool_abuse |
Detects unsafe tool execution or confirmation |
tool_call |
Uses server-side audit logs |
Each fuzzer run writes:
reports/scan_*.jsonreports/scan_*.mdreports/scan_*.html
The scanner scanner/readiness.py answers:
Is this AI agent reasonably ready to be exposed on a local network?
It does not prove perfect security. It provides a practical readiness verdict based on HTTP checks, LLM safety checks, agent-tool checks, RAG checks, and evidence.
| Verdict | Meaning |
|---|---|
PASS |
No critical, high, or medium blockers were detected |
WARN |
Medium-risk issues exist |
FAIL |
At least one critical or high exploitable issue exists |
Against the AgentSploit DVAA, the expected verdict is FAIL.
Against the local lab target:
python scanner/readiness.py --target http://127.0.0.1:8000/chat --profile lan-basic
python scanner/readiness.py --target http://127.0.0.1:8000/chat --profile lan-standardUsing a JSON target profile:
python scanner/readiness.py --config scanner/targets.example.json --profile lan-standardAgainst a generic LAN agent without AgentSploit audit logs:
python scanner/readiness.py --target http://192.168.1.50:8000/chat --no-audit --profile lan-basicThe readiness scanner checks:
- whether the chat endpoint is reachable;
- whether unauthenticated access is accepted;
- permissive CORS;
- unexpected HTTP methods;
- verbose error messages;
- missing visible rate limiting;
- exposed debug or internal endpoints such as
/docs,/openapi.json,/audit-log, and/documents; - prompt injection;
- secret disclosure;
- system prompt leakage;
- unsafe XSS or shell-command output;
- unsafe tool execution;
- RAG injection;
- security hallucination;
- unbounded output behavior.
Mermaid code:
flowchart TD
User["User"] --> CLI["scanner/readiness.py"]
CLI --> TargetProfile["Target profile"]
CLI --> HTTPChecks["HTTP checks"]
CLI --> PayloadChecks["LLM readiness payloads"]
CLI --> AuditChecks["Audit log correlation"]
CLI --> Scoring["Scoring engine"]
TargetProfile --> ChatEndpoint["Chat endpoint"]
HTTPChecks --> ChatEndpoint
PayloadChecks --> ChatEndpoint
AuditChecks --> AuditEndpoint["Optional /audit-log"]
Scoring --> Verdict["PASS WARN FAIL"]
Verdict --> JsonReport["JSON report"]
Verdict --> MdReport["Markdown report"]
Verdict --> HtmlReport["HTML report"]
The file scanner/targets.example.json describes how to communicate with a generic HTTP agent.
Example:
{
"targets": [
{
"name": "agentsploit-local",
"chat_url": "http://127.0.0.1:8000/chat",
"method": "POST",
"headers": {
"Content-Type": "application/json"
},
"request": {
"message_field": "message",
"messages_field": "messages",
"supports_multi_turn": true
},
"response_path": "response",
"request_id_path": "request_id",
"audit_url": "http://127.0.0.1:8000/audit-log"
}
]
}For another agent, the main fields to adapt are:
chat_urlheadersmessage_fieldresponse_pathaudit_url, if available
The module scanner/supply_chain.py checks risks that are not directly testable through prompts:
- unpinned dependencies;
- missing
.gitignoreprotections; - accidental secrets;
- known vulnerable dependencies when
pip-auditis available.
Run:
python scanner/supply_chain.py --skip-pip-auditInclude intentional vulnerable lab fixtures:
python scanner/supply_chain.py --skip-pip-audit --include-lab-fixturesWithout --include-lab-fixtures, the scanner intentionally ignores:
database_creds.txt- test files containing fake secrets;
- fixtures required by the lab.
| ID | Risk | AgentSploit Coverage |
|---|---|---|
LLM01 |
Prompt Injection | Direct, indirect, obfuscated, and RAG-triggered payloads |
LLM02 |
Sensitive Information Disclosure | Secrets, tools, and system context |
LLM03 |
Supply Chain | scanner/supply_chain.py |
LLM04 |
Data and Model Poisoning | Poisoned RAG documents |
LLM05 |
Improper Output Handling | XSS, shell commands, unsafe code |
LLM06 |
Excessive Agency | Unauthorized tool execution |
LLM07 |
System Prompt Leakage | Prompt extraction and reconstruction |
LLM08 |
Vector and Embedding Weaknesses | Retrieval manipulation and RAG confusion |
LLM09 |
Misinformation | False compliance claims and fabricated CVEs |
LLM10 |
Unbounded Consumption | Large output and recursive reasoning pressure |
AgentSploit maps several payloads to MITRE ATLAS techniques.
| Technique | Name |
|---|---|
AML.T0051 |
LLM Prompt Injection |
AML.T0053 |
AI Agent Tool Invocation |
AML.T0056 |
Extract LLM System Prompt |
AML.T0068 |
LLM Prompt Obfuscation |
AML.T0084.001 |
Tool Definitions |
AML.T0086 |
Exfiltration via AI Agent Tool Invocation |
AML.T0098 |
AI Agent Tool Credential Harvesting |
AML.T0099 |
AI Agent Tool Data Poisoning |
AML.T0029 |
Denial of AI Service |
These mappings are stored in the JSON payload files.
| Status | Meaning |
|---|---|
VULNERABLE |
A detector found evidence |
not detected |
No configured indicator was observed |
error |
The test could not be executed correctly |
not detected does not mean secure. It only means that this specific payload did not trigger the configured evidence.
| Status | Meaning |
|---|---|
PASS |
The check did not detect a problem |
WARN |
Something should be reviewed or mitigated |
FAIL |
A blocking or exploitable issue was found |
SKIPPED |
The check was not applicable or the target was unreachable |
Against the local DVAA, the readiness scanner should return FAIL, usually because of:
- no authentication;
- exposed
/docsand/audit-log; - internal information disclosure;
- XSS or shell-command generation;
send_emailexecution without confirmation;- poisoned RAG document retrieval.
Against an unreachable target, the scanner returns FAIL on HTTP-001, then marks dependent checks as SKIPPED.
Run the full test suite:
python -m pytestThe tests cover:
- fuzzer detectors;
- audit logs;
- RAG search;
- hardened policy, tool gateway, RAG guard, output guard, auth/rate limiting, and mock LLM mode;
- before/after comparison aggregation;
- supply-chain checks;
- target-profile parsing;
- readiness scoring;
- report generation;
- unreachable target behavior.
The tests do not call OpenAI.
Recommended portfolio demo:
- Start the vulnerable target.
- Run a readiness scan and show the
FAILverdict. - Open the Markdown or HTML report.
- Explain the evidence: model response, audit logs, OWASP mapping, MITRE mapping.
- Explain mitigations: authentication, rate limiting, tool authorization, RAG isolation, and output handling.
Typical commands:
uvicorn vulnerable_target.main:app --reload
python scanner/readiness.py --target http://127.0.0.1:8000/chat --profile lan-standard
python scanner/fuzzer.py --profile standard --delay 1 --timeout 60
python scanner/supply_chain.py --skip-pip-auditAgentSploit is an educational lab. It does not replace:
- a full penetration test;
- architecture review;
- code review;
- IAM review;
- runtime monitoring;
- production tool sandboxing;
- human risk assessment.
The scanners detect what they can observe through HTTP, LLM responses, audit logs, and project files.
Potential next steps:
- add a real vector database such as Chroma or FAISS;
- add a proxy mode for observing external agents;
- add PDF export;
- integrate
pip-auditinto a CI profile; - add strict CI/CD exit-code modes;
- add a small web UI for report visualization;
- expand MITRE ATLAS payload coverage;
- add a safe sandbox for testing dangerous model outputs;
- add profiles by agent type: support, SOC, DevOps, HR, document assistant.
AgentSploit demonstrates a complete AI security workflow.
Mermaid code:
flowchart LR
Build["Build vulnerable target"] --> Attack["Attack with LLM payloads"]
Attack --> Evidence["Collect responses and audit logs"]
Evidence --> Classify["Map to OWASP and MITRE"]
Classify --> Report["Generate reports"]
Report --> Decide["Decide PASS WARN FAIL"]
Decide --> Mitigate["Recommend mitigations"]
The project shows both offensive understanding of AI agents and defensive readiness assessment before exposing an agent on a local network.