Experiments for the fullsend platform — each tests a hypothesis about autonomous agent infrastructure, security, tooling, or workflows.
| # | Experiment | Status |
|---|---|---|
| 0001 | Agent outage fire drill | Active |
| 0002 | Claude-based ADR drift scanner | Concluded |
| 0003 | ADR-0046 drift scanner | Concluded |
| 0004 | Zero-config autonomous bug fix engine | Concluded |
| 0005 | Agent scoped tools triage | Concluded |
| 0006 | Code agent evaluation | Concluded |
| 0007 | GitHub Actions agent runtime MVP | Concluded |
| 0008 | Guardrails evaluation | Concluded |
| 0009 | Hermes-inspired security patterns | Concluded |
| 0010 | Host-side API server for sandboxed agents | Concluded |
| 0011 | Integration Service design doc drift | Concluded |
| 0012 | Model Armor vs AI agent triage | Concluded |
| 0013 | OpenShell policy bypass | Concluded |
| 0014 | OpenShell sandbox evaluation | Concluded |
| 0015 | Prompt injection defense-in-depth | Concluded |
| 0016 | Promptfoo for agent evaluation in CI | Concluded |
| 0017 | Reasoning monitor | Active |
| 0018 | Runner hello world | Active |
| 0019 | Skills | Active |
| 0020 | Target repository skills in triage | Concluded |
| 0021 | Tool scoping | Concluded |
| 0022 | Claude GitHub App auth | Concluded |
| 0023 | Review cache publication policy | Concluded |
Experiments follow a numbered directory convention. See AGENTS.md for full details.
- Naming:
NNNN-short-description/(zero-padded 4-digit number) - Frontmatter: YAML with
title,status, and optionaltopics - Statuses: Active, Concluded, Abandoned, Merged
- Template: 0000-experiment-template
- Linting:
hack/lint-experiment-numbersandhack/lint-experiment-frontmatterenforce conventions via pre-commit