feat(ai): sandboxed harness adapters (provider-agnostic sandbox layer) [WIP]#774
feat(ai): sandboxed harness adapters (provider-agnostic sandbox layer) [WIP]#774AlemTuzlak wants to merge 44 commits into
Conversation
…xample New @tanstack/ai-claude-code package that runs Claude Code (via @anthropic-ai/claude-agent-sdk) as a TanStack AI chat backend. Unlike HTTP provider adapters, this is a harness adapter: Claude Code owns the agent loop and executes its built-in tools (bash, file edits, search) server-side. - Stream translator maps Agent SDK messages to AG-UI events; harness tool activity arrives as already-resolved TOOL_CALL_*/TOOL_CALL_RESULT pairs and runs always finish with stop/length (never tool_calls), so the engine never re-executes harness tools. Every started tool call is guaranteed a result (synthesized on abort) to keep the engine's pending-call scan safe. - TanStack toolDefinition() server tools are bridged into the harness as an in-process MCP server (raw JSON Schema passthrough, no zod round-trip). Client-side/approval tools fail fast — documented v1 limitation. - Stateful sessions: session id surfaced via a claude-code.session-id CUSTOM event; resume via modelOptions.sessionId (+ forkSession). - Structured output uses the SDK's native outputFormat json_schema. - settingSources defaults to ['project'] so servers don't inherit user-level ~/.claude config from the host machine. - E2E: excluded from the aimock matrix (subprocess can't carry X-Test-Id isolation); covered by 44 unit tests plus a gated live smoke spec (CLAUDE_CODE_E2E=1). Also adds examples/ts-react-coding-agent: a TanStack Start app demoing session resume, the harness tool timeline, read-only/edit permission modes, tool bridging, and a sandboxed scratch workspace — with the agent registry structured so future Codex/Gemini CLI harness adapters can slot in. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ters Add two new coding-agent harness adapters alongside Claude Code: - @tanstack/ai-codex drives OpenAI Codex via @openai/codex-sdk with local tool execution, resumable sessions (modelOptions.sessionId), structured output, and a localhost MCP bridge for TanStack server tools. - @tanstack/ai-gemini-cli drives `gemini --acp` over the Agent Client Protocol with token-level streaming, resumable sessions, a configurable permission policy, and headless ACP auth method selection (authMethodId) so runs never stall on an interactive auth picker. Wire both into the ts-react-coding-agent example: the agent dropdown keeps every harness selectable, and a server function (createServerFn) reports which agents are actually configured at runtime so the UI can surface a setup dialog for unconfigured ones. Includes adapter docs and changesets. Co-authored-by: Cursor <cursoragent@cursor.com>
Add the @tanstack/ai-opencode package, an OpenCode harness adapter that drives OpenCode (via @opencode-ai/sdk) as a TanStack AI chat backend with local tool execution, token-level streaming, stateful sessions, and TanStack tool bridging over a localhost MCP server. Wires the adapter into the ts-react-coding-agent example, adds the OpenCode adapter docs page, and anchors the OpenCode.md gitignore entry so it no longer shadows the docs page on case-insensitive filesystems. Co-authored-by: Cursor <cursoragent@cursor.com>
# Conflicts: # pnpm-lock.yaml
…e, withSandbox, workspace, policy - @tanstack/ai-sandbox: provider-agnostic SandboxHandle/SandboxProvider/SandboxCapabilities contracts - capability tokens (SandboxCapability + optional SandboxStore/Locks), in-memory store/lock defaults - defineSandbox lazy controller + ensure state machine (resume->restoreSnapshot->create+bootstrap) with capability-aware degradation - withSandbox middleware (setup provides handle; onFinish/onError snapshot+destroy) - defineWorkspace (git/local/none + skills + secrets), provider-agnostic bootstrapWorkspace - defineSandboxPolicy + evaluateCommand (glob, deny>ask>allow), compound sandbox key (secrets excluded) - export DefinedChatMiddleware/AnyChatMiddleware from @tanstack/ai for portable middleware authoring - 22 unit tests (ensure/policy/key/store); types + lint clean Refs sandbox proposal (Phase A).
…git helper - @tanstack/ai-sandbox-local-process: SandboxHandle over host fs/child_process (no isolation, dev loop) - virtual /workspace root mapped to a real host dir with path containment - exec/spawn (duplex stdin, streamed stdout), localhost port channel, env, fork via dir copy, durable fs resume-by-dir - core: createExecBackedGit helper (shared by providers without native git); bootstrap clones into the handle's own root - 10 unit tests (fs/exec/spawn/lifecycle/fork/bootstrap/ensure); types + lint clean
…runner - @tanstack/ai: TextOptions.capabilities carries the middleware capability context so harness adapters can read provided capabilities (getSandbox(options.capabilities)) from chatStream; populated by the engine - @tanstack/ai-sandbox: spawnNdjson/toLines — spawn an agent CLI in a sandbox and stream parsed NDJSON stdout (the reusable harness-execution primitive) - tests: toLines buffering + spawnNdjson parsing (core), real spawn+NDJSON via local-process (11) — 25 core tests; types + lint clean
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Changeset Version Preview9 package(s) bumped directly, 31 bumped as dependents. 🟥 Major bumps
🟨 Minor bumps
🟩 Patch bumps
|
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
View your CI Pipeline Execution ↗ for commit f82a981
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-angular
@tanstack/ai-anthropic
@tanstack/ai-claude-code
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-codex
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-gemini-cli
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-mcp
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-opencode
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-sandbox
@tanstack/ai-sandbox-cloudflare
@tanstack/ai-sandbox-docker
@tanstack/ai-sandbox-local-process
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-utils
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/openai-base
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
…ential leakage Security review (PR #774): - argument injection: insert '--' end-of-options separators before positionals (clone url/target, add paths) and reject url/ref/dir/path values beginning with '-' (flag-smuggling guard) - secrets in argv: stop embedding the auth token in the clone URL (leaked via ps/logs); use a one-shot credential.helper that reads the token from the child ENV, single-quoted so the outer shell never expands it - 4 unit tests pinning: token absent from argv + present in env, '--' separators, leading-dash rejection, quote escaping
- @tanstack/ai-sandbox-docker: SandboxHandle over a Docker container - create/resume-by-id/restoreSnapshot(commit image)/destroy; durable fs across stop/start - exec + duplex spawn via dockerode exec + stream demux; fs over base64 piping (binary-safe, no tar dep) - commit-based snapshot + fork; host.docker.internal gateway for host MCP reachability; publishPorts -> ports.connect - exec-backed git reused from core - 3 integration tests (gated on a reachable daemon) — verified green against a real daemon: exec, fs+binary round-trip, snapshot, resume, spawn streaming, ensure+bootstrap - pnpm-workspace: declare dockerode's optional native deps (cpu-features, ssh2) as not-built (JS fallback, local socket)
- claudeCodeText now declares requires:[SandboxCapability] and spawns the claude CLI INSIDE the sandbox via sandbox.process (claude -p --output-format stream-json), reusing translateSdkStream for the stdout NDJSON - prompt fed via stdin (not argv); session id surfaced as before; emits a file.changed CUSTOM event with the git diff after the run - permission-mode/allowed/disallowed/add-dir/max-turns/system-prompt mapped to CLI flags; default permission-mode bypassPermissions (sandbox is isolated) - drop @anthropic-ai/claude-agent-sdk + @modelcontextprotocol/sdk deps; remove the in-process tool bridge (chat()-tools MCP proxy deferred — adapter rejects tools for now); provider-options self-contained - spawnNdjson gains an option to feed stdin - deterministic test via a fake claude CLI in a real local-process sandbox (24 tests); types + lint clean
Runnable demo (examples/sandbox-coding-agent) that runs Claude Code inside a sandbox to fix a bug end-to-end via chat() + withSandbox: - bootstraps a tiny git repo with a deliberate bug, asks the agent to fix it, streams output + prints the git diff - Docker provider by default (installs the claude CLI in setup); SANDBOX=local runs on the host process - README with prerequisites + run instructions for manual e2e verification
…lag mapping; changesets - SandboxPolicyCapability: withSandbox provides the definition policy (conditionally); harness adapters read it via getOptional - claude-code maps defineSandboxPolicy (default decision + fileWrite/network caps + tool-name command rules) onto --permission-mode/--allowedTools/--disallowedTools (best-effort; fine-grained command globs await the MCP permission-prompt tool) - changesets for the sandbox layer + updated claude-code changeset for the in-sandbox behavior - policy-map unit tests (5)
- docs/sandbox/overview.md: mental model, providers, defineWorkspace/defineSandboxPolicy, lifecycle/resume, events, the runnable example (no as-casts; latest model id) - docs/config.json: new Sandboxes section (addedAt 2026-06-16) - packages/ai-sandbox/skills/ai-sandbox: agent skill covering the sandbox APIs + critical rules - ship skills in the package files - test:docs green
…n-sandbox agent - startHostToolBridge: host-side Streamable-HTTP MCP server exposing chat() server tools; the in-sandbox claude calls mcp__tanstack__<tool>, proxied back to the host where execute() runs (closures/DB/secrets). Per-run bearer token; binds for host.docker.internal reachability from Docker - adapter wires --mcp-config when tools are present, picks localhost vs host.docker.internal by provider, and tears the bridge down after the run; tools no longer rejected - 3 host-side tests via the MCP SDK client (list/call/error/auth) — verified green without needing claude - docs + skill updated to describe the tool-proxy
- @tanstack/ai-sandbox-cloudflare: cloudflareSandbox() on @cloudflare/sandbox (edge, inside a Worker) - uniform SandboxHandle: exec, base64-backed fs, exec-backed git, exposePort preview URLs (previewHostname), setEnvVars; spawn via startProcess+onOutput queue - ephemeral disk + no GA snapshots -> durableFilesystem/snapshots false (withSandbox re-bootstraps across cold starts); background processes have no stdin (documented; stdin-fed harnesses need local-process/docker) - compiles against the real @cloudflare/sandbox types; 7 deterministic handle tests against a mock Sandbox (fs round-trip, exec, spawn queue, stdin limitation, port). Runtime verification pending a Workers runtime - align @cloudflare/workers-types version with the workspace (sherif)
- codexText declares requires:[SandboxCapability]; spawns 'codex exec --experimental-json' inside the sandbox (mirroring @openai/codex-sdk's own CLI invocation), prompt via stdin, JSONL thread events → existing translateThreadEvents - sandbox mode / approval policy / reasoning effort / add-dir / skip-git-repo-check / config mapped to codex CLI flags; resume via 'resume <id>' - drop @openai/codex-sdk + @modelcontextprotocol/sdk + the in-process tool bridge; provider-options self-contained; chat()-tools bridging deferred (rejects tools) - deterministic fake-codex-CLI test in a real local-process sandbox (27 tests); types/lint/knip/sherif clean
- geminiCliText declares requires:[SandboxCapability]; spawns 'gemini --acp' inside the sandbox and drives it over ACP via the sandbox's duplex process IO - new spawnHandleToAcpTransport adapts a SpawnHandle into the Uint8Array WebStreams ndJsonStream needs; all @agentclientprotocol/sdk protocol handling reused unchanged - drop local child_process spawn + @modelcontextprotocol/sdk + in-process tool bridge; chat()-tools bridging deferred (rejects tools); structuredOutput throws not-supported - transport-adapter + requires-sandbox tests (36); types/lint/knip/sherif clean
- opencodeText declares requires:[SandboxCapability]; spawns 'opencode serve' inside the sandbox, waits for readiness, exposes the port, and connects @opencode-ai/sdk's HTTP client via baseUrl (reusing startOpencodeSession's connect path) - new startOpencodeServerInSandbox helper (readiness detection + port exposure); Docker needs publishPorts:[port] - drop @modelcontextprotocol/sdk + in-process tool bridge; chat()-tools bridging deferred (rejects tools); structuredOutput throws not-supported; permission governed by the dynamic handler - server-helper + requires-sandbox tests (36); types/lint/knip/sherif clean
…4 harness adapters - move startHostToolBridge + BRIDGED_MCP_SERVER_NAME + hostForSandbox into @tanstack/ai-sandbox core (shared); add @modelcontextprotocol/sdk dep there; tool-bridge test relocated to core - claude-code: import bridge from core; build --mcp-config from the bridge (drop local bridge + dep) - codex: bridge tools via --config mcp_servers.<name>.url + bearer_token - gemini-cli: bridge tools via ACP newSession mcpServers (http + Authorization header) - opencode: bridge tools via OPENCODE_CONFIG_CONTENT mcp.remote (url + bearer header) at server spawn - all adapters no longer reject tools; bridged tool names feed the permission handlers; changesets updated - types/eslint/lib/build/knip/sherif green across all 5 packages
- @tanstack/ai-sandbox: shared resolveApproval (policy + client approvals -> allow/deny/needs-approval), stable approvalId, buildApprovalRequestedEvent (AG-UI CUSTOM 'approval-requested') - @tanstack/ai: TextOptions.approvals threaded from the engine's initialApprovals so harness adapters resolve ask-policy permission requests against the client's decisions (resume-based loop) - 11 unit tests for the resolver/keying/event; @tanstack/ai 1033 tests still pass
Wire client-in-the-loop approvals through every in-sandbox harness adapter, built on the shared approval primitives in @tanstack/ai-sandbox. - core bridge: optional permission-prompt tool on startHostToolBridge, and export PermissionToolResult so adapters can type their resolver. - claude-code: enforce the sandbox policy via --permission-prompt-tool; an `ask` action with no client decision yet emits an approval-requested event and denies, so the client approves and re-runs to continue. - gemini-cli / opencode: resolveInteractivePermission consults policy + client approvals, collects approval-requested events, and yields them after the stream (coercing nullable ACP tool titles). - codex: map defineSandboxPolicy onto codex exec`s coarse knobs (sandbox mode, approval_policy, network_access). codex exec is non-interactive with no per-action host callback, so the resume-based approval flow is not available for codex (documented); adds policy-map + tests. - changesets updated to describe the interactive-approval behavior.
Add provider-agnostic sandbox file-event hooks and a runnable demo that
uses them.
Hooks (@tanstack/ai-sandbox):
- watchWorkspace(handle, { onEvent }) + watchWithHooks(handle, hooks) emit
typed FileEvents (create/change/delete). A native fs.watch fast-path is
used when the provider advertises it; otherwise a portable `find -printf`
mtime snapshot-diff poll runs (no extra deps / image changes). .git and
node_modules are ignored by default.
- withSandboxFileEvents() middleware surfaces events into the chat() stream
as CUSTOM `sandbox.file` events, interleaved with the agent's output.
- local-process gains the native fs.watch seam (Node recursive watch on
Windows/macOS; Linux falls back to the poll).
Example (examples/sandbox-issue-triage):
- Fetches the first open issue on TanStack/ai, clones the repo into a
sandbox, runs Claude Code inside it to triage the issue and write
ISSUE-REPORT.md, reads it back via sandbox.fs, and writes a local report
with the observed file events appended. Two entrypoints: process + docker.
Docs/skill updated; changeset added.
…ry, runtime capability
…ithSandboxFileEvents
…s, refresh example README
Done + tested + committed
Core —
@tanstack/ai-sandbox:SandboxHandle/SandboxProvider/SandboxCapabilitiescontracts; capability tokens (SandboxCapability+ optionalSandboxStore/Locks/SandboxPolicy); in-memory store/lock;defineSandbox+ ensure state machine (resume→restoreSnapshot→create+bootstrap, capability-aware degradation);withSandbox;defineWorkspace;bootstrapWorkspace;defineSandboxPolicy+evaluateCommand; compound key; hardenedcreateExecBackedGit;spawnNdjson.Providers
-local-process— host fs/child_process (dev loop, no isolation).-docker— dockerode; create/resume/restoreSnapshot/destroy, exec + duplex spawn, base64 fs, commit-snapshot + fork. Integration tests verified against a real daemon.-cloudflare—@cloudflare/sandbox(edge/Workers); exec/base64-fs/exposePort/setEnvVars; ephemeral-disk degradation. Compiles against real CF types; runtime verify needs a Workers runtime.Harness adapters (all run in-sandbox, declare
requires:[SandboxCapability])-claude-code—claude -p --output-format stream-jsonvia stdin; reuses translate;file.changeddiff event; policy→CLI flag mapping; MCP tool-proxy (host Streamable-HTTP MCP server proxies chat() tools back to the host, verified via the MCP SDK client).-codex—codex exec --experimental-json(mirrors@openai/codex-sdk's own invocation).-gemini-cli—gemini --acpdriven over ACP via aSpawnHandle→WebStream transport (ACP protocol reused).-opencode— spawnsopencode servein-sandbox, exposes the port, connects the SDK client viabaseUrl.Core wiring —
@tanstack/ai:TextOptions.capabilities;DefinedChatMiddleware/AnyChatMiddlewareexports.Also:
examples/sandbox-coding-agent(runnable local e2e),docs/sandbox/overview.md+ nav,ai-sandboxagent skill, changesets for every package,git-execsecurity hardening.Verification: ~180 unit/integration tests across the sandbox packages (real Docker; deterministic fake-CLI tests in real local-process sandboxes for claude/codex; transport/server-helper + mock tests for gemini/opencode/cloudflare; MCP bridge via the MCP SDK client).
@tanstack/ai1033 tests still pass. types/eslint/build/knip/sherif/docs all green. Live agent-in-sandbox runs are the manual e2e (via the example; needs the agent CLIs + keys).Remaining (documented)
defineSandboxPolicy→ permission-mode / allowed/disallowed-tools mapping + each harness's native permission modes. The full resume loop is entangled with each harness's permission-prompt contract + chat()'s resume/persistence and needs the live CLIs to verify.