Skip to content

fix(ci): run anvil L1 as root so state dump can write the bind mount#29

Open
douglance wants to merge 1 commit into
fix/missing-anvil-statefrom
fix/anvil-state-bindmount-perms
Open

fix(ci): run anvil L1 as root so state dump can write the bind mount#29
douglance wants to merge 1 commit into
fix/missing-anvil-statefrom
fix/anvil-state-bindmount-perms

Conversation

@douglance

Copy link
Copy Markdown
Collaborator

Problem

The Publish Testnode workflow fails at Generate snapshot with:

Snapshot source missing non-empty Anvil state file after waiting 30000ms:
  config/anvil-state/state.json

The full init sequence (L1→L2→L3 rollups, token bridges, deposits) succeeds — it only dies when capturing the snapshot because anvil's state.json never lands on the host.

Root cause

ghcr.io/foundry-rs/foundry:v1.3.5 runs anvil as the non-root foundry user (uid 1000), confirmed from the image config:

User: 'foundry'

The L1 service dumps state into a host bind mount (../config/anvil-state:/state, --state=/state/state.json --state-interval=1). On Linux CI runners that directory is owned by the runner user, so anvil (uid 1000) can't write it — the periodic and on-exit state dumps silently fail and no snapshot is produced.

Docker Desktop on macOS maps bind-mount writes to the host user regardless of container uid, which is why this only broke in CI ("works locally, fails in CI").

Fix

Run the L1 anvil service as root (user: "0:0") so the state dump can write the bind mount. One-line change plus an explanatory comment.

Proof

Verified by dispatching the previously-failing Publish Testnode workflow on this branch — see the linked green run in the PR comments.

Follow-up (not in this PR)

finalizeFreshInit runs docker compose down before waitForAnvilStateFile, so the failure-dump step captures no anvil logs (containers are already gone). Worth reordering log capture before teardown so future failures are diagnosable.

The foundry image runs anvil as the non-root `foundry` user (uid 1000).
On Linux CI runners the bind-mounted config/anvil-state directory is owned
by the runner user, so anvil cannot write /state/state.json; the periodic
(--state-interval=1) and on-exit state dumps silently fail and no snapshot
is produced. Docker Desktop on macOS masks this by mapping bind-mount writes
to the host user regardless of container uid, so it only failed in CI.

Running the L1 service as root lets the state dump land in the bind mount.

Fixes the "Snapshot source missing non-empty Anvil state file" failure in
the Publish Testnode workflow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@douglance

Copy link
Copy Markdown
Collaborator Author

Proof: fix verified in CI.

Dispatched the previously-failing Publish Testnode workflow on this branch, scoped to the exact combo that was failing (l2 / v3.2), with a throwaway version=v0.0.0-anvil-perms-test tag:

  • Run: https://github.com/OffchainLabs/arbitrum-testnode/actions/runs/28607460859success
  • Generate snapshot (init, retry until captured) — ✅ passed on the first attempt (this is the step that failed 3×/run with Snapshot source missing non-empty Anvil state file)
  • Whole job (publish-testnode-image (l2, v3.2, l2, false)) green through Build and push testnode image, 9m0s total.

With user: "0:0", anvil can write /state/state.json into the bind mount on the Linux runner, so the snapshot is captured. Note: this run pushed a throwaway image tag v0.0.0-anvil-perms-test to GHCR that can be deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant