Skip to content

ci(dispatch): retry the sesh install on transient release-fetch failures#38

Merged
mike-diff merged 1 commit into
mainfrom
ci-dispatch-install-retry
Jun 24, 2026
Merged

ci(dispatch): retry the sesh install on transient release-fetch failures#38
mike-diff merged 1 commit into
mainfrom
ci-dispatch-install-retry

Conversation

@mike-diff

Copy link
Copy Markdown
Owner

Closes #none (CI reliability fix, no issue).

Problem

The dispatch workflow intermittently fails at the install-sesh step with curl: (22) ... 404 on releases/latest/download/sesh-linux-amd64. Root cause is a race with release.yml: that workflow deletes and recreates the rolling release on every push to main, and for the few seconds the rebuild takes the asset URL 404s. A dispatch run whose install step lands in that window dies on a transient 404. Observed in run 28070847550: install 404 at 02:25:03, release republished at 02:25:05 (2s later).

Changes

  • .github/workflows/dispatch.yml: the Install sesh step now runs install.sh in a retry loop (5 attempts, 10s apart) only for transient fetch failures: curl exit 22 (HTTP error, e.g. 404) and 18 (partial transfer). A checksum mismatch or any other failure is non-transient and aborts immediately with no retry.

install.sh itself is untouched: it is the public installer and its conservative no-retry curl flags are correct for one-shot user installs. The race tolerance belongs in the dispatch workflow, where concurrent workflows against the same rolling release actually happen.

How to test

  • Simulated the retry control flow against five exit-code sequences: 404-then-success recovers; checksum mismatch (rc=1) aborts on attempt 1; persistent 404 exhausts 5 attempts then fails; happy-path first-try success does not retry; two 404s then success recovers on attempt 3. The safety property (never retry a checksum mismatch) holds.
  • Next real dispatch that races a release rebuild will recover instead of failing.

Verification

go build ./... && go vet ./... pass (no Go touched). YAML valid, 12 steps. Retry logic simulated across 5 cases. No em dashes per AGENTS.md.

The repo's release.yml deletes and recreates the rolling release on every
push to main (delete tag, rebuild assets, republish). For the few seconds
that rebuild takes, releases/latest/download/<asset> returns 404. A dispatch
run whose install step lands in that window dies on a transient 404, as run
28070847550 did: the install step hit 404 at 02:25:03, two seconds before
the release was republished at 02:25:05.

install.sh is the public installer and keeps its conservative no-retry curl
flags (correct for one-shot user installs). The race tolerance belongs in
the dispatch workflow, where concurrent workflows against the same rolling
release actually happen.

Retry only transient fetch failures: curl exit 22 (HTTP error, e.g. 404) and
18 (partial transfer). A checksum mismatch is deterministic and is NOT
retried (a corrupt or tampered download must fail loudly). Five attempts,
10s between, then fail.

Verified by simulating the control flow: 404-then-success recovers; checksum
mismatch aborts immediately; persistent 404 exhausts then fails; happy path
no retry. build/vet green; no em dashes per AGENTS.md.
@mike-diff mike-diff merged commit 2f65548 into main Jun 24, 2026
2 checks passed
@mike-diff mike-diff deleted the ci-dispatch-install-retry branch June 24, 2026 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant