Skip to content

test(utils): fix flaky SIGTERM-ignored terminate test (bash tail-exec)#81

Merged
CMGS merged 1 commit into
masterfrom
fix/flaky-terminate-test-exec-opt
Jul 4, 2026
Merged

test(utils): fix flaky SIGTERM-ignored terminate test (bash tail-exec)#81
CMGS merged 1 commit into
masterfrom
fix/flaky-terminate-test-exec-opt

Conversation

@CMGS

@CMGS CMGS commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

The test job on master (7e4406f) failed with a 120s package timeout in utils, hung on TestTerminateProcess_SIGTERMIgnored_FallsBackToKill (58s in). A rerun passed — it is flaky, not a regression, and not caused by #80 (which only added two fast pure-function tests to the package).

Root cause

The helper process bash -c ""trap "" TERM; sleep 60"" gets tail-exec-optimized by bash: with sleep as the last command, bash execs it in place, so /proc/<pid>/cmdline argv0 becomes sleep, not bash. Verified on ubuntu 24.04 bash 5.2.21 (the CI image) and bash 5.3.

TerminateProcess(pid, "bash", "sleep", …) calls VerifyProcessCmdline, which requires filepath.Base(argv0) == "bash" before it will signal. It is a startup race: cmd.Start() forks bash, which then takes a few ms to exec into sleep.

  • Usually the cmdline check reads bash … before the exec lands → check passes → SIGTERM (trapped) → grace → SIGKILL → fast pass.
  • Occasionally the exec wins → cmdline is sleep → check fails → TerminateProcess no-ops → the undead sleep 60 runs its full 60s → the test blocks on cmd.Wait() → the utils package exceeds its 120s -timeout.

(macOS never hit this: verifyProcessCmdline returns errVerifyUnsupported there, so VerifyProcessCmdline falls back to IsProcessAlive and always proceeds.)

Fix

Append ; : so sleep is no longer the last command and bash cannot tail-exec — the process stays resident as bash, keeping argv0 stable. This is the same guard TestFindVMMByCmdline already uses (sleep 60 && :). Production TerminateProcess callers pass real binaries (cloud-hypervisor) that never rename themselves, so only the test was affected.

Verification

  • ubuntu 24.04 bash 5.2.21: old command → cmdline sleep 60; fixed command → cmdline bash -c ... (argv0 stays bash).
  • Fixed test on real Linux (go 1.26, pidfd path) under -race, 50 iterations: all pass, no hang (181s total, uniform — a single flake hit would have added a ~60s outlier).
  • Local: build, make lint (linux+darwin, 0 issues), golangci-lint fmt --diff clean.

…e test

bash -c "trap \"\" TERM; sleep 60" tail-execs into sleep, so /proc/pid/cmdline
argv0 becomes "sleep". TerminateProcess verifies argv0 == "bash" before signaling;
when the exec wins the startup race against that check it no-ops, the undead
sleep runs its full 60s, and the utils package trips its 120s test timeout.
A trailing "; :" keeps bash resident so the cmdline stays "bash".
@CMGS CMGS merged commit d9fa4df into master Jul 4, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant