feat: surface unrecoverable container errors during pod wait#13
Open
Longwt123 wants to merge 4 commits into
Open
feat: surface unrecoverable container errors during pod wait#13Longwt123 wants to merge 4 commits into
Longwt123 wants to merge 4 commits into
Conversation
When a CI job references a non-existent image or one it lacks permission to pull, the pod stays in Pending and waitForPodPhases previously timed out with only a generic phase-status message. GitHub Actions users had no indication of the real cause. Detect unrecoverable container waiting reasons (ImagePullBackOff, ErrImagePull, InvalidImageName, CreateContainerConfigError, CreateContainerError) on both init and regular containers, and fail fast with the container name, reason, and Kubernetes message so the error is visible in the Actions log. Refactor getPodPhase into readPod + parsePodPhase so the pod object can be inspected for container errors, and add unit tests covering parsePodPhase, getContainerErrors, and waitForPodPhases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r reasons
Extend the error feedback mechanism introduced in the previous commit with:
- describePodFailure(): aggregates pod phase, conditions, container statuses
and Warning events into a single human-readable diagnostic string.
Never throws — safe to call from any error path.
- describePodWarningEvents(): best-effort retrieval of recent Warning K8s
events (requires optional "events" RBAC permission). Degrades gracefully
when the permission is missing.
- getUnrecoverableWaitingReasons(): allows operators to extend the built-in
fast-fail whitelist via the ACTIONS_RUNNER_K8S_UNRECOVERABLE_WAITING_REASONS
environment variable without a code change. Built-in defaults cannot be
removed.
- All three failure paths in waitForPodPhases now attach full diagnostics:
1. Non-backoff phase (e.g. Failed) — includes pod details + events
2. Unrecoverable container error (e.g. ImagePullBackOff) — fail-fast with
diagnostics instead of waiting for timeout
3. Timeout — includes pod details so the user can see WHY the pod never
became ready
- README: document the optional "events" permission and the new env var.
- Tests: 8 new test cases (19 total) covering describePodFailure,
describePodWarningEvents, getUnrecoverableWaitingReasons, and edge cases
such as forbidden events API and unreadable pods.
Welcome To opensourceways CommunityHey @Longwt123 , thanks for your contribution to the community. Bot Usage ManualI'm the Bot here serving you. You can find the instructions on how to interact with me at Here . That means you can comment below every pull request or issue to trigger Bot Commands. |
CLA Signature PassLongwt123, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
The project pins prettier@2.6.2 in package-lock.json, but the previous commit was formatted with prettier 3.x which has different line-wrapping rules for template literals. Re-format with prettier 2.6.2 to pass CI.
CLA Signature PassLongwt123, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
v2.329.0 was deprecated by GitHub and rejected at the broker level with "Runner version v2.329.0 is deprecated and cannot receive messages.", causing runner pods to crash-loop immediately after connecting. Also fix Dockerfile layer ordering: switch to root before COPY so that the subsequent chown is not run as the unprivileged runner user. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
CLA Signature PassLongwt123, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
K8s 模式下,当 CI job 引用了不存在的镜像、或没有权限拉取的镜像时,Pod 会卡在 Pending。此前
waitForPodPhases只会一直 backoff 直到超时,最终只报一句笼统的 phase 错误,GitHub Actions 用户看不到真正的失败原因。改动
ImagePullBackOff、ErrImagePull、InvalidImageName、CreateContainerConfigError、CreateContainerError)。getPodPhase重构为readPod+parsePodPhase,以便复用 Pod 对象做容器检查。parsePodPhase、getContainerErrors、waitForPodPhases。测试
tsc --noEmit通过🤖 Generated with Claude Code