Skip to content

Consolidate AI review CI quality gate#3550

Open
masih wants to merge 22 commits into
mainfrom
masih/consolidate-ai-reviewers
Open

Consolidate AI review CI quality gate#3550
masih wants to merge 22 commits into
mainfrom
masih/consolidate-ai-reviewers

Conversation

@masih

@masih masih commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Consolidate the AI reviewers into a single required CI job with workaround for merge queue.

masih added 2 commits June 4, 2026 19:16
Consolidate the AI reviewers into a single required CI job with
workaround for merge queue.
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 25, 2026, 1:15 PM

@masih masih marked this pull request as ready for review June 4, 2026 17:24
@cursor

cursor Bot commented Jun 4, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
It changes merge gating behavior (single required status, merge-queue auto-pass, and advisory neutral/skipped), so mis-matched check names or conclusion handling could block or incorrectly allow merges.

Overview
Adds a new AI Review GitHub Actions workflow that exposes one required commit status instead of relying on each AI reviewer check separately.

The workflow runs on PR updates, on Bugbot / Claude check-run events, and on merge_group. It uses a GitHub App token to list check runs for the PR head SHA, takes the latest run per reviewer (substring name match), and sets a single AI Review status: failure if either reviewer reports blocking conclusions, success when both are done and nothing blocks (neutral/skipped are advisory), pending while reviews are still running, and auto-pass on merge queue because reviewers do not re-run there.

Concurrency is keyed by head SHA with cancel-in-progress: false so the last evaluation wins for last-write-wins commit statuses.

Reviewed by Cursor Bugbot for commit 0d77ef8. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread .github/workflows/ai-review.yml Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c19330e836

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/ai-review.yml Outdated
with:
script: |
const sha = context.payload.pull_request.head.sha
const deadline = Date.now() + 8 * 60 * 1000

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow enough time for Claude reviews

Claude Code's Code Review docs state that reviews complete in 20 minutes on average, so this 8-minute deadline (with a 10-minute job timeout) will make normal PRs fail this required gate while Claude Code Review is still pending. In repositories where branch protection requires this consolidated check, those PRs remain blocked until a manual rerun or new push even though the review would later complete successfully.

Useful? React with 👍 / 👎.

name: AI Review

on:
pull_request:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Trigger the gate when draft PRs become ready

With a bare pull_request trigger, GitHub only runs this workflow for the default opened/synchronize/reopened activity types, not ready_for_review. For PRs opened as drafts, Claude's automatic review does not run until the PR is marked ready, so this gate can time out on the draft and publishing it won't start a new gate run; the required check stays failed until a new push or manual rerun.

Useful? React with 👍 / 👎.

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.15%. Comparing base (04ca0f6) to head (0d77ef8).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3550      +/-   ##
==========================================
- Coverage   59.12%   58.15%   -0.97%     
==========================================
  Files        2259     2176      -83     
  Lines      186487   176890    -9597     
==========================================
- Hits       110255   102869    -7386     
+ Misses      66351    64929    -1422     
+ Partials     9881     9092     -789     
Flag Coverage Δ
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.
see 83 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml
Comment thread .github/workflows/ai-review.yml
Comment thread .github/workflows/ai-review.yml
Comment thread .github/workflows/ai-review.yml
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ai-review.yml Outdated
Comment thread .github/workflows/ecr.yml Outdated
Comment on lines +91 to +94
const at = r => new Date(r.started_at ?? r.updated_at ?? r.created_at ?? 0)
const latestFor = key =>
allRuns.filter(r => r.name.toLowerCase().includes(key)).sort((a, b) => at(b) - at(a))[0]
const runs = REVIEWERS.map(r => latestFor(r.key))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The fallback chain r.started_at ?? r.updated_at ?? r.created_at ?? 0 at line 91 references fields that don't exist on the GitHub check_run REST schema — only started_at and completed_at are top-level on check_run; created_at/updated_at live on check_suite — so the chain effectively collapses to r.started_at ?? 0. When a maintainer clicks Re-run, GitHub creates a new check_run with status: queued and started_at: null, so at(newRun) = new Date(null) = epoch and the older completed run wins the descending sort; latestFor returns the stale run with its prior conclusion for the entire rerun window. Combined with the merge_group auto-pass at lines 81-84, a maintainer-triggered rerun + queue-for-merge can land the PR with a stale success before the rerun verdict arrives. Fix: prefer non-completed runs when one exists for the same reviewer, or use check_run.id (monotonically increasing per repo) as the tiebreaker.

Extended reasoning...

What the bug is

Line 91 defines the sort key for picking the latest check_run per reviewer:

const at = r => new Date(r.started_at ?? r.updated_at ?? r.created_at ?? 0)

The fallback chain was added in response to inline comment 3372143715 (and is the same shape the author proposed for the rerun-staleness problem cursor[bot] flagged in 3363611828). But per the GitHub Checks Runs REST API schema, a check_run object only has these top-level fields: id, head_sha, node_id, external_id, url, html_url, details_url, status, conclusion, started_at, completed_at, output, name, check_suite, app, pull_requests, deployment. There is no top-level created_at or updated_at — those fields exist on the nested check_suite resource. So r.updated_at and r.created_at evaluate to undefined for every run and the chain reduces to r.started_at ?? 0.

Why this matters during reruns

When a maintainer clicks Re-run on Cursor Bugbot or Claude Code Review, GitHub creates a fresh check_run with status: queued and started_at: null (per the spec, started_at is nullable on check_run; cursor[bot] confirmed this exact behavior in inline 3363611828: "a freshly rerequested run is often queued with a null started_at"). The prior completed run is still returned by listForRef. Now the sort runs:

  • at(newQueuedRerun) = new Date(null).getTime() = 0 (epoch)
  • at(oldCompletedRun) = valid started_at timestamp (≫ 0)
  • Descending sort: old completed run wins

latestFor returns the OLD run. Line 102 sees run.status === 'completed' (the old one), and the script falls through to whichever verdict branch matches the OLD run's conclusion. The newly-queued rerun is invisible to the script during its entire queued+in_progress window, and the GitHub webhook does not emit a check_run event for the queued→in_progress transition, so the workflow has no other opportunity to re-evaluate until the rerun completes.

Why existing code doesn't prevent it

The author clearly intended the fallback chain to handle this case — it was added specifically in response to the queued-started_at-is-null problem cursor[bot] flagged. The fix landed but doesn't work because the chosen fallback fields are not in the check_run schema. This is a near-miss, not a missed-case: the structural intent is correct, the field names are wrong.

Impact

Once AI Review is wired into branch protection as the consolidated required gate (the explicit goal of this PR), the rerun-window staleness has two failure modes:

  • Stale failure / action_required → gate stays red. Annoying but recoverable: the rerun completes, fires check_run.completed, the workflow re-evaluates and posts the correct verdict.
  • Stale success / neutral → gate stays green during rerun. This is the dangerous one. Lines 81-84 auto-pass on merge_group events without consulting reviewer state at all. So a maintainer who triggered a rerun and then immediately queues the PR for merge can have the merge queue fire, see AI Review = success from the stale verdict, auto-pass merge_group's own evaluation, and land the PR before the rerun's verdict arrives. If the rerun would have surfaced a regression that the prior run missed, it merges into main with the gate showing green throughout.

Step-by-step proof (stale success → unsafe merge)

  1. t=0 — Both reviewers complete success on commit X. The script writes AI Review = success.
  2. t=5m — Maintainer clicks Re-run on Cursor Bugbot via the Checks UI. GitHub creates check_run #N+1 named Cursor Bugbot on head_sha = X with status: queued, started_at: null, completed_at: null. check_run.created fires.
  3. t=5m+ε — The job runs (the [created, rerequested] subscription wakes it). listForRef returns both the old completed Bugbot run AND the new queued one. The script computes at(new) = new Date(null) = epoch and at(old) = valid started_at > 0. Descending sort: old wins. latestFor('bugbot') returns the OLD completed run.
  4. t=5m+ε — Line 102: run.status === 'completed' ✓. Line 104: REJECTING.includes('success') false. Line 106: PASSING.includes('success') ✓ — script treats it as passing. With both reviewers passing per the OLD state, line 123 runs and posts AI Review = success (or, if the pending-overwrite branch is hit instead, it remains at the prior success). Either way: gate is green.
  5. t=6m — Maintainer queues PR for merge. merge_group fires. Lines 81-84 auto-pass with setStatus("success", "Merge queue — auto-pass").
  6. t=6m+ε — Merge proceeds. Bugbot rerun has not yet completed.
  7. t=8m — Bugbot rerun completes with failure (caught a regression the prior run missed). Too late — already in main.

The runtime semantics of new Date(null).getTime() === 0 are unambiguous JS: new Date(null) equals Thu Jan 01 1970 00:00:00 GMT+0000, and Math.max(0, anyPositiveTimestamp) always picks the positive value. The trace above follows directly.

How to fix it

Several viable options, in roughly increasing order of robustness:

  1. Prefer non-completed runs when both exist for a reviewer — most directly captures the intent ("a rerun is in flight, ignore the prior verdict"):
const latestFor = key => {
  const matches = allRuns.filter(r => r.name.toLowerCase().includes(key))
  const inFlight = matches.find(r => r.status !== 'completed')
  if (inFlight) return inFlight
  return matches.sort((a, b) => new Date(b.started_at ?? 0) - new Date(a.started_at ?? 0))[0]
}
  1. Sort by check_run.id as a stable tiebreaker — check_run ids are monotonically increasing per repo, so the newest run always wins regardless of started_at nullity:
const at = r => [new Date(r.started_at ?? 0).getTime(), r.id]
const latestFor = key => allRuns
  .filter(r => r.name.toLowerCase().includes(key))
  .sort((a, b) => { const [ta, ia] = at(a), [tb, ib] = at(b); return tb - ta || ib - ia })[0]
  1. Replace the broken fallback with completed_at (which IS on the schema) — narrower fix but only useful as a tiebreaker because queued runs have completed_at: null too.

Any of these unsticks the rerun-staleness case the original fallback chain was meant to address.

Comment thread .github/workflows/ecr.yml Outdated
@@ -1,5 +1,7 @@
name: ECR

BREAK AND SEE IF AI CATCHES THIS!!!!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The PR adds the literal line BREAK AND SEE IF AI CATCHES THIS!!!! at .github/workflows/ecr.yml:3 between the name: and on: keys. This is a bare scalar at the document root mixed with mapping entries, so it is not valid YAML and GitHub Actions will mark ecr.yml as 'Invalid workflow file' — no sei-chain container images (including mock- and mock_block_validation- variants) will be built or pushed to ECR on push to main/release/** or via workflow_dispatch after merge. Remove lines 2–3 before merging.

Extended reasoning...

What the bug is

.github/workflows/ecr.yml line 3 contains the bare unquoted text BREAK AND SEE IF AI CATCHES THIS!!!! (no leading #), sandwiched between the mapping keys name: ECR on line 1 and on: on line 5. The document root is a YAML mapping (its keys are name, on, jobs), so a non-mapping scalar at the same indentation level is a syntax error — a YAML document cannot mix a bare scalar with mapping entries in the same context.

How it manifests

Running a YAML parser against the file fails:

$ python3 -c "import yaml; yaml.safe_load(open(".github/workflows/ecr.yml"))"
yaml.scanner.ScannerError: while scanning a simple key
  ...
could not find expected ":" in .github/workflows/ecr.yml, line 3, column 1

GitHub Actions uses the same YAML semantics at workflow load time, so when this lands on main/release/**, the Actions tab will show Invalid workflow file for ecr.yml and the workflow will refuse to dispatch.

Why existing code does not prevent it

The line is a top-level addition with no surrounding guard. The workflow is otherwise correct, but a single malformed line at the document root is enough for GitHub to reject the entire file — there is no partial parsing or "skip bad lines" fallback.

Impact

.github/workflows/ecr.yml is the workflow that produces and publishes the sei-chain Docker images to ECR. It triggers on:

  • push to main and release/** branches
  • workflow_dispatch

For each push, it builds and pushes three image tags to {ecr-registry}/sei/sei-chain:

  • <sha> (production)
  • mock-<sha> (mock balances build)
  • mock_block_validation-<sha> (mock block-validation build)

Once this PR merges, every push to main/release/** will silently fail to dispatch ecr.yml. No new container images will be built or pushed. Anything downstream that pulls {ecr-registry}/sei/sei-chain:<sha> for a post-merge commit will not find the image, and deployment/CI flows depending on those tags break until a follow-up PR removes the line.

Why this happened and how to fix

The latest commit on this PR is 74dd752 Test neagtive case with AI review. The commit message is explicit: this line was inserted intentionally to test whether the new consolidated AI Review gate this PR introduces catches such regressions. It must not land. cursor[bot] independently flagged the same line on the same commit (inline 3373932492, "Accidental debug line in workflow", High Severity).

Fix: delete lines 2–3 of .github/workflows/ecr.yml (the blank line and the BREAK line) so the file goes back to:

name: ECR

on:
  push:
    ...

Step-by-step proof

  1. GitHub event: a commit lands on main after this PR merges.
  2. GitHub Actions loads .github/workflows/ecr.yml.
  3. The YAML parser reaches line 3 (BREAK AND SEE IF AI CATCHES THIS!!!!).
  4. Line 1 (name: ECR) established the document root as a mapping. Line 3 begins at column 1 with a non-mapping token (no : follows on the same logical line; the next token is on: on line 5 after a blank line).
  5. Parser raises ScannerError: could not find expected ":" in .github/workflows/ecr.yml, line 3, column 1.
  6. GitHub Actions surfaces "Invalid workflow file" and refuses to dispatch ecr.yml for this commit.
  7. None of the three image tags (sei-chain:<sha>, sei-chain:mock-<sha>, sei-chain:mock_block_validation-<sha>) is built or pushed to ECR.
  8. The same outcome repeats for every subsequent push to main/release/** and every workflow_dispatch invocation, until a follow-up PR removes the line.

🔬 also observed by cursor

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0d77ef8. Configure here.

(github.event_name == 'pull_request' && github.event.pull_request.draft == false) ||
(github.event_name == 'check_run' &&
(contains(github.event.check_run.name, 'Bugbot') ||
contains(github.event.check_run.name, 'Claude')))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case sensitive check_run filter

Medium Severity

The check_run job guard uses case-sensitive contains() on Bugbot and Claude, but the workflow comments and aggregation script treat reviewer names as case-insensitive. When a reviewer’s check-run name differs in casing, the job does not run on completion, so the AI Review commit status may remain pending even after both reviewers have finished.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0d77ef8. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant