[Experiment] code-review: BCQuality integration arm (live skills) by gggdttt · Pull Request #715 · microsoft/BC-Bench

gggdttt · 2026-06-29T15:20:46Z

Experiment Description

Enable the live BCQuality integration arm for the code-review category — the "after" side of the BCApps #8700 change. Instead of static in-repo checklists, the agent consumes microsoft/BCQuality at runtime: clone (pinned SHA) → filter → route through skills/entry.md, then emit the BC-Bench review.json schema.

This is the counterpart to the inline-knowledge (pre-#8700) arm (experiment/code-review/inline-knowledge). Together they let us compare: vanilla < inline knowledge < live BCQuality.

Configuration Changes

Custom instructions (instructions.enabled: true)
Skills (skills.enabled: true)
Custom agents
MCP servers
Other: bcquality.enabled: true — code-review-only switch. The filtered BCQuality clone becomes the Copilot CWD (knowledge read before the diff); the repo under review is granted via --add-dir; static instruction injection is skipped. No effect on bug-fix / test-generation.

Key pieces:

config.yaml: bcquality: section (repo + pinned ref SHA, enabled-layers, disabled-skills, knowledge allow/deny globs, task-context dimensions). enabled: true on this branch.
agent/shared/codereview_bcquality.py: clone_bcquality (pinned SHA, shallow), filter_clone (mirrors Invoke-BCQualityFilter.ps1, writes _filter-report.json), task-context writer, bootstrap prompt routing through skills/entry.md.
copilot/agent.py: live branch wiring (clone as CWD, --add-dir, hooks into the clone).
types.py: ExperimentConfiguration.bcquality flag → routes results to the Experiment Leaderboard.

Agent & Model

Agent: GitHub Copilot CLI
Model: (default)
Category: code-review

Hypothesis / Expected Outcome

Consuming BCQuality's live knowledge base should match or exceed the pre-#8700 inline checklists on finding quality (precision/recall/F1 vs gold), since it carries the same domain knowledge plus ongoing BCQuality updates and explicit knowledge-backed routing. Expected ordering: vanilla < inline knowledge < live BCQuality.

Notes

Draft only — entry point describing exactly what is evaluated; not meant to merge.
All 81 codereview.jsonl entries target microsoft/BCApps.
Recreated after the experiment branches were renamed to the symmetric experiment/code-review/* scheme (supersedes the former [Code-review]: live BCQuality consumption + faithful pre-#8700 old-inline baseline arm #696).

Adds a bcquality config section (default disabled) and a Python module that clones BCQuality at a pinned SHA, filters it per enabled-layers/knowledge globs, builds task-context, and a skills/entry.md bootstrap prompt -- replicating how microsoft/BCApps consumes microsoft/BCQuality today. Not yet wired into the agent; no effect on existing categories.

- ExperimentConfiguration: add bcquality flag - copilot agent: live BCQuality branch (clone CWD, --add-dir repo, skip static injection) - add 23 unit tests for codereview_bcquality module

…line arm - Extract the 6 faithful domain checklists (accessibility/performance/privacy/ security/style/upgrade) verbatim from BCApps 30e2b18ca3^ (the version BCApps shipped before adopting BCQuality), NOT the benchmark-tuned experiment snapshot - AGENTS.md: add review section routing /review through the 6 domain checklists - Enables a faithful before/after comparison: vanilla < old inline < live BCQuality - Inert by default (instructions.enabled=false); arm activated via config toggle

…re list)

…pdates)

…/ BCQuality arms)

…iment Leaderboard

…icro F1

…nistic severity mapping, relocate bcquality module to agent/shared)

…erity mapping

…er entry); surface git stderr on failure

…derr surfacing

…nja2 template

…to BCQuality bootstrap prompt

…-driven

wenjiefan and others added 18 commits June 25, 2026 14:36

code-review: wire live BCQuality path into copilot agent + tests

15c3feb

- ExperimentConfiguration: add bcquality flag - copilot agent: live BCQuality branch (clone CWD, --add-dir repo, skip static injection) - add 23 unit tests for codereview_bcquality module

code-review: markdown formatting in BCApps AGENTS.md (blank line befo…

058a5e1

…re list)

Merge main into code-review-live-bcquality (sync leaderboard + main u…

2fa1ccd

…pdates)

code-review docs: add Experiment Leaderboard table (vanilla / inline …

e8713f7

…/ BCQuality arms)

code-review docs: add Agent column, drop Vanilla reference from Exper…

b19f7e5

…iment Leaderboard

Fix pre-commit whitespace in instruction files; rename F1 column to M…

fd275cd

…icro F1

code-review: address self-review (reuse review.json constant, determi…

7da69c5

…nistic severity mapping, relocate bcquality module to agent/shared)

code-review: reuse review.json constant + deterministic BCQuality sev…

5bf1745

…erity mapping

code-review: cache BCQuality clone per-SHA (clone once, copy+filter p…

edd6dbd

…er entry); surface git stderr on failure

code-review: drop BCQuality clone cache (clone is cheap); keep git st…

b07213b

…derr surfacing

code-review: externalize BCQuality bootstrap prompt to config.yaml Ji…

72f7c51

…nja2 template

code-review: add super-skill execution-discipline / progress markers …

0dc121c

…to BCQuality bootstrap prompt

Merge branch 'main' into private/wenjiefan/code-review-live-bcquality

5a3b5a7

code-review: make BCQuality task-context goal/inputs-available config…

4c6c104

…-driven

Merge branch 'main' into private/wenjiefan/code-review-live-bcquality

88b3ea7

code-review: activate BCQuality integration arm (bcquality.enabled=true)

b747ce6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Experiment] code-review: BCQuality integration arm (live skills)#715

[Experiment] code-review: BCQuality integration arm (live skills)#715
gggdttt wants to merge 18 commits into
mainfrom
experiment/code-review/bcquality-integration

gggdttt commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gggdttt commented Jun 29, 2026

Experiment Description

Configuration Changes

Agent & Model

Hypothesis / Expected Outcome

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant