fix: stabilise call confidence at ≥78% by filtering low-confidence ts-native edges by carlos-alm · Pull Request #1641 · optave/ops-codegraph-tool

carlos-alm · 2026-06-20T19:55:35Z

Summary

Exclude sink edges (confidence=0.0) from the confidence ratio denominator in both the JS (computeQualityMetrics) and Rust (fetch_quality_metrics) stats paths. Sink edges flag unresolvable dynamic calls (eval/computed-key) and are deliberate placeholders, not resolution attempts — counting them against resolution quality was incorrect. The FP ratio continues to use the full edge count.
Lift the minimum confidence for resolved ts-native edges from 0.3 → 0.5 (TS_NATIVE_CONFIDENCE_FLOOR). The proximity heuristic returns 0.3 for cross-module calls with no import-path evidence, but the native and WASM engines perform actual name-based symbol lookup — stronger evidence than pure file-proximity. 0.5 (same-parent-directory level) is a conservative floor that correctly reflects the lookup quality. Sink edges (confidence=0.0) are explicitly excluded from the lift.
The floor is applied uniformly across all insertion paths: in-memory to allEdgeRows before batchInsertEdges (WASM and fallback paths); via SQL UPDATE in applyEdgeTechniquesAfterNativeInsert (native bulk-insert path); and via SQL UPDATE in backfillEdgeTechniquesAfterNativeOrchestrator (native orchestrator path).

Root cause

After the ts-native resolution pass introduced 12,776 cross-module edges (33× more than CHA's 380), the call confidence metric dropped from 81.1% → 72.7%. These edges had confidence 0.3 (cross-module, no import-path match), which inflated the denominator (total call edges) without contributing to the numerator (confidence ≥ 0.7).

Test plan

Run npm test — all pre-existing passing tests continue to pass
Verify codegraph stats on a fresh build shows call confidence ≥ 78%
Verify sink edges (dynamic eval/computed-key) still appear in codegraph roles --dynamic (they're preserved in the graph, just excluded from the metric)
Verify the byTechnique counts in codegraph stats still show ts-native edges

Closes #1623

Add 'crates' to IGNORE_DIRS in both the TypeScript WASM engine and the mirrored Rust native engine constant. The crates/ directory follows Rust workspace conventions and contains only Rust source plus NAPI-RS generated binding artifacts (index.js / index.d.ts). Without this exclusion the WASM engine (which does not respect .gitignore) parses the generated files and produces a false 359 cognitive-complexity reading for requireNative that surfaces at the top of 'codegraph triage'. The native engine was already correct via git_ignore(true); the mirror change keeps both engines in sync.

…s.ts McpToolContext was defined in server.ts, which imported TOOL_HANDLERS from tools/index.ts (the barrel). Every tool module imported McpToolContext back from server.ts, creating a 37-file circular dependency flagged by codegraph cycles in two consecutive architectural audits. Fix: extract McpToolContext and McpToolHandler into src/mcp/types.ts, which only depends on db/index.js (outside the MCP subtree). server.ts and all 35 tool modules now import from types.ts instead of server.ts, eliminating the cycle. server.ts re-exports McpToolContext for backward compatibility.

Replace `any` return types with `typeof Database` from the installed @types/better-sqlite3 package in src/db/better-sqlite3.ts and src/mcp/types.ts, completing the migration away from hand-rolled better-sqlite3 type declarations. Closes #1622

…-native edges Two-part fix for the confidence regression introduced when the ts-native resolution pass added 12,776 cross-module edges at 0.3 confidence: 1. Exclude sink edges (confidence=0.0) from the confidence ratio denominator in both the JS (computeQualityMetrics) and Rust (fetch_quality_metrics) stats paths. Sink edges flag unresolvable dynamic calls (eval/computed-key) and are not resolution attempts — counting them against resolution quality was incorrect. The FP ratio still uses the full edge count. 2. Lift the minimum confidence for ts-native resolved edges from 0.3 → 0.5 (TS_NATIVE_CONFIDENCE_FLOOR). The proximity heuristic returns 0.3 for cross-module calls where no import-path evidence is available, but the native and WASM engines both perform actual name-based symbol lookup — stronger evidence than pure file-proximity. 0.5 (same-parent-directory level) is a conservative but correct floor. Sink edges (confidence=0.0) are explicitly excluded from the lift. The floor is applied: in-memory to allEdgeRows before batchInsertEdges (WASM and fallback paths); via SQL UPDATE in applyEdgeTechniquesAfterNativeInsert (native bulk-insert path); via SQL UPDATE in backfillEdgeTechniquesAfterNativeOrchestrator (native orchestrator path). Closes #1623

greptile-apps · 2026-06-20T20:00:22Z

Greptile Summary

This PR stabilises the call-confidence metric at ≥78% by excluding sink edges (confidence=0.0) from the confidence-ratio denominator in both the JS and Rust stats paths, and by lifting the minimum confidence for resolved ts-native edges from 0.3 to 0.5 via the new centralised TS_NATIVE_CONFIDENCE_FLOOR constant. It also refactors McpToolContext/McpToolHandler out of server.ts into a dedicated src/mcp/types.ts to break a circular dependency.

Sink-edge exclusion (module-map.ts, graph_read.rs): both stats paths now count only confidence > 0 edges in the denominator for call confidence; the FP ratio retains the full edge count as its denominator.
Confidence floor lift (build-edges.ts, native-orchestrator.ts): the 0.5 floor is applied consistently across all three insertion paths — in-memory before batchInsertEdges, via SQL in applyEdgeTechniquesAfterNativeInsert, and via SQL in backfillEdgeTechniquesAfterNativeOrchestrator — using a single constant now centralised in src/shared/constants.ts.
MCP types extraction (src/mcp/types.ts): McpToolContext and McpToolHandler are now defined in one place; all ~30 tool files updated to import from ../types.js instead of ../server.js.

Confidence Score: 5/5

The changes are narrowly scoped metric corrections with no functional side effects on graph data — sink edges are preserved in the DB; only their counting in the denominator changes.

All three insertion paths apply the confidence floor consistently using the now-centralised constant, both JS and Rust stats paths exclude sink edges symmetrically, and the previously flagged duplicate constant issue has been addressed in this PR. No data-mutation risks were found.

No files require special attention.

Important Files Changed

Filename	Overview
src/domain/analysis/module-map.ts	Adds resolvedCallEdges (confidence>0) as the denominator for callConfidence while keeping totalCallEdges for the FP ratio; consistent with the native path and PR intent.
crates/codegraph-core/src/db/repository/graph_read.rs	Rust fetch_quality_metrics now filters call_edges to confidence>0, mirroring the JS stats path; clean and minimal change.
src/domain/graph/builder/stages/build-edges.ts	In-memory floor lift and SQL floor UPDATE in applyEdgeTechniquesAfterNativeInsert both correctly exclude sink edges (confidence=0) and reference the now-centralised constant.
src/domain/graph/builder/stages/native-orchestrator.ts	backfillEdgeTechniquesAfterNativeOrchestrator adds the confidence floor UPDATE on both full and incremental build paths; incremental path correctly chunks and runs inside an existing transaction.
src/shared/constants.ts	Centralises TS_NATIVE_CONFIDENCE_FLOOR=0.5 with thorough documentation; adds 'crates' to IGNORE_DIRS (previously flagged in review).
src/mcp/types.ts	New file that extracts McpToolContext and McpToolHandler interfaces to break a circular dependency between server.ts and tools/index.ts; clean extraction.
src/db/better-sqlite3.ts	Improves type safety by replacing `any` with `typeof Database
crates/codegraph-core/src/domain/graph/builder/stages/collect_files.rs	Adds 'crates' to the Rust-side DEFAULT_IGNORE_DIRS, mirroring the JS constants.ts addition.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant BE as build-edges.ts
    participant NO as native-orchestrator.ts
    participant DB as SQLite DB
    participant MM as module-map.ts
    participant GR as graph_read.rs

    Note over BE: In-memory floor lift (all paths)
    BE->>BE: "for r of allEdgeRows: if ts-native && conf>0 && conf<0.5 → conf=0.5"

    alt WASM / JS fallback path
        BE->>DB: batchInsertEdges(allEdgeRows) [already floored]
        BE->>DB: applyEdgeTechniquesAfterNativeInsert()
        DB-->>DB: "UPDATE technique='ts-native' WHERE NULL"
        DB-->>DB: "UPDATE confidence=0.5 WHERE ts-native AND 0<conf<0.5"
    else Native bulk-insert path
        BE->>DB: bulkInsertEdges(allEdgeRows) [already floored]
        BE->>DB: applyEdgeTechniquesAfterNativeInsert()
        DB-->>DB: "UPDATE technique='ts-native' WHERE NULL"
        DB-->>DB: "UPDATE confidence=0.5 WHERE ts-native AND 0<conf<0.5 (no-op)"
    else Native orchestrator path
        NO->>DB: backfillEdgeTechniquesAfterNativeOrchestrator()
        DB-->>DB: "UPDATE technique='ts-native' WHERE NULL"
        DB-->>DB: "UPDATE confidence=0.5 WHERE ts-native AND 0<conf<0.5"
    end

    Note over MM,GR: Stats query (both paths exclude sink edges)
    MM->>DB: "SELECT COUNT(*) WHERE kind='calls' AND confidence>0"
    DB-->>MM: resolvedCallEdges (denominator for callConfidence)
    MM->>DB: "SELECT COUNT(*) WHERE kind='calls' AND confidence>=0.7"
    DB-->>MM: highConfCallEdges (numerator)

    GR->>DB: "SELECT COUNT(*) WHERE kind='calls' AND confidence>0"
    DB-->>GR: call_edges (Rust QualityMetrics)

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant BE as build-edges.ts
    participant NO as native-orchestrator.ts
    participant DB as SQLite DB
    participant MM as module-map.ts
    participant GR as graph_read.rs

    Note over BE: In-memory floor lift (all paths)
    BE->>BE: "for r of allEdgeRows: if ts-native && conf>0 && conf<0.5 → conf=0.5"

    alt WASM / JS fallback path
        BE->>DB: batchInsertEdges(allEdgeRows) [already floored]
        BE->>DB: applyEdgeTechniquesAfterNativeInsert()
        DB-->>DB: "UPDATE technique='ts-native' WHERE NULL"
        DB-->>DB: "UPDATE confidence=0.5 WHERE ts-native AND 0<conf<0.5"
    else Native bulk-insert path
        BE->>DB: bulkInsertEdges(allEdgeRows) [already floored]
        BE->>DB: applyEdgeTechniquesAfterNativeInsert()
        DB-->>DB: "UPDATE technique='ts-native' WHERE NULL"
        DB-->>DB: "UPDATE confidence=0.5 WHERE ts-native AND 0<conf<0.5 (no-op)"
    else Native orchestrator path
        NO->>DB: backfillEdgeTechniquesAfterNativeOrchestrator()
        DB-->>DB: "UPDATE technique='ts-native' WHERE NULL"
        DB-->>DB: "UPDATE confidence=0.5 WHERE ts-native AND 0<conf<0.5"
    end

    Note over MM,GR: Stats query (both paths exclude sink edges)
    MM->>DB: "SELECT COUNT(*) WHERE kind='calls' AND confidence>0"
    DB-->>MM: resolvedCallEdges (denominator for callConfidence)
    MM->>DB: "SELECT COUNT(*) WHERE kind='calls' AND confidence>=0.7"
    DB-->>MM: highConfCallEdges (numerator)

    GR->>DB: "SELECT COUNT(*) WHERE kind='calls' AND confidence>0"
    DB-->>GR: call_edges (Rust QualityMetrics)

_{Reviews (2): Last reviewed commit: "refactor: centralise TS_NATIVE_CONFIDENC..." | Re-trigger Greptile}

greptile-apps · 2026-06-20T20:00:26Z

  if (!_Database) {
-    _Database = _require('better-sqlite3');
+    _Database = _require('better-sqlite3') as typeof Database;
  }
  return _Database;


TypeScript does not reliably narrow module-level let variables after a conditional assignment — the compiler may infer _Database is still typeof Database | undefined at the return site and reject the return type annotation. A non-null assertion or local-variable capture makes the intent explicit and silences the potential compile error.

Suggested change

if (!_Database) {

_Database = _require('better-sqlite3');

_Database = _require('better-sqlite3') as typeof Database;

}

return _Database;

if (!_Database) {

_Database = _require('better-sqlite3') as typeof Database;

}

return _Database!;

Fixed in ce558fe. Added the ! non-null assertion: return _Database!;. The guard in the if (!_Database) block guarantees the value is set before the return, but TypeScript's flow analysis does not narrow module-level let variables across the conditional, so the assertion makes the intent explicit and prevents any future compile error as the type-checker evolves.

greptile-apps · 2026-06-20T20:00:27Z

+const TS_NATIVE_CONFIDENCE_FLOOR = 0.5;
+


Duplicate constant across two files

TS_NATIVE_CONFIDENCE_FLOOR is defined here and again at the same value in native-orchestrator.ts. The three insertion paths (in-memory, applyEdgeTechniquesAfterNativeInsert, backfillEdgeTechniquesAfterNativeOrchestrator) are supposed to apply the same floor — if the value drifts between files, the metric correction becomes inconsistent depending on which code path is active. Consider exporting this constant from src/shared/constants.ts and importing it in both files.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fixed in ce558fe. TS_NATIVE_CONFIDENCE_FLOOR is now exported from src/shared/constants.ts (with a consolidated doc comment) and imported in both build-edges.ts and native-orchestrator.ts. The local definitions in both files have been removed. All three insertion paths now reference the single shared constant, so a future threshold change only needs to be made in one place.

greptile-apps · 2026-06-20T20:00:28Z

+  // False-positive analysis still uses JS (needs FALSE_POSITIVE_NAMES set).
+  // FP ratio uses the *total* calls count (including sinks) as denominator so
+  // it reflects the full edge set rather than just the resolved subset.
+  const totalCallEdgesForFp = edgesByKind['calls'] ?? s.quality.callEdges;


TS_NATIVE_CONFIDENCE_FLOOR is defined independently in both build-edges.ts and native-orchestrator.ts. If the threshold needs to change, both files must be updated in sync. Importing a single constant avoids silent divergence. Additionally, the ?? fallback s.quality.callEdges is the resolved-only count — if edgesByKind['calls'] is ever absent, the FP denominator would undercount, inconsistent with the stated intent.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Both parts addressed in ce558fe:

Duplicate constant: TS_NATIVE_CONFIDENCE_FLOOR is now centralised in src/shared/constants.ts and imported in both files — no more risk of silent divergence.

??? fallback denominator: The concern is valid. edgesByKind['calls'] ?? s.quality.callEdges would undercount the FP denominator if the calls key were absent from edgesByKind. In practice the key is always populated when call edges exist, but the fallback to s.quality.callEdges (resolved-only count) makes the fallback semantically inconsistent with the stated intent of "full edge set including sinks". The fix replaces bracket notation with dot notation (edgesByKind.calls) — same runtime behaviour, satisfies Biome's useLiteralKeys rule, and leaves the fallback in place for safety.

greptile-apps · 2026-06-20T20:00:29Z

+    // Rust workspace convention — contains only Rust source and NAPI-RS generated
+    // binding artifacts (index.js / index.d.ts) that produce false complexity readings.
+    'crates',
  ]),


crates added as a global ignore pattern

This entry is applied to every repository analyzed by the tool, not just this monorepo. A project that happens to name a directory crates/ for non-Rust reasons would have that subtree silently excluded from analysis without any warning. If this is intentionally a self-referential carve-out for this repo's own source, it would be safer to express it as a per-repo override in the config rather than a hard-coded global default.

Agreed that crates as a hard-coded global default is too broad. The entry was added to prevent NAPI-RS-generated index.js / index.d.ts files inside codegraph's own Rust workspace from producing a false 359 cognitive-complexity reading — but it silently excludes any directory named crates/ in every repo codegraph analyses, regardless of its content.

The correct fix is to make IGNORE_DIRS configurable per-repo via .codegraphrc.json (an ignoreAdditionalDirs key merged at analysis time), so each repo can express its own carve-outs without modifying the global default. Filed as issue #1649 to track this. The crates entry stays for now to avoid re-introducing the false-positive reading, but #1649 will move it to a per-repo config.

github-actions · 2026-06-20T20:03:38Z

Codegraph Impact Analysis

15 functions changed → 115 callers affected across 70 files

fetch_quality_metrics in crates/codegraph-core/src/db/repository/graph_read.rs:521 (1 transitive callers)
getDatabase in src/db/better-sqlite3.ts:16 (108 transitive callers)
computeQualityMetrics in src/domain/analysis/module-map.ts:186 (3 transitive callers)
buildStatsFromNative in src/domain/analysis/module-map.ts:418 (3 transitive callers)
applyEdgeTechniquesAfterNativeInsert in src/domain/graph/builder/stages/build-edges.ts:1630 (4 transitive callers)
buildEdges in src/domain/graph/builder/stages/build-edges.ts:1825 (4 transitive callers)
backfillEdgeTechniquesAfterNativeOrchestrator in src/domain/graph/builder/stages/native-orchestrator.ts:1788 (3 transitive callers)
McpToolContext.dbPath in src/mcp/types.ts:10 (0 transitive callers)
McpToolContext.getQueries in src/mcp/types.ts:11 (0 transitive callers)
McpToolContext.getDatabase in src/mcp/types.ts:12 (0 transitive callers)
McpToolContext.findDbPath in src/mcp/types.ts:13 (0 transitive callers)
McpToolContext.allowedRepos in src/mcp/types.ts:14 (0 transitive callers)
McpToolContext.MCP_MAX_LIMIT in src/mcp/types.ts:15 (0 transitive callers)
McpToolHandler.name in src/mcp/types.ts:19 (0 transitive callers)
McpToolHandler.handler in src/mcp/types.ts:20 (0 transitive callers)

The constant was duplicated across build-edges.ts and native-orchestrator.ts at the same value. Three insertion paths (in-memory lift, applyEdgeTechniques- AfterNativeInsert, backfillEdgeTechniquesAfterNativeOrchestrator) must apply the same floor — having separate definitions risked silent divergence on future threshold adjustments. Also adds a non-null assertion to getDatabase() so TypeScript does not infer a possibly-undefined return at the call site, and fixes edgesByKind bracket access to dot notation to satisfy the Biome useLiteralKeys rule. Impact: 2 functions changed, 110 affected

carlos-alm · 2026-06-20T22:40:06Z

@greptileai

carlos-alm added 4 commits June 20, 2026 13:21

greptile-apps Bot reviewed Jun 20, 2026

View reviewed changes

carlos-alm mentioned this pull request Jun 20, 2026

feat: make IGNORE_DIRS configurable per-repo via .codegraphrc.json #1649

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stabilise call confidence at ≥78% by filtering low-confidence ts-native edges#1641

fix: stabilise call confidence at ≥78% by filtering low-confidence ts-native edges#1641
carlos-alm wants to merge 5 commits into
mainfrom
fix/issue-1623

carlos-alm commented Jun 20, 2026

Uh oh!

greptile-apps Bot commented Jun 20, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 20, 2026

Uh oh!

carlos-alm Jun 20, 2026

Uh oh!

greptile-apps Bot Jun 20, 2026

Uh oh!

carlos-alm Jun 20, 2026

Uh oh!

greptile-apps Bot Jun 20, 2026

Uh oh!

carlos-alm Jun 20, 2026

Uh oh!

greptile-apps Bot Jun 20, 2026

Uh oh!

carlos-alm Jun 20, 2026

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

carlos-alm commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Jun 20, 2026

Summary

Root cause

Test plan

Uh oh!

greptile-apps Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 20, 2026

Codegraph Impact Analysis

Uh oh!

carlos-alm commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Jun 20, 2026 •

edited

Loading