Skip to content

Add Neo4j-backed TypeScript analysis backend and formalize backend contracts#159

Open
rahlk wants to merge 6 commits into
mainfrom
feat/neo4j-backend-support
Open

Add Neo4j-backed TypeScript analysis backend and formalize backend contracts#159
rahlk wants to merge 6 commits into
mainfrom
feat/neo4j-backend-support

Conversation

@rahlk

@rahlk rahlk commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a Neo4j-backed analysis backend for TypeScript and formalizes the façade↔backend contract across all three languages so additional backends can be dropped in cleanly.

The latest codeanalyzer-typescript can emit its analysis graph to Neo4j (--emit neo4j, schema schema.neo4j.json). This PR adds an SDK backend that answers the exact same get_* query surface as the in-memory backend, but by running Cypher over that graph instead of walking the pydantic / NetworkX structures.

What's included

Neo4j TypeScript backend (cldk/analysis/typescript/neo4j/)

  • TSNeo4jBackend — a drop-in alternative to TSCodeanalyzer implementing the full query surface (call graph, callers/callees, class hierarchy, call sites, decorators, symbol/method/field lookups, imports/exports/variables, …) via Cypher.
  • It can populate the database for you over Bolt (runs the analyzer with --emit neo4j --neo4j-uri), or query a DB that is already loaded (build_db=False).
  • Results are re-hydrated into the same cldk.models.typescript pydantic objects. Fields the projection inherently flattens (collapsed comments/type-params, aggregated import edges, the three round-tripped CALLS tag keys) are reconstructed best-effort and documented inline.
  • Selected through the façade with a single optional parameter:
from cldk import CLDK
from cldk.analysis import AnalysisLevel
from cldk.analysis.typescript.neo4j import Neo4jConnectionConfig

analysis = CLDK(language="typescript").analysis(
    project_path="…",
    analysis_level=AnalysisLevel.call_graph,
    neo4j_config=Neo4jConnectionConfig(uri="bolt://localhost:7687", username="neo4j", password="…"),
)

No neo4j_config → the existing in-memory backend, unchanged.

Shared backend contracts (ABCs)

  • TSAnalysisBackend, JavaAnalysisBackend, PythonAnalysisBackend — abstract bases declaring each façade's backend surface. The existing backends (TSCodeanalyzer, JCodeanalyzer, PyCodeanalyzer) and the new TSNeo4jBackend subclass them, so the relationship is enforced by the type system and at instantiation time rather than only by convention. This sets up equivalent Neo4j backends for Java/Python later.

Misc

  • Optional neo4j extra (pip install cldk[neo4j]) for the driver.
  • Regression test ensuring use_ray is forwarded through the Python façade (parity with the existing use_codeql / cache_dir guards).
  • Tests: live-DB integration tests for the Neo4j backend (auto-skip when no Neo4j is reachable) mirroring the in-memory backend's sample-app expectations; no-DB backend-selection and contract tests for all three languages.

Notes

  • The in-memory and Neo4j TypeScript backends were validated method-by-method against the same sample project for identical results.
  • The Neo4j emission requires codeanalyzer-typescript 0.4.0+; the SDK pydantic models may need a follow-up bump to accept its newer entrypoints field for the in-memory path.

Generated by Claude Code

rahlk and others added 6 commits June 18, 2026 08:39
Signed-off-by: Rahul Krishna <rkrsn@ibm.com>
…analyzer-ts graph)

Add TSNeo4jBackend, a drop-in alternative to the in-memory TSCodeanalyzer that
answers the exact same get_* query surface (call graph, callers/callees, class
hierarchy, call sites, decorators, symbol/method/field lookups, imports/exports/
variables, ...) by running Cypher over a live Neo4j graph instead of walking the
pydantic / NetworkX structures.

The graph is the one codeanalyzer-typescript emits with `--emit neo4j` (schema
schema.neo4j.json). The backend can populate the database for you over Bolt
(running the analyzer with --emit neo4j --neo4j-uri), or query a DB that is
already loaded (build_db=False). Results are re-hydrated into the same
cldk.models.typescript pydantic objects the in-memory backend returns; lossy
fields inherent to the projection (collapsed comments/type-params, aggregated
import edges, the three round-tripped CALLS tag keys) are reconstructed
best-effort and documented inline.

- cldk/analysis/typescript/neo4j/: backend, model reconstruction, Neo4jConnectionConfig.
- TypeScriptAnalysis / CLDK.analysis: optional neo4j_config selects the backend;
  default behavior unchanged.
- pyproject: optional `neo4j` extra for the driver.
- tests: live-DB integration tests (skipped when no Neo4j reachable) mirroring the
  in-memory backend's sample-app expectations, plus no-DB backend-selection unit tests.
Extract TSAnalysisBackend (cldk/analysis/typescript/backend.py), an abstract base
declaring the full 40-method query surface the TypeScriptAnalysis facade delegates
to. Both backends now implement it:
  - TSCodeanalyzer (in-memory pydantic / NetworkX)
  - TSNeo4jBackend (Cypher over Neo4j)

The facade<->backend relationship is now enforced by the type system and at
instantiation time, instead of matching only by convention. Facade `backend` is
typed against the ABC. Added a contract test asserting both backends subclass it,
fully implement it, and preserve every method signature.
Mirror the TypeScript TSAnalysisBackend pattern for Java and Python, in
anticipation of Neo4j/Cypher backends for those languages too:

  - cldk/analysis/java/backend.py:   JavaAnalysisBackend (36-method surface);
    JCodeanalyzer now subclasses it.
  - cldk/analysis/python/backend.py: PythonAnalysisBackend (21-method surface);
    PyCodeanalyzer now subclasses it.

Both facades type their `backend` attribute against the ABC, so a future
alternative backend can be selected without touching the facade. Added contract
tests for each (subclass, abstract/not-instantiable, fully-implemented, and that
the ABC covers every method the facade delegates to).
use_ray is already lifted all the way up (CLDK.analysis → PythonAnalysis →
PyCodeanalyzer → AnalysisOptions.using_ray), but unlike use_codeql and cache_dir
it had no regression test. Add test_use_ray_forwarded_through_facade mirroring the
existing use_codeql guard, so the facade can't silently drop the flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant