feat(bridges): LLM typed-entity Atlas extractor (replace keyword junk) by wolverin0 · Pull Request #166 · wolverin0/memorymaster

wolverin0 · 2026-06-23T01:43:46Z

Replaces the deterministic Atlas claim extractor — the one that produced 5,544 misclassified subject-line wrappers on the live VM (subject="whatsapp_contact", GitHub bot emails labeled 'commitment'), which is why the life-memory never worked (mm-d993).

What

memorymaster/bridges/atlas_llm_extractor.py reads evidence bodies (+sender+date) via call_llm and emits 0..N typed claims (person/company/project/commitment/decision/preference/fact/event) — or nothing for newsletters/bot-notifications. Spec: .planning/ATLAS-LLM-EXTRACTOR-SPEC.md.

Strict-JSON parse + graceful skip (empty/malformed/raising LLM → skip+degraded, never a junk fallback)
Validation rejects bare source-name subjects, the exact whatsapp_contact junk subject, any *_contact, and generic placeholders → the misclassification cannot recur (direct regression test)
Every claim routes through service.ingest (sensitivity filter applies — tested)
Provider-aware citations (the hardcoded whatsapp:// bug fixed)
extract-atlas-claims --extractor {llm,deterministic} (default llm); deterministic path preserved → LifeAgent bridge contract intact

Verification

Hermetic tests (mocked LLM) for all spec §5 cases + the validation guard. Broad atlas/extract/cli sweep: 212 passed. Security review: degrades_gracefully + sensitivity_routed confirmed; both blocking findings fixed before commit.

Follow-up (private, not in this PR)

Run the new extractor over the real VM evidence + merge the good typed claims into the brain.

The deterministic atlas_claim_extractor produced misclassified subject-line wrappers — on the live VM, 5,544 claims like "Atlas commitment evidence from vercel[bot]: Re:[repo]" with subject="whatsapp_contact" and whatsapp:// cites even for email (mm-d993). That noise is why the life-memory never worked. New memorymaster/bridges/atlas_llm_extractor.py reads evidence BODIES (+sender +date) via call_llm and emits 0..N TYPED claims (person/company/project/ commitment/decision/preference/fact/event) — or NOTHING for newsletters/bot notifications. Per spec .planning/ATLAS-LLM-EXTRACTOR-SPEC.md: - strict-JSON-array parse with graceful skip (empty/malformed/raising LLM -> skip + degraded counter, never a fallback junk claim) - validation REJECTS bare source-name subjects, the exact "whatsapp_contact" junk subject, any *_contact, and generic placeholders -> the misclassification cannot recur (direct regression test) - routes every claim through service.ingest (sensitivity filter applies) - provider-aware citations (gmail://outlook://gcal://gdrive://whatsapp://atlas://) - stable idempotency key; dry_run support CLI: extract-atlas-claims gains --extractor {llm,deterministic} (default llm), --model, --dry-run. Deterministic path preserved unchanged; the LifeAgent bridge contract is intact (it just gets the good extractor now). Pinned the deterministic contract test to --extractor deterministic. Hermetic tests (mocked LLM): typed extraction, noise->0-claims regression, graceful degrade, sensitivity routing, idempotency, provider citation, dry_run, CLI dispatch, and the validation guard. Security review: degrades_gracefully + sensitivity_routed confirmed; the two blocking findings (whatsapp_contact gap + default-flip test regression) fixed before commit.

…r deterministic Flipping the extract-atlas-claims default to --extractor llm routed three contract tests in test_atlas_contract.py through the LLM path. They assert the deterministic keyword-matcher's output (matched>=1), so with no LLM configured (CI) they got 0 matched and failed. My local full-suite run masked this — a real LLM provider is configured locally, so the llm path returned matches. Pin all three to --extractor deterministic (they test the deterministic contract, which is preserved). Verified by running the atlas tests with all LLM env vars unset (CI mirror): 92 passed.

…+ document v4.0.0→ work) (#167) * release: v4.1.0 — local-filesystem (Everything) bridge + LLM Atlas extractor Surfaces the previously-merged-but-undocumented work since v4.0.0: - #161 local-filesystem search bridge (resolve_project, Everything ES.exe, path redaction) - #166 LLM typed-entity Atlas extractor - #162 detect-first installer; #165 steward tiers-every-cycle; #163 CI conftest; #164 IP scrub README Key features + CHANGELOG now document the Everything integration. * docs(release): frame local-search honestly — optional, Windows/Codex-focused, graceful-degrade Re-verified the Everything bridge works end-to-end (es.exe 1.1.0.27): resolve-project finds real projects with explainable scoring, memory-first cache hits on repeat, paths redacted. But the value is narrow (path-resolution for weak-file-search agents + cross- session recall), not a headline; speed is sub-second either way. Docs now say so.

wolverin0 added 2 commits June 22, 2026 22:41

wolverin0 merged commit c6d3177 into main Jun 23, 2026
9 checks passed

wolverin0 deleted the feat/atlas-llm-extractor branch June 23, 2026 02:33

wolverin0 mentioned this pull request Jun 24, 2026

release: v4.1.0 — local-search bridge + LLM Atlas extractor (surface + document v4.0.0→ work) #167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bridges): LLM typed-entity Atlas extractor (replace keyword junk)#166

feat(bridges): LLM typed-entity Atlas extractor (replace keyword junk)#166
wolverin0 merged 2 commits into
mainfrom
feat/atlas-llm-extractor

wolverin0 commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wolverin0 commented Jun 23, 2026

What

Verification

Follow-up (private, not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant