fix(sec): SPAC writer atomicity, redemption LLM input cap, partial-success extractor outcome by sroussey · Pull Request #170 · workglow-dev/sec

sroussey · 2026-06-27T08:53:20Z

Summary

Three correctness/security fixes stacked as separate commits.

1. SPAC writer atomicity + monotonic history chain

SpacReportWriter.snapshot() derived valid_from from wall-clock next.updated_at and only de-collided against the currently-open history row, so clock skew or a stale-replay could invert the chain or back-date a history snapshot. rebuild() and snapshot() also read-modify-write without any lock, so two concurrent writers on the same CIK could leave two valid_to == null rows.

Anchor valid_from to the filing data (filingDate for non-stale writes, the existing row's as_of for stale replays) with strict monotonicity enforced against the max of all prior closed/open valid_to values.
New withSpacCikLock wraps the rebuild critical section — SQLite BEGIN IMMEDIATE, Postgres pg_advisory_xact_lock keyed on CIK, in-memory keyed mutex fallback. Backend dispatch checks the active repo class so in-memory test backends never reach getDb().
New SpacWriteLock.test.ts asserts exactly one open history row after 3 parallel writers on the same CIK.

2. Cap redemption AI input bytes

processRedemption8K joined the primary doc + every EX-99 exhibit markdown unconditionally into runStructured, with MAX_TOKENS=4096 bounding only the model's completion. A multi-megabyte EX-99 ran up token bills and widened the prompt-injection surface proportional to filing size.

Per-exhibit cap (200k chars) + total cap (400k chars). Oversized exhibits are dropped (not truncated — partial spans break source-span verification).
Full-drop records an OVERSIZED_INPUT dead-letter without invoking the model.
Partial-drop records an informational <section>-partial-oversized dead-letter so operators can triage filings whose largest exhibit was skipped.
Bump redemption extractor 1.0.0 → 1.1.0 (prompt shape changed → confidence calibration drifts → fresh dev cycle, matching the S-1/424 precedent in PR fix: six HIGH-priority hardening fixes (prompt-injection seal + 8-K storage + XML entity expansion) #165).
Add OVERSIZED_INPUT to DEAD_LETTER_REASON_CODES.

3. Partial-success outcome on `extractor_runs`

makeRunSection catches MODEL_INVALID_OUTPUT / LOW_CONFIDENCE_ALL / UNVERIFIED_SOURCE_SPAN, writes a dead-letter, and returns without throwing. ProcessAccessionDocFormTask then recorded a success row even when every section dead-lettered, so sec version coverage counted those as covered and drop-previous purged the dead-letter rows operators needed for triage.

Add tri-state outcome column (success / partial / failure) to extractor_runs. success boolean kept as outcome === "success" for back-compat.
After successful parse+store, ProcessAccessionDocFormTask queries pending section-level dead-letters for the filing and writes outcome = "partial" when any exist.
countSuccessfulAtVersion and listFilingsWithoutSuccessfulRun count only outcome = "success"; partial rows stay eligible for retry-dead-letters.
Legacy rows backfill outcome from the existing success boolean (partial breakdown is unknowable for them); SQLite setupAllDatabases gets a one-shot ADD COLUMN migration for pre-existing databases.

PR #169 merge-order note

withSpacCikLock does NOT yet handle a nested transaction held by the caller. On main today, recomputeAndSaveDeals issues no inner BEGIN, so the SQLite BEGIN IMMEDIATE here is the only transaction in the rebuild stack. PR #169 (SpacDealReplace.ts) introduces its own transaction in recomputeSpacDeals.

If this PR lands first, PR #169 should rebase its transaction to either skip BEGIN when an outer lock holds it, or detect the active transaction and use SAVEPOINT instead. If PR #169 lands first, this PR's SQLite path may need a similar guard.

Test plan

bun run build — clean
bun test src/storage/spac/ — all pass
bun test src/sec/forms/miscellaneous-filings/ — all pass (incl. new oversized tests)
bun test src/task/forms/ — all pass
bun test src/storage/versioning/ — all pass (incl. new partial-outcome tests)
bun test src/cli/queries/ — VersionCoverage tests still pass
Full bun test — 1386 pass / pre-existing FetchDailyIndexTask / FetchQuarterlyIndexTask network timeouts unrelated to this PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Generated by Claude Code

SpacReportWriter.snapshot() derived valid_from from wall-clock next.updated_at and only de-collided against the currently-open history row, so clock-skew or a stale-replay could invert the chain or back-date a history snapshot. rebuild() and snapshot() also read-modify-write without any lock, so two concurrent writers on the same CIK could leave two valid_to == null rows. Anchor valid_from to the data: filingDate for non-stale writes, the existing row's as_of for stale replays, with strict monotonicity enforced against the max of all prior closed/open valid_to values. Wrap the rebuild critical section in withSpacCikLock — SQLite BEGIN IMMEDIATE, Postgres pg_advisory_xact_lock keyed on CIK, in-memory keyed mutex fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01V3e3m8cMRy5stFhDzGmZrF

processRedemption8K joined the primary doc + every EX-99 exhibit markdown unconditionally into runStructured, with MAX_TOKENS=4096 bounding only the model's completion. A multi-megabyte EX-99 ran up token bills and widened the prompt-injection surface proportional to filing size. Cap per-exhibit at 200k chars and total at 400k chars; oversized exhibits are dropped (not truncated, since a partial span breaks source-span verification). Full-drop records an OVERSIZED_INPUT dead-letter without invoking the model; partial-drop records an additional informational partial-letter so operators can triage filings whose largest exhibit was skipped. Bump redemption extractor version 1.0.0 -> 1.1.0 - the model now sees a different prompt shape, so confidence calibration drifts; treat as a fresh dev cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01V3e3m8cMRy5stFhDzGmZrF

makeRunSection catches MODEL_INVALID_OUTPUT / LOW_CONFIDENCE_ALL / UNVERIFIED_SOURCE_SPAN, writes a dead-letter, and returns without throwing. ProcessAccessionDocFormTask then recorded a success extractor_run row even when every section dead-lettered, so sec version coverage counted them as covered and drop-previous purged the dead-letter rows operators needed for triage. Add a three-state outcome column (success / partial / failure) to extractor_runs. ProcessAccessionDocFormTask now queries the pending section-level dead-letters for the filing it just stored and writes outcome = partial when any exist. countSuccessfulAtVersion and listFilingsWithoutSuccessfulRun count only outcome = success; partial rows stay eligible for retry-dead-letters. Legacy rows backfill outcome from the existing success boolean - partial breakdown is unknowable for them; SQLite gets a one-shot ADD COLUMN migration in setupAllDatabases for pre-existing databases. Also tightens SpacWriteLock's backend dispatch to test the dealRepository class rather than the SEC_DB_TYPE token alone - tests register the token as sqlite while binding in-memory storages, so the env-only check spuriously opened a stray SQLite file via getDb(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01V3e3m8cMRy5stFhDzGmZrF

claude added 3 commits June 27, 2026 08:27

This was referenced Jun 28, 2026

fix(sec): close defang 
 bypass + thread pool client through recomputeSpacDeals #172

Open

fix(sec): gate SQLite SPAC lock through in-process mutex + auto-resolve oversized dead-letter #173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sec): SPAC writer atomicity, redemption LLM input cap, partial-success extractor outcome#170

fix(sec): SPAC writer atomicity, redemption LLM input cap, partial-success extractor outcome#170
sroussey wants to merge 3 commits into
mainfrom
claude/wonderful-hypatia-j1anwi

sroussey commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sroussey commented Jun 27, 2026

Summary

1. SPAC writer atomicity + monotonic history chain

2. Cap redemption AI input bytes

3. Partial-success outcome on extractor_runs

PR #169 merge-order note

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3. Partial-success outcome on `extractor_runs`