Skip to content

improvement(mothership): user_table speed parity — limit bounds, keyset paging, background import/delete jobs#5012

Open
TheodoreSpeaks wants to merge 7 commits into
mainfrom
improvement/table-mothership-speed
Open

improvement(mothership): user_table speed parity — limit bounds, keyset paging, background import/delete jobs#5012
TheodoreSpeaks wants to merge 7 commits into
mainfrom
improvement/table-mothership-speed

Conversation

@TheodoreSpeaks

@TheodoreSpeaks TheodoreSpeaks commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Context

Mothership's user_table tool lagged the fast/safe paths the tables feature already has. This PR lands three of the four findings from the audit; the fourth (the function_execute large-table CSV mount) is deferred — see note at the bottom.

Changes

C — limit bounds. query_rows took an unbounded limit (whole table into one tool result) and the filter ops took unbounded limits (every matching row's id+data into memory). Now rejected above MAX_QUERY_LIMIT (1000) / MAX_BULK_OPERATION_SIZE (1000) with the contract's message text.

D — query_rows waste + paging. Dropped the always-on executions-metadata load (withExecutions: false), and added keyset pagination: accepts an after cursor, returns nextCursor when a default-order page fills (rejects after+sort). Offset paging was O(n²) walking a big table.

E — async job parity for imports/deletes. import_file / create_from_file (CSV/TSV ≥8MB) and delete_rows_by_filter (>1000 matches, no explicit limit) now dispatch the same trigger.dev jobs the UI routes use, claiming the per-table job slot so they can't race a running background job. Smaller imports stay inline but claim/release the slot. The model polls progress via query_rows (rows appear as an import lands; the delete mask makes query_rows reflect the post-delete view immediately) — no dedicated job-status op. TableImportPayload.deleteSourceFile flag (default true) so Mothership imports of persistent workspace files don't delete the source.

Tests

user-table.test.ts (limit clamps, keyset after/nextCursor, after+sort rejection, slot claim/release, async dispatch payloads) and import-runner.test.ts (source-file cleanup default vs deleteSourceFile: false). bun run lint:check, type-check, and check:api-validation green.

Deferred (not in this PR)

  • function_execute large-table mount (finding B): the keyset CSV drain fixed the 100-row truncation but materialized the whole table in the web-container heap before the 50MB check, risking OOM on large/enterprise tables. Backed out here (mount stays at main's 100-row behavior); proper fix — incremental byte-bound + filter/limit/columns selection, or by-reference signed-URL fetch — is a separate effort.

Model-facing docs

Catalog change documenting after / limit maxes / background behavior is in simstudioai/copilot#312.

🤖 Generated with Claude Code

TheodoreSpeaks and others added 5 commits June 12, 2026 03:03
… + tenant-scoped query performance (#4915)

* feat(tables): paginated background row-delete jobs via table_jobs

* fix(tables): address review on async row-delete (filtered count, scoped optimistic clear, Cmd+A select-all, hide delete from tray)

* improvement(tables): filter-aware select-all runs, delete-job read mask, keyset index + autovacuum tuning

* feat(tables): run import/delete/export/backfill jobs on trigger.dev with in-process fallback

* improvement(tables): raise delete page to 10k and export batch to 5k

* improvement(tables): raise CSV import batch to 5k rows (param-cap bounded)

* feat(tables): surface export jobs in the header tray with progress, cancel, and download

* improvement(tables): surface exports as derived tables-scoped toasts instead of the import tray

* Revert "improvement(tables): surface exports as derived tables-scoped toasts instead of the import tray"

This reverts commit 1ea5871.

* fix(tables): preserve export storage key (NoSuchKey) and unify jobs in one spinner tray

* improvement(tables): jobs tray icon reflects aggregate state (spinner/check/alert)

* fix(tables): restore jobs tray on the tables list (dropped in staging merge)

* improvement(tables): keyset-paginate export row reads (offset paging was O(n^2) over large tables)

* perf(tables): keyset pagination for grid infinite scroll

Default-order row pages now cursor on (order_key, id) instead of OFFSET —
each page is an index seek on tableOrderKeyIdx, where OFFSET re-scans and
discards every prior row (O(N²) across deep scrolls and full drains like
select-all/export-to-clipboard). Sorted views keep offset paging; the
contract refines after+sort as mutually exclusive. v1 public rows API is
unchanged (extends the unrefined base, omits after).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): show export in job tray immediately on kickoff

The export-jobs query's poll only self-sustains once a running job is
already in the cache, so a freshly kicked export stayed invisible until
an SSE event or page refresh. Invalidate the tray query on kickoff
success so the icon appears right away.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): surface real row/column write errors in toasts

Drizzle wraps DB errors in DrizzleQueryError whose message is the failed
SQL — the real cause (e.g. the row-limit trigger's RAISE) sits on .cause,
so the routes' substring classification never matched and everything fell
through to generic 500s ("Failed to insert row"). Add rootErrorMessage
(cause-chain unwrap) and a shared rowWriteErrorResponse classifier that
consolidates the per-route pattern lists and rewrites the trigger message
into a friendly "Row limit exceeded — capped at N rows". Applied across
the app and v1 row-write routes and the columns route.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound filtered row counts (12.7s -> 0.6s)

JSONB filter predicates (->> ILIKE / range casts) are opaque to the
planner: it estimates a handful of matches and picks a parallel seq scan
over the entire shared user_table_rows relation — every tenant's rows —
for the page-0 COUNT(*), so any non-equality filter on a large table cost
10s+ regardless of how few rows matched. Run filtered counts in a
transaction with SET LOCAL enable_seqscan = off, forcing the
tenant-bounded bitmap plan. Unfiltered counts keep their index-only scan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound Cmd+F search and stream its window (75s -> 2s)

Same planner trap as the filtered count, compounded: the lateral
jsonb_each_text ILIKE is unestimatable, so findRowMatches on a 1M-row
table seq-scanned the whole 12M-row shared relation and disk-sorted
~120MB of window input (75s measured). SET LOCAL enable_seqscan=off
bounds the scan to the tenant; on the default order, additionally
penalizing bitmap/sort/parallel steers the planner onto the already-
sorted (table_id, order_key, id) index walk so row_number() streams
with no sort at all (2s measured). Flags only penalize plan shapes —
a custom sort still sorts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound sorted pages and filtered write selections

Extends the seqscan fix to every remaining jsonb-predicate path, all
measured on a 1M-row table in a 12M-row shared relation:
- sorted page query (ORDER BY data->>'col'): 9.7s/page -> 0.76s, and deep
  pages stop spilling ~130MB sorts to disk
- updateRowsByFilter / deleteRowsByFilter row selection: 14.4s -> bounded
- delete-job worker selectRowIdPage with a filter: 12.6s/page -> bounded
- dispatcher filtered-scope window walk: same shape, same fix

Shared withSeqscanOff helper moves to lib/table/planner.ts (service +
dispatcher both consume it; dispatcher can't import service).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-scoped containment index (migration 0232)

The plain GIN on user_table_rows.data matched @> candidates across every
tenant sharing the relation — a hot value in someone else's table
inflated everyone's equality filters (1.07M candidates fetched for a
33k-row match, lossy bitmap, 1.1s). Replace it with btree_gin
(table_id, data jsonb_path_ops): the tenant intersection happens inside
the index and paths are single hashed entries. Rare-equality probe
326ms -> 17ms with zero wasted candidates; unique-constraint checks and
upsert conflict lookups ride the same index. The new index is smaller
than the one it replaces (529MB vs 694MB on the 12M-row dev relation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound unique-constraint checks (3.5s -> <1s per write)

The unique check runs lower(data->>'col') = $1 LIMIT 1 on every insert
and cell edit touching a unique column. The predicate is unestimatable
and a unique (non-conflicting) value never exits early, so the planner
seq-scanned all 12.3M shared-relation rows per check — 3.5s measured.
Tenant-bound both the single and batch variants; the batch path sets the
flag on the caller's transaction when one is supplied (SET LOCAL dies at
its commit, and the statements that follow are tenant-scoped writes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound upsert conflict lookup

Same unestimatable data->>key predicate as the unique checks; an
insert-path upsert has no existing match so the lookup can't exit early
and seq-scans the whole shared relation. The upsert already runs in a
transaction — set the planner flag on it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(tables): consolidate executor types onto planner exports

service.ts kept a private DbTransaction alias and two inline
typeof db | DbTransaction unions after planner.ts began exporting the
canonical DbTransaction/DbExecutor — import those instead. From the
/simplify review of the perf series; no behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tests): drop narrow schema mock override in process-contents test

The local vi.mock('@sim/db/schema') stubbed only document/knowledgeBase,
but the file's import graph reaches lib/table/service whose module scope
now references tableJobs. The global schema mock already covers all of
it — rely on it per the testing rules instead of re-mocking.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): scope cancels and counts to the filtered selection (review)

Addresses the open Bugbot/Greptile findings on filtered select-all:
- Filtered runs no longer cancel the whole table: cancelWorkflowGroupRuns
  takes a filter — it stops only dispatches with that exact filter scope
  and only in-flight cells on matching rows (semi-join); whole-table and
  differently-scoped dispatches keep running, their cancelled cells
  skipped via cancelledAt > requestedAt.
- Stop on a filtered select-all sends the filter through cancel-runs
  (contract + route + mutation) instead of a table-wide cancel.
- runColumnBodySchema rejects rowIds + filter together (mirrors
  deleteTableRowsBodySchema).
- Select-all delete clears the selection in onSuccess, not at click, so
  a failed kickoff restores both rows and selection.
- Clipboard copy/cut estimates use the filter-aware total (rowTotal)
  instead of the whole-table rowCount.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore: retrigger CI (Actions dropped the previous push events)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore: bump api-validation route baseline to 807 (staging route + merge)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): release job claim when trigger.dev dispatch fails

If tasks.trigger (or its dynamic imports) throws after
markTableJobRunning, the ghost running row held the table's
one-write-job slot until the stale-job janitor fired (~15-20 min of
409s). All four kickoff routes now release the claim and rethrow; the
backfill runner releases and warns (a failed backfill never fails the
schema change). Greptile P1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore(db): squash 0232 into 0231 (one migration for the PR)

Both are branch-only — no environment has applied them through the
migration ledger yet — so the tenant-scoped GIN (btree_gin extension,
index swap) folds into 0231_table_jobs_and_keyset. Snapshot chain
re-pointed; drizzle-kit generate confirms zero drift.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): context-menu delete label shows the true select-all count

Under select-all the context menu counted only the loaded page ("Delete
1000 rows" on a 999k-row table) while the action correctly deletes every
matching row via the background job. Delete now gets its own count from
the filter-aware total minus deselections; the run-action labels keep the
loaded-row count since those actions act on loaded rows only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): context-menu bulk actions act on the full select-all scope

Follow-up to the label fix: under select-all the context menu's Run /
Re-run / Stop only acted on the loaded page of rows. They now route
through the same scopes as the action bar — runs dispatch by filter
(whole table when unfiltered), Stop uses the filter-scoped cancel — and
all labels share one true count (filter-aware total minus deselections,
locale-formatted). Like the action bar, filter-scoped runs ignore
deselections (the run API has no exclusion set).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(tables): exclusion set for select-all runs and stops

Select-all minus deselected rows now means exactly that for every bulk
action, not just delete. runColumnBodySchema and cancelTableRunsBodySchema
accept excludeRowIds (bounded by MAX_EXCLUDE_ROW_IDS, select-all scope
only); the dispatch scope persists it and the dispatcher window walk,
eager bulk-clear, pre-run cancel, and filter/table-scoped cancel all skip
excluded rows. Client threads exclusions from the selection through the
action bar and the grid context menu, including the optimistic stamps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): spare excluded-row dispatches on Stop; no orphan placeholder table

Two Bugbot findings on the exclusion work:
- Select-all-minus-deselections Stop (no filter) cancelled every active
  dispatch table-wide, killing row-scoped runs on deselected rows.
  markActiveDispatchesCancelled now spares dispatches whose scope.rowIds
  are fully contained in the exclusion set (coalesce(false) keeps
  table-wide dispatches cancellable).
- Create-mode import: a failed trigger.dev dispatch released the job
  claim but left the just-created placeholder table in the workspace.
  Archive it on the failure path (no hard-delete surface exists).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): row counts reflect a running delete everywhere

A mid-delete refresh resurrected the old counts: the optimistic update
stripped cached rows but left page-0 totalCount (footer / select-all
label) at the old total, and list/detail counts reported raw row_count
including doomed-but-not-yet-deleted rows.

- onMutate now sets the active view's totalCount to the kept rows and
  decrements the cached detail rowCount by the doomed estimate
- the kickoff persists that estimate on the job (payload.doomedCount,
  clamped server-side); getTableById/listTables subtract the
  not-yet-deleted remainder (doomedCount - rows_processed) while the
  delete runs, so refetched counts match the read path's delete mask

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* copy(tables): drop background mention from delete confirm

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): clear select-all immediately when a delete kicks off

The header checkbox lingered as a minus over the optimistically-emptied
grid: rowSelectionCoversAll treats zero rows as not-covered, and the
selection clear waited for the kickoff's onSuccess. Clear at click
(failed kickoffs visibly restore rows + toast; re-selecting is cheap)
and render an empty grid's header checkbox unchecked regardless — a
selection over zero rows is vacuous.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): export takes the async path while a delete job runs

The sync/async export choice reads rowCount, which is a doomed-estimate-
adjusted number during a running delete (and the estimate is client-
supplied) — an overstated estimate could route a still-large masked set
through the synchronous stream. Mid-delete exports now always run as a
job: safe at any size, and exports bypass the one-job-per-table gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(build): stop uploads setup from sweeping the project into route graphs

next build (Turbopack) failed with "Two or more assets with different
content were emitted to the same output path" on the server-root chunk.
Root cause: setup.server.ts's unscoped path.resolve(process.cwd()) made
node-file-tracing sweep the entire project — next.config.ts included —
into every route graph reaching lib/uploads (the files/upload route and,
since the export job, the export-async path). Two producers emitted the
swept config into same-named chunks; staging's latest commits made their
contents diverge and the names collided. Annotate the path derivation
with turbopackIgnore per the NFT warning's own remediation — the build
passes and all ~390 "unexpected file in NFT list" warnings disappear.

Also inline the releaseJobClaim dynamic imports in the kickoff routes to
plain static imports — service is already statically imported there.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…post-index ANALYZE guard (#4997)

* fix(tables): per-batch delete-job commits, real trigger.dev retries, post-index ANALYZE guard

* fix(tables): resume job progress across retries, rethrow root cause for clean failure messages
… up tables feature (#4995)

* improvement(tables): migrate inputs to emcn chip components and clean up tables feature

* fix(tables): address review feedback — stale filter column label, shared FieldError in enrichment config

* improvement(tables): scope create-table callback to stable mutateAsync
…me/drop prompt

`drizzle-kit push --force` only suppresses the data-loss confirm, not the
rename-vs-drop disambiguation prompt. That prompt fires whenever a diff both
adds and drops tables/columns at once (e.g. migration 0231 created
sim_trigger_state while dropping the workspace_notification_* tables), and in
CI it crashes with a bare "Interactive prompts require a TTY" stack trace.

Catch that specific failure in the dev push step and emit a GitHub error
annotation explaining the cause and the fix (drop the stale objects on the dev
DB to match schema.ts — the same DROPs the versioned migration already applied
to staging/prod), instead of leaving an opaque trace. Exit status is preserved
either way.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nds, background import/delete jobs

Mothership's table operations lagged the fast paths the tables feature
already has:

- function_execute mounted inputTables via queryRows with defaults,
  silently truncating the sandbox CSV to 100 rows and paying for a count
  and execution metadata it never used. Mounts now drain the keyset
  export reader page by page, remap stored column-id keys to display
  names (headers previously didn't match id-keyed cell data), mount
  tables in parallel, and count toward the 50MB sandbox mount budget.
- user_table query_rows accepted unbounded limits (whole table into one
  tool result) and only offset paging. It now enforces the contracts'
  MAX_QUERY_LIMIT, skips execution metadata, accepts the keyset `after`
  cursor, and returns `nextCursor` when a default-order page fills.
- import_file / create_from_file / delete_rows_by_filter mutated without
  claiming the per-table job slot, racing running background jobs, and
  ran whole imports inline in the chat request. CSV/TSV imports ≥8MB and
  unbounded filter-deletes matching >1000 rows now dispatch the same
  trigger.dev jobs the UI routes use (slot claim, release-on-dispatch-
  failure, delete mask); inline imports claim and release the slot. A new
  get_job operation surfaces the table's derived job state for polling.
- TableImportPayload grows deleteSourceFile (default true) so Mothership
  imports of persistent workspace files survive the worker's single-use
  source cleanup.

Generated tool catalog artifacts regenerated from simstudioai/copilot#289.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@vercel

vercel Bot commented Jun 12, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 16, 2026 2:09am

Request Review

@cursor

cursor Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Large change set touching schema-backed job claiming, background delete/export, and filter-scoped mutations; incorrect masking or job lifecycle could cause data visibility issues or stuck table write slots.

Overview
Tables background work moves from per-table import* fields to a unified table_jobs model (jobStatus, jobId, jobType, … on list/detail APIs). Imports claim the same slot via markTableJobRunning; new POST …/delete-async runs paginated deletes from filter + excludeRowIds + created_at cutoff (optional Trigger.dev), and POST …/export-async plus GET …/export/download handle large exports with storage cleanup on cron prune. Job cancel is generalized to POST …/job/cancel; stale-job cron now fails stuck import/delete jobs and prunes old terminal jobs (deleting export files best-effort).

Bulk workflow APIs accept filter / excludeRowIds on column run and cancel-runs, with shared tableFilterError validation. Row GET adds keyset after cursor parsing. Column/row write routes share rootErrorMessage / rowWriteErrorResponse (including friendly row-limit toasts).

Grid UX treats select-all as “all rows matching the filter” with an exclusion set, kicks off async delete instead of loading every id, and routes Play/Refresh/Stop/context-menu actions through filter-scoped dispatch when appropriate. Exports above a threshold (or during a running delete) go async; SSE job events drive tray refresh and auto-download. Sidebars/filters adopt ChipInput / ChipDropdown and shared sidebar field components.

CI: dev db:push surfaces actionable errors when Drizzle hits non-TTY rename/drop prompts.

Reviewed by Cursor Bugbot for commit 0e47d6e. Bugbot is set up for automated code reviews on this repo. Configure here.

sort: validated.sort ? wire.sortIn(validated.sort) : undefined,
limit: validated.limit,
offset: validated.offset,
after: validated.after,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP rows missing next cursor

Medium Severity

The GET rows handler now accepts an after keyset cursor and passes it to queryRows, but the JSON response never includes a nextCursor when a full page is returned. REST clients cannot advance past the first default-order page without guessing offsets or reusing Mothership-only tooling.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7310c06. Configure here.

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR migrates Mothership's user_table tool to match the performance and safety guarantees already present in the HTTP table routes: limit caps on query_rows and bulk ops, keyset (cursor) pagination for the default row order, and background-job dispatch for large CSV imports and large filtered deletes.

  • Limit enforcement (C): query_rows now rejects requests above MAX_QUERY_LIMIT (1,000) and filter-based bulk ops above MAX_BULK_OPERATION_SIZE (1,000), mirroring the HTTP contract validators.
  • Keyset paging (D): queryRows accepts an after: { orderKey, id } cursor and returns nextCursor when a default-order page fills; after+sort is rejected at both the contract and tool layer.
  • Async job parity (E): CSV/TSV imports ≥ 8 MB and unbounded filtered deletes > 1,000 matches now dispatch trigger.dev tasks (or detached workers) with the same table_jobs ownership gate that prevents racing; small inline imports also claim/release the slot; deleteSourceFile: false preserves persistent workspace files.

Confidence Score: 5/5

Safe to merge. The ownership gate (partial-unique index + onConflictDoNothing) is atomic; claim/release is correctly handled in both the inline and async dispatch paths. The visibility mask for running delete jobs is logically equivalent to the worker's row-selection predicate, keeping mid-job reads consistent.

The three core changes — limit enforcement, keyset pagination, and async job dispatch — are all narrowly scoped and well-tested. The null-orderKey pagination edge case is the only correctness concern, and it only surfaces for tables that have rows predating migration 0226 in a workspace where fractional ordering is now enabled; any such rows continue to be reachable via offset paging. All DB writes involving the new table_jobs table are transactionally safe.

The keyset cursor logic in user-table.ts (query_rows case, lines 583-599) and the corresponding queryRows path in service.ts are worth a second look for any tables with partially-backfilled order_key values.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/tools/server/table/user-table.ts Core Mothership tool handler: adds limit validation, keyset cursor pagination, and async dispatch for large imports/deletes; claim/release logic is correctly guarded via finally/internal error handling.
apps/sim/lib/table/service.ts Adds pendingDeleteMask (visibility mask for running delete jobs), keyset seek in queryRows, job lifecycle helpers, and export/delete page helpers; contains the noted extra DB query per queryRows call for the mask.
apps/sim/lib/table/delete-runner.ts New background delete worker: keyset-paginates by id, ownership-gates per page, commits batches in short transactions, and handles cancellation/supersede via JobSupersededError.
apps/sim/lib/table/import-runner.ts Migrated to unified job-status APIs and adds deleteSourceFile: false guard to preserve persistent workspace files.
packages/db/migrations/0233_table_jobs_and_keyset.sql Creates table_jobs table with partial-unique index for the one-active-job-per-table gate, migrates existing import state, adds tenant-scoped GIN index for JSONB filter queries, and adds keyset index on (table_id, id) for the delete worker.
apps/sim/lib/api/contracts/tables.ts Splits tableRowsQuerySchema into base/refined pair, adds after cursor field with after+sort mutual exclusion, and adds new export-async, delete-async, and job-cancel contracts.
apps/sim/app/api/cron/cleanup-stale-executions/route.ts Updated janitor now marks stale table_jobs (any type) as failed and prunes terminal jobs older than 24 h, with best-effort export-file deletion on prune.
apps/sim/lib/table/validation.ts Wraps per-column unique-constraint checks in withSeqscanOff to prevent full-relation sequential scans; correctly handles external-transaction callers.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant M as Mothership Model
    participant UT as user-table tool
    participant SVC as table/service
    participant TJ as table_jobs (DB)
    participant W as Background Worker

    Note over M,W: Large CSV import via import_file
    M->>UT: import_file(fileId, tableId)
    UT->>SVC: markTableJobRunning(tableId, importId, 'import')
    SVC->>TJ: INSERT job row (partial-unique index gate)
    TJ-->>SVC: "claimed=true"
    SVC-->>UT: true
    UT->>W: dispatchImportJob(payload)
    UT-->>M: success + jobId
    M->>UT: "query_rows(tableId, after=cursor)"
    UT->>SVC: queryRows + pendingDeleteMask check
    SVC-->>UT: rows + nextCursor
    UT-->>M: rows + nextCursor
    W->>SVC: markJobReady(tableId, importId)

    Note over M,W: Unbounded filter-delete via delete_rows_by_filter
    M->>UT: delete_rows_by_filter(tableId, filter)
    UT->>SVC: "queryRows count — totalCount > 1000"
    UT->>SVC: markTableJobRunning(tableId, jobId, 'delete', payload)
    SVC->>TJ: INSERT job row with filter+cutoff payload
    UT->>W: dispatchDeleteJob(jobId, filter, cutoff)
    UT-->>M: success + jobId, rows hidden immediately
    W->>SVC: selectRowIdPage keyset by id
    W->>SVC: deletePageByIds per-batch transactions
    W->>SVC: markJobReady — mask lifts
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant M as Mothership Model
    participant UT as user-table tool
    participant SVC as table/service
    participant TJ as table_jobs (DB)
    participant W as Background Worker

    Note over M,W: Large CSV import via import_file
    M->>UT: import_file(fileId, tableId)
    UT->>SVC: markTableJobRunning(tableId, importId, 'import')
    SVC->>TJ: INSERT job row (partial-unique index gate)
    TJ-->>SVC: "claimed=true"
    SVC-->>UT: true
    UT->>W: dispatchImportJob(payload)
    UT-->>M: success + jobId
    M->>UT: "query_rows(tableId, after=cursor)"
    UT->>SVC: queryRows + pendingDeleteMask check
    SVC-->>UT: rows + nextCursor
    UT-->>M: rows + nextCursor
    W->>SVC: markJobReady(tableId, importId)

    Note over M,W: Unbounded filter-delete via delete_rows_by_filter
    M->>UT: delete_rows_by_filter(tableId, filter)
    UT->>SVC: "queryRows count — totalCount > 1000"
    UT->>SVC: markTableJobRunning(tableId, jobId, 'delete', payload)
    SVC->>TJ: INSERT job row with filter+cutoff payload
    UT->>W: dispatchDeleteJob(jobId, filter, cutoff)
    UT-->>M: success + jobId, rows hidden immediately
    W->>SVC: selectRowIdPage keyset by id
    W->>SVC: deletePageByIds per-batch transactions
    W->>SVC: markJobReady — mask lifts
Loading

Reviews (4): Last reviewed commit: "improvement(mothership): drop get_job op..." | Re-trigger Greptile

Comment on lines 291 to 326
const tableMounts = await Promise.all(
inputTables.map(async (tableRef) => {
const tableId =
typeof tableRef === 'string'
? tableRef
: tableRef && typeof tableRef === 'object'
? (tableRef as CanonicalTableInput).tableId || (tableRef as CanonicalTableInput).path
: undefined
if (!tableId) return null
const table = await resolveTableRef(tableId, tablePathLookup)
if (!table || table.workspaceId !== workspaceId) {
throw new Error(
`Input table not found: "${tableId}". Pass the table id (tbl_...) from tables/{name}/meta.json, or a tables/{name}/meta.json path.`
)
}
const csvContent = await buildTableCsvForMount(table)
const sandboxPath =
typeof tableRef === 'object' && tableRef !== null
? (tableRef as CanonicalTableInput).sandboxPath
: undefined
if (!tableId) continue
const table = await resolveTableRef(tableId, tablePathLookup)
if (!table || table.workspaceId !== workspaceId) {
throw new Error(
`Input table not found: "${tableId}". Pass the table id (tbl_...) from tables/{name}/meta.json, or a tables/{name}/meta.json path.`
)
}
const rows = await queryRows(table, {}, 'copilot-fn-exec')

const allKeys = new Set(table.schema.columns.map((column) => column.name))
for (const row of rows.rows ?? []) {
if (row.data && typeof row.data === 'object') {
for (const key of Object.keys(row.data as Record<string, unknown>)) {
allKeys.add(key)
}
return {
path: sandboxPath || `/home/user/tables/${table.id}.csv`,
content: csvContent,
}
}
const headers = Array.from(allKeys)
const csvLines = [headers.join(',')]
for (const row of rows.rows ?? []) {
const data = (row.data || {}) as Record<string, unknown>
csvLines.push(
headers
.map((h) => {
const val = data[h]
const str = val === null || val === undefined ? '' : String(val)
return str.includes(',') || str.includes('"') || str.includes('\n')
? `"${str.replace(/"/g, '""')}"`
: str
})
.join(',')
})
)
for (const mount of tableMounts) {
if (!mount) continue
if (totalSize + mount.content.length > MAX_TOTAL_SIZE) {
throw new Error(
`Mounting table "${mount.path}" would exceed the ${MAX_TOTAL_SIZE / 1024 / 1024}MB total mount limit. Mount fewer or smaller tables.`
)
}
const csvContent = csvLines.join('\n')
const sandboxPath =
typeof tableRef === 'object' && tableRef !== null
? (tableRef as CanonicalTableInput).sandboxPath
: undefined
sandboxFiles.push({
path: sandboxPath || `/home/user/tables/${table.id}.csv`,
content: csvContent,
})
totalSize += mount.content.length
sandboxFiles.push(mount)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Full table CSVs built before budget enforcement

All input tables are serialized in parallel via Promise.all before any size check runs. For each table the buildTableCsvForMount loop fetches every row page-by-page and accumulates the entire CSV in memory — no size gate inside the loop. The outer budget check (totalSize + mount.content.length > MAX_TOTAL_SIZE) only fires after every table has been fully loaded.

Concrete failure: a user mounts two enterprise-plan tables each at ~40MB of rows. Both CSVs are built (80MB in memory) before the loop sees that the second one exceeds the 50MB cap and throws. File and directory mounts check record.size before fetching; tables lack that pre-flight guard. Under traffic, several concurrent requests doing this can exhaust the web container's heap.

Comment on lines +785 to +818
if (args.limit === undefined) {
const { totalCount } = await queryRows(
table,
{ filter: idFilter, limit: 1, withExecutions: false },
requestId
)
const matchCount = totalCount ?? 0
if (matchCount > TABLE_LIMITS.MAX_BULK_OPERATION_SIZE) {
const doomedCount = Math.min(matchCount, table.rowCount)
const cutoff = new Date()
const jobId = generateId()
const payload: TableDeleteJobPayload = {
filter: idFilter,
cutoff: cutoff.toISOString(),
doomedCount,
}
assertNotAborted()
const claimed = await markTableJobRunning(table.id, jobId, 'delete', payload)
if (!claimed) {
return { success: false, message: 'A job is already in progress for this table' }
}
await dispatchDeleteJob({
jobId,
tableId: table.id,
workspaceId,
filter: idFilter,
cutoff,
})
return {
success: true,
message: `Started background delete of ${doomedCount} matching rows (job ${jobId}). The rows are hidden from reads immediately; call get_job to track progress.`,
data: { jobId, doomedCount },
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 doomedCount can overestimate when rows are inserted between count and dispatch

matchCount comes from queryRows(..., { limit: 1, withExecutions: false }) and reflects the delete-mask-filtered row count. Between that count query and markTableJobRunning, rows inserted after the cutoff timestamp could match the filter, making matchCount stale. The cutoff passed to the background worker correctly caps new-row inclusion, but doomedCount stored in the job payload — and surfaced in the get_job response message — would over-report the number of rows being removed. Not a correctness issue for the worker, but the user-visible estimate is an upper bound, not an exact figure.

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR closes three performance gaps in Mothership's table operations: sandbox table mounts now drain all rows via keyset pagination instead of silently truncating to 100, query_rows gains a 1000-row limit cap and keyset cursor support, and large CSV/TSV imports plus unbounded filter-deletes are handed off to the existing background-job infrastructure (with a new get_job poll operation). The underlying data model also migrates: per-import state columns on user_table_definitions are replaced by a new table_jobs table with a partial-unique index enforcing one running write-job per table.

  • function-execute mounts: buildTableCsvForMount pages through selectExportRowPage in 5k-row chunks, remaps column-id keys to display names, and counts toward the 50 MB budget — fixing empty-column mounts on id-keyed tables and the silent 100-row truncation.
  • query_rows contract parity: limit now rejected above 1000, withExecutions: false, and keyset after/nextCursor accepted (rejected alongside sort); inline imports/deletes claim the one-write-job slot to block interleaving.
  • Background import/delete in Mothership: CSV/TSV ≥ 8 MB dispatches tableImportTask; unbounded filter-deletes above 1000 rows dispatch tableDeleteTask with a pendingDeleteMask that hides doomed rows immediately from all read paths.

Confidence Score: 3/5

Safe to merge for most workloads; enterprise tables with hundreds of thousands of rows could exhaust Vercel function memory during a sandbox mount before the 50 MB budget check fires.

The core job-slot migration, keyset pagination, background import/delete dispatch, and schema change are all well-structured and tested. The one concern worth resolving before a wide enterprise rollout is buildTableCsvForMount: all table CSVs are fully assembled in parallel memory before the budget check, so a user who mounts a 500k-row enterprise table gets a function OOM rather than a clean rejection.

apps/sim/lib/copilot/tools/handlers/function-execute.ts — the buildTableCsvForMount loop and the parallel Promise.all mount assembly need a per-page budget check to avoid OOM on large enterprise tables.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/tools/handlers/function-execute.ts Replaces the truncated 100-row queryRows mount with a keyset-draining buildTableCsvForMount that reads all rows and remaps id-keyed cells back to display names. Has a memory safety issue: the full CSV is materialized in memory before the 50 MB budget check runs, and pendingDeleteMask is called once per page (N+1 pattern).
apps/sim/lib/table/service.ts Major overhaul: migrates import-status fields from user_table_definitions to table_jobs, adds selectExportRowPage (position-keyset export drain), pendingDeleteMask for delete-job visibility, selectRowIdPage/deletePageByIds for async delete workers, and latestJobForTable/latestJobsForTables for the new job fields. pendingDeleteMask now runs on every queryRows call regardless of table state.
apps/sim/lib/copilot/tools/server/table/user-table.ts Adds query_rows limit clamping, keyset cursor (after/nextCursor), background delete dispatch for unbounded deletes, background import dispatch for large CSV/TSV files, get_job polling operation, and job-slot claim/release for inline imports. Logic is well-structured with proper guards and rollback paths.
apps/sim/lib/table/import-runner.ts Renames import-specific service calls to generic job variants (markImportFailed→markJobFailed, etc.), adds deleteSourceFile flag (defaults true for UI single-use uploads; pass false for Mothership persistent workspace files), and updates SSE events to the new job event kind.
packages/db/migrations/0233_table_jobs_and_keyset.sql Creates table_jobs with partial unique index (one running write-job per table), migrates existing import_status rows idempotently via ON CONFLICT DO NOTHING, tunes autovacuum on user_table_rows, and adds two CONCURRENT indexes (table_id+id keyset, tenant-scoped btree_gin) while dropping the old cross-tenant GIN index.
apps/sim/lib/table/delete-runner.ts New async delete worker: walks rows in keyset pages (selectRowIdPage → deletePageByIds), checks job ownership on each page, applies cutoff/filter/exclusion scope, emits SSE events, and marks the job terminal or failed.
packages/db/schema.ts Adds tableJobs table with unique partial index, replaces the old cross-tenant GIN index on user_table_rows with a tenant-scoped btree_gin index, adds table_id+id keyset index, and removes import_* columns from userTableDefinitions.
apps/sim/lib/table/dispatcher.ts Extends DispatchScope with filter/excludeRowIds for select-all-under-filter and select-all-minus-deselections, wraps the dispatcherStep page query in withSeqscanOff for jsonb-filtered scopes, and adds scope-aware markActiveDispatchesCancelled with JSONB scope comparison.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Mothership: import_file / create_from_file] --> B{CSV/TSV 8 MB?}
    B -- No --> C[markTableJobRunning claim slot]
    C --> D[Inline import parseFileRows + batchInsert]
    D --> E[releaseJobClaim]
    B -- Yes --> F[createTable with jobStatus=running or markTableJobRunning]
    F --> G{isTriggerDevEnabled?}
    G -- Yes --> H[tasks.trigger tableImportTask]
    G -- No --> I[runDetached runTableImport]
    H -- dispatch error --> J[releaseJobClaim / deleteTable]
    I --> K[runTableImport streaming keyset import]
    H --> K

    L[Mothership: delete_rows_by_filter] --> M{limit === undefined?}
    M -- Yes --> N[queryRows count with pendingDeleteMask]
    N --> O{matchCount > 1000?}
    O -- No --> P[deleteRowsByFilter inline]
    O -- Yes --> Q[markTableJobRunning claim]
    Q --> R[dispatchDeleteJob]
    R --> S[runTableDelete keyset page walk]
    M -- No --> P

    T[Mothership: query_rows] --> U{limit > 1000?}
    U -- Yes --> V[return error]
    U -- No --> W[queryRows with pendingDeleteMask + after cursor]
    W --> X[return rows + nextCursor]

    Y[Mothership: get_job] --> Z[getTableById latestJobForTable]
    Z --> AA[return job status / rowCount]
Loading

Reviews (2): Last reviewed commit: "improvement(mothership): table speed par..." | Re-trigger Greptile

Comment on lines 73 to 108
* would corrupt values.
*/
function formatMountCsvValue(value: unknown): string {
if (value === null || value === undefined) return ''
if (value instanceof Date) return value.toISOString()
if (typeof value === 'object') return JSON.stringify(value)
return String(value)
}

/**
* Serializes a full table to CSV for a sandbox mount. Walks the keyset export
* reader page by page so every row is included (`queryRows` with defaults
* silently truncated mounts to its 100-row page and paid for a count and
* execution metadata the CSV never used), and remaps stored column-id keys
* back to display names so headers line up with cell values.
*/
async function buildTableCsvForMount(table: TableDefinition): Promise<string> {
const nameById = buildNameById(table.schema)
const headers = table.schema.columns.map((c) => c.name)
const lines = [toCsvRow(headers)]
let after: { position: number; id: string } | null = null
while (true) {
const page = await selectExportRowPage(table, after, TABLE_MOUNT_PAGE_SIZE)
for (const row of page) {
const data = rowDataIdToName(row.data, nameById)
lines.push(toCsvRow(headers.map((header) => formatMountCsvValue(data[header]))))
}
if (page.length < TABLE_MOUNT_PAGE_SIZE) return lines.join('\n')
const last = page[page.length - 1]
after = { position: last.position, id: last.id }
}
}

async function resolveInputFiles(
workspaceId: string,
inputFiles?: unknown[],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Full CSV materialized in memory before the 50 MB budget check

buildTableCsvForMount drains every row into a lines[] array and only returns after the full table is assembled. The budget check in the caller happens after all table CSVs are built via Promise.all. For an enterprise table with 100k+ rows, the CSV string can be hundreds of megabytes or more before the check ever fires — the budget guard prevents the file from reaching the sandbox, but the memory damage is already done (and in the parallel case, all tables are materialized simultaneously).

The old queryRows path implicitly bounded this to 100 rows. A straightforward fix is to track a running byte count inside the drain loop and throw early once the caller's budget would be exceeded — or pass maxBytes as a parameter and abort the page walk.

Comment on lines 73 to 108
* would corrupt values.
*/
function formatMountCsvValue(value: unknown): string {
if (value === null || value === undefined) return ''
if (value instanceof Date) return value.toISOString()
if (typeof value === 'object') return JSON.stringify(value)
return String(value)
}

/**
* Serializes a full table to CSV for a sandbox mount. Walks the keyset export
* reader page by page so every row is included (`queryRows` with defaults
* silently truncated mounts to its 100-row page and paid for a count and
* execution metadata the CSV never used), and remaps stored column-id keys
* back to display names so headers line up with cell values.
*/
async function buildTableCsvForMount(table: TableDefinition): Promise<string> {
const nameById = buildNameById(table.schema)
const headers = table.schema.columns.map((c) => c.name)
const lines = [toCsvRow(headers)]
let after: { position: number; id: string } | null = null
while (true) {
const page = await selectExportRowPage(table, after, TABLE_MOUNT_PAGE_SIZE)
for (const row of page) {
const data = rowDataIdToName(row.data, nameById)
lines.push(toCsvRow(headers.map((header) => formatMountCsvValue(data[header]))))
}
if (page.length < TABLE_MOUNT_PAGE_SIZE) return lines.join('\n')
const last = page[page.length - 1]
after = { position: last.position, id: last.id }
}
}

async function resolveInputFiles(
workspaceId: string,
inputFiles?: unknown[],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 pendingDeleteMask fetched on every page — N+1 DB round-trips during drain

selectExportRowPage calls pendingDeleteMask(table) unconditionally, so a 100k-row table drained in 20 pages makes 20 extra DB round-trips just to check whether a delete job is running. The mask is derived from table.id which is constant, and the delete job status won't change meaningfully mid-drain. Fetching it once before the loop and threading it through would eliminate the repeated queries.

Comment thread apps/sim/lib/table/service.ts
…er to follow-up

The keyset CSV drain for inputTables fixed the 100-row truncation but
introduced an unbounded in-memory materialization: a large table (e.g.
enterprise 1M-row) is drained whole into the web-container heap before
the 50MB mount check runs, risking OOM. Reverting to main's behavior
(100-row mount) for now; the proper fix (incremental byte-bound +
filter/limit/columns selection, or by-reference signed-URL fetch) is
deferred to a separate effort so this PR can ship the user_table speed
work (C/D/E) without the memory regression.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@TheodoreSpeaks TheodoreSpeaks changed the title improvement(mothership): table speed parity — keyset reads, limit bounds, background import/delete jobs improvement(mothership): user_table speed parity — limit bounds, keyset paging, background import/delete jobs Jun 16, 2026
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/table/service.ts
const nextCursor =
!args.sort && result.rows.length === result.limit && lastRow?.orderKey
? { orderKey: lastRow.orderKey, id: lastRow.id }
: undefined

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nextCursor requires truthy orderKey

Low Severity

query_rows only returns nextCursor when the last row has a truthy orderKey. A full page whose last row lacks order_key (e.g. not yet backfilled) omits the cursor even though more rows may exist.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c1c4160. Configure here.

…s via query_rows

Per review, get_job is unnecessary: the model observes background
import/delete progress with query_rows (rows appearing as an import lands;
the delete mask already makes query_rows reflect the post-delete view).
Removes the operation and repoints the dispatch messages at query_rows.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0e47d6e. Configure here.

if (excluded.has(targetId)) excluded.delete(targetId)
else excluded.add(targetId)
return { kind: 'all', excluded: excluded.size === 0 ? undefined : excluded }
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shift-click shrinks select-all scope

Medium Severity

After a filter-wide select-all (rowSelection.kind === 'all'), shift-clicking row checkboxes only updates the loaded page: the handler materializes selection to in-memory row ids instead of keeping the exclusion-set model. Unloaded matching rows drop out of delete, run, and stop actions that rely on select-all semantics.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0e47d6e. Configure here.

onStopAllRows(queryOptions.filter ?? undefined, excluded)
} else {
onStopRows(contextMenuRowIds)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop count ignores select-all scope

Low Severity

For a filter-wide select-all, the context menu stop action calls onStopAllRows with the filter, but runningInContextSelection still sums runningByRowId only over contextMenuRowIds (loaded rows). The Stop item can stay hidden or show the wrong count when running workflows sit on unloaded matching rows.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0e47d6e. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants