improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation by icecrasher321 · Pull Request #5089 · simstudioai/sim

icecrasher321 · 2026-06-16T03:42:06Z

Summary

Fixes a class of 10 MB limit failures across workflow execution and KB connectors.

Function blocks: over-budget resolved block-output context values are offloaded to durable large-value refs and lazily re-read in the sandbox (sim.values.read), so a JS function can merge medium files without busting the 10 MB inter-block request-body cap (the original "Seedance" merge failure).
KB connectors — never silent: oversized files now surface as visible failed KB documents (with a reason) instead of being silently dropped — at listing time (GitHub/S3/Dropbox/OneDrive/SharePoint) and fetch time (GitLab/Azure/Google Drive via a shared ConnectorFileTooLargeError).
KB connectors — memory safety: unbounded response.text() downloads replaced with streaming readBodyWithLimit (cancels past the cap; closes a Dropbox OOM/DoS gap).
KB connectors — cap raised: per-file limit raised from a hardcoded 10 MB to the canonical 100 MB KB document limit (CONNECTOR_MAX_FILE_BYTES), except Google Drive's export path (Google's hard 10 MB export-API limit).
Sync engine: classifyExternalDoc classification, bulk skipDocuments (failed rows, excluded from the stuck-doc retry sweep), byte-bounded batch concurrency so the raised cap can't OOM the worker, and a metadata.fileSize ?? size fallback so skipped rows show the real size.

Type of Change

Bug fix

Testing

WiP

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…onnector size limits Addresses a class of 10 MB limit failures: - executor/variables: offload over-budget function block-output context values to durable large-value refs (lazy `sim.values.read`) so JS function blocks can merge medium files without exceeding the 10 MB inter-block request-body cap. - connectors: stream downloads via `readBodyWithLimit` (memory-safe), and surface oversized files as visible `failed` KB documents instead of silently dropping them — listing-time for github/s3/dropbox/onedrive/sharepoint, fetch-time for gitlab/azure/google-drive via a shared `ConnectorFileTooLargeError`. Raise the per-file cap from a hardcoded 10 MB to the canonical 100 MB KB document limit (`CONNECTOR_MAX_FILE_BYTES`), except Google Drive's export path (Google's hard 10 MB export-API limit). - sync-engine: `classifyExternalDoc` + bulk `skipDocuments` (failed rows with a reason, excluded from retry), byte-bounded batch concurrency to cap peak worker memory at the raised cap, and a `metadata.fileSize ?? size` fallback.

# Conflicts: # apps/sim/connectors/utils.test.ts

vercel · 2026-06-16T03:42:11Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	Jun 16, 2026 7:59pm

cursor · 2026-06-16T04:06:26Z

PR Summary

Medium Risk
Touches workflow execution payloads, SSE terminal delivery, and many connector download paths; behavior changes (higher limits, failed visible rows, offload) are intentional but broad across sync and execute paths.

Overview
Addresses 10 MB request-body and unbounded-download failures in workflow execution and KB connector syncs.

Function blocks: Resolved block-output values that would blow the ~6 MB inline budget (data + display) are offloaded to durable large-value refs and read in the JS sandbox via sim.values.read, so merging several medium payloads no longer fails the internal function route. Display code shows placeholders for offloaded values.

Workflow SSE: If the Redis event buffer rejects a write (e.g. oversized block output), terminal events still go out on the live SSE stream so the UI does not hang on “running.”

KB connectors: Shared CONNECTOR_MAX_FILE_BYTES (aligned with manual KB upload, up from hardcoded 10 MB) plus readBodyWithLimit instead of unbounded response.text() / truncation. Oversized files become failed rows with skippedReason via markSkipped / stubOrSkipBySize / ConnectorFileTooLargeError, at listing and fetch time, across storage-style connectors (Dropbox, OneDrive, SharePoint, S3, GitHub, GitLab, Azure DevOps, Google Drive downloads, Zoom VTT). takeIndexableWithinCap so skipped files do not consume maxFiles. Google Workspace export stays on Google’s 10 MB API limit.

Sync engine: classifyExternalDoc, bulk skipDocuments, byte-bounded chunkOpsByByteBudget for hydration concurrency, and stuck-doc retry excludes rows with no storageKey.

Agent skill memory-load-check documents when to apply the connector size pattern.

^{Reviewed by Cursor Bugbot for commit e1bece6. Configure here.}

greptile-apps · 2026-06-16T04:15:59Z

Greptile Summary

This PR tackles a class of 10 MB cap failures across workflow execution and KB connectors by offloading oversized function-block context values to durable large-value refs, raising the per-file connector limit to the canonical 100 MB KB document cap, and replacing silent drops of oversized files with visible failed KB rows.

Function blocks: maybeOffloadInlineFunctionContextValue in the resolver measures the inline footprint of each block-output reference and offloads values that would push the request body past a 6 MB combined budget to durable storage, lazily re-read in the sandbox via sim.values.read.
KB connectors: All file-storage connectors (Dropbox, OneDrive, SharePoint, GitHub, GitLab, Azure DevOps, S3, Google Drive, Zoom) now use shared utilities — readBodyWithLimit, stubOrSkipBySize, markSkipped, takeIndexableWithinCap — to stream-cap downloads, surface oversized files as visible failed rows rather than silently dropping them, and apply the cap correctly to the indexable-file quota.
Sync engine: classifyExternalDoc centralises reconciliation logic; chunkOpsByByteBudget bounds peak worker memory by splitting batches on both count and summed source bytes; skipDocuments bulk-inserts content-less failed rows; the stuck-doc retry sweep is guarded with isNotNull(storageKey) to exclude unskippable rows.

Confidence Score: 4/5

Safe to merge with awareness of the two P2 notes — no blocking defects introduced.

The changes are broad (19 files, three distinct subsystems) but well-structured: shared utilities have unit tests, the sync engine's new classification and batching paths are tested, and the resolver offload logic includes thorough test coverage. The two observations are minor: the takeIndexableWithinCap boundary behaviour is explicitly documented as intentional, and the storageKey null assumption holds because addDocument uploads before committing the row. No correctness regression or data-loss path was identified.

apps/sim/lib/knowledge/connectors/sync-engine.ts and apps/sim/connectors/utils.ts warrant a second read — the new takeIndexableWithinCap boundary behaviour means oversized files that appear after the indexable quota is saturated are still silently excluded from the failed-row surface.

Important Files Changed

Filename	Overview
apps/sim/lib/knowledge/connectors/sync-engine.ts	Major additions: classifyExternalDoc, chunkOpsByByteBudget, skipDocuments, and byte-bounded batch concurrency. Logic is sound; skipped-file visibility and duplicate-insert safety are correctly handled via existingByExternalId. One P2 note around the storageKey null filter assumption.
apps/sim/connectors/utils.ts	New shared utilities: CONNECTOR_MAX_FILE_BYTES, readBodyWithLimit, markSkipped, stubOrSkipBySize, isSkippedDocument, takeIndexableWithinCap, ConnectorFileTooLargeError. Implementation is correct; the takeIndexableWithinCap loop-break behavior means skipped items after the cap boundary can be silently dropped (P2, documented as intentional).
apps/sim/executor/variables/resolver.ts	Adds maybeOffloadInlineFunctionContextValue to offload oversized block-output values to durable large-value refs before they bust the 10 MB request body cap. Budget tracking and authorization (mergeLargeValueKeys) look correct; fallback on store failure leaves value inline.
apps/sim/app/api/workflows/[id]/execute/route.ts	Wraps event-buffer writes in try/catch so a buffer rejection (e.g. oversized payload) falls through to live SSE delivery rather than crashing the stream. terminalEventPublished is still set so finalization closes cleanly.
apps/sim/connectors/dropbox/dropbox.ts	Replaces isSupportedFile (silently filtered oversized) with isDownloadableFile + stubOrSkipBySize, fixes the OOM/DoS gap by replacing response.text() with readBodyWithLimit, and surfaces oversized files as visible failed rows at both listing and fetch time.
apps/sim/connectors/google-drive/google-drive.ts	Export path adds 403/exportSizeLimitExceeded detection to throw ConnectorFileTooLargeError; download path switches from truncating response.text() to readBodyWithLimit at 100 MB; both paths now surface oversized files as skipped failed rows.
apps/sim/connectors/s3/s3.ts	Listing now uses stubOrSkipBySize + takeIndexableWithinCap; getDocument returns markSkipped stubs instead of null for oversized objects; slicedSome computation changed from explicit slice tracking to documents.length < stubs.length (semantically equivalent).
apps/sim/connectors/zoom/zoom.ts	Applies the standard connector size pattern (stubOrSkipBySize at listing, readBodyWithLimit at fetch, markSkipped on overflow) to VTT transcript downloads; adds fileSize to metadata for accurate size display in KB UI.
apps/sim/connectors/types.ts	Adds optional skippedReason field to ExternalDocument; well-documented with clear semantics for the sync engine.
apps/sim/lib/execution/payloads/large-value-ref.ts	Exports formatLargeValueSize so the resolver can generate human-readable placeholder display text for offloaded refs.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Connector listDocuments] --> B{size reported?}
    B -- yes, size > cap --> C[stubOrSkipBySize → markSkipped\nskippedReason set, contentDeferred=false]
    B -- no size or within cap --> D[Normal stub\ncontentDeferred=true]
    C --> E[takeIndexableWithinCap\nSkipped items ride along,\nnot counted against quota]
    D --> E
    E --> F[classifyExternalDoc]
    F -- skip, no existing row --> G[skip op → skipDocuments\nbulk-insert failed rows]
    F -- skip, existing row --> H[unchanged ++\nlast-known-good kept]
    F -- add/update --> I[contentOps]
    I --> J[chunkOpsByByteBudget\ncount + byte budget]
    J --> K{contentDeferred?}
    K -- yes --> L[getDocument hydration]
    L -- skippedReason at fetch time --> G
    L -- content OK --> M[addDocument / updateDocument]
    K -- no --> M
    G --> N[DB: failed row\nstorageKey=null]
    M --> O[DB: pending row\nstorageKey set]
    O --> P[stuck-doc retry sweep\nisNotNull storageKey excludes N]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Connector listDocuments] --> B{size reported?}
    B -- yes, size > cap --> C[stubOrSkipBySize → markSkipped\nskippedReason set, contentDeferred=false]
    B -- no size or within cap --> D[Normal stub\ncontentDeferred=true]
    C --> E[takeIndexableWithinCap\nSkipped items ride along,\nnot counted against quota]
    D --> E
    E --> F[classifyExternalDoc]
    F -- skip, no existing row --> G[skip op → skipDocuments\nbulk-insert failed rows]
    F -- skip, existing row --> H[unchanged ++\nlast-known-good kept]
    F -- add/update --> I[contentOps]
    I --> J[chunkOpsByByteBudget\ncount + byte budget]
    J --> K{contentDeferred?}
    K -- yes --> L[getDocument hydration]
    L -- skippedReason at fetch time --> G
    L -- content OK --> M[addDocument / updateDocument]
    K -- no --> M
    G --> N[DB: failed row\nstorageKey=null]
    M --> O[DB: pending row\nstorageKey set]
    O --> P[stuck-doc retry sweep\nisNotNull storageKey excludes N]

_{Reviews (2): Last reviewed commit: "fix accounting issue" | Re-trigger Greptile}

icecrasher321 · 2026-06-16T19:59:16Z

@greptile

icecrasher321 · 2026-06-16T19:59:20Z

bugbot run

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e1bece6. Configure here.}

icecrasher321 added 4 commits June 15, 2026 20:25

Merge branch 'staging' into improvement/ten-mb-lims

2019e88

# Conflicts: # apps/sim/connectors/utils.test.ts

fix zoom

66b0f58

update skill

26cf668

vercel Bot deployed to Preview June 16, 2026 03:47 View deployment

icecrasher321 changed the title ~~fix(execution): offload large function inputs~~ improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation Jun 16, 2026

icecrasher321 marked this pull request as ready for review June 16, 2026 04:06

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread apps/sim/connectors/dropbox/dropbox.ts

Comment thread apps/sim/executor/variables/resolver.ts

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread apps/sim/lib/knowledge/connectors/sync-engine.ts

icecrasher321 added 3 commits June 16, 2026 11:24

Merge branch 'staging' into improvement/ten-mb-lims

9908401

address comments + fix terminal event in sse stream

dfb1b33

fix accounting issue

e1bece6

vercel Bot temporarily deployed to Preview June 16, 2026 19:59 Inactive

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread apps/sim/connectors/utils.ts

icecrasher321 merged commit feca5fa into staging Jun 16, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation#5089

improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation#5089
icecrasher321 merged 7 commits into
stagingfrom
improvement/ten-mb-lims

icecrasher321 commented Jun 16, 2026

Uh oh!

vercel Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

cursor Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

icecrasher321 commented Jun 16, 2026

Uh oh!

icecrasher321 commented Jun 16, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

icecrasher321 commented Jun 16, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

icecrasher321 commented Jun 16, 2026

Uh oh!

icecrasher321 commented Jun 16, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 16, 2026 •

edited

Loading

cursor Bot commented Jun 16, 2026 •

edited

Loading

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading