Skip to content

improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation#5089

Merged
icecrasher321 merged 7 commits into
stagingfrom
improvement/ten-mb-lims
Jun 16, 2026
Merged

improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation#5089
icecrasher321 merged 7 commits into
stagingfrom
improvement/ten-mb-lims

Conversation

@icecrasher321

Copy link
Copy Markdown
Collaborator

Summary

Fixes a class of 10 MB limit failures across workflow execution and KB connectors.

  • Function blocks: over-budget resolved block-output context values are offloaded to durable large-value refs and lazily re-read in the sandbox (sim.values.read), so a JS function can merge medium files without busting the 10 MB inter-block request-body cap (the original "Seedance" merge failure).
  • KB connectors — never silent: oversized files now surface as visible failed KB documents (with a reason) instead of being silently dropped — at listing time (GitHub/S3/Dropbox/OneDrive/SharePoint) and fetch time (GitLab/Azure/Google Drive via a shared ConnectorFileTooLargeError).
  • KB connectors — memory safety: unbounded response.text() downloads replaced with streaming readBodyWithLimit (cancels past the cap; closes a Dropbox OOM/DoS gap).
  • KB connectors — cap raised: per-file limit raised from a hardcoded 10 MB to the canonical 100 MB KB document limit (CONNECTOR_MAX_FILE_BYTES), except Google Drive's export path (Google's hard 10 MB export-API limit).
  • Sync engine: classifyExternalDoc classification, bulk skipDocuments (failed rows, excluded from the stuck-doc retry sweep), byte-bounded batch concurrency so the raised cap can't OOM the worker, and a metadata.fileSize ?? size fallback so skipped rows show the real size.

Type of Change

  • Bug fix

Testing

WiP

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…onnector size limits

Addresses a class of 10 MB limit failures:

- executor/variables: offload over-budget function block-output context values to
  durable large-value refs (lazy `sim.values.read`) so JS function blocks can merge
  medium files without exceeding the 10 MB inter-block request-body cap.
- connectors: stream downloads via `readBodyWithLimit` (memory-safe), and surface
  oversized files as visible `failed` KB documents instead of silently dropping them
  — listing-time for github/s3/dropbox/onedrive/sharepoint, fetch-time for
  gitlab/azure/google-drive via a shared `ConnectorFileTooLargeError`. Raise the
  per-file cap from a hardcoded 10 MB to the canonical 100 MB KB document limit
  (`CONNECTOR_MAX_FILE_BYTES`), except Google Drive's export path (Google's hard
  10 MB export-API limit).
- sync-engine: `classifyExternalDoc` + bulk `skipDocuments` (failed rows with a
  reason, excluded from retry), byte-bounded batch concurrency to cap peak worker
  memory at the raised cap, and a `metadata.fileSize ?? size` fallback.
# Conflicts:
#	apps/sim/connectors/utils.test.ts
@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 16, 2026 7:59pm

Request Review

@icecrasher321 icecrasher321 changed the title fix(execution): offload large function inputs improvement(execution, connectors): offload large function inputs, increase connector limits + better error propagation Jun 16, 2026
@icecrasher321 icecrasher321 marked this pull request as ready for review June 16, 2026 04:06
@cursor

cursor Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches workflow execution payloads, SSE terminal delivery, and many connector download paths; behavior changes (higher limits, failed visible rows, offload) are intentional but broad across sync and execute paths.

Overview
Addresses 10 MB request-body and unbounded-download failures in workflow execution and KB connector syncs.

Function blocks: Resolved block-output values that would blow the ~6 MB inline budget (data + display) are offloaded to durable large-value refs and read in the JS sandbox via sim.values.read, so merging several medium payloads no longer fails the internal function route. Display code shows placeholders for offloaded values.

Workflow SSE: If the Redis event buffer rejects a write (e.g. oversized block output), terminal events still go out on the live SSE stream so the UI does not hang on “running.”

KB connectors: Shared CONNECTOR_MAX_FILE_BYTES (aligned with manual KB upload, up from hardcoded 10 MB) plus readBodyWithLimit instead of unbounded response.text() / truncation. Oversized files become failed rows with skippedReason via markSkipped / stubOrSkipBySize / ConnectorFileTooLargeError, at listing and fetch time, across storage-style connectors (Dropbox, OneDrive, SharePoint, S3, GitHub, GitLab, Azure DevOps, Google Drive downloads, Zoom VTT). takeIndexableWithinCap so skipped files do not consume maxFiles. Google Workspace export stays on Google’s 10 MB API limit.

Sync engine: classifyExternalDoc, bulk skipDocuments, byte-bounded chunkOpsByByteBudget for hydration concurrency, and stuck-doc retry excludes rows with no storageKey.

Agent skill memory-load-check documents when to apply the connector size pattern.

Reviewed by Cursor Bugbot for commit e1bece6. Configure here.

Comment thread apps/sim/connectors/dropbox/dropbox.ts
Comment thread apps/sim/executor/variables/resolver.ts
@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR tackles a class of 10 MB cap failures across workflow execution and KB connectors by offloading oversized function-block context values to durable large-value refs, raising the per-file connector limit to the canonical 100 MB KB document cap, and replacing silent drops of oversized files with visible failed KB rows.

  • Function blocks: maybeOffloadInlineFunctionContextValue in the resolver measures the inline footprint of each block-output reference and offloads values that would push the request body past a 6 MB combined budget to durable storage, lazily re-read in the sandbox via sim.values.read.
  • KB connectors: All file-storage connectors (Dropbox, OneDrive, SharePoint, GitHub, GitLab, Azure DevOps, S3, Google Drive, Zoom) now use shared utilities — readBodyWithLimit, stubOrSkipBySize, markSkipped, takeIndexableWithinCap — to stream-cap downloads, surface oversized files as visible failed rows rather than silently dropping them, and apply the cap correctly to the indexable-file quota.
  • Sync engine: classifyExternalDoc centralises reconciliation logic; chunkOpsByByteBudget bounds peak worker memory by splitting batches on both count and summed source bytes; skipDocuments bulk-inserts content-less failed rows; the stuck-doc retry sweep is guarded with isNotNull(storageKey) to exclude unskippable rows.

Confidence Score: 4/5

Safe to merge with awareness of the two P2 notes — no blocking defects introduced.

The changes are broad (19 files, three distinct subsystems) but well-structured: shared utilities have unit tests, the sync engine's new classification and batching paths are tested, and the resolver offload logic includes thorough test coverage. The two observations are minor: the takeIndexableWithinCap boundary behaviour is explicitly documented as intentional, and the storageKey null assumption holds because addDocument uploads before committing the row. No correctness regression or data-loss path was identified.

apps/sim/lib/knowledge/connectors/sync-engine.ts and apps/sim/connectors/utils.ts warrant a second read — the new takeIndexableWithinCap boundary behaviour means oversized files that appear after the indexable quota is saturated are still silently excluded from the failed-row surface.

Important Files Changed

Filename Overview
apps/sim/lib/knowledge/connectors/sync-engine.ts Major additions: classifyExternalDoc, chunkOpsByByteBudget, skipDocuments, and byte-bounded batch concurrency. Logic is sound; skipped-file visibility and duplicate-insert safety are correctly handled via existingByExternalId. One P2 note around the storageKey null filter assumption.
apps/sim/connectors/utils.ts New shared utilities: CONNECTOR_MAX_FILE_BYTES, readBodyWithLimit, markSkipped, stubOrSkipBySize, isSkippedDocument, takeIndexableWithinCap, ConnectorFileTooLargeError. Implementation is correct; the takeIndexableWithinCap loop-break behavior means skipped items after the cap boundary can be silently dropped (P2, documented as intentional).
apps/sim/executor/variables/resolver.ts Adds maybeOffloadInlineFunctionContextValue to offload oversized block-output values to durable large-value refs before they bust the 10 MB request body cap. Budget tracking and authorization (mergeLargeValueKeys) look correct; fallback on store failure leaves value inline.
apps/sim/app/api/workflows/[id]/execute/route.ts Wraps event-buffer writes in try/catch so a buffer rejection (e.g. oversized payload) falls through to live SSE delivery rather than crashing the stream. terminalEventPublished is still set so finalization closes cleanly.
apps/sim/connectors/dropbox/dropbox.ts Replaces isSupportedFile (silently filtered oversized) with isDownloadableFile + stubOrSkipBySize, fixes the OOM/DoS gap by replacing response.text() with readBodyWithLimit, and surfaces oversized files as visible failed rows at both listing and fetch time.
apps/sim/connectors/google-drive/google-drive.ts Export path adds 403/exportSizeLimitExceeded detection to throw ConnectorFileTooLargeError; download path switches from truncating response.text() to readBodyWithLimit at 100 MB; both paths now surface oversized files as skipped failed rows.
apps/sim/connectors/s3/s3.ts Listing now uses stubOrSkipBySize + takeIndexableWithinCap; getDocument returns markSkipped stubs instead of null for oversized objects; slicedSome computation changed from explicit slice tracking to documents.length < stubs.length (semantically equivalent).
apps/sim/connectors/zoom/zoom.ts Applies the standard connector size pattern (stubOrSkipBySize at listing, readBodyWithLimit at fetch, markSkipped on overflow) to VTT transcript downloads; adds fileSize to metadata for accurate size display in KB UI.
apps/sim/connectors/types.ts Adds optional skippedReason field to ExternalDocument; well-documented with clear semantics for the sync engine.
apps/sim/lib/execution/payloads/large-value-ref.ts Exports formatLargeValueSize so the resolver can generate human-readable placeholder display text for offloaded refs.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Connector listDocuments] --> B{size reported?}
    B -- yes, size > cap --> C[stubOrSkipBySize → markSkipped\nskippedReason set, contentDeferred=false]
    B -- no size or within cap --> D[Normal stub\ncontentDeferred=true]
    C --> E[takeIndexableWithinCap\nSkipped items ride along,\nnot counted against quota]
    D --> E
    E --> F[classifyExternalDoc]
    F -- skip, no existing row --> G[skip op → skipDocuments\nbulk-insert failed rows]
    F -- skip, existing row --> H[unchanged ++\nlast-known-good kept]
    F -- add/update --> I[contentOps]
    I --> J[chunkOpsByByteBudget\ncount + byte budget]
    J --> K{contentDeferred?}
    K -- yes --> L[getDocument hydration]
    L -- skippedReason at fetch time --> G
    L -- content OK --> M[addDocument / updateDocument]
    K -- no --> M
    G --> N[DB: failed row\nstorageKey=null]
    M --> O[DB: pending row\nstorageKey set]
    O --> P[stuck-doc retry sweep\nisNotNull storageKey excludes N]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Connector listDocuments] --> B{size reported?}
    B -- yes, size > cap --> C[stubOrSkipBySize → markSkipped\nskippedReason set, contentDeferred=false]
    B -- no size or within cap --> D[Normal stub\ncontentDeferred=true]
    C --> E[takeIndexableWithinCap\nSkipped items ride along,\nnot counted against quota]
    D --> E
    E --> F[classifyExternalDoc]
    F -- skip, no existing row --> G[skip op → skipDocuments\nbulk-insert failed rows]
    F -- skip, existing row --> H[unchanged ++\nlast-known-good kept]
    F -- add/update --> I[contentOps]
    I --> J[chunkOpsByByteBudget\ncount + byte budget]
    J --> K{contentDeferred?}
    K -- yes --> L[getDocument hydration]
    L -- skippedReason at fetch time --> G
    L -- content OK --> M[addDocument / updateDocument]
    K -- no --> M
    G --> N[DB: failed row\nstorageKey=null]
    M --> O[DB: pending row\nstorageKey set]
    O --> P[stuck-doc retry sweep\nisNotNull storageKey excludes N]
Loading

Reviews (2): Last reviewed commit: "fix accounting issue" | Re-trigger Greptile

Comment thread apps/sim/lib/knowledge/connectors/sync-engine.ts
@icecrasher321

Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321

Copy link
Copy Markdown
Collaborator Author

bugbot run

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e1bece6. Configure here.

Comment thread apps/sim/connectors/utils.ts
@icecrasher321 icecrasher321 merged commit feca5fa into staging Jun 16, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant