ci: Version Packages by github-actions[bot] · Pull Request #778 · TanStack/ai

github-actions · 2026-06-17T13:25:38Z

This PR was opened by the Changesets release GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated.

Releases

@tanstack/ai@0.32.0

Minor Changes

#624 8fa6cc5 - Add a Google Veo video adapter (geminiVideo / createGeminiVideo) and the
per-model typed-duration video contract it is built on (#534, #634).

@tanstack/ai (additive, non-breaking): VideoAdapter /
BaseVideoAdapter gain a TModelDurationByName generic (defaulting to
Record<string, number>, preserving today's duration?: number typing for
adapters without a map) plus two introspection methods with safe defaults:
- availableDurations() — a DurationOptions tagged union
  (discrete | range | mixed | none) describing the durations the current
  model accepts. Default: { kind: 'none' }.
- snapDuration(seconds) — coerce raw seconds to the closest valid duration
  (snapToDurationOption is exported for adapter authors). Default:
  undefined.
generateVideo({ duration }) is now typed per model via
VideoDurationForAdapter<TAdapter>.

@tanstack/ai-gemini: new Veo adapter over the long-running
:predictLongRunning operation, supporting veo-3.1-generate-preview,
veo-3.1-fast-generate-preview, veo-3.0-generate-001,
veo-3.0-fast-generate-001, and veo-2.0-generate-001:
- geminiVideo('veo-3.0-generate-001') → duration?: 4 | 6 | 8
  (Veo 2: 5 | 6 | 8); adapter.snapDuration(7) → 6.
- Multimodal prompts: the first un-roled / 'start_frame' image part
  becomes the input image, 'end_frame' → lastFrame, 'reference' /
  'character' → referenceImages.
- size takes Veo aspect ratios ('16:9' | '9:16'); everything else from
  the SDK's GenerateVideosConfig (e.g. resolution, generateAudio,
  negativePrompt) is available through modelOptions.
- Responsible-AI filtering is surfaced as a failed job with the filter
  reasons.
Note: Veo result URLs are served by the Gemini Files API and require the
Google API key to download (x-goog-api-key header or key query
parameter).
#624 8fa6cc5 - generateImage() and generateVideo() now accept a multimodal prompt: a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry a metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informational metadata.tag label for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).

Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance @Image1, OpenAI / FLUX.2 "image 1" prose, Gemini content descriptions); see the image-generation docs for the per-provider table.

Provider behavior in this release:
- OpenAI image — Prompts with image parts route gpt-image-2 / gpt-image-1 / gpt-image-1-mini to images.edit() (up to 16 source images plus optional mask); dall-e-2 routes to images.edit() with one source image; dall-e-3 rejects image parts at compile time and at runtime.
- OpenAI video — Sora-2 / Sora-2-Pro accept a single image part as input_reference; passing more than one throws.
- Gemini image — Native models (gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodal contents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
- fal.ai — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → image_urls, Kling i2v start frame → image_url, Veo first-last-frame → first_frame_url / last_frame_url). Defaults for endpoints not in the map: single → image_url, multiple → image_urls; role: 'mask' → mask_url; role: 'control' → control_image_url; role: 'reference' / 'character' → reference_image_urls; video role: 'start_frame' / 'end_frame' → start_image_url / end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with pnpm generate:fal-image-fields (a unit test fails when it goes stale). In FalImageProviderOptions / FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url, start_image_url, video_url, audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via modelOptions.
- Grok — New grok-imagine-image / grok-imagine-image-quality models. Prompts with image parts route to xAI's JSON /v1/images/edits endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). role: 'mask' / 'control' throw. Their size uses an aspectRatio_resolution template ('16:9_2k', suffix optional) mirroring Gemini's native image models. grok-2-image-1212 remains text-to-image only.
- OpenRouter — Prompt parts map 1:1 onto multimodal text / image_url chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); data sources become data URIs.
- Anthropic — Unchanged (no image generation API).
A new resolveMediaPrompt() utility (exported from @tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.

On the client side, ImageGenerateInput.prompt and VideoGenerateInput.prompt (@tanstack/ai-client, and the useGenerateImage / useGenerateVideo hooks built on them) are widened from string to the same MediaPrompt shape, so prompt parts can be sent from the browser through your server route to generateImage() / generateVideo().

Closes #618.

Patch Changes

Updated dependencies [8fa6cc5]:
- @tanstack/ai-event-client@0.6.3

@tanstack/ai-client@0.18.0

Minor Changes

#624 8fa6cc5 - generateImage() and generateVideo() now accept a multimodal prompt: a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry a metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informational metadata.tag label for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).

Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance @Image1, OpenAI / FLUX.2 "image 1" prose, Gemini content descriptions); see the image-generation docs for the per-provider table.

Provider behavior in this release:
- OpenAI image — Prompts with image parts route gpt-image-2 / gpt-image-1 / gpt-image-1-mini to images.edit() (up to 16 source images plus optional mask); dall-e-2 routes to images.edit() with one source image; dall-e-3 rejects image parts at compile time and at runtime.
- OpenAI video — Sora-2 / Sora-2-Pro accept a single image part as input_reference; passing more than one throws.
- Gemini image — Native models (gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodal contents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
- fal.ai — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → image_urls, Kling i2v start frame → image_url, Veo first-last-frame → first_frame_url / last_frame_url). Defaults for endpoints not in the map: single → image_url, multiple → image_urls; role: 'mask' → mask_url; role: 'control' → control_image_url; role: 'reference' / 'character' → reference_image_urls; video role: 'start_frame' / 'end_frame' → start_image_url / end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with pnpm generate:fal-image-fields (a unit test fails when it goes stale). In FalImageProviderOptions / FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url, start_image_url, video_url, audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via modelOptions.
- Grok — New grok-imagine-image / grok-imagine-image-quality models. Prompts with image parts route to xAI's JSON /v1/images/edits endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). role: 'mask' / 'control' throw. Their size uses an aspectRatio_resolution template ('16:9_2k', suffix optional) mirroring Gemini's native image models. grok-2-image-1212 remains text-to-image only.
- OpenRouter — Prompt parts map 1:1 onto multimodal text / image_url chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); data sources become data URIs.
- Anthropic — Unchanged (no image generation API).
A new resolveMediaPrompt() utility (exported from @tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.

On the client side, ImageGenerateInput.prompt and VideoGenerateInput.prompt (@tanstack/ai-client, and the useGenerateImage / useGenerateVideo hooks built on them) are widened from string to the same MediaPrompt shape, so prompt parts can be sent from the browser through your server route to generateImage() / generateVideo().

Closes #618.

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-event-client@0.6.3

@tanstack/ai-fal@0.9.0

Minor Changes

#624 8fa6cc5 - generateImage() and generateVideo() now accept a multimodal prompt: a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry a metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informational metadata.tag label for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).

Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance @Image1, OpenAI / FLUX.2 "image 1" prose, Gemini content descriptions); see the image-generation docs for the per-provider table.

Provider behavior in this release:
- OpenAI image — Prompts with image parts route gpt-image-2 / gpt-image-1 / gpt-image-1-mini to images.edit() (up to 16 source images plus optional mask); dall-e-2 routes to images.edit() with one source image; dall-e-3 rejects image parts at compile time and at runtime.
- OpenAI video — Sora-2 / Sora-2-Pro accept a single image part as input_reference; passing more than one throws.
- Gemini image — Native models (gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodal contents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
- fal.ai — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → image_urls, Kling i2v start frame → image_url, Veo first-last-frame → first_frame_url / last_frame_url). Defaults for endpoints not in the map: single → image_url, multiple → image_urls; role: 'mask' → mask_url; role: 'control' → control_image_url; role: 'reference' / 'character' → reference_image_urls; video role: 'start_frame' / 'end_frame' → start_image_url / end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with pnpm generate:fal-image-fields (a unit test fails when it goes stale). In FalImageProviderOptions / FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url, start_image_url, video_url, audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via modelOptions.
- Grok — New grok-imagine-image / grok-imagine-image-quality models. Prompts with image parts route to xAI's JSON /v1/images/edits endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). role: 'mask' / 'control' throw. Their size uses an aspectRatio_resolution template ('16:9_2k', suffix optional) mirroring Gemini's native image models. grok-2-image-1212 remains text-to-image only.
- OpenRouter — Prompt parts map 1:1 onto multimodal text / image_url chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); data sources become data URIs.
- Anthropic — Unchanged (no image generation API).
A new resolveMediaPrompt() utility (exported from @tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.

On the client side, ImageGenerateInput.prompt and VideoGenerateInput.prompt (@tanstack/ai-client, and the useGenerateImage / useGenerateVideo hooks built on them) are widened from string to the same MediaPrompt shape, so prompt parts can be sent from the browser through your server route to generateImage() / generateVideo().

Closes #618.

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0

@tanstack/ai-gemini@0.17.0

Minor Changes

#624 8fa6cc5 - Add a Google Veo video adapter (geminiVideo / createGeminiVideo) and the
per-model typed-duration video contract it is built on (#534, #634).

@tanstack/ai (additive, non-breaking): VideoAdapter /
BaseVideoAdapter gain a TModelDurationByName generic (defaulting to
Record<string, number>, preserving today's duration?: number typing for
adapters without a map) plus two introspection methods with safe defaults:
- availableDurations() — a DurationOptions tagged union
  (discrete | range | mixed | none) describing the durations the current
  model accepts. Default: { kind: 'none' }.
- snapDuration(seconds) — coerce raw seconds to the closest valid duration
  (snapToDurationOption is exported for adapter authors). Default:
  undefined.
generateVideo({ duration }) is now typed per model via
VideoDurationForAdapter<TAdapter>.

@tanstack/ai-gemini: new Veo adapter over the long-running
:predictLongRunning operation, supporting veo-3.1-generate-preview,
veo-3.1-fast-generate-preview, veo-3.0-generate-001,
veo-3.0-fast-generate-001, and veo-2.0-generate-001:
- geminiVideo('veo-3.0-generate-001') → duration?: 4 | 6 | 8
  (Veo 2: 5 | 6 | 8); adapter.snapDuration(7) → 6.
- Multimodal prompts: the first un-roled / 'start_frame' image part
  becomes the input image, 'end_frame' → lastFrame, 'reference' /
  'character' → referenceImages.
- size takes Veo aspect ratios ('16:9' | '9:16'); everything else from
  the SDK's GenerateVideosConfig (e.g. resolution, generateAudio,
  negativePrompt) is available through modelOptions.
- Responsible-AI filtering is surfaced as a failed job with the filter
  reasons.
Note: Veo result URLs are served by the Gemini Files API and require the
Google API key to download (x-goog-api-key header or key query
parameter).
#624 8fa6cc5 - generateImage() and generateVideo() now accept a multimodal prompt: a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry a metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informational metadata.tag label for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).

Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance @Image1, OpenAI / FLUX.2 "image 1" prose, Gemini content descriptions); see the image-generation docs for the per-provider table.

Provider behavior in this release:
- OpenAI image — Prompts with image parts route gpt-image-2 / gpt-image-1 / gpt-image-1-mini to images.edit() (up to 16 source images plus optional mask); dall-e-2 routes to images.edit() with one source image; dall-e-3 rejects image parts at compile time and at runtime.
- OpenAI video — Sora-2 / Sora-2-Pro accept a single image part as input_reference; passing more than one throws.
- Gemini image — Native models (gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodal contents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
- fal.ai — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → image_urls, Kling i2v start frame → image_url, Veo first-last-frame → first_frame_url / last_frame_url). Defaults for endpoints not in the map: single → image_url, multiple → image_urls; role: 'mask' → mask_url; role: 'control' → control_image_url; role: 'reference' / 'character' → reference_image_urls; video role: 'start_frame' / 'end_frame' → start_image_url / end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with pnpm generate:fal-image-fields (a unit test fails when it goes stale). In FalImageProviderOptions / FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url, start_image_url, video_url, audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via modelOptions.
- Grok — New grok-imagine-image / grok-imagine-image-quality models. Prompts with image parts route to xAI's JSON /v1/images/edits endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). role: 'mask' / 'control' throw. Their size uses an aspectRatio_resolution template ('16:9_2k', suffix optional) mirroring Gemini's native image models. grok-2-image-1212 remains text-to-image only.
- OpenRouter — Prompt parts map 1:1 onto multimodal text / image_url chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); data sources become data URIs.
- Anthropic — Unchanged (no image generation API).
A new resolveMediaPrompt() utility (exported from @tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.

On the client side, ImageGenerateInput.prompt and VideoGenerateInput.prompt (@tanstack/ai-client, and the useGenerateImage / useGenerateVideo hooks built on them) are widened from string to the same MediaPrompt shape, so prompt parts can be sent from the browser through your server route to generateImage() / generateVideo().

Closes #618.

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0

@tanstack/ai-grok@0.12.0

Minor Changes

#624 8fa6cc5 - generateImage() and generateVideo() now accept a multimodal prompt: a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry a metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informational metadata.tag label for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).

Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance @Image1, OpenAI / FLUX.2 "image 1" prose, Gemini content descriptions); see the image-generation docs for the per-provider table.

Provider behavior in this release:
- OpenAI image — Prompts with image parts route gpt-image-2 / gpt-image-1 / gpt-image-1-mini to images.edit() (up to 16 source images plus optional mask); dall-e-2 routes to images.edit() with one source image; dall-e-3 rejects image parts at compile time and at runtime.
- OpenAI video — Sora-2 / Sora-2-Pro accept a single image part as input_reference; passing more than one throws.
- Gemini image — Native models (gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodal contents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
- fal.ai — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → image_urls, Kling i2v start frame → image_url, Veo first-last-frame → first_frame_url / last_frame_url). Defaults for endpoints not in the map: single → image_url, multiple → image_urls; role: 'mask' → mask_url; role: 'control' → control_image_url; role: 'reference' / 'character' → reference_image_urls; video role: 'start_frame' / 'end_frame' → start_image_url / end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with pnpm generate:fal-image-fields (a unit test fails when it goes stale). In FalImageProviderOptions / FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url, start_image_url, video_url, audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via modelOptions.
- Grok — New grok-imagine-image / grok-imagine-image-quality models. Prompts with image parts route to xAI's JSON /v1/images/edits endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). role: 'mask' / 'control' throw. Their size uses an aspectRatio_resolution template ('16:9_2k', suffix optional) mirroring Gemini's native image models. grok-2-image-1212 remains text-to-image only.
- OpenRouter — Prompt parts map 1:1 onto multimodal text / image_url chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); data sources become data URIs.
- Anthropic — Unchanged (no image generation API).
A new resolveMediaPrompt() utility (exported from @tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.

On the client side, ImageGenerateInput.prompt and VideoGenerateInput.prompt (@tanstack/ai-client, and the useGenerateImage / useGenerateVideo hooks built on them) are widened from string to the same MediaPrompt shape, so prompt parts can be sent from the browser through your server route to generateImage() / generateVideo().

Closes #618.

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/openai-base@0.8.5

@tanstack/ai-openai@0.15.0

Minor Changes

#624 8fa6cc5 - generateImage() and generateVideo() now accept a multimodal prompt: a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry a metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informational metadata.tag label for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).

Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance @Image1, OpenAI / FLUX.2 "image 1" prose, Gemini content descriptions); see the image-generation docs for the per-provider table.

Provider behavior in this release:
- OpenAI image — Prompts with image parts route gpt-image-2 / gpt-image-1 / gpt-image-1-mini to images.edit() (up to 16 source images plus optional mask); dall-e-2 routes to images.edit() with one source image; dall-e-3 rejects image parts at compile time and at runtime.
- OpenAI video — Sora-2 / Sora-2-Pro accept a single image part as input_reference; passing more than one throws.
- Gemini image — Native models (gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodal contents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
- fal.ai — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → image_urls, Kling i2v start frame → image_url, Veo first-last-frame → first_frame_url / last_frame_url). Defaults for endpoints not in the map: single → image_url, multiple → image_urls; role: 'mask' → mask_url; role: 'control' → control_image_url; role: 'reference' / 'character' → reference_image_urls; video role: 'start_frame' / 'end_frame' → start_image_url / end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with pnpm generate:fal-image-fields (a unit test fails when it goes stale). In FalImageProviderOptions / FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url, start_image_url, video_url, audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via modelOptions.
- Grok — New grok-imagine-image / grok-imagine-image-quality models. Prompts with image parts route to xAI's JSON /v1/images/edits endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). role: 'mask' / 'control' throw. Their size uses an aspectRatio_resolution template ('16:9_2k', suffix optional) mirroring Gemini's native image models. grok-2-image-1212 remains text-to-image only.
- OpenRouter — Prompt parts map 1:1 onto multimodal text / image_url chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); data sources become data URIs.
- Anthropic — Unchanged (no image generation API).
A new resolveMediaPrompt() utility (exported from @tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.

On the client side, ImageGenerateInput.prompt and VideoGenerateInput.prompt (@tanstack/ai-client, and the useGenerateImage / useGenerateVideo hooks built on them) are widened from string to the same MediaPrompt shape, so prompt parts can be sent from the browser through your server route to generateImage() / generateVideo().

Closes #618.

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-client@0.18.0
- @tanstack/openai-base@0.8.5

@tanstack/ai-openrouter@0.14.0

Minor Changes

#624 8fa6cc5 - generateImage() and generateVideo() now accept a multimodal prompt: a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry a metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informational metadata.tag label for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).

Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance @Image1, OpenAI / FLUX.2 "image 1" prose, Gemini content descriptions); see the image-generation docs for the per-provider table.

Provider behavior in this release:
- OpenAI image — Prompts with image parts route gpt-image-2 / gpt-image-1 / gpt-image-1-mini to images.edit() (up to 16 source images plus optional mask); dall-e-2 routes to images.edit() with one source image; dall-e-3 rejects image parts at compile time and at runtime.
- OpenAI video — Sora-2 / Sora-2-Pro accept a single image part as input_reference; passing more than one throws.
- Gemini image — Native models (gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodal contents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
- fal.ai — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → image_urls, Kling i2v start frame → image_url, Veo first-last-frame → first_frame_url / last_frame_url). Defaults for endpoints not in the map: single → image_url, multiple → image_urls; role: 'mask' → mask_url; role: 'control' → control_image_url; role: 'reference' / 'character' → reference_image_urls; video role: 'start_frame' / 'end_frame' → start_image_url / end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with pnpm generate:fal-image-fields (a unit test fails when it goes stale). In FalImageProviderOptions / FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url, start_image_url, video_url, audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via modelOptions.
- Grok — New grok-imagine-image / grok-imagine-image-quality models. Prompts with image parts route to xAI's JSON /v1/images/edits endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). role: 'mask' / 'control' throw. Their size uses an aspectRatio_resolution template ('16:9_2k', suffix optional) mirroring Gemini's native image models. grok-2-image-1212 remains text-to-image only.
- OpenRouter — Prompt parts map 1:1 onto multimodal text / image_url chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); data sources become data URIs.
- Anthropic — Unchanged (no image generation API).
A new resolveMediaPrompt() utility (exported from @tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.

On the client side, ImageGenerateInput.prompt and VideoGenerateInput.prompt (@tanstack/ai-client, and the useGenerateImage / useGenerateVideo hooks built on them) are widened from string to the same MediaPrompt shape, so prompt parts can be sent from the browser through your server route to generateImage() / generateVideo().

Closes #618.

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0

@tanstack/ai-angular@0.1.4

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-client@0.18.0

@tanstack/ai-anthropic@0.15.5

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0

@tanstack/ai-code-mode@0.2.9

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0

@tanstack/ai-code-mode-skills@0.2.9

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-code-mode@0.2.9

@tanstack/ai-devtools-core@0.4.12

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-event-client@0.6.3

@tanstack/ai-elevenlabs@0.2.24

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-client@0.18.0

@tanstack/ai-event-client@0.6.3

Patch Changes

#624 8fa6cc5 - generateImage() and generateVideo() now accept a multimodal prompt: a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry a metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informational metadata.tag label for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).

Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance @Image1, OpenAI / FLUX.2 "image 1" prose, Gemini content descriptions); see the image-generation docs for the per-provider table.

Provider behavior in this release:
- OpenAI image — Prompts with image parts route gpt-image-2 / gpt-image-1 / gpt-image-1-mini to images.edit() (up to 16 source images plus optional mask); dall-e-2 routes to images.edit() with one source image; dall-e-3 rejects image parts at compile time and at runtime.
- OpenAI video — Sora-2 / Sora-2-Pro accept a single image part as input_reference; passing more than one throws.
- Gemini image — Native models (gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodal contents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).
- fal.ai — Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (362 endpoints with nonstandard fields, e.g. nano-banana edit → image_urls, Kling i2v start frame → image_url, Veo first-last-frame → first_frame_url / last_frame_url). Defaults for endpoints not in the map: single → image_url, multiple → image_urls; role: 'mask' → mask_url; role: 'control' → control_image_url; role: 'reference' / 'character' → reference_image_urls; video role: 'start_frame' / 'end_frame' → start_image_url / end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump with pnpm generate:fal-image-fields (a unit test fails when it goes stale). In FalImageProviderOptions / FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url, start_image_url, video_url, audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly via modelOptions.
- Grok — New grok-imagine-image / grok-imagine-image-quality models. Prompts with image parts route to xAI's JSON /v1/images/edits endpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim). role: 'mask' / 'control' throw. Their size uses an aspectRatio_resolution template ('16:9_2k', suffix optional) mirroring Gemini's native image models. grok-2-image-1212 remains text-to-image only.
- OpenRouter — Prompt parts map 1:1 onto multimodal text / image_url chat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process); data sources become data URIs.
- Anthropic — Unchanged (no image generation API).
A new resolveMediaPrompt() utility (exported from @tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.

On the client side, ImageGenerateInput.prompt and VideoGenerateInput.prompt (@tanstack/ai-client, and the useGenerateImage / useGenerateVideo hooks built on them) are widened from string to the same MediaPrompt shape, so prompt parts can be sent from the browser through your server route to generateImage() / generateVideo().

Closes #618.
Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0

@tanstack/ai-groq@0.4.6

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/openai-base@0.8.5

@tanstack/ai-isolate-cloudflare@0.2.25

Patch Changes

Updated dependencies []:
- @tanstack/ai-code-mode@0.2.9

@tanstack/ai-isolate-node@0.1.34

Patch Changes

Updated dependencies []:
- @tanstack/ai-code-mode@0.2.9

@tanstack/ai-isolate-quickjs@0.1.34

Patch Changes

Updated dependencies []:
- @tanstack/ai-code-mode@0.2.9

@tanstack/ai-mcp@0.1.4

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0

@tanstack/ai-ollama@0.8.5

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0

@tanstack/ai-preact@0.9.9

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-client@0.18.0

@tanstack/ai-react@0.15.9

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-client@0.18.0

@tanstack/ai-react-ui@0.8.9

Patch Changes

Updated dependencies [8fa6cc5]:
- @tanstack/ai-client@0.18.0
- @tanstack/ai-react@0.15.9

@tanstack/ai-solid@0.13.9

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-client@0.18.0

@tanstack/ai-solid-ui@0.7.9

Patch Changes

Updated dependencies [8fa6cc5]:
- @tanstack/ai-client@0.18.0
- @tanstack/ai-solid@0.13.9

@tanstack/ai-svelte@0.13.9

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-client@0.18.0

@tanstack/ai-vue@0.13.9

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0
- @tanstack/ai-client@0.18.0

@tanstack/ai-vue-ui@0.2.21

Patch Changes

Updated dependencies []:
- @tanstack/ai-vue@0.13.9

@tanstack/openai-base@0.8.5

Patch Changes

Updated dependencies [8fa6cc5, 8fa6cc5]:
- @tanstack/ai@0.32.0

@tanstack/preact-ai-devtools@0.1.55

Patch Changes

Updated dependencies []:
- @tanstack/ai-devtools-core@0.4.12

@tanstack/react-ai-devtools@0.2.55

Patch Changes

Updated dependencies []:
- @tanstack/ai-devtools-core@0.4.12

@tanstack/solid-ai-devtools@0.2.55

Patch Changes

Updated dependencies []:
- @tanstack/ai-devtools-core@0.4.12

ci: Version Packages

1e93bf1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: Version Packages#778

ci: Version Packages#778
github-actions[bot] wants to merge 1 commit into
mainfrom
changeset-release/main

github-actions Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Uh oh!

Conversation

github-actions Bot commented Jun 17, 2026

Releases

@tanstack/ai@0.32.0

Minor Changes

Patch Changes

@tanstack/ai-client@0.18.0

Minor Changes

Patch Changes

@tanstack/ai-fal@0.9.0

Minor Changes

Patch Changes

@tanstack/ai-gemini@0.17.0

Minor Changes

Patch Changes

@tanstack/ai-grok@0.12.0

Minor Changes

Patch Changes

@tanstack/ai-openai@0.15.0

Minor Changes

Patch Changes

@tanstack/ai-openrouter@0.14.0

Minor Changes

Patch Changes

@tanstack/ai-angular@0.1.4

Patch Changes

@tanstack/ai-anthropic@0.15.5

Patch Changes

@tanstack/ai-code-mode@0.2.9

Patch Changes

@tanstack/ai-code-mode-skills@0.2.9

Patch Changes

@tanstack/ai-devtools-core@0.4.12

Patch Changes

@tanstack/ai-elevenlabs@0.2.24

Patch Changes

@tanstack/ai-event-client@0.6.3

Patch Changes

@tanstack/ai-groq@0.4.6

Patch Changes

@tanstack/ai-isolate-cloudflare@0.2.25

Patch Changes

@tanstack/ai-isolate-node@0.1.34

Patch Changes

@tanstack/ai-isolate-quickjs@0.1.34

Patch Changes

@tanstack/ai-mcp@0.1.4

Patch Changes

@tanstack/ai-ollama@0.8.5

Patch Changes

@tanstack/ai-preact@0.9.9

Patch Changes

@tanstack/ai-react@0.15.9

Patch Changes

@tanstack/ai-react-ui@0.8.9

Patch Changes

@tanstack/ai-solid@0.13.9

Patch Changes

@tanstack/ai-solid-ui@0.7.9

Patch Changes

@tanstack/ai-svelte@0.13.9

Patch Changes

@tanstack/ai-vue@0.13.9

Patch Changes

@tanstack/ai-vue-ui@0.2.21

Patch Changes

@tanstack/openai-base@0.8.5

Patch Changes

@tanstack/preact-ai-devtools@0.1.55

Patch Changes

@tanstack/react-ai-devtools@0.2.55

Patch Changes

@tanstack/solid-ai-devtools@0.2.55

Patch Changes

Uh oh!

Reviewers

Assignees

Labels

Projects