ci: Version Packages#778
Open
github-actions[bot] wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR was opened by the Changesets release GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated.
Releases
@tanstack/ai@0.32.0
Minor Changes
#624
8fa6cc5- Add a Google Veo video adapter (geminiVideo/createGeminiVideo) and theper-model typed-duration video contract it is built on (#534, #634).
@tanstack/ai(additive, non-breaking):VideoAdapter/BaseVideoAdaptergain aTModelDurationByNamegeneric (defaulting toRecord<string, number>, preserving today'sduration?: numbertyping foradapters without a map) plus two introspection methods with safe defaults:
availableDurations()— aDurationOptionstagged union(
discrete | range | mixed | none) describing the durations the currentmodel accepts. Default:
{ kind: 'none' }.snapDuration(seconds)— coerce raw seconds to the closest valid duration(
snapToDurationOptionis exported for adapter authors). Default:undefined.generateVideo({ duration })is now typed per model viaVideoDurationForAdapter<TAdapter>.@tanstack/ai-gemini: new Veo adapter over the long-running:predictLongRunningoperation, supportingveo-3.1-generate-preview,veo-3.1-fast-generate-preview,veo-3.0-generate-001,veo-3.0-fast-generate-001, andveo-2.0-generate-001:geminiVideo('veo-3.0-generate-001')→duration?: 4 | 6 | 8(Veo 2:
5 | 6 | 8);adapter.snapDuration(7)→6.'start_frame'image partbecomes the input image,
'end_frame'→lastFrame,'reference'/'character'→referenceImages.sizetakes Veo aspect ratios ('16:9' | '9:16'); everything else fromthe SDK's
GenerateVideosConfig(e.g.resolution,generateAudio,negativePrompt) is available throughmodelOptions.reasons.
Note: Veo result URLs are served by the Gemini Files API and require the
Google API key to download (
x-goog-api-keyheader orkeyqueryparameter).
#624
8fa6cc5-generateImage()andgenerateVideo()now accept a multimodalprompt: a plain string, or an ordered array of content parts (TextPart/ImagePart/VideoPart/AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry ametadata.rolehint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informationalmetadata.taglabel for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance
@Image1, OpenAI / FLUX.2"image 1"prose, Gemini content descriptions); see the image-generation docs for the per-provider table.Provider behavior in this release:
gpt-image-2/gpt-image-1/gpt-image-1-minitoimages.edit()(up to 16 source images plus optional mask);dall-e-2routes toimages.edit()with one source image;dall-e-3rejects image parts at compile time and at runtime.input_reference; passing more than one throws.gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodalcontents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).image_urls, Kling i2v start frame →image_url, Veo first-last-frame →first_frame_url/last_frame_url). Defaults for endpoints not in the map: single →image_url, multiple →image_urls;role: 'mask'→mask_url;role: 'control'→control_image_url;role: 'reference'/'character'→reference_image_urls; videorole: 'start_frame'/'end_frame'→start_image_url/end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump withpnpm generate:fal-image-fields(a unit test fails when it goes stale). InFalImageProviderOptions/FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url,start_image_url,video_url,audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly viamodelOptions.grok-imagine-image/grok-imagine-image-qualitymodels. Prompts with image parts route to xAI's JSON/v1/images/editsendpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim).role: 'mask'/'control'throw. Theirsizeuses anaspectRatio_resolutiontemplate ('16:9_2k', suffix optional) mirroring Gemini's native image models.grok-2-image-1212remains text-to-image only.text/image_urlchat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process);datasources become data URIs.A new
resolveMediaPrompt()utility (exported from@tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.On the client side,
ImageGenerateInput.promptandVideoGenerateInput.prompt(@tanstack/ai-client, and theuseGenerateImage/useGenerateVideohooks built on them) are widened fromstringto the sameMediaPromptshape, so prompt parts can be sent from the browser through your server route togenerateImage()/generateVideo().Closes #618.
Patch Changes
8fa6cc5]:@tanstack/ai-client@0.18.0
Minor Changes
#624
8fa6cc5-generateImage()andgenerateVideo()now accept a multimodalprompt: a plain string, or an ordered array of content parts (TextPart/ImagePart/VideoPart/AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry ametadata.rolehint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informationalmetadata.taglabel for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance
@Image1, OpenAI / FLUX.2"image 1"prose, Gemini content descriptions); see the image-generation docs for the per-provider table.Provider behavior in this release:
gpt-image-2/gpt-image-1/gpt-image-1-minitoimages.edit()(up to 16 source images plus optional mask);dall-e-2routes toimages.edit()with one source image;dall-e-3rejects image parts at compile time and at runtime.input_reference; passing more than one throws.gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodalcontents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).image_urls, Kling i2v start frame →image_url, Veo first-last-frame →first_frame_url/last_frame_url). Defaults for endpoints not in the map: single →image_url, multiple →image_urls;role: 'mask'→mask_url;role: 'control'→control_image_url;role: 'reference'/'character'→reference_image_urls; videorole: 'start_frame'/'end_frame'→start_image_url/end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump withpnpm generate:fal-image-fields(a unit test fails when it goes stale). InFalImageProviderOptions/FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url,start_image_url,video_url,audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly viamodelOptions.grok-imagine-image/grok-imagine-image-qualitymodels. Prompts with image parts route to xAI's JSON/v1/images/editsendpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim).role: 'mask'/'control'throw. Theirsizeuses anaspectRatio_resolutiontemplate ('16:9_2k', suffix optional) mirroring Gemini's native image models.grok-2-image-1212remains text-to-image only.text/image_urlchat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process);datasources become data URIs.A new
resolveMediaPrompt()utility (exported from@tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.On the client side,
ImageGenerateInput.promptandVideoGenerateInput.prompt(@tanstack/ai-client, and theuseGenerateImage/useGenerateVideohooks built on them) are widened fromstringto the sameMediaPromptshape, so prompt parts can be sent from the browser through your server route togenerateImage()/generateVideo().Closes #618.
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-fal@0.9.0
Minor Changes
#624
8fa6cc5-generateImage()andgenerateVideo()now accept a multimodalprompt: a plain string, or an ordered array of content parts (TextPart/ImagePart/VideoPart/AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry ametadata.rolehint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informationalmetadata.taglabel for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance
@Image1, OpenAI / FLUX.2"image 1"prose, Gemini content descriptions); see the image-generation docs for the per-provider table.Provider behavior in this release:
gpt-image-2/gpt-image-1/gpt-image-1-minitoimages.edit()(up to 16 source images plus optional mask);dall-e-2routes toimages.edit()with one source image;dall-e-3rejects image parts at compile time and at runtime.input_reference; passing more than one throws.gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodalcontents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).image_urls, Kling i2v start frame →image_url, Veo first-last-frame →first_frame_url/last_frame_url). Defaults for endpoints not in the map: single →image_url, multiple →image_urls;role: 'mask'→mask_url;role: 'control'→control_image_url;role: 'reference'/'character'→reference_image_urls; videorole: 'start_frame'/'end_frame'→start_image_url/end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump withpnpm generate:fal-image-fields(a unit test fails when it goes stale). InFalImageProviderOptions/FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url,start_image_url,video_url,audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly viamodelOptions.grok-imagine-image/grok-imagine-image-qualitymodels. Prompts with image parts route to xAI's JSON/v1/images/editsendpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim).role: 'mask'/'control'throw. Theirsizeuses anaspectRatio_resolutiontemplate ('16:9_2k', suffix optional) mirroring Gemini's native image models.grok-2-image-1212remains text-to-image only.text/image_urlchat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process);datasources become data URIs.A new
resolveMediaPrompt()utility (exported from@tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.On the client side,
ImageGenerateInput.promptandVideoGenerateInput.prompt(@tanstack/ai-client, and theuseGenerateImage/useGenerateVideohooks built on them) are widened fromstringto the sameMediaPromptshape, so prompt parts can be sent from the browser through your server route togenerateImage()/generateVideo().Closes #618.
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-gemini@0.17.0
Minor Changes
#624
8fa6cc5- Add a Google Veo video adapter (geminiVideo/createGeminiVideo) and theper-model typed-duration video contract it is built on (#534, #634).
@tanstack/ai(additive, non-breaking):VideoAdapter/BaseVideoAdaptergain aTModelDurationByNamegeneric (defaulting toRecord<string, number>, preserving today'sduration?: numbertyping foradapters without a map) plus two introspection methods with safe defaults:
availableDurations()— aDurationOptionstagged union(
discrete | range | mixed | none) describing the durations the currentmodel accepts. Default:
{ kind: 'none' }.snapDuration(seconds)— coerce raw seconds to the closest valid duration(
snapToDurationOptionis exported for adapter authors). Default:undefined.generateVideo({ duration })is now typed per model viaVideoDurationForAdapter<TAdapter>.@tanstack/ai-gemini: new Veo adapter over the long-running:predictLongRunningoperation, supportingveo-3.1-generate-preview,veo-3.1-fast-generate-preview,veo-3.0-generate-001,veo-3.0-fast-generate-001, andveo-2.0-generate-001:geminiVideo('veo-3.0-generate-001')→duration?: 4 | 6 | 8(Veo 2:
5 | 6 | 8);adapter.snapDuration(7)→6.'start_frame'image partbecomes the input image,
'end_frame'→lastFrame,'reference'/'character'→referenceImages.sizetakes Veo aspect ratios ('16:9' | '9:16'); everything else fromthe SDK's
GenerateVideosConfig(e.g.resolution,generateAudio,negativePrompt) is available throughmodelOptions.reasons.
Note: Veo result URLs are served by the Gemini Files API and require the
Google API key to download (
x-goog-api-keyheader orkeyqueryparameter).
#624
8fa6cc5-generateImage()andgenerateVideo()now accept a multimodalprompt: a plain string, or an ordered array of content parts (TextPart/ImagePart/VideoPart/AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry ametadata.rolehint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informationalmetadata.taglabel for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance
@Image1, OpenAI / FLUX.2"image 1"prose, Gemini content descriptions); see the image-generation docs for the per-provider table.Provider behavior in this release:
gpt-image-2/gpt-image-1/gpt-image-1-minitoimages.edit()(up to 16 source images plus optional mask);dall-e-2routes toimages.edit()with one source image;dall-e-3rejects image parts at compile time and at runtime.input_reference; passing more than one throws.gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodalcontents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).image_urls, Kling i2v start frame →image_url, Veo first-last-frame →first_frame_url/last_frame_url). Defaults for endpoints not in the map: single →image_url, multiple →image_urls;role: 'mask'→mask_url;role: 'control'→control_image_url;role: 'reference'/'character'→reference_image_urls; videorole: 'start_frame'/'end_frame'→start_image_url/end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump withpnpm generate:fal-image-fields(a unit test fails when it goes stale). InFalImageProviderOptions/FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url,start_image_url,video_url,audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly viamodelOptions.grok-imagine-image/grok-imagine-image-qualitymodels. Prompts with image parts route to xAI's JSON/v1/images/editsendpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim).role: 'mask'/'control'throw. Theirsizeuses anaspectRatio_resolutiontemplate ('16:9_2k', suffix optional) mirroring Gemini's native image models.grok-2-image-1212remains text-to-image only.text/image_urlchat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process);datasources become data URIs.A new
resolveMediaPrompt()utility (exported from@tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.On the client side,
ImageGenerateInput.promptandVideoGenerateInput.prompt(@tanstack/ai-client, and theuseGenerateImage/useGenerateVideohooks built on them) are widened fromstringto the sameMediaPromptshape, so prompt parts can be sent from the browser through your server route togenerateImage()/generateVideo().Closes #618.
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-grok@0.12.0
Minor Changes
#624
8fa6cc5-generateImage()andgenerateVideo()now accept a multimodalprompt: a plain string, or an ordered array of content parts (TextPart/ImagePart/VideoPart/AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry ametadata.rolehint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informationalmetadata.taglabel for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance
@Image1, OpenAI / FLUX.2"image 1"prose, Gemini content descriptions); see the image-generation docs for the per-provider table.Provider behavior in this release:
gpt-image-2/gpt-image-1/gpt-image-1-minitoimages.edit()(up to 16 source images plus optional mask);dall-e-2routes toimages.edit()with one source image;dall-e-3rejects image parts at compile time and at runtime.input_reference; passing more than one throws.gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodalcontents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).image_urls, Kling i2v start frame →image_url, Veo first-last-frame →first_frame_url/last_frame_url). Defaults for endpoints not in the map: single →image_url, multiple →image_urls;role: 'mask'→mask_url;role: 'control'→control_image_url;role: 'reference'/'character'→reference_image_urls; videorole: 'start_frame'/'end_frame'→start_image_url/end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump withpnpm generate:fal-image-fields(a unit test fails when it goes stale). InFalImageProviderOptions/FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url,start_image_url,video_url,audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly viamodelOptions.grok-imagine-image/grok-imagine-image-qualitymodels. Prompts with image parts route to xAI's JSON/v1/images/editsendpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim).role: 'mask'/'control'throw. Theirsizeuses anaspectRatio_resolutiontemplate ('16:9_2k', suffix optional) mirroring Gemini's native image models.grok-2-image-1212remains text-to-image only.text/image_urlchat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process);datasources become data URIs.A new
resolveMediaPrompt()utility (exported from@tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.On the client side,
ImageGenerateInput.promptandVideoGenerateInput.prompt(@tanstack/ai-client, and theuseGenerateImage/useGenerateVideohooks built on them) are widened fromstringto the sameMediaPromptshape, so prompt parts can be sent from the browser through your server route togenerateImage()/generateVideo().Closes #618.
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-openai@0.15.0
Minor Changes
#624
8fa6cc5-generateImage()andgenerateVideo()now accept a multimodalprompt: a plain string, or an ordered array of content parts (TextPart/ImagePart/VideoPart/AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry ametadata.rolehint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informationalmetadata.taglabel for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance
@Image1, OpenAI / FLUX.2"image 1"prose, Gemini content descriptions); see the image-generation docs for the per-provider table.Provider behavior in this release:
gpt-image-2/gpt-image-1/gpt-image-1-minitoimages.edit()(up to 16 source images plus optional mask);dall-e-2routes toimages.edit()with one source image;dall-e-3rejects image parts at compile time and at runtime.input_reference; passing more than one throws.gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodalcontents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).image_urls, Kling i2v start frame →image_url, Veo first-last-frame →first_frame_url/last_frame_url). Defaults for endpoints not in the map: single →image_url, multiple →image_urls;role: 'mask'→mask_url;role: 'control'→control_image_url;role: 'reference'/'character'→reference_image_urls; videorole: 'start_frame'/'end_frame'→start_image_url/end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump withpnpm generate:fal-image-fields(a unit test fails when it goes stale). InFalImageProviderOptions/FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url,start_image_url,video_url,audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly viamodelOptions.grok-imagine-image/grok-imagine-image-qualitymodels. Prompts with image parts route to xAI's JSON/v1/images/editsendpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim).role: 'mask'/'control'throw. Theirsizeuses anaspectRatio_resolutiontemplate ('16:9_2k', suffix optional) mirroring Gemini's native image models.grok-2-image-1212remains text-to-image only.text/image_urlchat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process);datasources become data URIs.A new
resolveMediaPrompt()utility (exported from@tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.On the client side,
ImageGenerateInput.promptandVideoGenerateInput.prompt(@tanstack/ai-client, and theuseGenerateImage/useGenerateVideohooks built on them) are widened fromstringto the sameMediaPromptshape, so prompt parts can be sent from the browser through your server route togenerateImage()/generateVideo().Closes #618.
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-openrouter@0.14.0
Minor Changes
#624
8fa6cc5-generateImage()andgenerateVideo()now accept a multimodalprompt: a plain string, or an ordered array of content parts (TextPart/ImagePart/VideoPart/AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry ametadata.rolehint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informationalmetadata.taglabel for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance
@Image1, OpenAI / FLUX.2"image 1"prose, Gemini content descriptions); see the image-generation docs for the per-provider table.Provider behavior in this release:
gpt-image-2/gpt-image-1/gpt-image-1-minitoimages.edit()(up to 16 source images plus optional mask);dall-e-2routes toimages.edit()with one source image;dall-e-3rejects image parts at compile time and at runtime.input_reference; passing more than one throws.gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodalcontents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).image_urls, Kling i2v start frame →image_url, Veo first-last-frame →first_frame_url/last_frame_url). Defaults for endpoints not in the map: single →image_url, multiple →image_urls;role: 'mask'→mask_url;role: 'control'→control_image_url;role: 'reference'/'character'→reference_image_urls; videorole: 'start_frame'/'end_frame'→start_image_url/end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump withpnpm generate:fal-image-fields(a unit test fails when it goes stale). InFalImageProviderOptions/FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url,start_image_url,video_url,audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly viamodelOptions.grok-imagine-image/grok-imagine-image-qualitymodels. Prompts with image parts route to xAI's JSON/v1/images/editsendpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim).role: 'mask'/'control'throw. Theirsizeuses anaspectRatio_resolutiontemplate ('16:9_2k', suffix optional) mirroring Gemini's native image models.grok-2-image-1212remains text-to-image only.text/image_urlchat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process);datasources become data URIs.A new
resolveMediaPrompt()utility (exported from@tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.On the client side,
ImageGenerateInput.promptandVideoGenerateInput.prompt(@tanstack/ai-client, and theuseGenerateImage/useGenerateVideohooks built on them) are widened fromstringto the sameMediaPromptshape, so prompt parts can be sent from the browser through your server route togenerateImage()/generateVideo().Closes #618.
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-angular@0.1.4
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-anthropic@0.15.5
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-code-mode@0.2.9
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-code-mode-skills@0.2.9
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-devtools-core@0.4.12
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-elevenlabs@0.2.24
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-event-client@0.6.3
Patch Changes
#624
8fa6cc5-generateImage()andgenerateVideo()now accept a multimodalprompt: a plain string, or an ordered array of content parts (TextPart/ImagePart/VideoPart/AudioPart) for image-conditioned generation, image-to-image, multi-reference, image-to-video, and edit / inpaint flows. Part order is meaningful — "not like this (image), more like this (image)" — and each media part may carry ametadata.rolehint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field, plus an informationalmetadata.taglabel for your own bookkeeping. The accepted part types are narrowed per model at compile time via each adapter's input-modality map, so passing an image part to a text-only model is a type error (with a clear runtime throw as backstop).Prompt text is always sent verbatim — the SDK never injects or rewrites in-prompt referencing markers. To reference inputs from your prompt, write the provider's own convention (fal Kling / Seedance
@Image1, OpenAI / FLUX.2"image 1"prose, Gemini content descriptions); see the image-generation docs for the per-provider table.Provider behavior in this release:
gpt-image-2/gpt-image-1/gpt-image-1-minitoimages.edit()(up to 16 source images plus optional mask);dall-e-2routes toimages.edit()with one source image;dall-e-3rejects image parts at compile time and at runtime.input_reference; passing more than one throws.gemini-*-flash-image, "nano-banana") map prompt parts 1:1 onto multimodalcontents, preserving interleaved order. Imagen is text-only (compile-time + runtime rejection).image_urls, Kling i2v start frame →image_url, Veo first-last-frame →first_frame_url/last_frame_url). Defaults for endpoints not in the map: single →image_url, multiple →image_urls;role: 'mask'→mask_url;role: 'control'→control_image_url;role: 'reference'/'character'→reference_image_urls; videorole: 'start_frame'/'end_frame'→start_image_url/end_image_url. Per-model prompt modalities are derived at the type level from the SDK's endpoint input types. Regenerate the map after a fal SDK bump withpnpm generate:fal-image-fields(a unit test fails when it goes stale). InFalImageProviderOptions/FalVideoProviderOptions, media-conditioning fields the mappers can populate (image_url,start_image_url,video_url,audio_url, …) are demoted from required to optional — supply them as prompt parts, or keep passing them explicitly viamodelOptions.grok-imagine-image/grok-imagine-image-qualitymodels. Prompts with image parts route to xAI's JSON/v1/images/editsendpoint (up to 3 source images, addressed by xAI in request order; the prompt is sent verbatim).role: 'mask'/'control'throw. Theirsizeuses anaspectRatio_resolutiontemplate ('16:9_2k', suffix optional) mirroring Gemini's native image models.grok-2-image-1212remains text-to-image only.text/image_urlchat content parts, preserving interleaved order, and are forwarded to the underlying image model. URL sources pass through verbatim (no fetching or re-encoding in your process);datasources become data URIs.A new
resolveMediaPrompt()utility (exported from@tanstack/ai) is the single downrev point from the canonical interleaved prompt shape to flattened text + per-modality part buckets, for adapter authors.On the client side,
ImageGenerateInput.promptandVideoGenerateInput.prompt(@tanstack/ai-client, and theuseGenerateImage/useGenerateVideohooks built on them) are widened fromstringto the sameMediaPromptshape, so prompt parts can be sent from the browser through your server route togenerateImage()/generateVideo().Closes #618.
Updated dependencies [
8fa6cc5,8fa6cc5]:@tanstack/ai-groq@0.4.6
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-isolate-cloudflare@0.2.25
Patch Changes
@tanstack/ai-isolate-node@0.1.34
Patch Changes
@tanstack/ai-isolate-quickjs@0.1.34
Patch Changes
@tanstack/ai-mcp@0.1.4
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-ollama@0.8.5
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-preact@0.9.9
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-react@0.15.9
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-react-ui@0.8.9
Patch Changes
8fa6cc5]:@tanstack/ai-solid@0.13.9
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-solid-ui@0.7.9
Patch Changes
8fa6cc5]:@tanstack/ai-svelte@0.13.9
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-vue@0.13.9
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/ai-vue-ui@0.2.21
Patch Changes
@tanstack/openai-base@0.8.5
Patch Changes
8fa6cc5,8fa6cc5]:@tanstack/preact-ai-devtools@0.1.55
Patch Changes
@tanstack/react-ai-devtools@0.2.55
Patch Changes
@tanstack/solid-ai-devtools@0.2.55
Patch Changes