Video Generation Node - Builder Studio Docs

The Video Generation node creates a new video from a text prompt and optional reference media. It is provider-bound and runs asynchronously: the node submits a job, polls the provider for status, and stores the finished video on the canvas as a synced media asset.

Generation is separate from the Video node

Use Video Generation to synthesize a new clip. Use the Video node to store, play, and pass through a video that already exists on the canvas. The generated output is written as durable media that downstream nodes consume on the video-out port.

What it does

Resolves a prompt from the node field or a connected text input.
Resolves the provider and model from the node preset and submits a generation job to the matching API route.
Resolves reference media — a start frame, an optional last frame, and for reference-to-video models, arrays of reference images, videos, and audio.
Polls the provider until the video is ready, then stores it durably and emits it on video-out.

Providers

Provider	Route	Models
Google	`/api/generate-video`	Google Veo through the Google GenAI SDK. Supports a primary reference image, additional reference images, and a last-frame image, validated against per-model capabilities.
fal	`/api/fal/generate-video`	Kling, Seedance, MiniMax, and Wan endpoints. The default endpoint is `bytedance/seedance-2.0/reference-to-video` when none is provided.

Two providers, not five

Like image generation, video generation resolves only the google and fal providers in the current build.

fal models

These fal video endpoints have first-class request contracts that define their generation mode and which reference fields they accept. Endpoints outside the contract list are accepted as-is with an image_url reference field default.

Model	Endpoint id	Mode
Kling 3.0 Pro	`fal-ai/kling-video/v3/pro/image-to-video`	Image-to-video
Kling 3.0 Standard	`fal-ai/kling-video/v3/standard/image-to-video`	Image-to-video
Seedance 2.0	`bytedance/seedance-2.0/image-to-video`, `bytedance/seedance-2.0/reference-to-video`	Image-to-video, reference-to-video
Seedance 2.0 Fast	`bytedance/seedance-2.0/fast/text-to-video`, `.../fast/image-to-video`, `.../fast/reference-to-video`	Text, image, and reference to video
MiniMax Hailuo 2.3	`fal-ai/minimax/hailuo-2.3/standard/text-to-video`	Text-to-video
Wan 2.7	`fal-ai/wan/v2.7/text-to-video`, `.../image-to-video`, `.../reference-to-video`	Text, image, and reference to video

Generation modes

Text-to-video — prompt only. Used by MiniMax, Seedance Fast text-to-video, and Wan text-to-video.
Image-to-video — a start frame drives motion from the prompt. Image-to-video endpoints require a reference image, and most also accept a last-frame image (end_image_url or tail_image_url).
Reference-to-video — arrays of reference images, videos, and audio guide the result. Reference audio requires at least one visual reference to accompany it.

fal reference media limits

Seedance reference-to-video accepts up to 9 reference images, 3 videos, and 3 audio files, capped at 12 reference files total. Wan reference-to-video accepts up to 3 reference images and 3 videos, capped at 3 files total. Reference URLs must be public HTTPS and pass the outbound URL safety checks.

Inputs

Prompt — required, under the shared maximum prompt length.
Reference image — required for image-to-video models, optional otherwise, passed inline or as an HTTPS URL.
Aspect ratio — applied when the endpoint supports it. Seedance and Wan publish their supported ratios; unsupported ratios are rejected.
Model parameters — endpoint-specific extras such as duration, resolution, generate_audio, negative_prompt, cfg_scale, and seed. Keys unsupported by the chosen endpoint are filtered out.

Outputs

On success the node emits the finished video on the video-out port and stores it as durable media. The same asset is reusable by later sessions and by other nodes, including the Video node.

Run a generation

Add the node and set a prompt
Add an AI Video node and type a prompt or connect a text source. The configuration panel opens automatically.
Pick a model and reference media
Choose the provider preset and model. For image-to-video, connect a reference image. For reference-to-video, supply reference images, videos, or audio within the model's limits.
Submit and wait
Running the node submits an asynchronous job and returns a job id with a signed status token. The node polls the provider status endpoint until the video completes or fails.
Reuse the output
The stored video flows out of video-out for use in downstream steps.

Async and polling

Both providers run asynchronously. The route's POST half submits the job and returns a job id, a signed status token, and a poll interval. A separate status request (the route's GET half) re-checks provider and model policy, polls the job, and on completion downloads the result into durable storage. Long jobs run under an extended request timeout.

Agent and API notes

Agents set the node prompt, provider preset, and reference media, then run the node and consume video-out. The example below is a direct fal video route body. Reserved internal keys are stripped, unknown model parameters for the chosen endpoint are dropped, every reference URL is safety-checked, and the request is bound to a canvas.

1{2  "prompt": "slow dolly across a neon-lit street at night",3  "endpointId": "fal-ai/kling-video/v3/pro/image-to-video",4  "imageUrl": "https://example.com/reference-frame.png",5  "modelParams": {6    "duration": 5,7    "generate_audio": false8  }9}

Was this page helpful?

What it does

Providers

fal models

Generation modes

Inputs

Outputs

Run a generation

Add the node and set a prompt

Pick a model and reference media

Submit and wait

Reuse the output

Async and polling

Agent and API notes