Audio Generation Node
Generate speech and audio with ElevenLabs TTS.
Audio generation produces spoken voice on the canvas through ElevenLabs. It covers two voice modes: text-to-speech, which narrates a prompt with a chosen voice, and speech-to-speech, which re-voices an input recording while preserving its delivery. Both run through the shared /api/generate-audio route and store the result as durable media.
What it does
- Text-to-speech — converts a prompt into narrated audio with a selected voice and model.
- Speech-to-speech — converts an input recording into the same content spoken by a target voice. The Speech to Speech node reads its audio from a connected
audio-ininput. - Decodes the returned audio, stores it durably, and emits it on the
audio-outport.
Provider and defaults
Audio generation is bound to ElevenLabs. When the node does not specify a voice or model, the route applies these defaults.
| Setting | Default | Applies to |
|---|---|---|
| Voice | EXAVITQu4vr4xnSDxMaL (Sarah) | Text-to-speech and speech-to-speech |
| Text-to-speech model | eleven_multilingual_v2 | Text-to-speech |
| Speech-to-speech model | eleven_multilingual_sts_v2 | Speech-to-speech |
| Output format | audio/mpeg (MP3) | All modes |
Inputs
- Prompt / text — required for text-to-speech. The route accepts either
textorprompt. - Audio input — required for speech-to-speech, supplied as inline base64 (
audioBase64) or a public HTTPS URL (audioUrl). The Speech to Speech node resolves this from its connected audio input. - Voice — an ElevenLabs voice id. The route enforces voice usage access when shared credentials are used.
- Voice settings — optional stability, similarity boost, style, speed, and speaker boost, each clamped to its valid range.
- Text-to-speech extras — language code, seed, surrounding text for continuity, request id stitching, pronunciation dictionaries, and text-normalization options.
Outputs
On success the node emits MP3 audio on the audio-out port and stores it as durable media. Downstream nodes consume it like any other audio asset, and it remains reusable across sessions.
Generate voice audio
Add a voice node
Add a Text to Speech or Speech to Speech node. The configuration panel opens automatically so you can choose a voice and model.
Provide the source
For text-to-speech, type a prompt or connect a text source. For speech-to-speech, connect an audio input on
audio-in.Tune voice and settings
Pick a voice and adjust stability, similarity, style, or speed. Speech-to-speech can also remove background noise from the source recording.
Run and reuse
Run the node to generate audio. The stored result flows out of
audio-outfor downstream steps.
Agent and API notes
Both modes post to /api/generate-audio with a mode field. The route is rate-limited, bound to a canvas for credential scoping, and idempotent per provider operation so a retried run does not double-charge. The examples below show a text-to-speech and a speech-to-speech body.
{ "mode": "tts", "text": "Welcome to Builder Studio.", "voice": "EXAVITQu4vr4xnSDxMaL", "model": "eleven_multilingual_v2"}{ "mode": "sts", "voice": "EXAVITQu4vr4xnSDxMaL", "model": "eleven_multilingual_sts_v2", "audioUrl": "https://example.com/source-recording.mp3"}Was this page helpful?