- Integrations
- /
- ElevenLabs
- /
- Actions
- /
- Text to Speech
ActionElevenLabsUpdated May 2026
How do I generate speech with ElevenLabs?
Short answer: Drop the "ElevenLabs → Text to Speech" action anywhere in your workflow, map the inputs from upstream nodes, and publish.
Inputs
The fields this action accepts.
Every field can be mapped from an upstream trigger, AI step, table row, or hard-coded literal.
| Field | Type | Required | Description |
|---|---|---|---|
Voice ID voice_id | string | Required | The voice to use. Find voice IDs via the List Voices operation or your ElevenLabs dashboard. |
Text text | string | Required | The text to convert to speech (max 5000 chars for standard plan) |
Model model_id | options | Optional | Model. Options: Multilingual v2 (highest quality), Turbo v2.5 (low latency), Monolingual v1 (English only) |
Stability stability | string | Optional | Voice stability (0.0 to 1.0). Lower = more expressive, higher = more consistent. |
Similarity Boost similarity_boost | string | Optional | Voice clarity and similarity (0.0 to 1.0). Higher = closer to original voice. |
Output Format output_format | options | Optional | Output Format. Options: MP3 (44.1kHz, 128kbps), MP3 (44.1kHz, 192kbps), PCM (16kHz), PCM (44.1kHz) |
Sample request
{"voice_id": "e.g. 21m00Tcm4TlvDq8ikWAM","text": "e.g. Hello, welcome to our platform. We're glad to have you here.","model_id": "{{trigger.model_id}}","stability": "e.g. 0.5","similarity_boost": "e.g. 0.75"}
Returns
{"note": "Binary audio data — pipe to a file or downstream service","content_type": "audio/mpeg"}
Use these fields in downstream nodes for routing, logging, or error handling.
Triggered by
Apps that pair well as the trigger for Text to Speech.
Any of these apps can fire this action as part of a workflow.
FAQ
Questions about Text to Speech.
What does the Text to Speech action do in ElevenLabs?
Generates high-quality audio from text using ElevenLabs voices (stock or cloned). The premium-quality TTS option vs faster/cheaper alternatives — for broadcast-quality narration, audiobook production, or branded voice content.
What inputs does Text to Speech require?
Required: Voice ID, Text. Every input accepts a static value or a variable from any upstream node in your workflow.
Can I use dynamic inputs from earlier workflow nodes?
Yes. Any field on this action can pull values from upstream nodes, whether that's a form response, a trigger payload, an AI output, or a lookup result.
What happens if ElevenLabs returns an error?
The workflow pauses on the failed node, the error message is captured in the run log, and you can retry the run with one click. Auto-retry policies are configurable per workflow with exponential backoff up to 5 attempts.
Does Text to Speech support batch operations?
Yes. Run Text to Speech inside a Loop node to process arrays. Tiny Command handles ElevenLabs's rate limits automatically so you don't have to throttle manually.
More actions
Other ElevenLabs actions.
Action
Instant Voice Clone
Trains a custom voice from a short audio sample (a few minutes ideal). Once trained, the voice ID is usable in Text to Speech for consistent narration. For brand-voice or character-voice production at scale.
ActionGenerate Sound Effect
Generates short audio effects from a text prompt ("door creak", "thunder", "applause"). Useful for video production workflows that need sound effects alongside generated narration.
ActionGet ElevenLabs Account Info
Returns the connected account's details including remaining character quota for the current period. Pre-flight on bulk-narration batches to avoid mid-batch quota exhaustion.
ActionGet Voice
Returns a voice's metadata — name, gender, age, accent, description. Useful for "show me the voice configuration before generating" workflows.
ActionList Voices
Returns stock voices plus any custom-cloned voices for the account. Useful for voice-picker UIs at workflow setup or for inventorying available voice catalogs.
ActionSpeech to Text
Transcribes audio using ElevenLabs' speech recognition. While ElevenLabs is better known for TTS, their STT is competitive with Deepgram/AssemblyAI for specific use cases. Useful for unified ElevenLabs-only voice-agent workflows.
Send text to speech from your workflows.
Triggered by anything in the catalog. Free tier available. No credit card.