Skip to content

ElevenLabs

ElevenLabs provides high-quality text-to-speech (TTS) and speech-to-speech (STS) audio generation. This provider uses the elevenlabs Python SDK.

Audio only

ElevenLabs supports TTS and STS audio generation. Use this provider with AudioGenerationConfig and generate_tts() / generate_sts().


Installation

pip install tarash-gateway[elevenlabs]

Quick Example — Text-to-Speech (TTS)

from tarash.tarash_gateway import generate_tts
from tarash.tarash_gateway.models import (
    AudioGenerationConfig,
    AudioOutputFormat,
    TTSRequest,
)

config = AudioGenerationConfig(
    provider="elevenlabs",
    model="eleven_multilingual_v2",
    api_key="YOUR_ELEVENLABS_KEY",
)

request = TTSRequest(
    text="Hello, welcome to Tarash Gateway!",
    voice_id="21m00Tcm4TlvDq8ikWAM",  # "Rachel" voice
    output_format=AudioOutputFormat(format="mp3", sample_rate=44100, bitrate=128),
)

response = generate_tts(config, request)
print(response.request_id)    # Unique request ID
print(response.content_type)  # e.g. "audio/mpeg"
# response.audio contains base64-encoded audio bytes

Quick Example — Speech-to-Speech (STS)

from tarash.tarash_gateway import generate_sts
from tarash.tarash_gateway.models import (
    AudioGenerationConfig,
    AudioOutputFormat,
    STSRequest,
)

config = AudioGenerationConfig(
    provider="elevenlabs",
    model="eleven_multilingual_v2",
    api_key="YOUR_ELEVENLABS_KEY",
)

request = STSRequest(
    audio="https://example.com/input-speech.mp3",  # URL, bytes, or base64
    voice_id="21m00Tcm4TlvDq8ikWAM",
    output_format=AudioOutputFormat(format="mp3", sample_rate=44100, bitrate=128),
)

response = generate_sts(config, request)
print(response.request_id)
# response.audio contains base64-encoded audio bytes

Async variants are also available:

from tarash.tarash_gateway import generate_tts_async, generate_sts_async

response = await generate_tts_async(config, tts_request)
response = await generate_sts_async(config, sts_request)

TTS Parameters

Parameter Required Supported Notes
text Yes Yes The text to convert to speech
voice_id Yes Yes ElevenLabs voice identifier
output_format Yes AudioOutputFormat with format, sample_rate, bitrate
voice_settings Yes Dict with stability, similarity_boost, etc.
language_code Yes Language hint (e.g. "en", "es", "fr")
extra_params Yes Any additional ElevenLabs API parameters

STS Parameters

Parameter Required Supported Notes
audio Yes Yes Input audio — URL, raw bytes, or base64 string
voice_id Yes Yes ElevenLabs voice identifier
output_format Yes AudioOutputFormat with format, sample_rate, bitrate
voice_settings Yes Dict with stability, similarity_boost, etc. (JSON-encoded for STS multipart form)
extra_params Yes Any additional ElevenLabs API parameters

Output Format

The output_format is converted to an ElevenLabs format string by joining the parts with underscores:

AudioOutputFormat(format="mp3", sample_rate=44100, bitrate=128)  # → "mp3_44100_128"
AudioOutputFormat(format="pcm", sample_rate=16000)               # → "pcm_16000"
AudioOutputFormat(format="mp3")                                  # → "mp3"

Voice Settings

Pass provider-specific voice tuning via voice_settings:

request = TTSRequest(
    text="Hello!",
    voice_id="21m00Tcm4TlvDq8ikWAM",
    voice_settings={
        "stability": 0.5,
        "similarity_boost": 0.75,
        "style": 0.0,
        "use_speaker_boost": True,
    },
)

Supported Models

ElevenLabs model IDs are not hardcoded — any valid ElevenLabs model ID can be passed in the config. Common models include:

Model ID Quality Latency Notes
eleven_multilingual_v2 Highest Standard 29 languages, best quality
eleven_flash_v2_5 Good Low Optimized for low latency
eleven_turbo_v2_5 High Medium Balance of speed and quality
config = AudioGenerationConfig(
    provider="elevenlabs",
    model="eleven_flash_v2_5",  # Any valid ElevenLabs model ID
    api_key="YOUR_ELEVENLABS_KEY",
)

Provider-Specific Notes

Authentication: The api_key must always be passed explicitly in the AudioGenerationConfig. There is no automatic environment variable fallback.

No progress callbacks: ElevenLabs streams audio chunks directly with no status or polling events. Progress callbacks (on_progress) are accepted for interface compatibility but are not invoked.

STS audio input: The audio field in STSRequest accepts multiple formats:

  • A URL string (e.g. "https://example.com/audio.mp3") — downloaded automatically
  • Raw bytes
  • A base64-encoded string
  • A dict with a "content" key containing raw bytes

Response format: Audio is returned as a base64-encoded string in response.audio. The content_type field indicates the MIME type (e.g. "audio/mpeg" for MP3).

Timeout configuration: Use AudioGenerationConfig.timeout to control the maximum wait time in seconds (default: 240).