Hume AI¶
Hume AI provides expressive text-to-speech via Octave models with emotional control. Tarash uses the official hume Python SDK.
TTS Only
Hume AI supports text-to-speech (TTS) only. Speech-to-speech (STS) is not available.
Installation¶
Install tarash-gateway with the Hume extra:
This installs the hume SDK (>=0.9.0).
Quick Example¶
from tarash.tarash_gateway import generate_tts
from tarash.tarash_gateway.models import AudioGenerationConfig, TTSRequest
config = AudioGenerationConfig(
provider="hume",
model="octave-2",
api_key="YOUR_HUME_KEY",
)
request = TTSRequest(
text="Welcome to the future of expressive speech synthesis!",
voice_id="Kora",
voice_settings={"description": "excited and warm", "speed": 1.1},
)
response = generate_tts(config, request)
print(f"Duration: {response.duration}s")
Supported Models¶
| Model ID | Version | Notes |
|---|---|---|
octave-2 / hume-v2 |
2 | Latest Octave model |
octave-1 / hume-v1 |
1 | Previous Octave model |
The model version is extracted automatically from the model name. For example, octave-2 resolves to version "2" and hume-v1 resolves to version "1".
Parameters¶
| Parameter | TTSRequest field | Notes |
|---|---|---|
| Text | text |
The text to synthesize into speech |
| Voice | voice_id |
Voice name (default) or voice ID (see Voice Lookup Modes below) |
| Output format | output_format |
AudioOutputFormat(format="mp3") — passed as {"type": format} to the API |
| Description | voice_settings={"description": "..."} |
Voice style guidance (e.g., "excited and warm", "calm and soothing") |
| Speed | voice_settings={"speed": 1.2} |
Speech rate multiplier |
| Trailing silence | voice_settings={"trailing_silence": 0.5} |
Seconds of silence appended after the utterance |
Extra Parameters¶
Advanced or Hume-specific parameters can be passed through extra_params:
request = TTSRequest(
text="Hello world",
voice_id="Kora",
extra_params={
"context": {"text": "Previously on the show..."},
},
)
Voice Lookup Modes¶
Hume supports three ways to specify a voice, controlled via voice_settings:
Name Lookup (default)¶
By default, voice_id is treated as a voice name with the HUME_AI provider:
request = TTSRequest(
text="Hello!",
voice_id="Kora", # Looked up by name
)
# Sends: {"name": "Kora", "provider": "HUME_AI"}
ID Lookup¶
Set voice_id_mode to "id" to look up the voice by its unique identifier:
request = TTSRequest(
text="Hello!",
voice_id="a1b2c3d4-...",
voice_settings={"voice_id_mode": "id"},
)
# Sends: {"id": "a1b2c3d4-...", "provider": "HUME_AI"}
Custom Voice Provider¶
Override the voice provider with voice_provider:
request = TTSRequest(
text="Hello!",
voice_id="my-custom-voice",
voice_settings={"voice_provider": "CUSTOM"},
)
# Sends: {"name": "my-custom-voice", "provider": "CUSTOM"}
You can combine voice_id_mode and voice_provider:
request = TTSRequest(
text="Hello!",
voice_id="a1b2c3d4-...",
voice_settings={"voice_id_mode": "id", "voice_provider": "CUSTOM"},
)
# Sends: {"id": "a1b2c3d4-...", "provider": "CUSTOM"}
Provider-Specific Notes¶
SDK requirement: Install the hume package via the extra: pip install tarash-gateway[hume].
Authentication: api_key must be passed explicitly in AudioGenerationConfig. There is no automatic environment variable reading.
Utterances: Text is wrapped into a single PostedUtterance object internally. Voice specification, description, speed, and trailing silence are all set on the utterance.
Response metadata: The response includes duration from the Hume generation object, along with generation_id, file_size, and encoding details in raw_response.
No streaming: Hume returns the complete audio in a single response. The on_progress callback is accepted but unused.