Hume AI¶

Hume AI provides expressive text-to-speech via Octave models with emotional control. Tarash uses the official hume Python SDK.

TTS Only

Hume AI supports text-to-speech (TTS) only. Speech-to-speech (STS) is not available.

Installation¶

Install tarash-gateway with the Hume extra:

pip install tarash-gateway[hume]

This installs the hume SDK (>=0.9.0).

Quick Example¶

from tarash.tarash_gateway import generate_tts
from tarash.tarash_gateway.models import AudioGenerationConfig, TTSRequest

config = AudioGenerationConfig(
    provider="hume",
    model="octave-2",
    api_key="YOUR_HUME_KEY",
)

request = TTSRequest(
    text="Welcome to the future of expressive speech synthesis!",
    voice_id="Kora",
    voice_settings={"description": "excited and warm", "speed": 1.1},
)

response = generate_tts(config, request)
print(f"Duration: {response.duration}s")

Supported Models¶

Model ID	Version	Notes
`octave-2` / `hume-v2`	2	Latest Octave model
`octave-1` / `hume-v1`	1	Previous Octave model

The model version is extracted automatically from the model name. For example, octave-2 resolves to version "2" and hume-v1 resolves to version "1".

Parameters¶

Parameter	TTSRequest field	Notes
Text	`text`	The text to synthesize into speech
Voice	`voice_id`	Voice name (default) or voice ID (see Voice Lookup Modes below)
Output format	`output_format`	`AudioOutputFormat(format="mp3")` — passed as `{"type": format}` to the API
Description	`voice_settings={"description": "..."}`	Voice style guidance (e.g., `"excited and warm"`, `"calm and soothing"`)
Speed	`voice_settings={"speed": 1.2}`	Speech rate multiplier
Trailing silence	`voice_settings={"trailing_silence": 0.5}`	Seconds of silence appended after the utterance

Extra Parameters¶

Advanced or Hume-specific parameters can be passed through extra_params:

request = TTSRequest(
    text="Hello world",
    voice_id="Kora",
    extra_params={
        "context": {"text": "Previously on the show..."},
    },
)

Voice Lookup Modes¶

Hume supports three ways to specify a voice, controlled via voice_settings:

Name Lookup (default)¶

By default, voice_id is treated as a voice name with the HUME_AI provider:

request = TTSRequest(
    text="Hello!",
    voice_id="Kora",  # Looked up by name
)
# Sends: {"name": "Kora", "provider": "HUME_AI"}

ID Lookup¶

Set voice_id_mode to "id" to look up the voice by its unique identifier:

request = TTSRequest(
    text="Hello!",
    voice_id="a1b2c3d4-...",
    voice_settings={"voice_id_mode": "id"},
)
# Sends: {"id": "a1b2c3d4-...", "provider": "HUME_AI"}

Custom Voice Provider¶

Override the voice provider with voice_provider:

request = TTSRequest(
    text="Hello!",
    voice_id="my-custom-voice",
    voice_settings={"voice_provider": "CUSTOM"},
)
# Sends: {"name": "my-custom-voice", "provider": "CUSTOM"}

You can combine voice_id_mode and voice_provider:

request = TTSRequest(
    text="Hello!",
    voice_id="a1b2c3d4-...",
    voice_settings={"voice_id_mode": "id", "voice_provider": "CUSTOM"},
)
# Sends: {"id": "a1b2c3d4-...", "provider": "CUSTOM"}

Provider-Specific Notes¶

SDK requirement: Install the hume package via the extra: pip install tarash-gateway[hume].

Authentication: api_key must be passed explicitly in AudioGenerationConfig. There is no automatic environment variable reading.

Utterances: Text is wrapped into a single PostedUtterance object internally. Voice specification, description, speed, and trailing silence are all set on the utterance.

Response metadata: The response includes duration from the Hume generation object, along with generation_id, file_size, and encoding details in raw_response.

No streaming: Hume returns the complete audio in a single response. The on_progress callback is accepted but unused.