Replicate¶

Replicate is a platform for running open-source AI models. Tarash supports video generation via Kling, Kling Lip Sync, Luma Dream Machine, Minimax (Hailuo), Wan, and Google Veo 3.

Installation¶

pip install tarash-gateway[replicate]

Quick Example¶

from tarash.tarash_gateway import generate_video
from tarash.tarash_gateway.models import (
    VideoGenerationConfig,
    VideoGenerationRequest,
    ImageType,
)

config = VideoGenerationConfig(
    provider="replicate",
    model="kwaivgi/kling-v2.1",
    api_key="YOUR_REPLICATE_TOKEN",
)

# Kling requires an image input
request = VideoGenerationRequest(
    prompt="The kite soars higher into the stormy sky",
    duration_seconds=5,
    image_list=[
        ImageType(image="https://example.com/kite.jpg", type="first_frame"),
    ],
)

response = generate_video(config, request)
print(response.video)

Google Veo 3 via Replicate¶

config = VideoGenerationConfig(
    provider="replicate",
    model="google/veo-3",
    api_key="YOUR_REPLICATE_TOKEN",
)

request = VideoGenerationRequest(
    prompt="A bamboo forest in early morning mist",
    duration_seconds=8,
    aspect_ratio="16:9",
)

response = generate_video(config, request)

Kling Lip Sync via Replicate¶

config = VideoGenerationConfig(
    provider="replicate",
    model="kwaivgi/kling-lip-sync",
    api_key="YOUR_REPLICATE_TOKEN",
)

# Audio-driven lip sync
request = VideoGenerationRequest(
    prompt="lipsync",
    video="https://example.com/talking-head.mp4",
    extra_params={
        "audio_file": "https://example.com/speech.mp3",
    },
)

response = generate_video(config, request)

Text-to-speech lip sync (no audio file needed):

request = VideoGenerationRequest(
    prompt="lipsync",
    video="https://example.com/talking-head.mp4",
    extra_params={
        "text": "Hello, this is a lip sync demo!",
        "voice_id": "en_AOT",
        "voice_speed": 1.0,
    },
)

Parameters¶

Parameter	Required	Supported	Models	Notes
`prompt`	✅	✅	All	Text description of the video
`duration_seconds`	—	✅	Kling, Minimax, Veo3	Integer seconds
`image_list` (first_frame)	—	✅	Kling, Luma	Start frame
`image_list` (last_frame)	—	✅	Luma	End frame
`image_list` (reference)	—	✅	Minimax	Reference image
`enhance_prompt`	—	✅	Minimax	As `prompt_optimizer`
`aspect_ratio`	—	✅	Luma, Veo3	Passed through
`video`	—	✅	Kling Lip Sync	Input video URL for lip sync
`extra_params.audio_file`	—	✅	Kling Lip Sync	Audio file URL (.mp3/.wav/.m4a/.aac)
`extra_params.text`	—	✅	Kling Lip Sync	Text for TTS (if no audio)
`extra_params.voice_id`	—	✅	Kling Lip Sync	Voice ID for TTS (default: `en_AOT`)
`extra_params.voice_speed`	—	✅	Kling Lip Sync	TTS speech rate (0.8–2.0)
`extra_params.video_id`	—	✅	Kling Lip Sync	Kling video ID (alt to `video`)
`seed`	—	—	—
`negative_prompt`	—	—	—
`generate_audio`	—	—	—

Supported Models¶

Model names on Replicate often include version hashes (e.g., minimax/video-01:abc123). Tarash strips the hash before registry lookup, then uses prefix matching so you can pass version-pinned names without changing config.

Model ID / Prefix	Duration Options	Image-to-Video	Notes
`kwaivgi/kling-lip-sync`	—	—	Kling Lip Sync. Video + audio or text+TTS input.
`kwaivgi/kling`	5s, 10s	✅	Kling v2.1. Image input required.
`luma/`	—	✅	Matches any `luma/*` model (Dream Machine)
`minimax/`	6s, 10s	✅	Matches any `minimax/*` model
`hailuo/`	6s, 10s	✅	Matches any `hailuo/*` model
`wan-video/`	—	✅	Wan video models
`google/veo-3`	4s, 6s, 8s	✅	Google Veo 3 via Replicate

Example with version hash:

config = VideoGenerationConfig(
    provider="replicate",
    model="minimax/video-01:abc123def456",  # Hash stripped, matches "minimax/" prefix
    api_key="...",
)

Provider-Specific Notes¶

Kling Lip Sync supports two input modes. Provide either audio_file (audio-driven) or text + voice_id (TTS-driven). Video input can be a URL via the video field, or a Kling-generated video via extra_params.video_id. Video should be 2–10 seconds, 720p–1080p, under 100MB.

Kling v2.1 requires image input. The kwaivgi/kling model only supports image-to-video. If no image is provided in image_list, a ValidationError is raised.

Manual polling. Unlike Fal's event streaming, Replicate uses a manual status polling loop. Tarash checks the prediction status every poll_interval seconds up to max_poll_attempts times. Terminal statuses: succeeded, failed, canceled.

Version hash handling. Model names with : are split on : to strip version hashes before registry lookup:

minimax/video-01:abc123  →  lookup: minimax/video-01  →  prefix match: minimax/

Generic fallback. For models not in the registry, Tarash applies generic mappers that pass prompt, seed, negative_prompt, and aspect_ratio through unchanged, and drops everything else.