Skip to content

OmniHuman (via Fal.ai)

ByteDance OmniHuman generates vivid, high-quality videos from a human figure image paired with an audio file. The character's emotions and movements maintain a strong correlation with the audio, producing synchronized lip movements and expressive body motion.

Quick Example

from tarash.tarash_gateway import generate_video
from tarash.tarash_gateway.models import VideoGenerationConfig, VideoGenerationRequest

config = VideoGenerationConfig(
    provider="fal",
    model="fal-ai/bytedance/omnihuman/v1.5",
    api_key="YOUR_FAL_KEY",
)

request = VideoGenerationRequest(
    prompt="A person speaking naturally",
    image_list=[{"type": "reference", "image": "https://example.com/person.jpg"}],
    resolution="720p",
    extra_params={
        "audio_url": "https://example.com/speech.mp3",
        "turbo_mode": True,
    },
)

response = generate_video(config, request)
print(response.video)

Supported Models

Model ID Cost Notes
fal-ai/bytedance/omnihuman/v1.5 $0.16/sec Latest version, improved quality
fal-ai/bytedance/omnihuman $0.16/sec Original version

Parameters

Parameter Field Required Notes
image_url image_list (type: reference) Human figure image URL
audio_url extra_params Audio file URL (max 30s for 1080p, 60s for 720p)
prompt prompt Optional text guidance for video generation
resolution resolution 720p or 1080p (default: 1080p)
turbo_mode extra_params Faster generation with slight quality trade-off