OmniHuman (via Fal.ai)
ByteDance OmniHuman generates vivid, high-quality videos from a human figure image paired with an audio file. The character's emotions and movements maintain a strong correlation with the audio, producing synchronized lip movements and expressive body motion.
Quick Example
from tarash.tarash_gateway import generate_video
from tarash.tarash_gateway.models import VideoGenerationConfig, VideoGenerationRequest
config = VideoGenerationConfig(
provider="fal",
model="fal-ai/bytedance/omnihuman/v1.5",
api_key="YOUR_FAL_KEY",
)
request = VideoGenerationRequest(
prompt="A person speaking naturally",
image_list=[{"type": "reference", "image": "https://example.com/person.jpg"}],
resolution="720p",
extra_params={
"audio_url": "https://example.com/speech.mp3",
"turbo_mode": True,
},
)
response = generate_video(config, request)
print(response.video)
Supported Models
| Model ID |
Cost |
Notes |
fal-ai/bytedance/omnihuman/v1.5 |
$0.16/sec |
Latest version, improved quality |
fal-ai/bytedance/omnihuman |
$0.16/sec |
Original version |
Parameters
| Parameter |
Field |
Required |
Notes |
image_url |
image_list (type: reference) |
✅ |
Human figure image URL |
audio_url |
extra_params |
✅ |
Audio file URL (max 30s for 1080p, 60s for 720p) |
prompt |
prompt |
— |
Optional text guidance for video generation |
resolution |
resolution |
— |
720p or 1080p (default: 1080p) |
turbo_mode |
extra_params |
— |
Faster generation with slight quality trade-off |