AI video & image model directory

Compare AI video & image models before you generate

Browse video, image, audio, and preparation models. Compare pricing, max duration, resolution, audio support, input modes, and strengths before choosing the right model.

Browse models Compare engines

Price before renderPay-as-you-goUpdated specsVideo, image & audio

Choose the right model for your workflow

View all use cases

Cinematic videoFilm-like motion and scenesBest: Kling 3 Pro Native audio & lip syncDialogue and ambienceBest: Seedance 2.0 Fast draftsQuick iterationsBest: Seedance 2.0 Fast Image-to-videoBring images to lifeBest: Seedance 2.0 Product adsAd-ready clips and referencesBest: Veo 3.1 Prompt controlStrong adherenceBest: Kling 3 Pro Longest durationLonger clips and output limitsBest: LTX 2.3 Fast Best valueLower-cost iterationBest: LTX 2.3 Fast

Recommended starting points

Start here if you want a quick shortlist before comparing every model by price, duration, resolution, and supported inputs.

How we benchmark See all models

ByteDance8.5/10

Seedance 2.0

Best for premium multi-shot AI video with native audio, lip sync, and realistic motion.

From: $0.38/s
Up to: 15s
Max res.: 4K

Kling 3 Pro

Best for cinematic control, image-to-video, prompt adherence, and voice-led sequences.

From: $0.22/s
Up to: 15s
Max res.: 1080p

Veo 3.1

Best for ad-ready shots, references, first/last-frame control, and extend workflows.

From: $0.52/s
Up to: 8s
Max res.: 4K

Happy Horse 1.1

Best for Alibaba text, image, and reference-to-video workflows with native audio and lip sync.

From: $0.18/s
Up to: 15s
Max res.: 1080p

Lightricks6.7/10

LTX 2.3 Fast

Best for low-cost drafts, longer clips, fast prompt tests, and 4K workflows.

From: $0.05/s
Up to: 20s
Max res.: 4K

Luma Ray 3.2

Best for current Luma Modify Video, Reframe, guide-frame edits, and silent motion tests.

From: $0.13/s
Up to: 10s
Max res.: 1080p

Browse AI engines by type

Start with video generation, then switch to image generation or audio and lip sync without mixing engine families.

Format

Input mode

Price

Sort by

Models

8.5/10

Score

ByteDance

Seedance 2.0

Strengths: Audio & Lip Sync · Visual Quality

$0.38/s15s4K

From: $0.38/s
Max dur.: 15s
Max res.: 4K

T2VI2VV2VFirst/LastExtendLip sync

Best for premium multi-shot AI video with native audio, lip sync, and realistic motion.

View specs Compare Examples

8.3/10

Score

Kling

Kling 3 Pro

Strengths: Controllability · Prompt Adherence

$0.22/s15s1080p

From: $0.22/s
Max dur.: 15s
Max res.: 1080p

T2VI2VFirst/LastLip syncAudio

Best for cinematic control, image-to-video, prompt adherence, and voice-led sequences.

View specs Compare Examples

7.9/10

Score

Google

Veo 3.1

Strengths: Audio & Lip Sync · Prompt Adherence

$0.52/s8s4K

From: $0.52/s
Max dur.: 8s
Max res.: 4K

T2VI2VV2VFirst/LastExtendLip sync

Best for ad-ready shots, references, first/last-frame control, and extend workflows.

View specs Compare Examples

8.3/10

Score

Alibaba

Happy Horse 1.1

Strengths: Audio & Lip Sync · Prompt Adherence

$0.18/s15s1080p

From: $0.18/s
Max dur.: 15s
Max res.: 1080p

T2VI2VLip syncAudio

Best for Alibaba text, image, and reference-to-video workflows with native audio and lip sync.

View specs Compare Examples

7.1/10

Score

Lightricks

LTX 2.3 Pro

Strengths: Controllability · Audio & Lip Sync

$0.08/s20s4K

From: $0.08/s
Max dur.: 20s
Max res.: 4K

T2VI2VV2VFirst/LastExtendLip sync

Best for all-in-one LTX workflows, retakes, audio, video-to-video, and 4K output.

View specs Compare Examples

7.8/10

Score

ByteDance

Seedance 2.0 Fast

Strengths: Audio & Lip Sync · Speed & Stability

$0.14/s15s720p

From: $0.14/s
Max dur.: 15s
Max res.: 720p

T2VI2VV2VFirst/LastExtendLip sync

Best for quick drafts, lower-cost iterations, shot planning, and native audio tests.

View specs Compare Examples

Use model cards to review capabilities first, then jump into the video or image hub for a tighter shortlist.

Not sure which AI video model to choose?

Compare popular model pairs side by side before generating.

Seedance 2.0 vs Kling 3 Pro Seedance 2.0 vs Seedance 2.0 Fast Kling 3 Pro vs Veo 3.1 Happy Horse 1.1 vs Seedance 2.0 Happy Horse 1.1 vs Veo 3.1 Veo 3.1 Fast vs Veo 3.1 Lite

Compare pricing, duration, and output limits

Video models are usually priced per second, while image models are priced per image or output size. Use the cards to compare starting price, max duration, resolution, audio support, and input modes before generating.

Pricing

Compare per-second video pricing and per-image still pricing before rendering.

Duration

Check max video length before choosing a model for drafts or production clips.

Output quality

Compare 720p, 1080p, 4K, audio, input modes, and model-specific limits.

AI model specs, pricing, and examples FAQ

Which AI video model should I start with?

Start with Seedance 2.0 for native audio and realistic motion, Kling 3 Pro for cinematic control, Veo 3.1 for high-quality prompt adherence, and LTX 2.3 Fast for fast drafts and lower-cost iterations.

Which models support native audio or lip sync?

Seedance, Kling, Veo, Sora, LTX, Wan, and other models may support audio or lip sync depending on the exact version. Check each model card for Audio, Lip sync, T2V, I2V, V2V, and First/Last support.

How is AI video pricing calculated?

Most video models are priced per second of generated output. Image models are usually priced per image or by output size. Open each model page for exact pricing and limits.

What is the maximum duration for AI video models?

Max duration varies by model. Some models are limited to 8-15 seconds, while others support longer clips. Use the duration filter or compare cards to find the right model.

Where can I find prompt examples?

Use the examples pages for model-specific prompts and outputs, including LTX, Kling, Seedance, Veo, Wan, Sora, and Pika examples.

What is the difference between video models and image models?

Video models generate motion clips from text, images, video references or first/last frames. Image models generate still images, edits, product visuals, and references that can be used before animation.