15+ AI Video Models

      AI Model Specifications

      ClipStudios includes 15+ AI video generation models across three paid tiers.

      • Starter (€20/mo, 1,000 credits): Seedance 1.5 Pro, Wan, Runway, Kling 2.1.
      • Plus (€40/mo, 2,000 credits): + Kling 2.6 and 3.0, Veo 3.1 Fast, Wan 2.6 / 2.7, AI Effects Studio, Motion Control.
      • Pro (from €99/mo, 5,000+ credits): + Veo 3.1 Quality, Kling 2.5 / 3.0 Pro, Runway Gen-3 Alpha, Voice Studio Expressive V3.

      All paid plans include text-to-video, image-to-video, lip-sync, and commercial licensing with zero watermarks.

      Need help choosing? Take our quiz

      Quick model facts

      Filters:
      Starter Tier
      Credit costs are examples

      Entry-level models available on Starter subscription

      ByteDance

      ByteDance Seedance Lite

      Budget-Friendly
      Simple Videos

      Visual Storytelling Powered by Intelligent Video Generation

      Text-to-Video and Image-to-Video

      Seedance 1.0 transforms your creative concepts into dynamic visual narratives with an intuitive understanding of story structure and motion choreography. Designed for creators who think in stories rather than technical specifications, this model generates cohesive video content that flows naturally from scene to scene, bridging the gap between imagination and polished visual output.

      Available Resolutions:

      480p
      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      480p:5s = 10 credits10s = 20 credits
      720p:5s = 23 credits10s = 45 credits
      Runway

      Runway

      Creative
      Artistic

      Runway Gen-3 Alpha is a high-fidelity AI video model for cinematic motion, smooth transitions, and precise camera control. It supports text-to-video and image-to-video and excels at realistic movement, stylized visuals, and seamless scene continuity. Best for product reveals, atmospheric scenes, FPV fly-throughs, and tracking shots. Use direct, descriptive prompts that define motion and camera behavior. Shorter clips (4–6 seconds) work best. Does not generate audio.

      Text-to-Video and Image-to-Video

      Runway Gen-3 Alpha is a high-fidelity AI video model for cinematic motion, smooth transitions, and precise camera control. It supports text-to-video and image-to-video and excels at realistic movement, stylized visuals, and seamless scene continuity. Best for product reveals, atmospheric scenes, FPV fly-throughs, and tracking shots. Use direct, descriptive prompts that define motion and camera behavior. Shorter clips (4–6 seconds) work best. Does not generate audio.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 12 credits10s = 30 credits
      1080p:5s = 30 credits
      ByteDance

      Seedance 1.5 Pro

      AI Audio
      Complete Output
      Best Value

      Seedance 1.5 Pro is a motion-focused text-to-video and image-to-video model built for dynamic scenes, camera movement, and action. It supports multi-shot sequences with "Shot Switch" and optional AI-generated audio (sound effects and ambience) in one pass. Use motion-first prompts: subject, action, and camera. Adverbs like "slowly" or "quickly" control motion intensity. Negative prompts are not supported.

      Text-to-Video and Image-to-Video

      Seedance 1.5 Pro is a motion-focused text-to-video and image-to-video model built for dynamic scenes, camera movement, and action. It supports multi-shot sequences with "Shot Switch" and optional AI-generated audio (sound effects and ambience) in one pass. Use motion-first prompts: subject, action, and camera. Adverbs like "slowly" or "quickly" control motion intensity. Negative prompts are not supported.

      Available Resolutions:

      480p
      720p
      1080p

      Available Durations:

      4 seconds
      8 seconds
      12 seconds
      Credit Costs:
      480p:4s = 7 credits8s = 14 credits12s = 19 credits
      720p:4s = 14 credits8s = 28 credits12s = 42 credits
      Wan AI

      Wan 2.2 A14B Turbo

      Fast Generation
      Prototyping

      WAN 2.2 is a text-to-video and image-to-video model with improved motion fidelity, prompt adherence, and cinematic lighting. It is tuned for short, emotion-driven videos—joy, calm, healing, inspiration—where mood, motion, and atmosphere matter more than long-form narrative. Use clear visual details and atmosphere words (warm, gentle, cinematic) for best results.

      Text-to-Video and Image-to-Video

      WAN 2.2 is a text-to-video and image-to-video model with improved motion fidelity, prompt adherence, and cinematic lighting. It is tuned for short, emotion-driven videos—joy, calm, healing, inspiration—where mood, motion, and atmosphere matter more than long-form narrative. Use clear visual details and atmosphere words (warm, gentle, cinematic) for best results.

      Available Resolutions:

      480p
      720p

      Available Durations:

      5 seconds
      Credit Costs:
      480p:5s = 40 credits
      720p:5s = 80 credits
      Plus Tier
      Credit costs are examples

      Advanced models available on Plus subscription

      ByteDance

      ByteDance Seedance Pro

      Quality Balance
      Marketing

      Visual Storytelling Powered by Intelligent Video Generation

      Text-to-Video and Image-to-Video

      Seedance 1.0 transforms your creative concepts into dynamic visual narratives with an intuitive understanding of story structure and motion choreography. Designed for creators who think in stories rather than technical specifications, this model generates cohesive video content that flows naturally from scene to scene, bridging the gap between imagination and polished visual output.

      Available Resolutions:

      480p
      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      480p:5s = 14 credits10s = 28 credits
      720p:5s = 30 credits10s = 60 credits
      ByteDance

      ByteDance Seedance Pro Fast

      Fast Premium
      Agencies

      Visual Storytelling Powered by Intelligent Video Generation

      Image-to-Video

      Seedance 1.0 transforms your creative concepts into dynamic visual narratives with an intuitive understanding of story structure and motion choreography. Designed for creators who think in stories rather than technical specifications, this model generates cohesive video content that flows naturally from scene to scene, bridging the gap between imagination and polished visual output.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 16 credits10s = 36 credits
      1080p:5s = 36 credits10s = 72 credits
      Kuaishou

      Kling 2.6

      AI Audio
      Expressive
      Text-to-Video and Image-to-Video

      Available Resolutions:

      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      1080p:5s = 55 credits10s = 110 credits
      Kling AI

      Kling v2.1 Pro

      Professional
      Commercial

      Kling 2.1 Pro is a high-quality text-to-video and image-to-video model known for realistic motion, scene consistency, and strong camera control. It excels at action sequences, nature and landscape videos, and scenes with natural physics and movement. Structure prompts as: Subject → Movement → Scene → Camera / Lighting / Atmosphere. Use natural language and cinematic camera terms (close-up, tracking shot, dolly). Supports prompts up to 2,500 characters. Negative prompts are supported. Audio must be added in post-production.

      Image-to-Video

      Kling 2.1 Pro is a high-quality text-to-video and image-to-video model known for realistic motion, scene consistency, and strong camera control. It excels at action sequences, nature and landscape videos, and scenes with natural physics and movement. Structure prompts as: Subject → Movement → Scene → Camera / Lighting / Atmosphere. Use natural language and cinematic camera terms (close-up, tracking shot, dolly). Supports prompts up to 2,500 characters. Negative prompts are supported. Audio must be added in post-production.

      Available Resolutions:

      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      1080p:5s = 50 credits10s = 100 credits
      Kling AI

      Kling v2.1 Standard

      Social Media
      Quick Content

      Kling 2.1 is a high-quality text-to-video and image-to-video model focused on realistic motion, scene consistency, and strong camera control. It produces visually coherent videos with natural physics and movement, and works well for action sequences, nature and landscape content, and scenes that need smooth motion. It supports cinematic camera language (pan, dolly, tracking, handheld) and negative prompts, and works best with clear, natural-language prompts that describe subject, movement, scene, and lighting. Kling 2.1 does not generate audio; add music or voiceover in post-production.

      Image-to-Video

      Kling 2.1 is a high-quality text-to-video and image-to-video model focused on realistic motion, scene consistency, and strong camera control. It produces visually coherent videos with natural physics and movement, and works well for action sequences, nature and landscape content, and scenes that need smooth motion. It supports cinematic camera language (pan, dolly, tracking, handheld) and negative prompts, and works best with clear, natural-language prompts that describe subject, movement, scene, and lighting. Kling 2.1 does not generate audio; add music or voiceover in post-production.

      Available Resolutions:

      720p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 25 credits10s = 50 credits
      Kling AI

      Kling v2.5 Turbo Pro

      Premium Quality
      Advertising

      Next-Generation Video Production with Fluid Motion Control

      Text-to-Video and Image-to-Video

      Kling 2.5 Turbo represents the latest innovation from Kuaishou, bringing refined text-to-video and image-to-video capabilities to your creative workflow. This iteration emphasises a better understanding of creative prompts, smoother motion transitions, and rock-solid consistency.

      Available Resolutions:

      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      1080p:5s = 42 credits10s = 84 credits
      Google

      Veo 3.1 Fast

      High Quality
      Fast Turnaround

      Veo 3.1 Fast is a cinematic AI video model built for high-fidelity motion, precise camera control, and structured storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Think in scenes, not single frames. More detail generally improves results.

      Text-to-Video and Image-to-Video

      Veo 3.1 Fast is a cinematic AI video model built for high-fidelity motion, precise camera control, and structured storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Think in scenes, not single frames. More detail generally improves results.

      Available Resolutions:

      720p

      Available Durations:

      8 seconds
      Credit Costs:
      720p:8s = 60 credits
      Wan AI

      Wan 2.5

      Detailed Scenes
      Storytelling

      WAN 2.5 is a fast text-to-video and image-to-video model aimed at beginners and high-frequency creators. It produces short-form videos with built-in audio and lip-sync, suited for social media, marketing, and educational content. Keep prompts simple and descriptive. Best for TikTok, Instagram, and YouTube Shorts.

      Text-to-Video and Image-to-Video

      WAN 2.5 is a fast text-to-video and image-to-video model aimed at beginners and high-frequency creators. It produces short-form videos with built-in audio and lip-sync, suited for social media, marketing, and educational content. Keep prompts simple and descriptive. Best for TikTok, Instagram, and YouTube Shorts.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 60 credits10s = 120 credits
      1080p:5s = 100 credits10s = 200 credits
      Wan AI

      Wan 2.6

      Extended Duration
      Storytelling
      Text-to-Video and Image-to-Video

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      15 seconds
      Credit Costs:
      720p:5s = 70 credits10s = 140 credits15s = 210 credits
      1080p:5s = 105 credits10s = 210 credits15s = 315 credits
      Pro Tier
      Credit costs are examples

      Premium models available on Pro subscription

      Kling AI

      Kling v2.1 Master

      Studio Quality
      Enterprise

      Kling 2.1 Master is the enhanced Kling 2.1 variant with stronger prompt adherence and higher visual consistency. It outputs 1080p only and is best for final-quality outputs. It excels at realistic motion, scene consistency, and cinematic camera control. Use natural language and cinematic camera terms. Negative prompts are supported. You can generate audio and merge it with the video in post-production.

      Text-to-Video and Image-to-Video

      Kling 2.1 Master is the enhanced Kling 2.1 variant with stronger prompt adherence and higher visual consistency. It outputs 1080p only and is best for final-quality outputs. It excels at realistic motion, scene consistency, and cinematic camera control. Use natural language and cinematic camera terms. Negative prompts are supported. You can generate audio and merge it with the video in post-production.

      Available Resolutions:

      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      1080p:5s = 160 credits10s = 320 credits
      Google

      Veo 3.1 Quality

      Cinematic
      Premium Content

      Veo 3.1 Quality is a cinematic AI video model for high-fidelity motion, precise camera control, and structured visual storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Use temporal language (slow, deliberate movement) for more natural motion. Clarity and intent matter more than length.

      Text-to-Video and Image-to-Video

      Veo 3.1 Quality is a cinematic AI video model for high-fidelity motion, precise camera control, and structured visual storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Use temporal language (slow, deliberate movement) for more natural motion. Clarity and intent matter more than length.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      8 seconds
      Credit Costs:
      720p:8s = 250 credits
      1080p:8s = 250 credits
      Wan AI

      Wan 2.6 Video-to-Video

      Style Transfer
      Reference Video

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 70 credits10s = 140 credits
      1080p:5s = 105 credits10s = 210 credits

      Not Sure Which Model to Choose?

      Take our quick quiz to find the perfect model and plan for your needs.

      We use analytics to improve your experience. See our Privacy Policy.