goenhance logo

HappyHorse 1.1 AI Video Generator

HappyHorse 1.1 is Alibaba's upgraded multimodal AI video model for 3–15s clips, with smoother motion, stronger subject consistency, better prompt following, more natural visual texture, and native audio-video generation.

Key Features of HappyHorse 1.1

Stronger Motion and Temporal Consistency

HappyHorse 1.1 improves motion modeling and frame-to-frame consistency, especially for fighting, dancing, running, turning, vehicle movement, and camera-follow shots. Compared with 1.0, it reduces slow-motion feel, ghosting, and disconnected action beats.
Example PromptGenerated Clip
A ferocious red dragon (elemental) erupts from the sea, soaring into the sky and circling rapidly above the ship, whipping up enormous waves. The dynamic camera follows the dragon as it cuts through the storm, rolling through towering swells and disappearing into the distance.

More Stable Multi-Reference R2V

The upgraded multi-reference video workflow supports up to 9 reference images. This helps preserve a person's face, clothing, product details, brand elements, and environment across short clips, making it useful for e-commerce ads, livestream-style videos, product demos, and character-based content.

Better Long-Prompt and Scene Planning

HappyHorse 1.1 improves long-context understanding, role relationships, scene planning, and camera-language interpretation. It is better at following prompts that describe who is speaking, where characters stand, how emotions change, and how the camera cuts between shots.
Example PromptGenerated Clip
A bustling futuristic market on another planet, where alien merchants hawk glowing fruits, robots roam everywhere, floating holographic advertisements fill the air, and colorful lights are visible all around, captured in a cinematic handheld camera style.

More Natural Visual Texture

The model has been tuned for more realistic skin texture, facial detail, hair rendering, lighting, shadows, and local stability. It reduces the oily or over-processed look seen in some 1.0 outputs, while keeping portraits and short-drama visuals more natural.

Native Audio-Video Generation

HappyHorse generates audio and video together rather than simply adding sound afterward. Version 1.1 improves speech rhythm, pauses, emotional tone, background music, ambient sound, and audio-visual sync, although instrument-performance scenes may still need manual review.

HappyHorse 1.1 Parameters

ParameterValueNotes
Release DateJune 22, 2026Officially released as Alibaba's upgraded HappyHorse video generation model.
Model Size15B parametersA 15-billion-parameter multimodal video generation model.
ArchitectureUnified multimodal Transfusion / single-stream TransformerText, image, video, and audio tokens are processed in one model instead of separate stitched modules.
Transformer Depth40 layersReported as a unified 40-layer Transformer architecture.
Generation ModesText-to-video, image-to-video, reference-to-video, video editingCovers written prompts, still image animation, multi-reference video creation, and video editing scenarios.
Duration3–15 secondsSingle generated clips support short-form video lengths.
Resolution720p / 1080pBoth HD and full HD generation are supported.
Frame Rate24fpsSuitable for cinematic short-form clips.
Aspect RatioCustom / flexibleSupports flexible output ratios for horizontal, vertical, square, and other creative formats.
Reference ImagesUp to 9 imagesUseful for locking characters, products, outfits, scenes, and brand elements.
AudioSupportedOutputs video with audio, including dialogue, ambience, music, and sound effects.
DenoisingDMD-2 distillation, 8 denoising stepsReduces generation steps and improves efficiency.
CFGRemovedClassifier-free guidance is removed to improve efficiency.
Inference SpeedAbout 38s for a 5s 1080p clip on one NVIDIA H100Reported benchmark for short 1080p generation.
720p Price0.9 RMB/sec list price; as low as 0.54 RMB/sec promoPromo pricing depends on platform and campaign.
1080p Price1.2 RMB/sec list price; as low as 0.72 RMB/sec promoThe 1080p list price is down 25% from HappyHorse 1.0's 1.6 RMB/sec.

HappyHorse 1.1 Use Cases

E-Commerce Product and Live-Selling Videos

Use multiple reference images to combine a spokesperson, product, outfit, and livestream-style room into one short ad clip. This is useful when product color, packaging, lipstick shade, clothing, or brand details must stay consistent instead of looking only approximately correct.

Short Drama, Brand Story, and Game CG Concepts

HappyHorse 1.1 is better suited for emotional dialogue, multi-shot indoor scenes, action sequences, cinematic brand teasers, and stylized game CG concepts because it improves motion continuity, long-prompt planning, camera-language understanding, and natural facial texture.

HappyHorse 1.1 on X

HappyHorse 1.1 Frequently Asked Questions

What is HappyHorse 1.1?

HappyHorse 1.1 is Alibaba's upgraded AI video generation model for short clips. It focuses on smoother motion, stronger subject consistency, better prompt following, more natural image quality, and improved audio-video sync.

What generation modes does HappyHorse 1.1 support?

It supports text-to-video, image-to-video, multi-reference reference-to-video, and video editing workflows for short AI video creation.

How long can HappyHorse 1.1 videos be?

Single generated clips support 3 to 15 seconds, which fits short ads, social videos, character clips, product demos, and short-drama shots.

What resolutions are supported?

HappyHorse 1.1 supports 720p and 1080p generation, with flexible aspect ratios for different content formats.

How many reference images can HappyHorse 1.1 use?

The multi-reference workflow supports up to 9 reference images, helping the model preserve character faces, clothing, products, scenes, and brand elements.

How is HappyHorse 1.1 different from HappyHorse 1.0?

Version 1.1 keeps the same general technical direction but improves motion continuity, multi-reference subject locking, complex prompt understanding, visual texture, and audio expression. It also lowers the 1080p list price compared with 1.0.

Does HappyHorse 1.1 generate audio?

Yes. HappyHorse 1.1 can generate speech, ambience, music, and sound effects together with the video.

What are the main limitations?

It can still struggle with complex physics, crowded background faces, edge-case multi-subject scenes, and instrument-performance audio sync. For commercial use, outputs should still be reviewed before publishing.

Ready to Test HappyHorse 1.1?

Use HappyHorse 1.1 to explore short AI videos with smoother action, more stable reference subjects, stronger prompt following, and native audio. It is especially useful for short drama, e-commerce ads, brand concepts, and game-style video ideas.

Try HappyHorse 1.1