Optimized to output at 2048×2048 and other high-res aspect ratios. Expect crisp detail, stable composition, and strong fidelity.
A 32× spatial compression VAE reduces tokens while preserving structure — enabling 2K quality with 1K-like token counts for faster inference.
Combines a multimodal encoder for scene understanding with a glyph-aware ByT5 encoder for improved text rendering and multilingual prompts.
Optional prompt rewriting adds visual clarity; the refiner ups detail and reduces artifacts — use both for the best looks.
Distilled variant supports fewer steps with competitive quality — great for quick previews and iterative creation.
Supports 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3 — all at 2K-class resolutions for social, print, and ads.
Describe subjects, scene, lighting, and any on-image text. Add styles (photoreal, anime, cinematic, etc.).
Toggle PromptEnhancer for richer detail and the Refiner for extra sharpness and fewer artifacts.
Preview with distilled steps for speed, then upscale or refine to finalize and download.
Generate native 2K images with strong prompt following, multilingual support, and optional enhancements — all in your browser.