The weights for the new Krea 2 (K2) image generation model from Krea have leaked online, allowing the advanced multimodal architecture to be used on local devices.

image
image
image

What Happened

Weights for the Krea 2 (K2) model, built on a single-stream multimodal diffusion transformer architecture with 12.9 billion parameters, have been released into the public domain. It uses Qwen3-VL-4B-Instruct as the text encoder and the Qwen-Image autoencoder as the VAE. The community has already released Diffusers versions for both the Base and accelerated (Turbo, 8 steps) models, as well as optimized FP8 weights for operation on consumer GPUs with 16 to 24 GB of VRAM.

Context

Krea 2 represents a transition to the DiT (Diffusion Transformer) architecture, where text and visual contexts are integrated into a single stream. The use of powerful LLM encoders, such as Qwen3-VL, is aimed at solving the problem of prompt adherence.

Why It Matters for the Industry

The release of Krea 2 demonstrates the dominance of multimodal DiT architectures and the integration of LLMs into media generation processes. This accelerates research into open-weights solutions comparable in quality to closed APIs and sets a standard for future SOTA generation models.

Why It Matters for Users

Users can now run high-quality, Krea-level generation locally via ComfyUI or Diffusers. The availability of FP8 versions makes the model accessible to owners of mid-to-high-end graphics cards (e.g., RTX 3090 or 4090), providing a high level of stylistic control without the need for paid cloud subscriptions.

What Is Currently Unknown / Limitations

There is a gap between the capabilities of enthusiasts and the corporate sector: while the leak is a tool for customization for the community, for Enterprise, it represents a critical risk of compliance and intellectual property violations.

Sources

Author

Look at AI, Editorial Staff