T2I Models
updated
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image
• Updated
• 26
• 6
Paper
• 2506.23044
• Published
• 61
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper
• 2507.01953
• Published
• 18
LongAnimation: Long Animation Generation with Dynamic Global-Local
Memory
Paper
• 2507.01945
• Published
• 76
4KAgent: Agentic Any Image to 4K Super-Resolution
Paper
• 2507.07105
• Published
• 106
T-LoRA: Single Image Diffusion Model Customization Without Overfitting
Paper
• 2507.05964
• Published
• 120
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
Paper
• 2507.07136
• Published
• 40
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Paper
• 2507.14119
• Published
• 60
DesignLab: Designing Slides Through Iterative Detection and Correction
Paper
• 2507.17202
• Published
• 51
PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized
Timestep Adaptation
Paper
• 2507.16116
• Published
• 13
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World
Shorts
Paper
• 2507.20939
• Published
• 57
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image
Generative Models Great Again
Paper
• 2507.22058
• Published
• 40
Qwen-Image Technical Report
Paper
• 2508.02324
• Published
• 272
Omni-Effects: Unified and Spatially-Controllable Visual Effects
Generation
Paper
• 2508.07981
• Published
• 63
NextStep-1: Toward Autoregressive Image Generation with Continuous
Tokens at Scale
Paper
• 2508.10711
• Published
• 145
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
• 2508.20751
• Published
• 89
Emu3.5: Native Multimodal Models are World Learners
Paper
• 2510.26583
• Published
• 111
OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal
Document Layout Generation
Paper
• 2510.26213
• Published
• 10
Multimodal Spatial Reasoning in the Large Model Era: A Survey and
Benchmarks
Paper
• 2510.25760
• Published
• 17
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper
• 2511.10629
• Published
• 127
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper
• 2511.14993
• Published
• 231
Back to Basics: Let Denoising Generative Models Denoise
Paper
• 2511.13720
• Published
• 69
Light-X: Generative 4D Video Rendering with Camera and Illumination Control
Paper
• 2512.05115
• Published
• 11
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper
• 2512.08765
• Published
• 132
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality
Paper
• 2512.07951
• Published
• 50
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
Paper
• 2512.06065
• Published
• 29
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper
• 2512.13687
• Published
• 105
Few-Step Distillation for Text-to-Image Generation: A Practical Guide
Paper
• 2512.13006
• Published
• 10
EgoX: Egocentric Video Generation from a Single Exocentric Video
Paper
• 2512.08269
• Published
• 119
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer
Paper
• 2601.14250
• Published
• 47
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Paper
• 2602.02493
• Published
• 42
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
Paper
• 2602.02437
• Published
• 77
Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation
Paper
• 2602.01756
• Published
• 22