ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning Paper • 2605.20342 • Published 11 days ago • 33
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 12 days ago • 111
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing Paper • 2604.22586 • Published Apr 24 • 16
Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets Paper • 2604.22294 • Published Apr 24 • 18
LLM Safety From Within: Detecting Harmful Content with Internal Representations Paper • 2604.18519 • Published Apr 20 • 26
DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction Paper • 2604.21518 • Published Apr 23 • 28
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 227
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models Paper • 2603.22003 • Published Mar 23 • 12
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published Mar 12 • 22
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought Paper • 2603.22847 • Published Mar 24 • 26
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published Mar 24 • 33
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published Mar 24 • 36
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published Mar 24 • 63
DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models Paper • 2603.23499 • Published Mar 24 • 51
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents Paper • 2603.22386 • Published Mar 23 • 57
SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM Paper • 2603.23386 • Published Mar 24 • 40
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published Mar 23 • 136
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published Mar 24 • 91