Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)? Paper • 2605.30557 • Published 5 days ago • 5
PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation Paper • 2605.14269 • Published 19 days ago • 9
V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising Paper • 2603.16792 • Published Mar 17 • 3
Error-Driven Scene Editing for 3D Grounding in Large Language Models Paper • 2511.14086 • Published Nov 18, 2025 • 7