Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published Jun 1 • 57
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published Jun 1 • 236
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search Paper • 2605.20244 • Published May 18 • 4
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published May 22 • 81
bilabila/b-b7_olr_ts10_gru_hib_costdyn_util_w3_sym7_202601_lossq_ms400k_h12 68k • Updated May 23 • 4 • 1
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published May 21 • 171
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes Paper • 2605.15843 • Published May 15 • 6
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published May 14 • 79
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices Paper • 2605.10933 • Published May 11 • 4
Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts Paper • 2602.03473 • Published May 8 • 11