Idea
updated
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
• 2402.14083
• Published
• 47
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published
• 627
Genie: Generative Interactive Environments
Paper
• 2402.15391
• Published
• 72
Humanoid Locomotion as Next Token Prediction
Paper
• 2402.19469
• Published
• 29
ViTAR: Vision Transformer with Any Resolution
Paper
• 2403.18361
• Published
• 55
Simulating Classroom Education with LLM-Empowered Agents
Paper
• 2406.19226
• Published
• 32
MIRAI: Evaluating LLM Agents for Event Forecasting
Paper
• 2407.01231
• Published
• 18
Prithvi WxC: Foundation Model for Weather and Climate
Paper
• 2409.13598
• Published
• 45
Selective Attention Improves Transformer
Paper
• 2410.02703
• Published
• 25
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper
• 2411.17465
• Published
• 90
Chimera: Improving Generalist Model with Domain-Specific Experts
Paper
• 2412.05983
• Published
• 9
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
• 2412.08635
• Published
• 49
Large Action Models: From Inception to Implementation
Paper
• 2412.10047
• Published
• 36
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published
• 108
AnySat: An Earth Observation Model for Any Resolutions, Scales, and
Modalities
Paper
• 2412.14123
• Published
• 11
Cosmos World Foundation Model Platform for Physical AI
Paper
• 2501.03575
• Published
• 82
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
• 2501.04682
• Published
• 99
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot
Planning
Paper
• 2411.04983
• Published
• 13
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published
• 153
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Paper
• 2502.05173
• Published
• 64
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
• 2502.11089
• Published
• 168
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context
Memory of Transformers
Paper
• 2502.15007
• Published
• 174
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Paper
• 2502.20395
• Published
• 45
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
• 2503.14456
• Published
• 153
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper
• 2503.15558
• Published
• 50
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
• 2504.01990
• Published
• 303
Paper
• 2504.00927
• Published
• 56
One-Minute Video Generation with Test-Time Training
Paper
• 2504.05298
• Published
• 110
MineWorld: a Real-Time and Open-Source Interactive World Model on
Minecraft
Paper
• 2504.08388
• Published
• 42
SocioVerse: A World Model for Social Simulation Powered by LLM Agents
and A Pool of 10 Million Real-World Users
Paper
• 2504.10157
• Published
• 17
Adaptive Computation Pruning for the Forgetting Transformer
Paper
• 2504.06949
• Published
• 3
Voila: Voice-Language Foundation Models for Real-Time Autonomous
Interaction and Voice Role-Play
Paper
• 2505.02707
• Published
• 85
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper
• 2506.06962
• Published
• 28
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper
• 2508.05004
• Published
• 130
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper
• 2508.11987
• Published
• 72
Intelligence per Watt: Measuring Intelligence Efficiency of Local AI
Paper
• 2511.07885
• Published
• 10