interesting stuff
updated
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
• 2309.11495
• Published
• 40
Adapting Large Language Models via Reading Comprehension
Paper
• 2309.09530
• Published
• 82
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages
Paper
• 2309.09400
• Published
• 87
Language Modeling Is Compression
Paper
• 2309.10668
• Published
• 84
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
• 2309.09117
• Published
• 40
Exploring Large Language Models' Cognitive Moral Development through
Defining Issues Test
Paper
• 2309.13356
• Published
• 38
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
Language Models
Paper
• 2309.15098
• Published
• 7
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper
• 2309.14717
• Published
• 46
Paper
• 2309.16609
• Published
• 38
Effective Long-Context Scaling of Foundation Models
Paper
• 2309.16039
• Published
• 31
Large Language Models Cannot Self-Correct Reasoning Yet
Paper
• 2310.01798
• Published
• 36
DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
Paper
• 2310.03714
• Published
• 37
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper
• 2310.09263
• Published
• 40
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
• 2310.11453
• Published
• 106
Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection
Paper
• 2310.11511
• Published
• 79
H2O Open Ecosystem for State-of-the-art Large Language Models
Paper
• 2310.13012
• Published
• 9
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper
• 2310.16836
• Published
• 14
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper
• 2310.16795
• Published
• 27
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper
• 2310.17157
• Published
• 14
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper
• 2310.17680
• Published
• 74
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper
• 2310.19102
• Published
• 11
LoRAShear: Efficient Large Language Model Structured Pruning and
Knowledge Recovery
Paper
• 2310.18356
• Published
• 24
Does GPT-4 Pass the Turing Test?
Paper
• 2310.20216
• Published
• 17
CodePlan: Repository-level Coding using LLMs and Planning
Paper
• 2309.12499
• Published
• 80
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Paper
• 2311.00059
• Published
• 20
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper
• 2311.00945
• Published
• 16
Unveiling Safety Vulnerabilities of Large Language Models
Paper
• 2311.04124
• Published
• 9
MEGAVERSE: Benchmarking Large Language Models Across Languages,
Modalities, Models and Tasks
Paper
• 2311.07463
• Published
• 15
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as
an Alternative to Attention Layers in Transformers
Paper
• 2311.10642
• Published
• 25
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
• 2311.13600
• Published
• 47
Language Models are Super Mario: Absorbing Abilities from Homologous
Models as a Free Lunch
Paper
• 2311.03099
• Published
• 30
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
• 2312.03491
• Published
• 34
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper
• 2312.04474
• Published
• 34
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
• 2312.03818
• Published
• 34
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
• 2401.02994
• Published
• 52
Paper
• 2401.04088
• Published
• 160
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
Lengths in Large Language Models
Paper
• 2401.04658
• Published
• 27
The Impact of Reasoning Step Length on Large Language Models
Paper
• 2401.04925
• Published
• 18
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
• 2401.06951
• Published
• 26
Extending LLMs' Context Window with 100 Samples
Paper
• 2401.07004
• Published
• 16
Self-Rewarding Language Models
Paper
• 2401.10020
• Published
• 152
Rambler: Supporting Writing With Speech via LLM-Assisted Gist
Manipulation
Paper
• 2401.10838
• Published
• 9
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated
Text
Paper
• 2401.12070
• Published
• 45
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
Rise of Code Intelligence
Paper
• 2401.14196
• Published
• 70
jinaai/jina-embeddings-v2-base-de
Feature Extraction
• 0.2B • Updated
• 867k
• 82
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published
• 73
Weaver: Foundation Models for Creative Writing
Paper
• 2401.17268
• Published
• 45
TrustLLM: Trustworthiness in Large Language Models
Paper
• 2401.05561
• Published
• 69
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
• 2402.01739
• Published
• 28
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper
• 2402.02834
• Published
• 17
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper
• 2402.03620
• Published
• 117
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
• 2402.04291
• Published
• 50
Aya Dataset: An Open-Access Collection for Multilingual Instruction
Tuning
Paper
• 2402.06619
• Published
• 57
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model
Paper
• 2402.07827
• Published
• 48
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
• 2402.07456
• Published
• 46
Scaling Laws for Fine-Grained Mixture of Experts
Paper
• 2402.07871
• Published
• 13
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
Computing Power and the Governance of Artificial Intelligence
Paper
• 2402.08797
• Published
• 15
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper
• 2402.09727
• Published
• 38
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper
• 2402.10193
• Published
• 21
Chain-of-Thought Reasoning Without Prompting
Paper
• 2402.10200
• Published
• 109
How to Train Data-Efficient LLMs
Paper
• 2402.09668
• Published
• 43
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
Workflows
Paper
• 2402.10379
• Published
• 31
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
• 2402.12226
• Published
• 45
OneBit: Towards Extremely Low-bit Large Language Models
Paper
• 2402.11295
• Published
• 24
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue
Summarization
Paper
• 2402.13249
• Published
• 15
Coercing LLMs to do and reveal (almost) anything
Paper
• 2402.14020
• Published
• 13
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
• 2402.14905
• Published
• 134
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published
• 627
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
• 2402.15319
• Published
• 22
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
• 2403.03853
• Published
• 66
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
• 2403.07508
• Published
• 77
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop
Queries
Paper
• 2401.15391
• Published
• 6
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document
Understanding
Paper
• 2403.12895
• Published
• 32
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published
• 72
LLM Agent Operating System
Paper
• 2403.16971
• Published
• 73
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
LLMs Under Compression
Paper
• 2403.15447
• Published
• 16
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper
• 2403.16627
• Published
• 22
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in
Long-Horizon Generation
Paper
• 2403.05313
• Published
• 9
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published
• 117
SAM 2: Segment Anything in Images and Videos
Paper
• 2408.00714
• Published
• 120
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal
Language Model
Paper
• 2408.00754
• Published
• 23
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published
• 78
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Paper
• 2408.07055
• Published
• 68
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
• 2408.11796
• Published
• 58
Automated Design of Agentic Systems
Paper
• 2408.08435
• Published
• 40
ColPali: Efficient Document Retrieval with Vision Language Models
Paper
• 2407.01449
• Published
• 51
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device
Language Models
Paper
• 2408.15518
• Published
• 42
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of
Encoders
Paper
• 2408.15998
• Published
• 86
Configurable Foundation Models: Building LLMs from a Modular Perspective
Paper
• 2409.02877
• Published
• 32
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
• 2409.17146
• Published
• 121
Ruler: A Model-Agnostic Method to Control Generated Length for Large
Language Models
Paper
• 2409.18943
• Published
• 28
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large
Language Models
Paper
• 2409.17066
• Published
• 28
Paper
• 2410.05258
• Published
• 180
LLMs Know More Than They Show: On the Intrinsic Representation of LLM
Hallucinations
Paper
• 2410.02707
• Published
• 47
Paper
• 2410.07073
• Published
• 69
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Paper
• 2410.05355
• Published
• 35
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper
• 2410.10814
• Published
• 51
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
and Generation
Paper
• 2410.13848
• Published
• 35
Why Does the Effective Context Length of LLMs Fall Short?
Paper
• 2410.18745
• Published
• 17
Continuous Speech Synthesis using per-token Latent Diffusion
Paper
• 2410.16048
• Published
• 29
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context
Prompting
Paper
• 2410.17856
• Published
• 52
Paper
• 2410.21276
• Published
• 87
Document Parsing Unveiled: Techniques, Challenges, and Prospects for
Structured Information Extraction
Paper
• 2410.21169
• Published
• 30
Stealing User Prompts from Mixture of Experts
Paper
• 2410.22884
• Published
• 16
TokenFormer: Rethinking Transformer Scaling with Tokenized Model
Parameters
Paper
• 2410.23168
• Published
• 24
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published
• 108
How to Synthesize Text Data without Model Collapse?
Paper
• 2412.14689
• Published
• 53
Paper
• 2412.15115
• Published
• 377
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
• 2501.04519
• Published
• 288
OmniThink: Expanding Knowledge Boundaries in Machine Writing through
Thinking
Paper
• 2501.09751
• Published
• 46
GameFactory: Creating New Games with Generative Interactive Videos
Paper
• 2501.08325
• Published
• 67
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 441
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published
• 255
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published
• 152
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
• 2502.08910
• Published
• 148
Large Language Diffusion Models
Paper
• 2502.09992
• Published
• 126
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in
Agentic Tasks
Paper
• 2502.08235
• Published
• 59
Qwen2.5-VL Technical Report
Paper
• 2502.13923
• Published
• 213
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper
• 2502.14502
• Published
• 91
Thus Spake Long-Context Large Language Model
Paper
• 2502.17129
• Published
• 73
Slamming: Training a Speech Language Model on One GPU in a Day
Paper
• 2502.15814
• Published
• 69
START: Self-taught Reasoner with Tools
Paper
• 2503.04625
• Published
• 113
EuroBERT: Scaling Multilingual Encoders for European Languages
Paper
• 2503.05500
• Published
• 81
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time
Exploration and Exploitation
Paper
• 2503.13288
• Published
• 51
SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion
Paper
• 2503.11576
• Published
• 147
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
• 2503.18878
• Published
• 119
Qwen2.5-Omni Technical Report
Paper
• 2503.20215
• Published
• 170
PaperBench: Evaluating AI's Ability to Replicate AI Research
Paper
• 2504.01848
• Published
• 37
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
Language Model Pre-training
Paper
• 2504.13161
• Published
• 93
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Paper
• 2504.15521
• Published
• 64
BitNet v2: Native 4-bit Activations with Hadamard Transformation for
1-bit LLMs
Paper
• 2504.18415
• Published
• 49
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
• 2504.20571
• Published
• 98
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published
• 189
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
• 2501.05366
• Published
• 102
A Multi-Dimensional Constraint Framework for Evaluating and Improving
Instruction Following in Large Language Models
Paper
• 2505.07591
• Published
• 11
Paper
• 2505.09388
• Published
• 334
Emerging Properties in Unified Multimodal Pretraining
Paper
• 2505.14683
• Published
• 133
Qwen3 Embedding: Advancing Text Embedding and Reranking Through
Foundation Models
Paper
• 2506.05176
• Published
• 79
Geopolitical biases in LLMs: what are the "good" and the "bad" countries
according to contemporary language models
Paper
• 2506.06751
• Published
• 71
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper
• 2506.16406
• Published
• 131
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published
• 159
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper
• 2507.05566
• Published
• 115
A Survey on Latent Reasoning
Paper
• 2507.06203
• Published
• 93
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
Paper
• 2507.10532
• Published
• 90
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
• 2507.09477
• Published
• 88
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published
• 261
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published
• 122
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published
• 206
VibeVoice Technical Report
Paper
• 2508.19205
• Published
• 143
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Paper
• 2509.25154
• Published
• 30
Cache-to-Cache: Direct Semantic Communication Between Large Language
Models
Paper
• 2510.03215
• Published
• 98
When Models Lie, We Learn: Multilingual Span-Level Hallucination
Detection with PsiloQA
Paper
• 2510.04849
• Published
• 115
LightMem: Lightweight and Efficient Memory-Augmented Generation
Paper
• 2510.18866
• Published
• 114
Kimi Linear: An Expressive, Efficient Attention Architecture
Paper
• 2510.26692
• Published
• 125
Diffusion Language Models are Super Data Learners
Paper
• 2511.03276
• Published
• 129
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Paper
• 2511.08577
• Published
• 108
Qwen3-VL Technical Report
Paper
• 2511.21631
• Published
• 157
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published
• 311
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper
• 2512.20578
• Published
• 86