interesting stuff - a uhlo Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

uhlo 's Collections

interesting stuff

interesting stuff

updated Jan 7

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 40
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 82
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 87
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 84
Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 40
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test

Paper • 2309.13356 • Published Sep 23, 2023 • 38
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Paper • 2309.15098 • Published Sep 26, 2023 • 7
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 46
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 38
Effective Long-Context Scaling of Foundation Models

Paper • 2309.16039 • Published Sep 27, 2023 • 31
Large Language Models Cannot Self-Correct Reasoning Yet

Paper • 2310.01798 • Published Oct 3, 2023 • 36
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Paper • 2310.03714 • Published Oct 5, 2023 • 37
Table-GPT: Table-tuned GPT for Diverse Table Tasks

Paper • 2310.09263 • Published Oct 13, 2023 • 40
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 106
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 79
H2O Open Ecosystem for State-of-the-art Large Language Models

Paper • 2310.13012 • Published Oct 17, 2023 • 9
LLM-FP4: 4-Bit Floating-Point Quantized Transformers

Paper • 2310.16836 • Published Oct 25, 2023 • 14
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 27
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Paper • 2310.17157 • Published Oct 26, 2023 • 14
CodeFusion: A Pre-trained Diffusion Model for Code Generation

Paper • 2310.17680 • Published Oct 26, 2023 • 74
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Paper • 2310.19102 • Published Oct 29, 2023 • 11
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

Paper • 2310.18356 • Published Oct 24, 2023 • 24
Does GPT-4 Pass the Turing Test?

Paper • 2310.20216 • Published Oct 31, 2023 • 17
CodePlan: Repository-level Coding using LLMs and Planning

Paper • 2309.12499 • Published Sep 21, 2023 • 80
The Generative AI Paradox: "What It Can Create, It May Not Understand"

Paper • 2311.00059 • Published Oct 31, 2023 • 20
E3 TTS: Easy End-to-End Diffusion-based Text to Speech

Paper • 2311.00945 • Published Nov 2, 2023 • 16
Unveiling Safety Vulnerabilities of Large Language Models

Paper • 2311.04124 • Published Nov 7, 2023 • 9
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Paper • 2311.07463 • Published Nov 13, 2023 • 15
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers

Paper • 2311.10642 • Published Nov 17, 2023 • 25
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 47
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Paper • 2311.03099 • Published Nov 6, 2023 • 30
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Paper • 2312.03491 • Published Dec 6, 2023 • 34
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Paper • 2312.04474 • Published Dec 7, 2023 • 34
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper • 2312.03818 • Published Dec 6, 2023 • 34
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9, 2024 • 27
The Impact of Reasoning Step Length on Large Language Models

Paper • 2401.04925 • Published Jan 10, 2024 • 18
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13, 2024 • 26
Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13, 2024 • 16
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

Paper • 2401.10838 • Published Jan 19, 2024 • 9
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Paper • 2401.12070 • Published Jan 22, 2024 • 45
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 70
jinaai/jina-embeddings-v2-base-de

Feature Extraction • 0.2B • Updated Jan 6, 2025 • 867k • 82
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 73
Weaver: Foundation Models for Creative Writing

Paper • 2401.17268 • Published Jan 30, 2024 • 45
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29, 2024 • 28
Shortened LLaMA: A Simple Depth Pruning for Large Language Models

Paper • 2402.02834 • Published Feb 5, 2024 • 17
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 117
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 50
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

Paper • 2402.06619 • Published Feb 9, 2024 • 57
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

Paper • 2402.07827 • Published Feb 12, 2024 • 48
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Paper • 2402.07456 • Published Feb 12, 2024 • 46
Scaling Laws for Fine-Grained Mixture of Experts

Paper • 2402.07871 • Published Feb 12, 2024 • 13
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Paper • 2401.01335 • Published Jan 2, 2024 • 68
Computing Power and the Governance of Artificial Intelligence

Paper • 2402.08797 • Published Feb 13, 2024 • 15
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15, 2024 • 38
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 21
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15, 2024 • 43
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 45
OneBit: Towards Extremely Low-bit Large Language Models

Paper • 2402.11295 • Published Feb 17, 2024 • 24
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 15
Coercing LLMs to do and reveal (almost) anything

Paper • 2402.14020 • Published Feb 21, 2024 • 13
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
GPTVQ: The Blessing of Dimensionality for LLM Quantization

Paper • 2402.15319 • Published Feb 23, 2024 • 22
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 66
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12, 2024 • 77
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

Paper • 2401.15391 • Published Jan 27, 2024 • 6
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Paper • 2403.12895 • Published Mar 19, 2024 • 32
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 72
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 73
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Paper • 2403.15447 • Published Mar 18, 2024 • 16
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Paper • 2403.16627 • Published Mar 25, 2024 • 22
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

Paper • 2403.05313 • Published Mar 8, 2024 • 9
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 117
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 120
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Paper • 2408.00754 • Published Aug 1, 2024 • 23
Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Paper • 2408.07055 • Published Aug 13, 2024 • 68
LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40
ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27, 2024 • 51
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28, 2024 • 42
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28, 2024 • 86
Configurable Foundation Models: Building LLMs from a Modular Perspective

Paper • 2409.02877 • Published Sep 4, 2024 • 32
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 121
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

Paper • 2409.18943 • Published Sep 27, 2024 • 28
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published Sep 25, 2024 • 28
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 180
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 47
Pixtral 12B

Paper • 2410.07073 • Published Oct 9, 2024 • 69
Falcon Mamba: The First Competitive Attention-free 7B Language Model

Paper • 2410.05355 • Published Oct 7, 2024 • 35
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published Oct 14, 2024 • 51
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Paper • 2410.13848 • Published Oct 17, 2024 • 35
Why Does the Effective Context Length of LLMs Fall Short?

Paper • 2410.18745 • Published Oct 24, 2024 • 17
Continuous Speech Synthesis using per-token Latent Diffusion

Paper • 2410.16048 • Published Oct 21, 2024 • 29
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Paper • 2410.17856 • Published Oct 23, 2024 • 52
GPT-4o System Card

Paper • 2410.21276 • Published Oct 25, 2024 • 87
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Paper • 2410.21169 • Published Oct 28, 2024 • 30
Stealing User Prompts from Mixture of Experts

Paper • 2410.22884 • Published Oct 30, 2024 • 16
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Paper • 2410.23168 • Published Oct 30, 2024 • 24
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 377
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8, 2025 • 288
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Paper • 2501.09751 • Published Jan 16, 2025 • 46
GameFactory: Creating New Games with Generative Interactive Videos

Paper • 2501.08325 • Published Jan 14, 2025 • 67
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 441
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 255
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10, 2025 • 152
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13, 2025 • 148
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 126
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

Paper • 2502.08235 • Published Feb 12, 2025 • 59
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 213
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20, 2025 • 91
Thus Spake Long-Context Large Language Model

Paper • 2502.17129 • Published Feb 24, 2025 • 73
Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19, 2025 • 69
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6, 2025 • 113
EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published Mar 7, 2025 • 81
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation

Paper • 2503.13288 • Published Mar 17, 2025 • 51
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 147
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24, 2025 • 119
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 170
PaperBench: Evaluating AI's Ability to Replicate AI Research

Paper • 2504.01848 • Published Apr 2, 2025 • 37
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published Apr 17, 2025 • 93
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 120
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks

Paper • 2504.15521 • Published Apr 22, 2025 • 64
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper • 2504.18415 • Published Apr 25, 2025 • 49
Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 98
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 189
Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published Jan 9, 2025 • 102
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models

Paper • 2505.07591 • Published May 12, 2025 • 11
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 334
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20, 2025 • 133
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 79
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models

Paper • 2506.06751 • Published Jun 7, 2025 • 71
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19, 2025 • 131
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4, 2025 • 159
SingLoRA: Low Rank Adaptation Using a Single Matrix

Paper • 2507.05566 • Published Jul 8, 2025 • 115
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8, 2025 • 93
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14, 2025 • 90
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Paper • 2507.09477 • Published Jul 13, 2025 • 88
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 261
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22, 2025 • 122
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 206
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 143
Who's Your Judge? On the Detectability of LLM-Generated Judgments

Paper • 2509.25154 • Published Sep 29, 2025 • 30
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published Oct 3, 2025 • 98
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA

Paper • 2510.04849 • Published Oct 6, 2025 • 115
LightMem: Lightweight and Efficient Memory-Augmented Generation

Paper • 2510.18866 • Published Oct 21, 2025 • 114
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 125
Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 129
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Paper • 2511.08577 • Published Nov 11, 2025 • 108
Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 157
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 311
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Paper • 2512.20578 • Published Dec 23, 2025 • 86

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs