Papers that exist - a DogManTC Collection

DogManTC 's Collections

Papers that exist

Papers that exist

updated about 24 hours ago

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

Paper • 2509.15591 • Published Sep 19, 2025 • 45
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8, 2025 • 93
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

Paper • 2602.03120 • Published 14 days ago • 1
TADA! Tuning Audio Diffusion Models through Activation Steering

Paper • 2602.11910 • Published 5 days ago • 2
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

Paper • 2602.13191 • Published 4 days ago • 26
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics

Paper • 2602.12617 • Published 4 days ago • 19
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

Paper • 2602.13013 • Published 4 days ago • 7
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published 8 days ago • 41
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 37
Latent Flow Transformer

Paper • 2505.14513 • Published May 20, 2025 • 29
LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference

Paper • 2601.02569 • Published Jan 5
LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published Sep 18, 2024 • 35
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model

Paper • 2411.04496 • Published Nov 7, 2024 • 22
FoNE: Precise Single-Token Number Embeddings via Fourier Features

Paper • 2502.09741 • Published Feb 13, 2025 • 15
FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Paper • 2507.12720 • Published Jul 17, 2025 • 10
Distilling Token-Trained Models into Byte-Level Models

Paper • 2602.01007 • Published 17 days ago
Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling

Paper • 2502.14553 • Published Feb 20, 2025
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction

Paper • 2505.11254 • Published May 16, 2025 • 48
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 507
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering

Paper • 2409.08250 • Published Sep 12, 2024 • 1
LightMem: Lightweight and Efficient Memory-Augmented Generation

Paper • 2510.18866 • Published Oct 21, 2025 • 114
The End of Manual Decoding: Towards Truly End-to-End Language Models

Paper • 2510.26697 • Published Oct 30, 2025 • 117
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 124
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models

Paper • 2602.10224 • Published 7 days ago • 18
ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation

Paper • 2601.21912 • Published 19 days ago • 1
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Paper • 2405.13792 • Published May 22, 2024 • 1
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations

Paper • 2505.02819 • Published May 5, 2025 • 26
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Paper • 2502.16894 • Published Feb 24, 2025 • 32
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Paper • 2602.12205 • Published 5 days ago • 76
MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

Paper • 2602.11761 • Published 5 days ago • 6
CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling

Paper • 2602.01766 • Published 15 days ago
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published 24 days ago • 33
MemFly: On-the-Fly Memory Optimization via Information Bottleneck

Paper • 2602.07885 • Published 9 days ago • 7
Voxtral Realtime

Paper • 2602.11298 • Published 6 days ago • 15
UMEM: Unified Memory Extraction and Management Framework for Generalizable Memory

Paper • 2602.10652 • Published 6 days ago • 2
Weight Decay Improves Language Model Plasticity

Paper • 2602.11137 • Published 6 days ago • 2
Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens

Paper • 2602.10229 • Published 7 days ago • 5
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

Paper • 2602.09713 • Published 7 days ago • 8
Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

Paper • 2602.07106 • Published 11 days ago • 11
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Paper • 2602.08711 • Published 8 days ago • 25
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning

Paper • 2602.10622 • Published 6 days ago • 26