FireRedASR2S
Collection
FireRedASR2S is a SOTA, industrial-grade, all-in-one ASR system with ASR, VAD, LID, and Punc module. All modules achieve SOTA performance. β’ 7 items β’ Updated
β’ 8
[Code] [Paper] [Model] [Blog] [Demo]
FireRedASR2S is a state-of-the-art (SOTA), industrial-grade, all-in-one ASR system presented in the paper FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System. It integrates four modules into a unified pipeline: ASR, Voice Activity Detection (VAD), Spoken Language Identification (LID), and Punctuation Prediction (Punc).
To use the system, first clone the official repository and install the dependencies. Then you can use the following Python API:
from fireredasr2s import FireRedAsr2System, FireRedAsr2SystemConfig
# Initialize the system with default config
asr_system_config = FireRedAsr2SystemConfig()
asr_system = FireRedAsr2System(asr_system_config)
# Process an audio file (16kHz 16-bit mono PCM)
result = asr_system.process("assets/hello_zh.wav")
print(result['text'])
# Output: δ½ ε₯½δΈηγ
FireRedASR2-LLM achieves 2.89% average CER on 4 public Mandarin benchmarks and 11.55% on 19 public Chinese dialects and accents benchmarks, outperforming competitive baselines including Doubao-ASR, Qwen3-ASR, and Fun-ASR.
| Model | Mandarin (Avg CER%) | Dialects (Avg CER%) |
|---|---|---|
| FireRedASR2-LLM | 2.89 | 11.55 |
| FireRedASR2-AED | 3.05 | 11.67 |
| Doubao-ASR | 3.69 | 15.39 |
| Qwen3-ASR | 3.76 | 11.85 |
@article{xu2026fireredasr2s,
title={FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System},
author={Xu, Kaituo and Jia, Yan and Huang, Kai and Chen, Junjie and Li, Wenpeng and Liu, Kun and Xie, Feng-Long and Tang, Xu and Hu, Yao},
journal={arXiv preprint arXiv:2603.10420},
year={2026}
}