Papers
arxiv:2410.17196

VoiceBench: Benchmarking LLM-Based Voice Assistants

Published on Oct 22, 2024
Authors:
,
,
,
,
,

Abstract

VoiceBench, a new benchmark, evaluates LLM-based voice assistants in real-world scenarios with diverse speaker characteristics, environmental factors, and content, highlighting current limitations and guiding future research.

Building on the success of large language models (LLMs), recent advancements such as GPT-4o have enabled real-time speech interactions through LLM-based voice assistants, offering a significantly improved user experience compared to traditional text-based interactions. However, the absence of benchmarks designed to evaluate these speech interaction capabilities has hindered progress of LLM-based voice assistants development. Current evaluations focus primarily on automatic speech recognition (ASR) or general knowledge evaluation with clean speeches, neglecting the more intricate, real-world scenarios that involve diverse speaker characteristics, environmental and content factors. To address this, we introduce VoiceBench, the first benchmark designed to provide a multi-faceted evaluation of LLM-based voice assistants. VoiceBench also includes both real and synthetic spoken instructions that incorporate the above three key real-world variations. Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2410.17196
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.17196 in a model README.md to link it from this page.

Datasets citing this paper 3

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.17196 in a Space README.md to link it from this page.

Collections including this paper 2