How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise

2025-10-04 22:08 GMT · 6 months ago aimagpro.com

Optimizing only for Automatic Speech Recognition (ASR) and Word Error Rate (WER) is insufficient for modern, interactive voice agents. Robust evaluation must measure end-to-end task success, barge-in behavior and latency, and hallucination-under-noise—alongside ASR, safety, and instruction following. VoiceBench offers a multi-facet speech-interaction benchmark across general knowledge, instruction following, safety, and robustness to speaker/environment/content variations, but […]
The post How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise appeared first on MarkTechPost.