用 QuickSIN 比较噪声中的人工和机器语音识别。

IF 1.2 Q3 ACOUSTICS

JASA express letters Pub Date : 2024-09-01 DOI:10.1121/10.0028612

Malcolm Slaney, Matthew B Fitzgerald

{"title":"用 QuickSIN 比较噪声中的人工和机器语音识别。","authors":"Malcolm Slaney, Matthew B Fitzgerald","doi":"10.1121/10.0028612","DOIUrl":null,"url":null,"abstract":"A test is proposed to characterize the performance of speech recognition systems. The QuickSIN test is used by audiologists to measure the ability of humans to recognize continuous speech in noise. This test yields the signal-to-noise ratio at which individuals can correctly recognize 50% of the keywords in low-context sentences. It is argued that a metric for automatic speech recognizers will ground the performance of automatic speech-in-noise recognizers to human abilities. Here, it is demonstrated that the performance of modern recognizers, built using millions of hours of unsupervised training data, is anywhere from normal to mildly impaired in noise compared to human participants.","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 9","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing human and machine speech recognition in noise with QuickSIN.\",\"authors\":\"Malcolm Slaney, Matthew B Fitzgerald\",\"doi\":\"10.1121/10.0028612\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A test is proposed to characterize the performance of speech recognition systems. The QuickSIN test is used by audiologists to measure the ability of humans to recognize continuous speech in noise. This test yields the signal-to-noise ratio at which individuals can correctly recognize 50% of the keywords in low-context sentences. It is argued that a metric for automatic speech recognizers will ground the performance of automatic speech-in-noise recognizers to human abilities. Here, it is demonstrated that the performance of modern recognizers, built using millions of hours of unsupervised training data, is anywhere from normal to mildly impaired in noise compared to human participants.\",\"PeriodicalId\":73538,\"journal\":{\"name\":\"JASA express letters\",\"volume\":\"4 9\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JASA express letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0028612\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JASA express letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1121/10.0028612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种测试方法来鉴定语音识别系统的性能。听力学家使用 QuickSIN 测试来测量人类在噪声中识别连续语音的能力。该测试可得出个人能正确识别低语境句子中 50% 关键字的信噪比。有人认为，自动语音识别器的衡量标准将使自动噪声语音识别器的性能与人类的能力相一致。本文证明，与人类参与者相比，使用数百万小时无监督训练数据构建的现代识别器在噪声中的表现从正常到轻微受损不等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparing human and machine speech recognition in noise with QuickSIN.

A test is proposed to characterize the performance of speech recognition systems. The QuickSIN test is used by audiologists to measure the ability of humans to recognize continuous speech in noise. This test yields the signal-to-noise ratio at which individuals can correctly recognize 50% of the keywords in low-context sentences. It is argued that a metric for automatic speech recognizers will ground the performance of automatic speech-in-noise recognizers to human abilities. Here, it is demonstrated that the performance of modern recognizers, built using millions of hours of unsupervised training data, is anywhere from normal to mildly impaired in noise compared to human participants.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JASA express letters

CiteScore

1.70

自引率

0.00%

发文量