探索实用指标，以支持自动语音识别评估。

Q3 Health Professions

Studies in Health Technology and Informatics Pub Date : 2023-08-23 DOI:10.3233/SHTI230636

E A Draffan, Mike Wald, Chaohai Ding, Yunjia Li

{"title":"探索实用指标，以支持自动语音识别评估。","authors":"E A Draffan, Mike Wald, Chaohai Ding, Yunjia Li","doi":"10.3233/SHTI230636","DOIUrl":null,"url":null,"abstract":"Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.","PeriodicalId":39242,"journal":{"name":"Studies in Health Technology and Informatics","volume":"306 ","pages":"305-310"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring Practical Metrics to Support Automatic Speech Recognition Evaluations.\",\"authors\":\"E A Draffan, Mike Wald, Chaohai Ding, Yunjia Li\",\"doi\":\"10.3233/SHTI230636\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.\",\"PeriodicalId\":39242,\"journal\":{\"name\":\"Studies in Health Technology and Informatics\",\"volume\":\"306 \",\"pages\":\"305-310\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in Health Technology and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI230636\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Health Professions\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Health Technology and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI230636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Health Professions","Score":null,"Total":0}

引用次数: 0

摘要

最近对自动语音识别的文本输出质量评估的研究表明，使用单词错误率来查看英语中存在多少错误并不一定有助于自动转录或字幕的开发。对错误类型的置信度仍然很低，因为从语音到文本的误译并不总是通过详细说明错误原因的注释来捕获。在高等教育中，有这样的情况:要求配图和抄写的学生发现，一些学术讲座的结果充斥着单词错误，这意味着理解水平下降，那些有认知、身体和感官障碍的学生受到的影响尤其严重。尽管对会话自动语音识别的普遍理解有了令人难以置信的进步，但学术场合往往包括许多特定领域的术语，讲师可能不是母语人士，他们需要在嘈杂的情况下处理录音技术。本文旨在讨论如何使用附加指标来捕获问题并将其反馈到机器学习过程中，以提高输出质量，并为使用虚拟会议系统的人员提供更具包容性的实践。这个过程超越了所表达的内容，并检查了副语言方面，如时间、语调、语音质量和语音理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring Practical Metrics to Support Automatic Speech Recognition Evaluations.

Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Studies in Health Technology and Informatics Health Professions-Health Information Management

CiteScore

1.20

自引率

0.00%

发文量

1463

期刊介绍： This book series was started in 1990 to promote research conducted under the auspices of the EC programmes’ Advanced Informatics in Medicine (AIM) and Biomedical and Health Research (BHR) bioengineering branch. A driving aspect of international health informatics is that telecommunication technology, rehabilitative technology, intelligent home technology and many other components are moving together and form one integrated world of information and communication media.