{"title":"Exploring Practical Metrics to Support Automatic Speech Recognition Evaluations.","authors":"E A Draffan, Mike Wald, Chaohai Ding, Yunjia Li","doi":"10.3233/SHTI230636","DOIUrl":null,"url":null,"abstract":"<p><p>Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.</p>","PeriodicalId":39242,"journal":{"name":"Studies in Health Technology and Informatics","volume":"306 ","pages":"305-310"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Health Technology and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI230636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Health Professions","Score":null,"Total":0}
引用次数: 0
Abstract
Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.
期刊介绍:
This book series was started in 1990 to promote research conducted under the auspices of the EC programmes’ Advanced Informatics in Medicine (AIM) and Biomedical and Health Research (BHR) bioengineering branch. A driving aspect of international health informatics is that telecommunication technology, rehabilitative technology, intelligent home technology and many other components are moving together and form one integrated world of information and communication media.