Comparing speech recognition services for HCI applications in behavioral health

Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers Pub Date : 2020-09-10 DOI:10.1145/3410530.3414372

P. Chlebek, Elizabeth Shriberg, Yang Lu, T. Rutowski, A. Harati, R. Oliveira

{"title":"Comparing speech recognition services for HCI applications in behavioral health","authors":"P. Chlebek, Elizabeth Shriberg, Yang Lu, T. Rutowski, A. Harati, R. Oliveira","doi":"10.1145/3410530.3414372","DOIUrl":null,"url":null,"abstract":"Behavioral health conditions such as depression and anxiety are a global concern, and there is growing interest in employing speech technology to screen and monitor patients remotely. Language modeling approaches require automatic speech recognition (ASR) and multiple privacy-compliant ASR services are commercially available. We use a corpus of over 60 hours of speech from a behavioral health task, and compare ASR performance for four commercial vendors. We expected similar performance, but found large differences between the top and next-best performer, for both mobile (48% relative WER increase) and laptop (67% relative WER increase) data. Results suggest the importance of benchmarking ASR systems in this domain. Additionally we find that WER is not systematically related to depression itself. Performance is however affected by diverse audio quality from users' personal devices, and possibly from the overall style of speech in this domain.","PeriodicalId":7183,"journal":{"name":"Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers","volume":"137 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3410530.3414372","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Behavioral health conditions such as depression and anxiety are a global concern, and there is growing interest in employing speech technology to screen and monitor patients remotely. Language modeling approaches require automatic speech recognition (ASR) and multiple privacy-compliant ASR services are commercially available. We use a corpus of over 60 hours of speech from a behavioral health task, and compare ASR performance for four commercial vendors. We expected similar performance, but found large differences between the top and next-best performer, for both mobile (48% relative WER increase) and laptop (67% relative WER increase) data. Results suggest the importance of benchmarking ASR systems in this domain. Additionally we find that WER is not systematically related to depression itself. Performance is however affected by diverse audio quality from users' personal devices, and possibly from the overall style of speech in this domain.

查看原文本刊更多论文

比较语音识别服务在行为健康中的HCI应用

抑郁症和焦虑症等行为健康状况是全球关注的问题，人们对使用语音技术远程筛查和监测患者的兴趣越来越大。语言建模方法需要自动语音识别(ASR)，而多种符合隐私的ASR服务已在商业上可用。我们使用来自行为健康任务的超过60小时的语音语料库，并比较了四个商业供应商的ASR性能。我们预计会有类似的表现，但发现表现最好的和次佳的设备之间存在巨大差异，无论是移动设备(相对WER增长48%)还是笔记本电脑(相对WER增长67%)。结果表明，在这一领域对ASR系统进行基准测试的重要性。此外，我们发现WER与抑郁症本身没有系统的关系。然而，性能受到来自用户个人设备的不同音频质量的影响，并且可能受到该领域的整体语音风格的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers

自引率

0.00%

发文量