解码差异:评估自动语音识别系统在转录黑人和白人患者与护士的口头交流中的表现。

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES
JAMIA Open Pub Date : 2024-12-10 eCollection Date: 2024-12-01 DOI:10.1093/jamiaopen/ooae130
Maryam Zolnoori, Sasha Vergez, Zidu Xu, Elyas Esmaeili, Ali Zolnour, Krystal Anne Briggs, Jihye Kim Scroggins, Seyed Farid Hosseini Ebrahimabad, James M Noble, Maxim Topaz, Suzanne Bakken, Kathryn H Bowles, Ian Spens, Nicole Onorato, Sridevi Sridharan, Margaret V McDonald
{"title":"解码差异:评估自动语音识别系统在转录黑人和白人患者与护士的口头交流中的表现。","authors":"Maryam Zolnoori, Sasha Vergez, Zidu Xu, Elyas Esmaeili, Ali Zolnour, Krystal Anne Briggs, Jihye Kim Scroggins, Seyed Farid Hosseini Ebrahimabad, James M Noble, Maxim Topaz, Suzanne Bakken, Kathryn H Bowles, Ian Spens, Nicole Onorato, Sridevi Sridharan, Margaret V McDonald","doi":"10.1093/jamiaopen/ooae130","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>As artificial intelligence evolves, integrating speech processing into home healthcare (HHC) workflows is increasingly feasible. Audio-recorded communications enhance risk identification models, with automatic speech recognition (ASR) systems as a key component. This study evaluates the transcription accuracy and equity of 4 ASR systems-Amazon Web Services (AWS) General, AWS Medical, Whisper, and Wave2Vec-in transcribing patient-nurse communication in US HHC, focusing on their ability in accurate transcription of speech from Black and White English-speaking patients.</p><p><strong>Materials and methods: </strong>We analyzed audio recordings of patient-nurse encounters from 35 patients (16 Black and 19 White) in a New York City-based HHC service. Overall, 860 utterances were available for study, including 475 drawn from Black patients and 385 from White patients. Automatic speech recognition performance was measured using word error rate (WER), benchmarked against a manual gold standard. Disparities were assessed by comparing ASR performance across racial groups using the linguistic inquiry and word count (LIWC) tool, focusing on 10 linguistic dimensions, as well as specific speech elements including repetition, filler words, and proper nouns (medical and nonmedical terms).</p><p><strong>Results: </strong>The average age of participants was 67.8 years (SD = 14.4). Communication lasted an average of 15 minutes (range: 11-21 minutes) with a median of 1186 words per patient. Of 860 total utterances, 475 were from Black patients and 385 from White patients. Amazon Web Services General had the highest accuracy, with a median WER of 39%. However, all systems showed reduced accuracy for Black patients, with significant discrepancies in LIWC dimensions such as \"Affect,\" \"Social,\" and \"Drives.\" Amazon Web Services Medical performed best for medical terms, though all systems have difficulties with filler words, repetition, and nonmedical terms, with AWS General showing the lowest error rates at 65%, 64%, and 53%, respectively.</p><p><strong>Discussion: </strong>While AWS systems demonstrated superior accuracy, significant disparities by race highlight the need for more diverse training datasets and improved dialect sensitivity. Addressing these disparities is critical for ensuring equitable ASR performance in HHC settings and enhancing risk prediction models through audio-recorded communication.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"7 4","pages":"ooae130"},"PeriodicalIF":2.5000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631515/pdf/","citationCount":"0","resultStr":"{\"title\":\"Decoding disparities: evaluating automatic speech recognition system performance in transcribing Black and White patient verbal communication with nurses in home healthcare.\",\"authors\":\"Maryam Zolnoori, Sasha Vergez, Zidu Xu, Elyas Esmaeili, Ali Zolnour, Krystal Anne Briggs, Jihye Kim Scroggins, Seyed Farid Hosseini Ebrahimabad, James M Noble, Maxim Topaz, Suzanne Bakken, Kathryn H Bowles, Ian Spens, Nicole Onorato, Sridevi Sridharan, Margaret V McDonald\",\"doi\":\"10.1093/jamiaopen/ooae130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>As artificial intelligence evolves, integrating speech processing into home healthcare (HHC) workflows is increasingly feasible. Audio-recorded communications enhance risk identification models, with automatic speech recognition (ASR) systems as a key component. This study evaluates the transcription accuracy and equity of 4 ASR systems-Amazon Web Services (AWS) General, AWS Medical, Whisper, and Wave2Vec-in transcribing patient-nurse communication in US HHC, focusing on their ability in accurate transcription of speech from Black and White English-speaking patients.</p><p><strong>Materials and methods: </strong>We analyzed audio recordings of patient-nurse encounters from 35 patients (16 Black and 19 White) in a New York City-based HHC service. Overall, 860 utterances were available for study, including 475 drawn from Black patients and 385 from White patients. Automatic speech recognition performance was measured using word error rate (WER), benchmarked against a manual gold standard. Disparities were assessed by comparing ASR performance across racial groups using the linguistic inquiry and word count (LIWC) tool, focusing on 10 linguistic dimensions, as well as specific speech elements including repetition, filler words, and proper nouns (medical and nonmedical terms).</p><p><strong>Results: </strong>The average age of participants was 67.8 years (SD = 14.4). Communication lasted an average of 15 minutes (range: 11-21 minutes) with a median of 1186 words per patient. Of 860 total utterances, 475 were from Black patients and 385 from White patients. Amazon Web Services General had the highest accuracy, with a median WER of 39%. However, all systems showed reduced accuracy for Black patients, with significant discrepancies in LIWC dimensions such as \\\"Affect,\\\" \\\"Social,\\\" and \\\"Drives.\\\" Amazon Web Services Medical performed best for medical terms, though all systems have difficulties with filler words, repetition, and nonmedical terms, with AWS General showing the lowest error rates at 65%, 64%, and 53%, respectively.</p><p><strong>Discussion: </strong>While AWS systems demonstrated superior accuracy, significant disparities by race highlight the need for more diverse training datasets and improved dialect sensitivity. Addressing these disparities is critical for ensuring equitable ASR performance in HHC settings and enhancing risk prediction models through audio-recorded communication.</p>\",\"PeriodicalId\":36278,\"journal\":{\"name\":\"JAMIA Open\",\"volume\":\"7 4\",\"pages\":\"ooae130\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631515/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMIA Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamiaopen/ooae130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooae130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

目标:随着人工智能的发展,将语音处理集成到家庭医疗(HHC)工作流程中越来越可行。录音通信增强了风险识别模型,其中自动语音识别(ASR)系统是关键组成部分。本研究评估了4种ASR系统——amazon Web Services (AWS) General、AWS Medical、Whisper和wave2vec——在转录美国HHC患者-护士交流中的转录准确性和公平性,重点关注它们准确转录黑人和白人英语患者语音的能力。材料和方法:我们分析了纽约市HHC服务中35名患者(16名黑人和19名白人)的患者-护士接触录音。总共有860个话语可供研究,其中475个来自黑人患者,385个来自白人患者。自动语音识别性能是用单词错误率(WER)来衡量的,以人工黄金标准为基准。通过使用语言查询和单词计数(LIWC)工具比较不同种族的ASR表现,评估差异,重点关注10个语言维度,以及特定的语音元素,包括重复、填充词和专有名词(医学和非医学术语)。结果:参与者平均年龄67.8岁(SD = 14.4)。交流平均持续15分钟(范围:11-21分钟),平均每位患者1186个单词。在860个话语中,475个来自黑人患者,385个来自白人患者。Amazon Web Services General的准确率最高,WER的中位数为39%。然而,所有系统对黑人患者的准确性都有所降低,在LIWC维度(如“影响”、“社会”和“驱动”)上存在显著差异。Amazon Web Services Medical在医疗术语方面表现最好,尽管所有系统在填充词、重复词和非医疗术语方面都存在困难,但AWS General的错误率最低,分别为65%、64%和53%。讨论:虽然AWS系统显示出优越的准确性,但种族之间的显著差异突出了对更多样化的训练数据集和改进方言敏感性的需求。解决这些差异对于确保卫生保健环境中公平的ASR绩效和通过录音交流加强风险预测模型至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Decoding disparities: evaluating automatic speech recognition system performance in transcribing Black and White patient verbal communication with nurses in home healthcare.

Objectives: As artificial intelligence evolves, integrating speech processing into home healthcare (HHC) workflows is increasingly feasible. Audio-recorded communications enhance risk identification models, with automatic speech recognition (ASR) systems as a key component. This study evaluates the transcription accuracy and equity of 4 ASR systems-Amazon Web Services (AWS) General, AWS Medical, Whisper, and Wave2Vec-in transcribing patient-nurse communication in US HHC, focusing on their ability in accurate transcription of speech from Black and White English-speaking patients.

Materials and methods: We analyzed audio recordings of patient-nurse encounters from 35 patients (16 Black and 19 White) in a New York City-based HHC service. Overall, 860 utterances were available for study, including 475 drawn from Black patients and 385 from White patients. Automatic speech recognition performance was measured using word error rate (WER), benchmarked against a manual gold standard. Disparities were assessed by comparing ASR performance across racial groups using the linguistic inquiry and word count (LIWC) tool, focusing on 10 linguistic dimensions, as well as specific speech elements including repetition, filler words, and proper nouns (medical and nonmedical terms).

Results: The average age of participants was 67.8 years (SD = 14.4). Communication lasted an average of 15 minutes (range: 11-21 minutes) with a median of 1186 words per patient. Of 860 total utterances, 475 were from Black patients and 385 from White patients. Amazon Web Services General had the highest accuracy, with a median WER of 39%. However, all systems showed reduced accuracy for Black patients, with significant discrepancies in LIWC dimensions such as "Affect," "Social," and "Drives." Amazon Web Services Medical performed best for medical terms, though all systems have difficulties with filler words, repetition, and nonmedical terms, with AWS General showing the lowest error rates at 65%, 64%, and 53%, respectively.

Discussion: While AWS systems demonstrated superior accuracy, significant disparities by race highlight the need for more diverse training datasets and improved dialect sensitivity. Addressing these disparities is critical for ensuring equitable ASR performance in HHC settings and enhancing risk prediction models through audio-recorded communication.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信