{"title":"揭示弱监督端到端模型在卒中后失语症患者语音识别中的潜力","authors":"Giulia Sanguedolce, P. Naylor, F. Geranmayeh","doi":"10.18653/v1/2023.clinicalnlp-1.24","DOIUrl":null,"url":null,"abstract":"Post-stroke speech and language deficits (aphasia) significantly impact patients’ quality of life. Many with mild symptoms remain undiagnosed, and the majority do not receive the intensive doses of therapy recommended, due to healthcare costs and/or inadequate services. Automatic Speech Recognition (ASR) may help overcome these difficulties by improving diagnostic rates and providing feedback during tailored therapy. However, its performance is often unsatisfactory due to the high variability in speech errors and scarcity of training datasets. This study assessed the performance of Whisper, a recently released end-to-end model, in patients with post-stroke aphasia (PWA). We tuned its hyperparameters to achieve the lowest word error rate (WER) on aphasic speech. WER was significantly higher in PWA compared to age-matched controls (10.3% vs 38.5%, p<0.001). We demonstrated that worse WER was related to the more severe aphasia as measured by expressive (overt naming, and spontaneous speech production) and receptive (written and spoken comprehension) language assessments. Stroke lesion size did not affect the performance of Whisper. Linear mixed models accounting for demographic factors, therapy duration, and time since stroke, confirmed worse Whisper performance with left hemispheric frontal lesions.We discuss the implications of these findings for how future ASR can be improved in PWA.","PeriodicalId":216954,"journal":{"name":"Clinical Natural Language Processing Workshop","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncovering the Potential for a Weakly Supervised End-to-End Model in Recognising Speech from Patient with Post-Stroke Aphasia\",\"authors\":\"Giulia Sanguedolce, P. Naylor, F. Geranmayeh\",\"doi\":\"10.18653/v1/2023.clinicalnlp-1.24\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Post-stroke speech and language deficits (aphasia) significantly impact patients’ quality of life. Many with mild symptoms remain undiagnosed, and the majority do not receive the intensive doses of therapy recommended, due to healthcare costs and/or inadequate services. Automatic Speech Recognition (ASR) may help overcome these difficulties by improving diagnostic rates and providing feedback during tailored therapy. However, its performance is often unsatisfactory due to the high variability in speech errors and scarcity of training datasets. This study assessed the performance of Whisper, a recently released end-to-end model, in patients with post-stroke aphasia (PWA). We tuned its hyperparameters to achieve the lowest word error rate (WER) on aphasic speech. WER was significantly higher in PWA compared to age-matched controls (10.3% vs 38.5%, p<0.001). We demonstrated that worse WER was related to the more severe aphasia as measured by expressive (overt naming, and spontaneous speech production) and receptive (written and spoken comprehension) language assessments. Stroke lesion size did not affect the performance of Whisper. Linear mixed models accounting for demographic factors, therapy duration, and time since stroke, confirmed worse Whisper performance with left hemispheric frontal lesions.We discuss the implications of these findings for how future ASR can be improved in PWA.\",\"PeriodicalId\":216954,\"journal\":{\"name\":\"Clinical Natural Language Processing Workshop\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Natural Language Processing Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2023.clinicalnlp-1.24\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Natural Language Processing Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2023.clinicalnlp-1.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
卒中后言语和语言缺陷(失语症)显著影响患者的生活质量。许多症状轻微的人仍未得到诊断,由于医疗费用和/或服务不足,大多数人没有接受建议的高剂量治疗。自动语音识别(ASR)可以通过提高诊断率和在量身定制的治疗过程中提供反馈来帮助克服这些困难。然而,由于语音错误的高度可变性和训练数据集的稀缺性,其性能往往不令人满意。这项研究评估了Whisper(最近发布的端到端模型)在中风后失语(PWA)患者中的表现。我们调整了它的超参数,以达到最低的失语错误率(WER)。与年龄匹配的对照组相比,PWA组的WER明显更高(10.3% vs 38.5%, p<0.001)。我们证明,通过表达性(公开命名和自发言语产生)和接受性(书面和口头理解)语言评估,越差的WER与越严重的失语症有关。脑卒中病变大小不影响Whisper的性能。线性混合模型考虑了人口统计学因素、治疗持续时间和中风后的时间,证实了左半球额叶病变患者的耳语表现更差。我们讨论了这些发现对未来如何改善PWA的ASR的意义。
Uncovering the Potential for a Weakly Supervised End-to-End Model in Recognising Speech from Patient with Post-Stroke Aphasia
Post-stroke speech and language deficits (aphasia) significantly impact patients’ quality of life. Many with mild symptoms remain undiagnosed, and the majority do not receive the intensive doses of therapy recommended, due to healthcare costs and/or inadequate services. Automatic Speech Recognition (ASR) may help overcome these difficulties by improving diagnostic rates and providing feedback during tailored therapy. However, its performance is often unsatisfactory due to the high variability in speech errors and scarcity of training datasets. This study assessed the performance of Whisper, a recently released end-to-end model, in patients with post-stroke aphasia (PWA). We tuned its hyperparameters to achieve the lowest word error rate (WER) on aphasic speech. WER was significantly higher in PWA compared to age-matched controls (10.3% vs 38.5%, p<0.001). We demonstrated that worse WER was related to the more severe aphasia as measured by expressive (overt naming, and spontaneous speech production) and receptive (written and spoken comprehension) language assessments. Stroke lesion size did not affect the performance of Whisper. Linear mixed models accounting for demographic factors, therapy duration, and time since stroke, confirmed worse Whisper performance with left hemispheric frontal lesions.We discuss the implications of these findings for how future ASR can be improved in PWA.