{"title":"利用预训练的深度学习模型进行帕金森氏症语言障碍评估的可行性研究》,用于语音到文本分析。","authors":"Kwang Hyeon Kim, Byung-Jou Lee, Hae-Won Koo","doi":"10.13004/kjnt.2024.20.e30","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study investigates the feasibility of employing a pre-trained deep learning wave-to-vec model for speech-to-text analysis in individuals with speech disorders arising from Parkinson's disease (PD).</p><p><strong>Methods: </strong>A publicly available dataset containing speech recordings including the Hoehn and Yahr (H&Y) staging, Movement Disorder Society Unified Parkinson's Disease Rating Scale (UPDRS) Part I, UPDRS Part II scores, and gender information from both healthy controls (HC) and those diagnosed with PD was utilized. Employing the Wav2Vec model, a speech-to-text analysis method was implemented on PD patient data. Tasks conducted included word letter classification, word match probability assessment, and analysis of speech waveform characteristics as provided by the model's output.</p><p><strong>Results: </strong>For the dataset comprising 20 cases, among individuals with PD, the H&Y score averaged 2.50±0.67, the UPDRS II-part 5 score averaged 0.70±1.00, and the UPDRS III-part 18 score averaged 0.80±0.98. Additionally, the number of words derived from decoded text subsequent to speech recognition was evaluated, resulting in mean values of 299.10±16.79 and 259.80±93.39 for the HC and PD groups, respectively. Furthermore, the calculated degree of agreement for all syllables was based on the speech process. The accuracy for the reading sentences was observed to be 0.31 and 0.10, respectively.</p><p><strong>Conclusion: </strong>This study aimed to demonstrate the effectiveness of wave-to-vec in enhancing speech-to-text analysis for patients with speech disorders. The findings could pave the way for the development of clinical tools for improved diagnosis, evaluation, and communication support for this population.</p>","PeriodicalId":36879,"journal":{"name":"Korean Journal of Neurotrauma","volume":"20 3","pages":"168-179"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11450341/pdf/","citationCount":"0","resultStr":"{\"title\":\"Feasibility Study of Parkinson's Speech Disorder Evaluation With Pre-Trained Deep Learning Model for Speech-to-Text Analysis.\",\"authors\":\"Kwang Hyeon Kim, Byung-Jou Lee, Hae-Won Koo\",\"doi\":\"10.13004/kjnt.2024.20.e30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>This study investigates the feasibility of employing a pre-trained deep learning wave-to-vec model for speech-to-text analysis in individuals with speech disorders arising from Parkinson's disease (PD).</p><p><strong>Methods: </strong>A publicly available dataset containing speech recordings including the Hoehn and Yahr (H&Y) staging, Movement Disorder Society Unified Parkinson's Disease Rating Scale (UPDRS) Part I, UPDRS Part II scores, and gender information from both healthy controls (HC) and those diagnosed with PD was utilized. Employing the Wav2Vec model, a speech-to-text analysis method was implemented on PD patient data. Tasks conducted included word letter classification, word match probability assessment, and analysis of speech waveform characteristics as provided by the model's output.</p><p><strong>Results: </strong>For the dataset comprising 20 cases, among individuals with PD, the H&Y score averaged 2.50±0.67, the UPDRS II-part 5 score averaged 0.70±1.00, and the UPDRS III-part 18 score averaged 0.80±0.98. Additionally, the number of words derived from decoded text subsequent to speech recognition was evaluated, resulting in mean values of 299.10±16.79 and 259.80±93.39 for the HC and PD groups, respectively. Furthermore, the calculated degree of agreement for all syllables was based on the speech process. The accuracy for the reading sentences was observed to be 0.31 and 0.10, respectively.</p><p><strong>Conclusion: </strong>This study aimed to demonstrate the effectiveness of wave-to-vec in enhancing speech-to-text analysis for patients with speech disorders. The findings could pave the way for the development of clinical tools for improved diagnosis, evaluation, and communication support for this population.</p>\",\"PeriodicalId\":36879,\"journal\":{\"name\":\"Korean Journal of Neurotrauma\",\"volume\":\"20 3\",\"pages\":\"168-179\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11450341/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Korean Journal of Neurotrauma\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.13004/kjnt.2024.20.e30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Journal of Neurotrauma","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13004/kjnt.2024.20.e30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
摘要
研究目的本研究调查了在帕金森病(PD)引起的言语障碍患者中使用预训练深度学习波形-vec模型进行语音-文本分析的可行性:我们利用了一个公开可用的数据集,其中包含语音录音,包括健康对照组(HC)和被诊断为帕金森病患者的 Hoehn and Yahr(H&Y)分期、运动障碍协会统一帕金森病评分量表(UPDRS)第一部分、UPDRS 第二部分评分和性别信息。采用 Wav2Vec 模型,对帕金森病患者数据实施了语音到文本分析方法。分析任务包括单词字母分类、单词匹配概率评估以及分析模型输出提供的语音波形特征:结果:在由 20 个病例组成的数据集中,PD 患者的 H&Y 评分平均为 2.50±0.67,UPDRS II 第 5 部分评分平均为 0.70±1.00,UPDRS III 第 18 部分评分平均为 0.80±0.98。此外,还对语音识别后从解码文本中得出的单词数进行了评估,结果是HC组和PD组的平均值分别为(299.10±16.79)和(259.80±93.39)。此外,所有音节的一致度计算均基于语音过程。阅读句子的准确度分别为 0.31 和 0.10:本研究旨在证明 wave-to-vec 在增强言语障碍患者的语音到文本分析方面的有效性。研究结果可为开发临床工具铺平道路,以改善对这一人群的诊断、评估和交流支持。
Feasibility Study of Parkinson's Speech Disorder Evaluation With Pre-Trained Deep Learning Model for Speech-to-Text Analysis.
Objective: This study investigates the feasibility of employing a pre-trained deep learning wave-to-vec model for speech-to-text analysis in individuals with speech disorders arising from Parkinson's disease (PD).
Methods: A publicly available dataset containing speech recordings including the Hoehn and Yahr (H&Y) staging, Movement Disorder Society Unified Parkinson's Disease Rating Scale (UPDRS) Part I, UPDRS Part II scores, and gender information from both healthy controls (HC) and those diagnosed with PD was utilized. Employing the Wav2Vec model, a speech-to-text analysis method was implemented on PD patient data. Tasks conducted included word letter classification, word match probability assessment, and analysis of speech waveform characteristics as provided by the model's output.
Results: For the dataset comprising 20 cases, among individuals with PD, the H&Y score averaged 2.50±0.67, the UPDRS II-part 5 score averaged 0.70±1.00, and the UPDRS III-part 18 score averaged 0.80±0.98. Additionally, the number of words derived from decoded text subsequent to speech recognition was evaluated, resulting in mean values of 299.10±16.79 and 259.80±93.39 for the HC and PD groups, respectively. Furthermore, the calculated degree of agreement for all syllables was based on the speech process. The accuracy for the reading sentences was observed to be 0.31 and 0.10, respectively.
Conclusion: This study aimed to demonstrate the effectiveness of wave-to-vec in enhancing speech-to-text analysis for patients with speech disorders. The findings could pave the way for the development of clinical tools for improved diagnosis, evaluation, and communication support for this population.