Advanced optimization strategies for combining acoustic features and speech recognition error rates in multi-stage classification of Parkinson's disease severity.

IF 2.8 4区医学 Q2 ENGINEERING, BIOMEDICAL

Biomedical Engineering Letters Pub Date : 2025-03-07 eCollection Date: 2025-05-01 DOI:10.1007/s13534-025-00465-9

S I M M Raton Mondol, Ryul Kim, Sangmin Lee

{"title":"Advanced optimization strategies for combining acoustic features and speech recognition error rates in multi-stage classification of Parkinson's disease severity.","authors":"S I M M Raton Mondol, Ryul Kim, Sangmin Lee","doi":"10.1007/s13534-025-00465-9","DOIUrl":null,"url":null,"abstract":"<p><p>Recent research has made significant progress with definitively identifying individuals with Parkinson's disease (PD) using speech analysis techniques. However, these studies have often treated the early and advanced stages of PD as equivalent, overlooking the distinct speech impairments and symptoms that can vary significantly across the various stages. This research aims to enhance diagnostic accuracy by utilizing advanced optimization strategies to combine speech recognition results (character error rates) with the acoustic features of vowels for more rigorous diagnostic precision. The dysphonia features of three sustained Korean vowels /아/ (a), /이/ (i), and /우/ (u) were examined for their diversity and strong correlations. Four recognized machine-learning classifiers: Random Forest, Support Vector Machine, k-Nearest Neighbors, and Multi-Layer Perceptron, were employed for consistent and reliable analysis. By fine-tuning the Whisper model specifically for PD speech recognition and optimizing it for each severity level of PD, we significantly improved the discernibility between PD severity levels. This enhancement, when combined with vowel data, allowed for a more precise classification, achieving an improved detection accuracy of 5.87% for a 3-level severity classification over the PD \"ON\"-state dataset, and an improved detection accuracy of 7.8% for a 3-level severity classification over the PD \"OFF\"-state dataset. This comprehensive approach not only evaluates the effectiveness of different feature extraction methods but also minimizes the variance across final classification models, thus detecting varying severity levels of PD more effectively.</p>","PeriodicalId":46898,"journal":{"name":"Biomedical Engineering Letters","volume":"15 3","pages":"497-511"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12011695/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Engineering Letters","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s13534-025-00465-9","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Recent research has made significant progress with definitively identifying individuals with Parkinson's disease (PD) using speech analysis techniques. However, these studies have often treated the early and advanced stages of PD as equivalent, overlooking the distinct speech impairments and symptoms that can vary significantly across the various stages. This research aims to enhance diagnostic accuracy by utilizing advanced optimization strategies to combine speech recognition results (character error rates) with the acoustic features of vowels for more rigorous diagnostic precision. The dysphonia features of three sustained Korean vowels /아/ (a), /이/ (i), and /우/ (u) were examined for their diversity and strong correlations. Four recognized machine-learning classifiers: Random Forest, Support Vector Machine, k-Nearest Neighbors, and Multi-Layer Perceptron, were employed for consistent and reliable analysis. By fine-tuning the Whisper model specifically for PD speech recognition and optimizing it for each severity level of PD, we significantly improved the discernibility between PD severity levels. This enhancement, when combined with vowel data, allowed for a more precise classification, achieving an improved detection accuracy of 5.87% for a 3-level severity classification over the PD "ON"-state dataset, and an improved detection accuracy of 7.8% for a 3-level severity classification over the PD "OFF"-state dataset. This comprehensive approach not only evaluates the effectiveness of different feature extraction methods but also minimizes the variance across final classification models, thus detecting varying severity levels of PD more effectively.

查看原文本刊更多论文

结合声学特征和语音识别错误率的帕金森病多阶段分类高级优化策略

最近的研究在使用语音分析技术明确识别帕金森病（PD）个体方面取得了重大进展。然而，这些研究通常将早期和晚期PD等同对待，忽略了不同阶段可能存在显著差异的不同语言障碍和症状。本研究旨在利用先进的优化策略，将语音识别结果（字符错误率）与元音的声学特征相结合，以提高诊断精度，从而提高诊断精度。我们研究了三个韩语元音/ / (a)、/ / (i)和/ / (u)的发音障碍特征，以确定它们的多样性和强相关性。四种公认的机器学习分类器：随机森林、支持向量机、k近邻和多层感知器，用于一致和可靠的分析。通过专门针对PD语音识别对Whisper模型进行微调，并针对PD的每个严重级别对其进行优化，我们显著提高了PD严重级别之间的可识别性。当与元音数据相结合时，这种增强允许更精确的分类，在PD“开”状态数据集上实现3级严重程度分类的检测精度提高了5.87%，在PD“关”状态数据集上实现3级严重程度分类的检测精度提高了7.8%。这种综合的方法不仅评估了不同特征提取方法的有效性，而且最大限度地减少了最终分类模型之间的差异，从而更有效地检测出不同严重程度的PD。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biomedical Engineering Letters ENGINEERING, BIOMEDICAL-

CiteScore

6.80

自引率

0.00%

发文量

期刊介绍： Biomedical Engineering Letters (BMEL) aims to present the innovative experimental science and technological development in the biomedical field as well as clinical application of new development. The article must contain original biomedical engineering content, defined as development, theoretical analysis, and evaluation/validation of a new technique. BMEL publishes the following types of papers: original articles, review articles, editorials, and letters to the editor. All the papers are reviewed in single-blind fashion.