Developing a smart system for binary classification of disordered voices using machine learning

IF 1.8 4区 医学 Q2 OTORHINOLARYNGOLOGY
Yat Chun Au, Manwa L. Ng
{"title":"Developing a smart system for binary classification of disordered voices using machine learning","authors":"Yat Chun Au,&nbsp;Manwa L. Ng","doi":"10.1016/j.amjoto.2025.104672","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>Voice disorder is characterized by disruptions in voice quality caused by issues in vocal fold vibration during phonation. The study explored the application of machine learning, based on the Random Forest (RF) and Decision Tree (DT) models, in the classification of normophonic and disordered voices using acoustic features. The RF and DT classifiers were compared, and the diagnostic utility of individual acoustic parameters was evaluated across multilingual databases, with an emphasis on Cantonese voice samples.</div></div><div><h3>Methods</h3><div>Sustained vowel /a/ recordings were extracted from the Saarbruecken Voice Database, the Perceptual Voice Qualities Database, and a local Cantonese clinical repository. A total of 1986 samples were used for training and testing. Twenty-nine acoustic features were extracted using Parselmouth, a Python interface to Praat. RF and DT models were trained on overseas data and validated on local Cantonese recordings. The RF and DT models were compared based on classification accuracy, sensitivity, specificity, and F1-score. Feature importance was assessed using Mean Decrease in Impurity (MDI) and Mean Decrease in Accuracy (MDA). Receiver Operating Characteristic (ROC) analysis was performed to evaluate the discriminative ability of each acoustic parameter by sex and dataset origin.</div></div><div><h3>Results</h3><div>The RF model outperformed the DT model, with RF achieving an accuracy of 89 %, precision of 79 %, and F1 score of 77 %, compared to 78 % accuracy and 61 % F1 score associated with DT. RF demonstrated superior true positive and negative rates, and lower false negative rates, making it more suitable for clinical applications. Acoustic feature analysis identified age, CSID, and shimmer and jitter measures as key contributors to classification performance. ROC analyses revealed that CSID and stdevF0Hz were reliable discriminators for male voices, while CSID, localabsoluteJitter, apq11Shimmer, and localdbShimmer demonstrated strong classification performance in female voices across all datasets. However, threshold variability between local and overseas datasets highlights the need for population-specific calibration.</div></div><div><h3>Conclusion</h3><div>This study underscores the potential of machine learning, particularly the RF algorithm, in enhancing the accuracy of voice disorder diagnosis by automating acoustic feature analysis. The integration of such models into clinical practice could offer more reliable, non-invasive methods for early detection and management of voice disorders, thus improving patient outcomes. Future research should focus on expanding dataset diversity and further validation to enhance the generalizability and clinical applicability of these findings.</div></div>","PeriodicalId":7591,"journal":{"name":"American Journal of Otolaryngology","volume":"46 4","pages":"Article 104672"},"PeriodicalIF":1.8000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Otolaryngology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0196070925000754","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives

Voice disorder is characterized by disruptions in voice quality caused by issues in vocal fold vibration during phonation. The study explored the application of machine learning, based on the Random Forest (RF) and Decision Tree (DT) models, in the classification of normophonic and disordered voices using acoustic features. The RF and DT classifiers were compared, and the diagnostic utility of individual acoustic parameters was evaluated across multilingual databases, with an emphasis on Cantonese voice samples.

Methods

Sustained vowel /a/ recordings were extracted from the Saarbruecken Voice Database, the Perceptual Voice Qualities Database, and a local Cantonese clinical repository. A total of 1986 samples were used for training and testing. Twenty-nine acoustic features were extracted using Parselmouth, a Python interface to Praat. RF and DT models were trained on overseas data and validated on local Cantonese recordings. The RF and DT models were compared based on classification accuracy, sensitivity, specificity, and F1-score. Feature importance was assessed using Mean Decrease in Impurity (MDI) and Mean Decrease in Accuracy (MDA). Receiver Operating Characteristic (ROC) analysis was performed to evaluate the discriminative ability of each acoustic parameter by sex and dataset origin.

Results

The RF model outperformed the DT model, with RF achieving an accuracy of 89 %, precision of 79 %, and F1 score of 77 %, compared to 78 % accuracy and 61 % F1 score associated with DT. RF demonstrated superior true positive and negative rates, and lower false negative rates, making it more suitable for clinical applications. Acoustic feature analysis identified age, CSID, and shimmer and jitter measures as key contributors to classification performance. ROC analyses revealed that CSID and stdevF0Hz were reliable discriminators for male voices, while CSID, localabsoluteJitter, apq11Shimmer, and localdbShimmer demonstrated strong classification performance in female voices across all datasets. However, threshold variability between local and overseas datasets highlights the need for population-specific calibration.

Conclusion

This study underscores the potential of machine learning, particularly the RF algorithm, in enhancing the accuracy of voice disorder diagnosis by automating acoustic feature analysis. The integration of such models into clinical practice could offer more reliable, non-invasive methods for early detection and management of voice disorders, thus improving patient outcomes. Future research should focus on expanding dataset diversity and further validation to enhance the generalizability and clinical applicability of these findings.
利用机器学习开发一种对无序语音进行二元分类的智能系统
目的语音障碍以发声过程中声带振动问题引起的语音质量中断为特征。该研究探索了基于随机森林(RF)和决策树(DT)模型的机器学习在使用声学特征对正常声部和无序声部进行分类中的应用。比较了RF和DT分类器,并在多语言数据库中评估了单个声学参数的诊断效用,重点是广东话语音样本。方法从Saarbruecken语音数据库、感知语音质量数据库和当地粤语临床数据库中提取持续元音/a/录音。总共有1986个样本被用于训练和测试。使用Parselmouth(一个Python的Praat接口)提取了29个声学特征。RF和DT模型在海外数据上进行了训练,并在本地粤语录音上进行了验证。比较RF和DT模型的分类准确性、敏感性、特异性和f1评分。使用平均杂质降低(MDI)和平均准确性降低(MDA)评估特征重要性。进行受试者工作特征(ROC)分析,以评估性别和数据集来源对每个声学参数的判别能力。结果RF模型优于DT模型,RF模型的准确率为89%,精度为79%,F1评分为77%,而DT模型的准确率为78%,F1评分为61%。RF具有较高的真阳性率和阴性率,较低的假阴性率,更适合临床应用。声学特征分析确定了年龄、CSID、闪烁和抖动测量是分类性能的关键因素。ROC分析显示,CSID和stdevF0Hz是男声的可靠判别器,而CSID、localabsoluteJitter、apq11Shimmer和localdbShimmer在所有数据集上都对女声表现出较强的分类性能。然而,本地和海外数据集之间的阈值差异突出了针对特定人群进行校准的必要性。本研究强调了机器学习,特别是射频算法在通过自动化声学特征分析来提高语音障碍诊断准确性方面的潜力。将这些模型整合到临床实践中,可以为早期发现和管理语音障碍提供更可靠、非侵入性的方法,从而改善患者的预后。未来的研究应侧重于扩大数据集的多样性,并进一步验证,以增强这些发现的普遍性和临床适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
American Journal of Otolaryngology
American Journal of Otolaryngology 医学-耳鼻喉科学
CiteScore
4.40
自引率
4.00%
发文量
378
审稿时长
41 days
期刊介绍: Be fully informed about developments in otology, neurotology, audiology, rhinology, allergy, laryngology, speech science, bronchoesophagology, facial plastic surgery, and head and neck surgery. Featured sections include original contributions, grand rounds, current reviews, case reports and socioeconomics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信