基于机器学习的基于高速视频内窥镜记录声信号的沙哑严重程度估计。

IF 2.5 4区 医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY
Tobias Schraut, Michael Döllinger, Melda Kunduk, Matthias Echternach, Stephan Dürr, Julia Werz, Anne Schützenberger
{"title":"基于机器学习的基于高速视频内窥镜记录声信号的沙哑严重程度估计。","authors":"Tobias Schraut, Michael Döllinger, Melda Kunduk, Matthias Echternach, Stephan Dürr, Julia Werz, Anne Schützenberger","doi":"10.1016/j.jvoice.2024.12.008","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This study investigates the use of sustained phonations recorded during high-speed videoendoscopy (HSV) for machine learning-based assessment of hoarseness severity (H). The performance of this approach is compared with conventional recordings obtained during voice therapy to evaluate key differences and limitations of HSV-derived acoustic recordings.</p><p><strong>Methods: </strong>A database of 617 voice recordings with a duration of 250 ms was gathered during HSV examination (HS). Two databases comprising 809 vowels recorded during voice therapy were used for comparison, examining recording durations of 1 second (VT-1) and 250 ms (VT-2). A total of 490 features were extracted, including perturbation and noise characteristics, spectral and cepstral coefficients, as well as features based on modulation spectrum, nonlinear dynamic analysis, entropy, and empirical mode decomposition. Model development focused on selecting a minimal-optimal feature subset and suitable classification algorithms. Recordings were classified into two groups of hoarseness based on auditory-perceptual ratings by experts, yielding a continuous hoarseness score yˆ. Model performance was evaluated based on classification accuracy, correlation between predicted scores yˆ∈[0,1] and subjective ratings H∈{0,1,2,3}, and correlation between the relative change in quantitative and subjective ratings.</p><p><strong>Results: </strong>Logistic regression combined with five acoustic features achieved a classification accuracy of 0.863 (VT-1), 0.847 (VT-2), and 0.742 (HS) on the test sets. A correlation of 0.797 (VT-1), 0.763 (VT-2), and 0.637 (HS) was obtained between yˆ and H, respectively. For 21 test subjects with two recordings, the model yielded a correlation of 0.592 (VT-1), 0.486 (VT-2), and 0.088 (HS) between ∆yˆ and ∆H.</p><p><strong>Conclusion: </strong>While acoustic signals recorded during HSV show potential for quantitative hoarseness assessment, they are less reliable than voice therapy recordings due to practical challenges associated with oral laryngeal examination. Addressing these limitations, for example, through the use of flexible nasal endoscopy, could improve the quality of HSV-derived acoustic recordings and voice assessments.</p>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning-Based Estimation of Hoarseness Severity Using Acoustic Signals Recorded During High-Speed Videoendoscopy.\",\"authors\":\"Tobias Schraut, Michael Döllinger, Melda Kunduk, Matthias Echternach, Stephan Dürr, Julia Werz, Anne Schützenberger\",\"doi\":\"10.1016/j.jvoice.2024.12.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>This study investigates the use of sustained phonations recorded during high-speed videoendoscopy (HSV) for machine learning-based assessment of hoarseness severity (H). The performance of this approach is compared with conventional recordings obtained during voice therapy to evaluate key differences and limitations of HSV-derived acoustic recordings.</p><p><strong>Methods: </strong>A database of 617 voice recordings with a duration of 250 ms was gathered during HSV examination (HS). Two databases comprising 809 vowels recorded during voice therapy were used for comparison, examining recording durations of 1 second (VT-1) and 250 ms (VT-2). A total of 490 features were extracted, including perturbation and noise characteristics, spectral and cepstral coefficients, as well as features based on modulation spectrum, nonlinear dynamic analysis, entropy, and empirical mode decomposition. Model development focused on selecting a minimal-optimal feature subset and suitable classification algorithms. Recordings were classified into two groups of hoarseness based on auditory-perceptual ratings by experts, yielding a continuous hoarseness score yˆ. Model performance was evaluated based on classification accuracy, correlation between predicted scores yˆ∈[0,1] and subjective ratings H∈{0,1,2,3}, and correlation between the relative change in quantitative and subjective ratings.</p><p><strong>Results: </strong>Logistic regression combined with five acoustic features achieved a classification accuracy of 0.863 (VT-1), 0.847 (VT-2), and 0.742 (HS) on the test sets. A correlation of 0.797 (VT-1), 0.763 (VT-2), and 0.637 (HS) was obtained between yˆ and H, respectively. For 21 test subjects with two recordings, the model yielded a correlation of 0.592 (VT-1), 0.486 (VT-2), and 0.088 (HS) between ∆yˆ and ∆H.</p><p><strong>Conclusion: </strong>While acoustic signals recorded during HSV show potential for quantitative hoarseness assessment, they are less reliable than voice therapy recordings due to practical challenges associated with oral laryngeal examination. Addressing these limitations, for example, through the use of flexible nasal endoscopy, could improve the quality of HSV-derived acoustic recordings and voice assessments.</p>\",\"PeriodicalId\":49954,\"journal\":{\"name\":\"Journal of Voice\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Voice\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jvoice.2024.12.008\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvoice.2024.12.008","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究探讨了在高速视频内窥镜(HSV)中记录的持续发声用于基于机器学习的沙哑严重程度评估(H)的使用。将该方法的性能与语音治疗期间获得的常规录音进行比较,以评估HSV衍生声学记录的主要差异和局限性。方法:收集HSV检查(HS)过程中617段持续时间为250 ms的录音资料。使用两个包含语音治疗期间记录的809个元音的数据库进行比较,检查录音持续时间为1秒(VT-1)和250毫秒(VT-2)。共提取490个特征,包括扰动和噪声特征、谱和倒谱系数,以及基于调制谱、非线性动态分析、熵和经验模态分解的特征。模型开发的重点是选择最小最优特征子集和合适的分类算法。根据专家的听觉感知评分,将录音分为两组沙哑,得出连续的沙哑评分y´。根据分类精度、预测分数y´∈[0,1]与主观评分H∈{0,1,2,3}之间的相关性以及定量评分和主观评分的相对变化之间的相关性来评估模型性能。结果:结合5个声学特征的Logistic回归在测试集上的分类准确率分别为0.863 (VT-1)、0.847 (VT-2)和0.742 (HS)。与H的相关性分别为0.797 (VT-1)、0.763 (VT-2)、0.637 (HS)。对于21名有两次记录的被试,模型得出∆y´和∆H之间的相关性分别为0.592 (VT-1)、0.486 (VT-2)和0.088 (HS)。结论:虽然在HSV中记录的声信号显示了定量评估沙哑的潜力,但由于与口腔喉检查相关的实际挑战,它们不如语音治疗记录可靠。解决这些限制,例如,通过使用柔性鼻内窥镜,可以提高单纯疱疹病毒衍生的声学记录和声音评估的质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine Learning-Based Estimation of Hoarseness Severity Using Acoustic Signals Recorded During High-Speed Videoendoscopy.

Objectives: This study investigates the use of sustained phonations recorded during high-speed videoendoscopy (HSV) for machine learning-based assessment of hoarseness severity (H). The performance of this approach is compared with conventional recordings obtained during voice therapy to evaluate key differences and limitations of HSV-derived acoustic recordings.

Methods: A database of 617 voice recordings with a duration of 250 ms was gathered during HSV examination (HS). Two databases comprising 809 vowels recorded during voice therapy were used for comparison, examining recording durations of 1 second (VT-1) and 250 ms (VT-2). A total of 490 features were extracted, including perturbation and noise characteristics, spectral and cepstral coefficients, as well as features based on modulation spectrum, nonlinear dynamic analysis, entropy, and empirical mode decomposition. Model development focused on selecting a minimal-optimal feature subset and suitable classification algorithms. Recordings were classified into two groups of hoarseness based on auditory-perceptual ratings by experts, yielding a continuous hoarseness score yˆ. Model performance was evaluated based on classification accuracy, correlation between predicted scores yˆ∈[0,1] and subjective ratings H∈{0,1,2,3}, and correlation between the relative change in quantitative and subjective ratings.

Results: Logistic regression combined with five acoustic features achieved a classification accuracy of 0.863 (VT-1), 0.847 (VT-2), and 0.742 (HS) on the test sets. A correlation of 0.797 (VT-1), 0.763 (VT-2), and 0.637 (HS) was obtained between yˆ and H, respectively. For 21 test subjects with two recordings, the model yielded a correlation of 0.592 (VT-1), 0.486 (VT-2), and 0.088 (HS) between ∆yˆ and ∆H.

Conclusion: While acoustic signals recorded during HSV show potential for quantitative hoarseness assessment, they are less reliable than voice therapy recordings due to practical challenges associated with oral laryngeal examination. Addressing these limitations, for example, through the use of flexible nasal endoscopy, could improve the quality of HSV-derived acoustic recordings and voice assessments.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Voice
Journal of Voice 医学-耳鼻喉科学
CiteScore
4.00
自引率
13.60%
发文量
395
审稿时长
59 days
期刊介绍: The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信