利用大型语言模型自动检测腭裂患者的腭咽功能障碍。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES
Frontiers in digital health Pub Date : 2025-03-28 eCollection Date: 2025-01-01 DOI:10.3389/fdgth.2025.1552746
Myranda Uselton Shirk, Catherine Dang, Jaewoo Cho, Hanlin Chen, Lily Hofstetter, Jack Bijur, Claiborne Lucas, Andrew James, Ricardo-Torres Guzman, Andrea Hiller, Noah Alter, Amy Stone, Maria Powell, Matthew E Pontell
{"title":"利用大型语言模型自动检测腭裂患者的腭咽功能障碍。","authors":"Myranda Uselton Shirk, Catherine Dang, Jaewoo Cho, Hanlin Chen, Lily Hofstetter, Jack Bijur, Claiborne Lucas, Andrew James, Ricardo-Torres Guzman, Andrea Hiller, Noah Alter, Amy Stone, Maria Powell, Matthew E Pontell","doi":"10.3389/fdgth.2025.1552746","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Hypernasality, a hallmark of velopharyngeal insufficiency (VPI), is a speech disorder with significant psychosocial and functional implications. Conventional diagnostic methods rely heavily on specialized expertise and equipment, posing challenges in resource-limited settings. This study explores the application of OpenAI's Whisper model for automated hypernasality detection, offering a scalable and efficient alternative to traditional approaches.</p><p><strong>Methods: </strong>The Whisper model was adapted for binary classification by replacing its sequence-to-sequence decoder with a custom classification head. A dataset of 184 audio recordings, including 96 hypernasal (cases) and 88 non-hypernasal samples (controls), was used for training and evaluation. The Whisper model's performance was compared to traditional machine learning approaches, including support vector machines (SVM) and random forest (RF) classifiers.</p><p><strong>Results: </strong>The Whisper-based model effectively detected hypernasality in speech, achieving a test accuracy of 97% and an F1-score of 0.97. It significantly outperformed SVM and RF classifiers, which achieved accuracies of 88.1% and 85.7%, respectively. Whisper demonstrated robust performance across diverse recording conditions and required minimal training data, showcasing its scalability and efficiency for hypernasality detection.</p><p><strong>Conclusion: </strong>This study demonstrates the effectiveness of the Whisper-based model for hypernasality detection. By providing a reliable pretest probability, the Whisper model can serve as a triaging mechanism to prioritize patients for further evaluation, reducing diagnostic delays and optimizing resource allocation.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1552746"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986712/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging large language models for automated detection of velopharyngeal dysfunction in patients with cleft palate.\",\"authors\":\"Myranda Uselton Shirk, Catherine Dang, Jaewoo Cho, Hanlin Chen, Lily Hofstetter, Jack Bijur, Claiborne Lucas, Andrew James, Ricardo-Torres Guzman, Andrea Hiller, Noah Alter, Amy Stone, Maria Powell, Matthew E Pontell\",\"doi\":\"10.3389/fdgth.2025.1552746\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Hypernasality, a hallmark of velopharyngeal insufficiency (VPI), is a speech disorder with significant psychosocial and functional implications. Conventional diagnostic methods rely heavily on specialized expertise and equipment, posing challenges in resource-limited settings. This study explores the application of OpenAI's Whisper model for automated hypernasality detection, offering a scalable and efficient alternative to traditional approaches.</p><p><strong>Methods: </strong>The Whisper model was adapted for binary classification by replacing its sequence-to-sequence decoder with a custom classification head. A dataset of 184 audio recordings, including 96 hypernasal (cases) and 88 non-hypernasal samples (controls), was used for training and evaluation. The Whisper model's performance was compared to traditional machine learning approaches, including support vector machines (SVM) and random forest (RF) classifiers.</p><p><strong>Results: </strong>The Whisper-based model effectively detected hypernasality in speech, achieving a test accuracy of 97% and an F1-score of 0.97. It significantly outperformed SVM and RF classifiers, which achieved accuracies of 88.1% and 85.7%, respectively. Whisper demonstrated robust performance across diverse recording conditions and required minimal training data, showcasing its scalability and efficiency for hypernasality detection.</p><p><strong>Conclusion: </strong>This study demonstrates the effectiveness of the Whisper-based model for hypernasality detection. By providing a reliable pretest probability, the Whisper model can serve as a triaging mechanism to prioritize patients for further evaluation, reducing diagnostic delays and optimizing resource allocation.</p>\",\"PeriodicalId\":73078,\"journal\":{\"name\":\"Frontiers in digital health\",\"volume\":\"7 \",\"pages\":\"1552746\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986712/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fdgth.2025.1552746\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1552746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:高鼻音是腭咽功能不全(VPI)的标志,是一种具有显著社会心理和功能影响的语言障碍。传统的诊断方法严重依赖于专业知识和设备,在资源有限的环境中提出了挑战。本研究探讨了OpenAI的Whisper模型在自动鼻音检测中的应用,为传统方法提供了一种可扩展且高效的替代方案。方法:通过使用自定义分类头替换其序列到序列解码器,将Whisper模型用于二进制分类。184个录音数据集,包括96个高鼻部(病例)和88个非高鼻部样本(对照),用于训练和评估。将Whisper模型的性能与传统的机器学习方法进行了比较,包括支持向量机(SVM)和随机森林(RF)分类器。结果:基于whisper的模型有效地检测了语音中的鼻音,测试准确率达到97%,f1得分为0.97。它明显优于SVM和RF分类器,准确率分别达到88.1%和85.7%。Whisper在不同的记录条件下表现出了强大的性能,并且需要最少的训练数据,展示了它在高鼻音检测方面的可扩展性和效率。结论:本研究证明了基于whisper的鼻音检测模型的有效性。通过提供可靠的预诊概率,Whisper模型可以作为一种分诊机制,为进一步评估患者提供优先级,减少诊断延误,优化资源分配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Leveraging large language models for automated detection of velopharyngeal dysfunction in patients with cleft palate.

Background: Hypernasality, a hallmark of velopharyngeal insufficiency (VPI), is a speech disorder with significant psychosocial and functional implications. Conventional diagnostic methods rely heavily on specialized expertise and equipment, posing challenges in resource-limited settings. This study explores the application of OpenAI's Whisper model for automated hypernasality detection, offering a scalable and efficient alternative to traditional approaches.

Methods: The Whisper model was adapted for binary classification by replacing its sequence-to-sequence decoder with a custom classification head. A dataset of 184 audio recordings, including 96 hypernasal (cases) and 88 non-hypernasal samples (controls), was used for training and evaluation. The Whisper model's performance was compared to traditional machine learning approaches, including support vector machines (SVM) and random forest (RF) classifiers.

Results: The Whisper-based model effectively detected hypernasality in speech, achieving a test accuracy of 97% and an F1-score of 0.97. It significantly outperformed SVM and RF classifiers, which achieved accuracies of 88.1% and 85.7%, respectively. Whisper demonstrated robust performance across diverse recording conditions and required minimal training data, showcasing its scalability and efficiency for hypernasality detection.

Conclusion: This study demonstrates the effectiveness of the Whisper-based model for hypernasality detection. By providing a reliable pretest probability, the Whisper model can serve as a triaging mechanism to prioritize patients for further evaluation, reducing diagnostic delays and optimizing resource allocation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.20
自引率
0.00%
发文量
0
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信