Myranda Uselton Shirk, Catherine Dang, Jaewoo Cho, Hanlin Chen, Lily Hofstetter, Jack Bijur, Claiborne Lucas, Andrew James, Ricardo-Torres Guzman, Andrea Hiller, Noah Alter, Amy Stone, Maria Powell, Matthew E Pontell
{"title":"利用大型语言模型自动检测腭裂患者的腭咽功能障碍。","authors":"Myranda Uselton Shirk, Catherine Dang, Jaewoo Cho, Hanlin Chen, Lily Hofstetter, Jack Bijur, Claiborne Lucas, Andrew James, Ricardo-Torres Guzman, Andrea Hiller, Noah Alter, Amy Stone, Maria Powell, Matthew E Pontell","doi":"10.3389/fdgth.2025.1552746","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Hypernasality, a hallmark of velopharyngeal insufficiency (VPI), is a speech disorder with significant psychosocial and functional implications. Conventional diagnostic methods rely heavily on specialized expertise and equipment, posing challenges in resource-limited settings. This study explores the application of OpenAI's Whisper model for automated hypernasality detection, offering a scalable and efficient alternative to traditional approaches.</p><p><strong>Methods: </strong>The Whisper model was adapted for binary classification by replacing its sequence-to-sequence decoder with a custom classification head. A dataset of 184 audio recordings, including 96 hypernasal (cases) and 88 non-hypernasal samples (controls), was used for training and evaluation. The Whisper model's performance was compared to traditional machine learning approaches, including support vector machines (SVM) and random forest (RF) classifiers.</p><p><strong>Results: </strong>The Whisper-based model effectively detected hypernasality in speech, achieving a test accuracy of 97% and an F1-score of 0.97. It significantly outperformed SVM and RF classifiers, which achieved accuracies of 88.1% and 85.7%, respectively. Whisper demonstrated robust performance across diverse recording conditions and required minimal training data, showcasing its scalability and efficiency for hypernasality detection.</p><p><strong>Conclusion: </strong>This study demonstrates the effectiveness of the Whisper-based model for hypernasality detection. By providing a reliable pretest probability, the Whisper model can serve as a triaging mechanism to prioritize patients for further evaluation, reducing diagnostic delays and optimizing resource allocation.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1552746"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986712/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging large language models for automated detection of velopharyngeal dysfunction in patients with cleft palate.\",\"authors\":\"Myranda Uselton Shirk, Catherine Dang, Jaewoo Cho, Hanlin Chen, Lily Hofstetter, Jack Bijur, Claiborne Lucas, Andrew James, Ricardo-Torres Guzman, Andrea Hiller, Noah Alter, Amy Stone, Maria Powell, Matthew E Pontell\",\"doi\":\"10.3389/fdgth.2025.1552746\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Hypernasality, a hallmark of velopharyngeal insufficiency (VPI), is a speech disorder with significant psychosocial and functional implications. Conventional diagnostic methods rely heavily on specialized expertise and equipment, posing challenges in resource-limited settings. This study explores the application of OpenAI's Whisper model for automated hypernasality detection, offering a scalable and efficient alternative to traditional approaches.</p><p><strong>Methods: </strong>The Whisper model was adapted for binary classification by replacing its sequence-to-sequence decoder with a custom classification head. A dataset of 184 audio recordings, including 96 hypernasal (cases) and 88 non-hypernasal samples (controls), was used for training and evaluation. The Whisper model's performance was compared to traditional machine learning approaches, including support vector machines (SVM) and random forest (RF) classifiers.</p><p><strong>Results: </strong>The Whisper-based model effectively detected hypernasality in speech, achieving a test accuracy of 97% and an F1-score of 0.97. It significantly outperformed SVM and RF classifiers, which achieved accuracies of 88.1% and 85.7%, respectively. Whisper demonstrated robust performance across diverse recording conditions and required minimal training data, showcasing its scalability and efficiency for hypernasality detection.</p><p><strong>Conclusion: </strong>This study demonstrates the effectiveness of the Whisper-based model for hypernasality detection. By providing a reliable pretest probability, the Whisper model can serve as a triaging mechanism to prioritize patients for further evaluation, reducing diagnostic delays and optimizing resource allocation.</p>\",\"PeriodicalId\":73078,\"journal\":{\"name\":\"Frontiers in digital health\",\"volume\":\"7 \",\"pages\":\"1552746\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986712/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fdgth.2025.1552746\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1552746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Leveraging large language models for automated detection of velopharyngeal dysfunction in patients with cleft palate.
Background: Hypernasality, a hallmark of velopharyngeal insufficiency (VPI), is a speech disorder with significant psychosocial and functional implications. Conventional diagnostic methods rely heavily on specialized expertise and equipment, posing challenges in resource-limited settings. This study explores the application of OpenAI's Whisper model for automated hypernasality detection, offering a scalable and efficient alternative to traditional approaches.
Methods: The Whisper model was adapted for binary classification by replacing its sequence-to-sequence decoder with a custom classification head. A dataset of 184 audio recordings, including 96 hypernasal (cases) and 88 non-hypernasal samples (controls), was used for training and evaluation. The Whisper model's performance was compared to traditional machine learning approaches, including support vector machines (SVM) and random forest (RF) classifiers.
Results: The Whisper-based model effectively detected hypernasality in speech, achieving a test accuracy of 97% and an F1-score of 0.97. It significantly outperformed SVM and RF classifiers, which achieved accuracies of 88.1% and 85.7%, respectively. Whisper demonstrated robust performance across diverse recording conditions and required minimal training data, showcasing its scalability and efficiency for hypernasality detection.
Conclusion: This study demonstrates the effectiveness of the Whisper-based model for hypernasality detection. By providing a reliable pretest probability, the Whisper model can serve as a triaging mechanism to prioritize patients for further evaluation, reducing diagnostic delays and optimizing resource allocation.