{"title":"A novel fusion architecture for detecting Parkinson’s Disease using semi-supervised speech embeddings","authors":"Tariq Adnan, Abdelrahman Abdelkader, Zipei Liu, Ekram Hossain, Sooyong Park, Md Saiful Islam, Ehsan Hoque","doi":"10.1038/s41531-025-00956-7","DOIUrl":null,"url":null,"abstract":"We introduce a framework for screening Parkinson’s disease (PD) using English pangram utterances. Our dataset includes 1306 participants (392 with PD) from both home and clinical settings, covering diverse demographics (53.2% female). We used deep learning embeddings from Wav2Vec 2.0, WavLM, and ImageBind to capture speech dynamics indicative of PD. Our novel fusion model for PD classification aligns different speech embeddings into a cohesive feature space, outperforming baseline alternatives. In a stratified randomized split, the model achieved an AUROC of 88.9% and an accuracy of 85.7%. Statistical bias analysis showed equitable performance across sex, ethnicity, and age subgroups, with robustness across various disease durations and PD stages. Detailed error analysis revealed higher misclassification rates in specific age ranges for males and females, aligning with clinical insights. External testing yielded AUROCs of 82.1% and 78.4% on two clinical datasets, and an AUROC of 77.4% on an unseen general spontaneous English speech dataset, demonstrating versatility in natural speech analysis and potential for global accessibility and health equity.","PeriodicalId":19706,"journal":{"name":"NPJ Parkinson's Disease","volume":"13 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Parkinson's Disease","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41531-025-00956-7","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce a framework for screening Parkinson’s disease (PD) using English pangram utterances. Our dataset includes 1306 participants (392 with PD) from both home and clinical settings, covering diverse demographics (53.2% female). We used deep learning embeddings from Wav2Vec 2.0, WavLM, and ImageBind to capture speech dynamics indicative of PD. Our novel fusion model for PD classification aligns different speech embeddings into a cohesive feature space, outperforming baseline alternatives. In a stratified randomized split, the model achieved an AUROC of 88.9% and an accuracy of 85.7%. Statistical bias analysis showed equitable performance across sex, ethnicity, and age subgroups, with robustness across various disease durations and PD stages. Detailed error analysis revealed higher misclassification rates in specific age ranges for males and females, aligning with clinical insights. External testing yielded AUROCs of 82.1% and 78.4% on two clinical datasets, and an AUROC of 77.4% on an unseen general spontaneous English speech dataset, demonstrating versatility in natural speech analysis and potential for global accessibility and health equity.
期刊介绍:
npj Parkinson's Disease is a comprehensive open access journal that covers a wide range of research areas related to Parkinson's disease. It publishes original studies in basic science, translational research, and clinical investigations. The journal is dedicated to advancing our understanding of Parkinson's disease by exploring various aspects such as anatomy, etiology, genetics, cellular and molecular physiology, neurophysiology, epidemiology, and therapeutic development. By providing free and immediate access to the scientific and Parkinson's disease community, npj Parkinson's Disease promotes collaboration and knowledge sharing among researchers and healthcare professionals.