{"title":"A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings","authors":"Tariq Adnan, Abdelrahman Abdelkader, Zipei Liu, Ekram Hossain, Sooyong Park, MD Saiful Islam, Ehsan Hoque","doi":"arxiv-2405.17206","DOIUrl":null,"url":null,"abstract":"We present a framework to recognize Parkinson's disease (PD) through an\nEnglish pangram utterance speech collected using a web application from diverse\nrecording settings and environments, including participants' homes. Our dataset\nincludes a global cohort of 1306 participants, including 392 diagnosed with PD.\nLeveraging the diversity of the dataset, spanning various demographic\nproperties (such as age, sex, and ethnicity), we used deep learning embeddings\nderived from semi-supervised models such as Wav2Vec 2.0, WavLM, and ImageBind\nrepresenting the speech dynamics associated with PD. Our novel fusion model for\nPD classification, which aligns different speech embeddings into a cohesive\nfeature space, demonstrated superior performance over standard\nconcatenation-based fusion models and other baselines (including models built\non traditional acoustic features). In a randomized data split configuration,\nthe model achieved an Area Under the Receiver Operating Characteristic Curve\n(AUROC) of 88.94% and an accuracy of 85.65%. Rigorous statistical analysis\nconfirmed that our model performs equitably across various demographic\nsubgroups in terms of sex, ethnicity, and age, and remains robust regardless of\ndisease duration. Furthermore, our model, when tested on two entirely unseen\ntest datasets collected from clinical settings and from a PD care center,\nmaintained AUROC scores of 82.12% and 78.44%, respectively. This affirms the\nmodel's robustness and it's potential to enhance accessibility and health\nequity in real-world applications.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.17206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present a framework to recognize Parkinson's disease (PD) through an
English pangram utterance speech collected using a web application from diverse
recording settings and environments, including participants' homes. Our dataset
includes a global cohort of 1306 participants, including 392 diagnosed with PD.
Leveraging the diversity of the dataset, spanning various demographic
properties (such as age, sex, and ethnicity), we used deep learning embeddings
derived from semi-supervised models such as Wav2Vec 2.0, WavLM, and ImageBind
representing the speech dynamics associated with PD. Our novel fusion model for
PD classification, which aligns different speech embeddings into a cohesive
feature space, demonstrated superior performance over standard
concatenation-based fusion models and other baselines (including models built
on traditional acoustic features). In a randomized data split configuration,
the model achieved an Area Under the Receiver Operating Characteristic Curve
(AUROC) of 88.94% and an accuracy of 85.65%. Rigorous statistical analysis
confirmed that our model performs equitably across various demographic
subgroups in terms of sex, ethnicity, and age, and remains robust regardless of
disease duration. Furthermore, our model, when tested on two entirely unseen
test datasets collected from clinical settings and from a PD care center,
maintained AUROC scores of 82.12% and 78.44%, respectively. This affirms the
model's robustness and it's potential to enhance accessibility and health
equity in real-world applications.