Najla D Al Futaisi, Björn W Schuller, Fabien Ringeval, Maja Pantic
{"title":"The Noor Project: fair transformer transfer learning for autism spectrum disorder recognition from speech.","authors":"Najla D Al Futaisi, Björn W Schuller, Fabien Ringeval, Maja Pantic","doi":"10.3389/fdgth.2025.1274675","DOIUrl":null,"url":null,"abstract":"<p><p>Early detection is crucial for managing incurable disorders, particularly autism spectrum disorder (ASD). Unfortunately, a considerable number of individuals with ASD receive a late diagnosis or remain undiagnosed. Speech holds a critical role in ASD, as a significant number of affected individuals experience speech impairments or remain non-verbal. To address this, we use speech analysis for automatic ASD recognition in children by classifying their speech as either autistic or typically developing. However, due to the lack of large labelled datasets, we leverage two smaller datasets to explore deep transfer learning methods. We investigate two fine-tuning approaches: (1) Discriminative Fine-Tuning (D-FT), which is pre-trained on a related dataset before being tuned on a similar task, and (2) Wav2Vec 2.0 Fine-Tuning (W2V2-FT), which leverages self-supervised speech representations pre-trained on a larger, unrelated dataset. We perform two distinct classification tasks: (a) a binary task to determine typicality, classifying speech as either that of a typically developing (TD) child or an atypically developing (AD) child; and (b) a four-class diagnosis task, which further classifies atypical cases into ASD, dysphasia (DYS), or pervasive developmental disorder-not otherwise specified (NOS), alongside TD. This research aims to improve early recognition strategies, particularly for individuals with ASD. The findings suggest that transfer learning methods can be a valuable tool for autism recognition from speech. For the typicality classification task (TD vs. AD), the D-FT model achieved the highest test UAR (94.8%), outperforming W2V2-FT (91.5%). In the diagnosis task (TD, ASD, DYS, NOS), D-FT also demonstrated superior performance (60.9% UAR) compared to W2V2-FT (54.3%). These results highlight the potential of transfer learning for speech-based ASD recognition and underscore the challenges of multi-class classification with limited labeled data.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1274675"},"PeriodicalIF":3.2000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12399525/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1274675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Early detection is crucial for managing incurable disorders, particularly autism spectrum disorder (ASD). Unfortunately, a considerable number of individuals with ASD receive a late diagnosis or remain undiagnosed. Speech holds a critical role in ASD, as a significant number of affected individuals experience speech impairments or remain non-verbal. To address this, we use speech analysis for automatic ASD recognition in children by classifying their speech as either autistic or typically developing. However, due to the lack of large labelled datasets, we leverage two smaller datasets to explore deep transfer learning methods. We investigate two fine-tuning approaches: (1) Discriminative Fine-Tuning (D-FT), which is pre-trained on a related dataset before being tuned on a similar task, and (2) Wav2Vec 2.0 Fine-Tuning (W2V2-FT), which leverages self-supervised speech representations pre-trained on a larger, unrelated dataset. We perform two distinct classification tasks: (a) a binary task to determine typicality, classifying speech as either that of a typically developing (TD) child or an atypically developing (AD) child; and (b) a four-class diagnosis task, which further classifies atypical cases into ASD, dysphasia (DYS), or pervasive developmental disorder-not otherwise specified (NOS), alongside TD. This research aims to improve early recognition strategies, particularly for individuals with ASD. The findings suggest that transfer learning methods can be a valuable tool for autism recognition from speech. For the typicality classification task (TD vs. AD), the D-FT model achieved the highest test UAR (94.8%), outperforming W2V2-FT (91.5%). In the diagnosis task (TD, ASD, DYS, NOS), D-FT also demonstrated superior performance (60.9% UAR) compared to W2V2-FT (54.3%). These results highlight the potential of transfer learning for speech-based ASD recognition and underscore the challenges of multi-class classification with limited labeled data.
早期发现对于治疗无法治愈的疾病,特别是自闭症谱系障碍(ASD)至关重要。不幸的是,相当多的ASD患者得到较晚的诊断或仍未被诊断。语言在自闭症中起着至关重要的作用,因为有相当多的自闭症患者会出现语言障碍或无法言语。为了解决这个问题,我们使用语音分析来自动识别儿童的ASD,将他们的语言分为自闭症或正常发展。然而,由于缺乏大型标记数据集,我们利用两个较小的数据集来探索深度迁移学习方法。我们研究了两种微调方法:(1)判别微调(D-FT),它在对类似任务进行调优之前在相关数据集上进行预训练,以及(2)Wav2Vec 2.0微调(W2V2-FT),它利用在更大的不相关数据集上预训练的自监督语音表示。我们执行了两个不同的分类任务:(a)一个二元任务来确定典型,将语音分为典型发展(TD)儿童或非典型发展(AD)儿童;(b)四类诊断任务,进一步将非典型病例分类为ASD,语言障碍(DYS),或广泛性发育障碍(NOS),以及TD。这项研究旨在改善早期识别策略,特别是对于自闭症患者。研究结果表明,迁移学习方法可以成为从语言中识别自闭症的一个有价值的工具。对于典型分类任务(TD vs. AD), D-FT模型获得了最高的测试UAR(94.8%),优于W2V2-FT(91.5%)。在诊断任务(TD、ASD、DYS、NOS)中,D-FT的UAR值为60.9%,优于W2V2-FT(54.3%)。这些结果突出了迁移学习在基于语音的ASD识别中的潜力,并强调了在有限标记数据下进行多类分类的挑战。