{"title":"Reproducible comparison and interpretation of machine learning classifiers to predict autism on the ABIDE multimodal dataset","authors":"Yilan Dong, Dafnis Batalle, Maria Deprez","doi":"10.1101/2024.09.04.24313055","DOIUrl":null,"url":null,"abstract":"Autism is a neurodevelopmental condition affecting ∼1% of the population. Recently, machine learning models have been trained to classify participants with autism using their neuroimaging features, though the performance of these models varies in the literature. Differences in experimental setup hamper the direct comparison of different machine-learning approaches. In this paper, five of the most widely used and best-performing machine learning models in the field were trained to classify participants with autism and typically developing (TD) participants, using functional connectivity matrices, structural volumetric measures and phenotypic information from the Autism Brain Imaging Data Exchange (ABIDE) dataset. Their performance was compared under the same evaluation standard. The models implemented included: graph convolutional networks (GCN), edge-variational graph convolutional networks (EV-GCN), fully connected networks (FCN), auto-encoder followed by a fully connected network (AE-FCN) and support vector machine (SVM). Our results show that all models performed similarly, achieving a classification accuracy around 70%. Our results suggest that different inclusion criteria, data modalities and evaluation pipelines rather than different machine learning models may explain variations in accuracy in published literature. The highest accuracy in our framework was obtained by an ensemble of GCN models trained on combination of functional MRI and structural MRI features, reaching classification accuracy of 72.2% and AUC = 0.78 on the test set. The combined structural and functional modalities exhibited higher predictive ability compared to using single modality features alone. Ensemble methods were found to be helpful to improve the performance of the models. Furthermore, we also investigated the stability of features identified by the different machine learning models using the SmoothGrad interpretation method. The FCN model demonstrated the highest stability selecting relevant features contributing to model decision making. Code available at: https://github.com/YilanDong19/Machine-learning-with-ABIDE.","PeriodicalId":501367,"journal":{"name":"medRxiv - Neurology","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Neurology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.04.24313055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Autism is a neurodevelopmental condition affecting ∼1% of the population. Recently, machine learning models have been trained to classify participants with autism using their neuroimaging features, though the performance of these models varies in the literature. Differences in experimental setup hamper the direct comparison of different machine-learning approaches. In this paper, five of the most widely used and best-performing machine learning models in the field were trained to classify participants with autism and typically developing (TD) participants, using functional connectivity matrices, structural volumetric measures and phenotypic information from the Autism Brain Imaging Data Exchange (ABIDE) dataset. Their performance was compared under the same evaluation standard. The models implemented included: graph convolutional networks (GCN), edge-variational graph convolutional networks (EV-GCN), fully connected networks (FCN), auto-encoder followed by a fully connected network (AE-FCN) and support vector machine (SVM). Our results show that all models performed similarly, achieving a classification accuracy around 70%. Our results suggest that different inclusion criteria, data modalities and evaluation pipelines rather than different machine learning models may explain variations in accuracy in published literature. The highest accuracy in our framework was obtained by an ensemble of GCN models trained on combination of functional MRI and structural MRI features, reaching classification accuracy of 72.2% and AUC = 0.78 on the test set. The combined structural and functional modalities exhibited higher predictive ability compared to using single modality features alone. Ensemble methods were found to be helpful to improve the performance of the models. Furthermore, we also investigated the stability of features identified by the different machine learning models using the SmoothGrad interpretation method. The FCN model demonstrated the highest stability selecting relevant features contributing to model decision making. Code available at: https://github.com/YilanDong19/Machine-learning-with-ABIDE.