{"title":"克服多位点功能磁共振成像研究中的位点变异:一个增强机器学习模型可泛化性的自编码器框架。","authors":"Fahad Almuqhim, Fahad Saeed","doi":"10.1007/s12021-025-09746-1","DOIUrl":null,"url":null,"abstract":"<p><p>Harmonizing multisite functional magnetic resonance imaging (fMRI) data is crucial for eliminating site-specific variability that hinders the generalizability of machine learning models. Traditional harmonization techniques, such as ComBat, depend on additive and multiplicative factors, and may struggle to capture the non-linear interactions between scanner hardware, acquisition protocols, and signal variations between different imaging sites. In addition, these statistical techniques require data from all the sites during their model training which may have the unintended consequence of data leakage for ML models trained using this harmonized data. The ML models trained using this harmonized data may result in low reliability and reproducibility when tested on unseen data sets, limiting their applicability for general clinical usage. In this study, we propose Autoencoders (AEs) as an alternative for harmonizing multisite fMRI data. Our designed and developed framework leverages the non-linear representation learning capabilities of AEs to reduce site-specific effects while preserving biologically meaningful features. Our evaluation using Autism Brain Imaging Data Exchange I (ABIDE-I) dataset, containing 1,035 subjects collected from 17 centers demonstrates statistically significant improvements in leave-one-site-out (LOSO) cross-validation evaluations. All AE variants (AE, SAE, TAE, and DAE) significantly outperformed the baseline mode (p < 0.01), with mean accuracy improvements ranging from 3.41% to 5.04%. Our findings demonstrate the potential of AEs to harmonize multisite neuroimaging data effectively enabling robust downstream analyses across various neuroscience applications while reducing data-leakage, and preservation of neurobiological features. Our open-source code is made available at https://github.com/pcdslab/Autoencoder-fMRI-Harmonization .</p>","PeriodicalId":49761,"journal":{"name":"Neuroinformatics","volume":"23 3","pages":"46"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Overcoming Site Variability in Multisite fMRI Studies: an Autoencoder Framework for Enhanced Generalizability of Machine Learning Models.\",\"authors\":\"Fahad Almuqhim, Fahad Saeed\",\"doi\":\"10.1007/s12021-025-09746-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Harmonizing multisite functional magnetic resonance imaging (fMRI) data is crucial for eliminating site-specific variability that hinders the generalizability of machine learning models. Traditional harmonization techniques, such as ComBat, depend on additive and multiplicative factors, and may struggle to capture the non-linear interactions between scanner hardware, acquisition protocols, and signal variations between different imaging sites. In addition, these statistical techniques require data from all the sites during their model training which may have the unintended consequence of data leakage for ML models trained using this harmonized data. The ML models trained using this harmonized data may result in low reliability and reproducibility when tested on unseen data sets, limiting their applicability for general clinical usage. In this study, we propose Autoencoders (AEs) as an alternative for harmonizing multisite fMRI data. Our designed and developed framework leverages the non-linear representation learning capabilities of AEs to reduce site-specific effects while preserving biologically meaningful features. Our evaluation using Autism Brain Imaging Data Exchange I (ABIDE-I) dataset, containing 1,035 subjects collected from 17 centers demonstrates statistically significant improvements in leave-one-site-out (LOSO) cross-validation evaluations. All AE variants (AE, SAE, TAE, and DAE) significantly outperformed the baseline mode (p < 0.01), with mean accuracy improvements ranging from 3.41% to 5.04%. Our findings demonstrate the potential of AEs to harmonize multisite neuroimaging data effectively enabling robust downstream analyses across various neuroscience applications while reducing data-leakage, and preservation of neurobiological features. Our open-source code is made available at https://github.com/pcdslab/Autoencoder-fMRI-Harmonization .</p>\",\"PeriodicalId\":49761,\"journal\":{\"name\":\"Neuroinformatics\",\"volume\":\"23 3\",\"pages\":\"46\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neuroinformatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s12021-025-09746-1\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuroinformatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12021-025-09746-1","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Overcoming Site Variability in Multisite fMRI Studies: an Autoencoder Framework for Enhanced Generalizability of Machine Learning Models.
Harmonizing multisite functional magnetic resonance imaging (fMRI) data is crucial for eliminating site-specific variability that hinders the generalizability of machine learning models. Traditional harmonization techniques, such as ComBat, depend on additive and multiplicative factors, and may struggle to capture the non-linear interactions between scanner hardware, acquisition protocols, and signal variations between different imaging sites. In addition, these statistical techniques require data from all the sites during their model training which may have the unintended consequence of data leakage for ML models trained using this harmonized data. The ML models trained using this harmonized data may result in low reliability and reproducibility when tested on unseen data sets, limiting their applicability for general clinical usage. In this study, we propose Autoencoders (AEs) as an alternative for harmonizing multisite fMRI data. Our designed and developed framework leverages the non-linear representation learning capabilities of AEs to reduce site-specific effects while preserving biologically meaningful features. Our evaluation using Autism Brain Imaging Data Exchange I (ABIDE-I) dataset, containing 1,035 subjects collected from 17 centers demonstrates statistically significant improvements in leave-one-site-out (LOSO) cross-validation evaluations. All AE variants (AE, SAE, TAE, and DAE) significantly outperformed the baseline mode (p < 0.01), with mean accuracy improvements ranging from 3.41% to 5.04%. Our findings demonstrate the potential of AEs to harmonize multisite neuroimaging data effectively enabling robust downstream analyses across various neuroscience applications while reducing data-leakage, and preservation of neurobiological features. Our open-source code is made available at https://github.com/pcdslab/Autoencoder-fMRI-Harmonization .
期刊介绍:
Neuroinformatics publishes original articles and reviews with an emphasis on data structure and software tools related to analysis, modeling, integration, and sharing in all areas of neuroscience research. The editors particularly invite contributions on: (1) Theory and methodology, including discussions on ontologies, modeling approaches, database design, and meta-analyses; (2) Descriptions of developed databases and software tools, and of the methods for their distribution; (3) Relevant experimental results, such as reports accompanie by the release of massive data sets; (4) Computational simulations of models integrating and organizing complex data; and (5) Neuroengineering approaches, including hardware, robotics, and information theory studies.