Confounding Effects on the Performance of Machine Learning Analysis of Static Functional Connectivity Computed from rs-fMRI Multi-site Data.

IF 2.7 4区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Neuroinformatics Pub Date : 2023-10-01 Epub Date: 2023-08-15 DOI:10.1007/s12021-023-09639-1

Oswaldo Artiles, Zeina Al Masry, Fahad Saeed

{"title":"Confounding Effects on the Performance of Machine Learning Analysis of Static Functional Connectivity Computed from rs-fMRI Multi-site Data.","authors":"Oswaldo Artiles, Zeina Al Masry, Fahad Saeed","doi":"10.1007/s12021-023-09639-1","DOIUrl":null,"url":null,"abstract":"<p><p>Resting-state functional magnetic resonance imaging (rs-fMRI) is a non-invasive imaging technique widely used in neuroscience to understand the functional connectivity of the human brain. While rs-fMRI multi-site data can help to understand the inner working of the brain, the data acquisition and processing of this data has many challenges. One of the challenges is the variability of the data associated with different acquisitions sites, and different MRI machines vendors. Other factors such as population heterogeneity among different sites, with variables such as age and gender of the subjects, must also be considered. Given that most of the machine-learning models are developed using these rs-fMRI multi-site data sets, the intrinsic confounding effects can adversely affect the generalizability and reliability of these computational methods, as well as the imposition of upper limits on the classification scores. This work aims to identify the phenotypic and imaging variables producing the confounding effects, as well as to control these effects. Our goal is to maximize the classification scores obtained from the machine learning analysis of the Autism Brain Imaging Data Exchange (ABIDE) rs-fMRI multi-site data. To achieve this goal, we propose novel methods of stratification to produce homogeneous sub-samples of the 17 ABIDE sites, as well as the generation of new features from the static functional connectivity values, using multiple linear regression models, ComBat harmonization models, and normalization methods. The main results obtained with our statistical models and methods are an accuracy of 76.4%, sensitivity of 82.9%, and specificity of 77.0%, which are 8.8%, 20.5%, and 7.5% above the baseline classification scores obtained from the machine learning analysis of the static functional connectivity computed from the ABIDE rs-fMRI multi-site data.</p>","PeriodicalId":49761,"journal":{"name":"Neuroinformatics","volume":" ","pages":"651-668"},"PeriodicalIF":2.7000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11877654/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuroinformatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12021-023-09639-1","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/8/15 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Resting-state functional magnetic resonance imaging (rs-fMRI) is a non-invasive imaging technique widely used in neuroscience to understand the functional connectivity of the human brain. While rs-fMRI multi-site data can help to understand the inner working of the brain, the data acquisition and processing of this data has many challenges. One of the challenges is the variability of the data associated with different acquisitions sites, and different MRI machines vendors. Other factors such as population heterogeneity among different sites, with variables such as age and gender of the subjects, must also be considered. Given that most of the machine-learning models are developed using these rs-fMRI multi-site data sets, the intrinsic confounding effects can adversely affect the generalizability and reliability of these computational methods, as well as the imposition of upper limits on the classification scores. This work aims to identify the phenotypic and imaging variables producing the confounding effects, as well as to control these effects. Our goal is to maximize the classification scores obtained from the machine learning analysis of the Autism Brain Imaging Data Exchange (ABIDE) rs-fMRI multi-site data. To achieve this goal, we propose novel methods of stratification to produce homogeneous sub-samples of the 17 ABIDE sites, as well as the generation of new features from the static functional connectivity values, using multiple linear regression models, ComBat harmonization models, and normalization methods. The main results obtained with our statistical models and methods are an accuracy of 76.4%, sensitivity of 82.9%, and specificity of 77.0%, which are 8.8%, 20.5%, and 7.5% above the baseline classification scores obtained from the machine learning analysis of the static functional connectivity computed from the ABIDE rs-fMRI multi-site data.

Abstract Image

查看原文本刊更多论文

对机器学习性能的混淆影响从rs fMRI多站点数据计算的静态功能连接分析。

静息状态功能性磁共振成像（rs-fMRI）是一种非侵入性成像技术，广泛应用于神经科学，以了解人脑的功能连接。虽然rs功能磁共振成像多部位数据有助于了解大脑的内部工作，但这些数据的数据采集和处理有许多挑战。其中一个挑战是与不同的收购地点和不同的MRI机器供应商相关的数据的可变性。还必须考虑其他因素，如不同地点之间的人口异质性，以及受试者的年龄和性别等变量。鉴于大多数机器学习模型都是使用这些rs-fMRI多站点数据集开发的，固有的混杂效应可能会对这些计算方法的可推广性和可靠性产生不利影响，并对分类分数施加上限。这项工作旨在确定产生混杂效应的表型和成像变量，并控制这些效应。我们的目标是最大化从自闭症脑成像数据交换（ABIDE）的fMRI多站点数据的机器学习分析中获得的分类分数。为了实现这一目标，我们提出了新的分层方法来产生17个ABIDE位点的同质子样本，并使用多元线性回归模型、Compat协调模型和归一化方法从静态功能连接值生成新特征。使用我们的统计模型和方法获得的主要结果是76.4%的准确率、82.9%的灵敏度和77.0%的特异性，比根据ABIDE的fMRI多位点数据计算的静态功能连接的机器学习分析获得的基线分类得分高8.8%、20.5%和7.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neuroinformatics 医学-计算机：跨学科应用

CiteScore

6.00

自引率

6.70%

发文量

审稿时长

3 months

期刊介绍： Neuroinformatics publishes original articles and reviews with an emphasis on data structure and software tools related to analysis, modeling, integration, and sharing in all areas of neuroscience research. The editors particularly invite contributions on: (1) Theory and methodology, including discussions on ontologies, modeling approaches, database design, and meta-analyses; (2) Descriptions of developed databases and software tools, and of the methods for their distribution; (3) Relevant experimental results, such as reports accompanie by the release of massive data sets; (4) Computational simulations of models integrating and organizing complex data; and (5) Neuroengineering approaches, including hardware, robotics, and information theory studies.