Elizabeth Kresock, Bryan Dawkins, Henry Luttbeg, Yijie Jamie Li, Rayus Kuplicki, B A McKinney
{"title":"Centrality nearest-neighbor projected-distance regression (C-NPDR) feature selection for correlation-based predictors with application to resting-state fMRI study of major depressive disorder.","authors":"Elizabeth Kresock, Bryan Dawkins, Henry Luttbeg, Yijie Jamie Li, Rayus Kuplicki, B A McKinney","doi":"10.1371/journal.pone.0319346","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Nearest-neighbor projected-distance regression (NPDR) is a metric-based machine learning feature selection algorithm that uses distances between samples and projected differences between variables to identify variables or features that may interact to affect the prediction of complex outcomes. Typical tabular bioinformatics data consist of separate variables of interest, such as genes or proteins. In contrast, resting-state functional MRI (rs-fMRI) data are composed of time-series for brain regions of interest (ROIs) for each subject, and these within-brain time-series are typically transformed into correlations between pairs of ROIs. These pairs of variables of interest can then be used as inputs for feature selection or other machine learning methods. Straightforward feature selection would return the most significant pairs of ROIs; however, it would also be beneficial to know the importance of individual ROIs.</p><p><strong>Results: </strong>We extend NPDR to compute the importance of individual ROIs from correlation-based features. We introduce correlation-difference and centrality-based versions of NPDR. Centrality-based NPDR can be coupled with any centrality method and can be coupled with importance scores other than NPDR, such as random forest importance scores. We develop a new simulation method using random network theory to generate artificial correlation data predictors with variations in correlations that affect class prediction.</p><p><strong>Conclusions: </strong>We compared feature selection methods based on detection of functional simulated ROIs, and we applied the new centrality NPDR approach to a resting-state fMRI study of major depressive disorder (MDD) participants and healthy controls. We determined that the areas of the brain that have the strongest network effect on MDD include the middle temporal gyrus, the inferior temporal gyrus, and the dorsal entorhinal cortex. The resulting feature selection and simulation approaches can be applied to other domains that use correlation-based features.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 3","pages":"e0319346"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11884682/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0319346","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
背景:最近邻投影距离回归(NPDR)是一种基于度量的机器学习特征选择算法,它利用样本之间的距离和变量之间的投影差异来识别可能相互作用影响复杂结果预测的变量或特征。典型的表格生物信息学数据由独立的相关变量组成,如基因或蛋白质。与此相反,静息态功能磁共振成像(rs-fMRI)数据由每个受试者大脑感兴趣区域(ROIs)的时间序列组成,这些脑内时间序列通常被转换成成对 ROIs 之间的相关性。然后,这些相关变量对可用作特征选择或其他机器学习方法的输入。直接的特征选择将返回最重要的 ROI 对;然而,了解单个 ROI 的重要性也是有益的:我们对 NPDR 进行了扩展,以便从基于相关性的特征中计算单个 ROI 的重要性。我们引入了相关差分和基于中心性的 NPDR 版本。基于中心性的 NPDR 可以与任何中心性方法相结合,也可以与 NPDR 以外的重要性评分相结合,如随机森林重要性评分。我们开发了一种新的模拟方法,利用随机网络理论生成人工相关性数据预测器,其相关性变化会影响类别预测:我们比较了基于功能模拟 ROI 检测的特征选择方法,并将新的中心性 NPDR 方法应用于重度抑郁障碍(MDD)患者和健康对照组的静息态 fMRI 研究。我们确定,对 MDD 具有最强网络效应的大脑区域包括颞中回、颞下回和背侧内侧皮层。由此产生的特征选择和模拟方法可应用于使用基于相关性特征的其他领域。
Centrality nearest-neighbor projected-distance regression (C-NPDR) feature selection for correlation-based predictors with application to resting-state fMRI study of major depressive disorder.
Background: Nearest-neighbor projected-distance regression (NPDR) is a metric-based machine learning feature selection algorithm that uses distances between samples and projected differences between variables to identify variables or features that may interact to affect the prediction of complex outcomes. Typical tabular bioinformatics data consist of separate variables of interest, such as genes or proteins. In contrast, resting-state functional MRI (rs-fMRI) data are composed of time-series for brain regions of interest (ROIs) for each subject, and these within-brain time-series are typically transformed into correlations between pairs of ROIs. These pairs of variables of interest can then be used as inputs for feature selection or other machine learning methods. Straightforward feature selection would return the most significant pairs of ROIs; however, it would also be beneficial to know the importance of individual ROIs.
Results: We extend NPDR to compute the importance of individual ROIs from correlation-based features. We introduce correlation-difference and centrality-based versions of NPDR. Centrality-based NPDR can be coupled with any centrality method and can be coupled with importance scores other than NPDR, such as random forest importance scores. We develop a new simulation method using random network theory to generate artificial correlation data predictors with variations in correlations that affect class prediction.
Conclusions: We compared feature selection methods based on detection of functional simulated ROIs, and we applied the new centrality NPDR approach to a resting-state fMRI study of major depressive disorder (MDD) participants and healthy controls. We determined that the areas of the brain that have the strongest network effect on MDD include the middle temporal gyrus, the inferior temporal gyrus, and the dorsal entorhinal cortex. The resulting feature selection and simulation approaches can be applied to other domains that use correlation-based features.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage