Centrality nearest-neighbor projected-distance regression (C-NPDR) feature selection for correlation-based predictors with application to resting-state fMRI study of major depressive disorder.
Elizabeth Kresock, Bryan Dawkins, Henry Luttbeg, Yijie Jamie Li, Rayus Kuplicki, B A McKinney
{"title":"Centrality nearest-neighbor projected-distance regression (C-NPDR) feature selection for correlation-based predictors with application to resting-state fMRI study of major depressive disorder.","authors":"Elizabeth Kresock, Bryan Dawkins, Henry Luttbeg, Yijie Jamie Li, Rayus Kuplicki, B A McKinney","doi":"10.1371/journal.pone.0319346","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Nearest-neighbor projected-distance regression (NPDR) is a metric-based machine learning feature selection algorithm that uses distances between samples and projected differences between variables to identify variables or features that may interact to affect the prediction of complex outcomes. Typical tabular bioinformatics data consist of separate variables of interest, such as genes or proteins. In contrast, resting-state functional MRI (rs-fMRI) data are composed of time-series for brain regions of interest (ROIs) for each subject, and these within-brain time-series are typically transformed into correlations between pairs of ROIs. These pairs of variables of interest can then be used as inputs for feature selection or other machine learning methods. Straightforward feature selection would return the most significant pairs of ROIs; however, it would also be beneficial to know the importance of individual ROIs.</p><p><strong>Results: </strong>We extend NPDR to compute the importance of individual ROIs from correlation-based features. We introduce correlation-difference and centrality-based versions of NPDR. Centrality-based NPDR can be coupled with any centrality method and can be coupled with importance scores other than NPDR, such as random forest importance scores. We develop a new simulation method using random network theory to generate artificial correlation data predictors with variations in correlations that affect class prediction.</p><p><strong>Conclusions: </strong>We compared feature selection methods based on detection of functional simulated ROIs, and we applied the new centrality NPDR approach to a resting-state fMRI study of major depressive disorder (MDD) participants and healthy controls. We determined that the areas of the brain that have the strongest network effect on MDD include the middle temporal gyrus, the inferior temporal gyrus, and the dorsal entorhinal cortex. The resulting feature selection and simulation approaches can be applied to other domains that use correlation-based features.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 3","pages":"e0319346"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11884682/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0319346","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Nearest-neighbor projected-distance regression (NPDR) is a metric-based machine learning feature selection algorithm that uses distances between samples and projected differences between variables to identify variables or features that may interact to affect the prediction of complex outcomes. Typical tabular bioinformatics data consist of separate variables of interest, such as genes or proteins. In contrast, resting-state functional MRI (rs-fMRI) data are composed of time-series for brain regions of interest (ROIs) for each subject, and these within-brain time-series are typically transformed into correlations between pairs of ROIs. These pairs of variables of interest can then be used as inputs for feature selection or other machine learning methods. Straightforward feature selection would return the most significant pairs of ROIs; however, it would also be beneficial to know the importance of individual ROIs.
Results: We extend NPDR to compute the importance of individual ROIs from correlation-based features. We introduce correlation-difference and centrality-based versions of NPDR. Centrality-based NPDR can be coupled with any centrality method and can be coupled with importance scores other than NPDR, such as random forest importance scores. We develop a new simulation method using random network theory to generate artificial correlation data predictors with variations in correlations that affect class prediction.
Conclusions: We compared feature selection methods based on detection of functional simulated ROIs, and we applied the new centrality NPDR approach to a resting-state fMRI study of major depressive disorder (MDD) participants and healthy controls. We determined that the areas of the brain that have the strongest network effect on MDD include the middle temporal gyrus, the inferior temporal gyrus, and the dorsal entorhinal cortex. The resulting feature selection and simulation approaches can be applied to other domains that use correlation-based features.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage