A Novel Ensemble Machine Learning Approach for Interpretable Modeling, Feature Extraction and Selection With Applications to Medical and Biomedical Signals and Data

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Bo Sun, Hua-Liang Wei
{"title":"A Novel Ensemble Machine Learning Approach for Interpretable Modeling, Feature Extraction and Selection With Applications to Medical and Biomedical Signals and Data","authors":"Bo Sun,&nbsp;Hua-Liang Wei","doi":"10.1002/cpe.70697","DOIUrl":null,"url":null,"abstract":"<p>Feature extraction and selection are crucial in biomedical data analysis to address high dimensionality, reduce computational complexity, and enhance model interpretability. However, traditional methods often focus on individual feature importance, overlooking complex inter-feature relationships, especially when processing and modeling dynamic and time-series data. In this study, we propose a novel framework that integrates Feature Co-occurrence Networks (FCN) with global importance scoring via the PageRank algorithm, which is built on a parametric Nonlinear AutoRegressive with eXogenous inputs (NARX) model structure to better capture temporal dependencies in sequential data. The proposed NARX-FCN-PageRank approach combines the strengths of multiple feature selection strategies while leveraging network analysis to identify stable and representative feature subsets. Extensive evaluations across diverse biomedical datasets, including both static and dynamic scenarios, demonstrate that our method effectively reduces feature dimensionality without compromising predictive performance. Moreover, the network visualizations provide valuable insights into the interdependencies and centrality of selected features, supporting model interpretability and enhancing trustworthiness. The NARX-FCN-PageRank framework thus offers a versatile and interpretable solution for feature selection in biomedical data analysis, with the potential to facilitate more efficient and reliable modeling in clinical and medical research applications.</p>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"38 8","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpe.70697","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70697","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Feature extraction and selection are crucial in biomedical data analysis to address high dimensionality, reduce computational complexity, and enhance model interpretability. However, traditional methods often focus on individual feature importance, overlooking complex inter-feature relationships, especially when processing and modeling dynamic and time-series data. In this study, we propose a novel framework that integrates Feature Co-occurrence Networks (FCN) with global importance scoring via the PageRank algorithm, which is built on a parametric Nonlinear AutoRegressive with eXogenous inputs (NARX) model structure to better capture temporal dependencies in sequential data. The proposed NARX-FCN-PageRank approach combines the strengths of multiple feature selection strategies while leveraging network analysis to identify stable and representative feature subsets. Extensive evaluations across diverse biomedical datasets, including both static and dynamic scenarios, demonstrate that our method effectively reduces feature dimensionality without compromising predictive performance. Moreover, the network visualizations provide valuable insights into the interdependencies and centrality of selected features, supporting model interpretability and enhancing trustworthiness. The NARX-FCN-PageRank framework thus offers a versatile and interpretable solution for feature selection in biomedical data analysis, with the potential to facilitate more efficient and reliable modeling in clinical and medical research applications.

Abstract Image

一种用于可解释建模、特征提取和选择的新型集成机器学习方法及其在医学和生物医学信号和数据中的应用
在生物医学数据分析中,特征提取和选择是解决高维问题、降低计算复杂度和增强模型可解释性的关键。然而,传统方法往往关注单个特征的重要性,忽略了复杂的特征间关系,特别是在处理和建模动态和时间序列数据时。在本研究中,我们提出了一个新的框架,该框架通过PageRank算法将特征共现网络(FCN)与全局重要性评分相结合,该框架建立在参数非线性自回归外生输入(NARX)模型结构上,以更好地捕获时序数据中的时间依赖性。所提出的NARX-FCN-PageRank方法结合了多种特征选择策略的优势,同时利用网络分析来识别稳定且具有代表性的特征子集。对各种生物医学数据集(包括静态和动态场景)的广泛评估表明,我们的方法在不影响预测性能的情况下有效地降低了特征维数。此外,网络可视化为所选特征的相互依赖性和中心性提供了有价值的见解,支持模型可解释性并增强可信度。因此,NARX-FCN-PageRank框架为生物医学数据分析中的特征选择提供了一个通用且可解释的解决方案,具有促进临床和医学研究应用中更有效和可靠的建模的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Concurrency and Computation-Practice & Experience
Concurrency and Computation-Practice & Experience 工程技术-计算机:理论方法
CiteScore
5.00
自引率
10.00%
发文量
664
审稿时长
9.6 months
期刊介绍: Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书