Correspondence: Accuracy Is Not Enough: Stability‐Aware Feature Selection for Reproducible Biomarker Discovery

IF 12 1区医学 Q1 ALLERGY

Allergy Pub Date : 2025-09-23 DOI:10.1111/all.70075

Yoshiyasu Takefuji

引用次数: 0

Abstract

Random forest (RF) models can achieve high predictive accuracy, yet their model‐specific feature importances may be unstable and misleading. Using an allergy benchmark dataset (10,000 instances, 11 features), we compared five selection strategies—RF, logistic regression, feature agglomeration (FA), highly variable gene selection (HVGS), and Spearman correlation—evaluating cross‐validated accuracy with the top five features and after removing the top two (reselecting the top three). RF attained 0.9999 accuracy with the top five but fell to 0.8836 and showed unstable rankings; logistic regression maintained 0.9116 but was also unstable. FA, HVGS, and Spearman achieved near‐perfect accuracy (0.9999) with the top five and modest declines (0.9076–0.9116) with stable rankings. Results underscore that accuracy does not imply reliable importance; stability‐aware, model‐agnostic, or unsupervised methods better support reproducible biomarker discovery.

查看原文本刊更多论文

通信：准确性是不够的：稳定性意识特征选择可重复的生物标志物发现

随机森林（RF）模型可以达到很高的预测精度，但其特定于模型的特征重要性可能不稳定且具有误导性。使用过敏基准数据集（10,000个实例，11个特征），我们比较了五种选择策略- rf，逻辑回归，特征聚集（FA），高变量基因选择（HVGS）和Spearman相关性-评估交叉验证的准确性与前五个特征，并在删除前两个（重新选择前三个）。前5名的RF准确率达到0.9999，但下降到0.8836，排名不稳定；Logistic回归维持0.9116，但也不稳定。FA、HVGS和Spearman在前五名中获得了接近完美的精度（0.9999），在排名稳定的情况下略有下降（0.9076-0.9116）。结果强调，准确性并不意味着可靠的重要性；稳定性感知、模型不可知或无监督的方法更好地支持可重复的生物标志物发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Allergy 医学-过敏

CiteScore

26.10

自引率

9.70%

发文量

393

审稿时长

2 months

期刊介绍： Allergy is an international and multidisciplinary journal that aims to advance, impact, and communicate all aspects of the discipline of Allergy/Immunology. It publishes original articles, reviews, position papers, guidelines, editorials, news and commentaries, letters to the editors, and correspondences. The journal accepts articles based on their scientific merit and quality. Allergy seeks to maintain contact between basic and clinical Allergy/Immunology and encourages contributions from contributors and readers from all countries. In addition to its publication, Allergy also provides abstracting and indexing information. Some of the databases that include Allergy abstracts are Abstracts on Hygiene & Communicable Disease, Academic Search Alumni Edition, AgBiotech News & Information, AGRICOLA Database, Biological Abstracts, PubMed Dietary Supplement Subset, and Global Health, among others.