通信:准确性是不够的:稳定性意识特征选择可重复的生物标志物发现

IF 12 1区 医学 Q1 ALLERGY
Allergy Pub Date : 2025-09-23 DOI:10.1111/all.70075
Yoshiyasu Takefuji
{"title":"通信:准确性是不够的:稳定性意识特征选择可重复的生物标志物发现","authors":"Yoshiyasu Takefuji","doi":"10.1111/all.70075","DOIUrl":null,"url":null,"abstract":"Random forest (RF) models can achieve high predictive accuracy, yet their model‐specific feature importances may be unstable and misleading. Using an allergy benchmark dataset (10,000 instances, 11 features), we compared five selection strategies—RF, logistic regression, feature agglomeration (FA), highly variable gene selection (HVGS), and Spearman correlation—evaluating cross‐validated accuracy with the top five features and after removing the top two (reselecting the top three). RF attained 0.9999 accuracy with the top five but fell to 0.8836 and showed unstable rankings; logistic regression maintained 0.9116 but was also unstable. FA, HVGS, and Spearman achieved near‐perfect accuracy (0.9999) with the top five and modest declines (0.9076–0.9116) with stable rankings. Results underscore that accuracy does not imply reliable importance; stability‐aware, model‐agnostic, or unsupervised methods better support reproducible biomarker discovery.","PeriodicalId":122,"journal":{"name":"Allergy","volume":"29 1","pages":""},"PeriodicalIF":12.0000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Correspondence: Accuracy Is Not Enough: Stability‐Aware Feature Selection for Reproducible Biomarker Discovery\",\"authors\":\"Yoshiyasu Takefuji\",\"doi\":\"10.1111/all.70075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Random forest (RF) models can achieve high predictive accuracy, yet their model‐specific feature importances may be unstable and misleading. Using an allergy benchmark dataset (10,000 instances, 11 features), we compared five selection strategies—RF, logistic regression, feature agglomeration (FA), highly variable gene selection (HVGS), and Spearman correlation—evaluating cross‐validated accuracy with the top five features and after removing the top two (reselecting the top three). RF attained 0.9999 accuracy with the top five but fell to 0.8836 and showed unstable rankings; logistic regression maintained 0.9116 but was also unstable. FA, HVGS, and Spearman achieved near‐perfect accuracy (0.9999) with the top five and modest declines (0.9076–0.9116) with stable rankings. Results underscore that accuracy does not imply reliable importance; stability‐aware, model‐agnostic, or unsupervised methods better support reproducible biomarker discovery.\",\"PeriodicalId\":122,\"journal\":{\"name\":\"Allergy\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":12.0000,\"publicationDate\":\"2025-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Allergy\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/all.70075\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ALLERGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Allergy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/all.70075","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ALLERGY","Score":null,"Total":0}
引用次数: 0

摘要

随机森林(RF)模型可以达到很高的预测精度,但其特定于模型的特征重要性可能不稳定且具有误导性。使用过敏基准数据集(10,000个实例,11个特征),我们比较了五种选择策略- rf,逻辑回归,特征聚集(FA),高变量基因选择(HVGS)和Spearman相关性-评估交叉验证的准确性与前五个特征,并在删除前两个(重新选择前三个)。前5名的RF准确率达到0.9999,但下降到0.8836,排名不稳定;Logistic回归维持0.9116,但也不稳定。FA、HVGS和Spearman在前五名中获得了接近完美的精度(0.9999),在排名稳定的情况下略有下降(0.9076-0.9116)。结果强调,准确性并不意味着可靠的重要性;稳定性感知、模型不可知或无监督的方法更好地支持可重复的生物标志物发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Correspondence: Accuracy Is Not Enough: Stability‐Aware Feature Selection for Reproducible Biomarker Discovery
Random forest (RF) models can achieve high predictive accuracy, yet their model‐specific feature importances may be unstable and misleading. Using an allergy benchmark dataset (10,000 instances, 11 features), we compared five selection strategies—RF, logistic regression, feature agglomeration (FA), highly variable gene selection (HVGS), and Spearman correlation—evaluating cross‐validated accuracy with the top five features and after removing the top two (reselecting the top three). RF attained 0.9999 accuracy with the top five but fell to 0.8836 and showed unstable rankings; logistic regression maintained 0.9116 but was also unstable. FA, HVGS, and Spearman achieved near‐perfect accuracy (0.9999) with the top five and modest declines (0.9076–0.9116) with stable rankings. Results underscore that accuracy does not imply reliable importance; stability‐aware, model‐agnostic, or unsupervised methods better support reproducible biomarker discovery.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Allergy
Allergy 医学-过敏
CiteScore
26.10
自引率
9.70%
发文量
393
审稿时长
2 months
期刊介绍: Allergy is an international and multidisciplinary journal that aims to advance, impact, and communicate all aspects of the discipline of Allergy/Immunology. It publishes original articles, reviews, position papers, guidelines, editorials, news and commentaries, letters to the editors, and correspondences. The journal accepts articles based on their scientific merit and quality. Allergy seeks to maintain contact between basic and clinical Allergy/Immunology and encourages contributions from contributors and readers from all countries. In addition to its publication, Allergy also provides abstracting and indexing information. Some of the databases that include Allergy abstracts are Abstracts on Hygiene & Communicable Disease, Academic Search Alumni Edition, AgBiotech News & Information, AGRICOLA Database, Biological Abstracts, PubMed Dietary Supplement Subset, and Global Health, among others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信