Reproducible feature selection in heterogeneous multicenter datasets via sign-consistency criteria.

IF 1.6 3区 医学 Q3 HEALTH CARE SCIENCES & SERVICES
Xun Zhao, Yalu Ping
{"title":"Reproducible feature selection in heterogeneous multicenter datasets via sign-consistency criteria.","authors":"Xun Zhao, Yalu Ping","doi":"10.1177/09622802251338375","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of risk features associated with disease plays a crucial role in biomedical fields. These features are often used to provide evidence for clinical decision-making. However, in the presence of between-center heterogeneity, covariate effects across data centers may exhibit inconsistent directions, making feature selection challenging. In this work, we propose a novel framework to select reproducible risk features whose underlying effects are consistent across different centers. We quantify the feature reproducibility based on the sign-consistency criterion, which provides an acceptable level of heterogeneity in effect sizes and ensures the reasonable similarity of reproducible signals. Compared with the existing feature selection methods, our proposed method effectively protects data privacy and does not rely on the assumption of data homogeneity. Extensive simulations demonstrated that the proposed method has greater power than existing methods do. We apply the proposed approach to analyze data from the China Health and Retirement Study Longitudinal Study (CHARLS) and identify nine important risk factors that show reproducible associations with depression.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251338375"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Methods in Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/09622802251338375","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

The identification of risk features associated with disease plays a crucial role in biomedical fields. These features are often used to provide evidence for clinical decision-making. However, in the presence of between-center heterogeneity, covariate effects across data centers may exhibit inconsistent directions, making feature selection challenging. In this work, we propose a novel framework to select reproducible risk features whose underlying effects are consistent across different centers. We quantify the feature reproducibility based on the sign-consistency criterion, which provides an acceptable level of heterogeneity in effect sizes and ensures the reasonable similarity of reproducible signals. Compared with the existing feature selection methods, our proposed method effectively protects data privacy and does not rely on the assumption of data homogeneity. Extensive simulations demonstrated that the proposed method has greater power than existing methods do. We apply the proposed approach to analyze data from the China Health and Retirement Study Longitudinal Study (CHARLS) and identify nine important risk factors that show reproducible associations with depression.

基于符号一致性标准的异构多中心数据集可重复特征选择。
识别与疾病相关的风险特征在生物医学领域起着至关重要的作用。这些特征通常为临床决策提供依据。然而,在中心间异质性的存在下,跨数据中心的协变量效应可能表现出不一致的方向,使得特征选择具有挑战性。在这项工作中,我们提出了一个新的框架来选择可重复的风险特征,其潜在影响在不同的中心是一致的。我们基于符号一致性标准量化特征再现性,该标准提供了可接受的效应大小异质性水平,并确保可再现信号的合理相似性。与现有的特征选择方法相比,本文提出的方法有效地保护了数据隐私,并且不依赖于数据同质性假设。大量的仿真结果表明,所提出的方法比现有方法具有更大的功率。我们应用该方法分析了中国健康与退休研究纵向研究(CHARLS)的数据,并确定了9个与抑郁症有可重复关联的重要危险因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistical Methods in Medical Research
Statistical Methods in Medical Research 医学-数学与计算生物学
CiteScore
4.10
自引率
4.30%
发文量
127
审稿时长
>12 weeks
期刊介绍: Statistical Methods in Medical Research is a peer reviewed scholarly journal and is the leading vehicle for articles in all the main areas of medical statistics and an essential reference for all medical statisticians. This unique journal is devoted solely to statistics and medicine and aims to keep professionals abreast of the many powerful statistical techniques now available to the medical profession. This journal is a member of the Committee on Publication Ethics (COPE)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信