LACE-UP: An ensemble machine-learning method for health subtype classification on multidimensional binary data

IF 9.4 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Rebecca Danning, Frank B. Hu, Xihong Lin
{"title":"LACE-UP: An ensemble machine-learning method for health subtype classification on multidimensional binary data","authors":"Rebecca Danning, Frank B. Hu, Xihong Lin","doi":"10.1073/pnas.2423341122","DOIUrl":null,"url":null,"abstract":"Disease and behavior subtype identification is of significant interest in biomedical research. However, in many settings, subtype discovery is limited by a lack of robust statistical clustering methods appropriate for binary data. Here, we introduce LACE-UP [latent class analysis ensembled with UMAP (uniform manifold approximation and projection) and PCA (principal components analysis)], an ensemble machine-learning method for clustering multidimensional binary data that does not require prespecifying the number of clusters and is robust to realistic data settings, such as the correlation of variables observed from the same individual and the inclusion of variables unrelated to the underlying subtype. The method ensembles latent class analysis, a model-based clustering method; principal components analysis, a spectral signal processing method; and UMAP, a cutting-edge model-free dimensionality reduction algorithm. In simulations, LACE-UP outperforms gold-standard techniques across a variety of realistic scenarios, including in the presence of correlated and extraneous data. We apply LACE-UP to dietary behavior data from the UK Biobank to demonstrate its power to uncover interpretable dietary subtypes that are associated with lipids and cardiovascular risk.","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"24 1","pages":""},"PeriodicalIF":9.4000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2423341122","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Disease and behavior subtype identification is of significant interest in biomedical research. However, in many settings, subtype discovery is limited by a lack of robust statistical clustering methods appropriate for binary data. Here, we introduce LACE-UP [latent class analysis ensembled with UMAP (uniform manifold approximation and projection) and PCA (principal components analysis)], an ensemble machine-learning method for clustering multidimensional binary data that does not require prespecifying the number of clusters and is robust to realistic data settings, such as the correlation of variables observed from the same individual and the inclusion of variables unrelated to the underlying subtype. The method ensembles latent class analysis, a model-based clustering method; principal components analysis, a spectral signal processing method; and UMAP, a cutting-edge model-free dimensionality reduction algorithm. In simulations, LACE-UP outperforms gold-standard techniques across a variety of realistic scenarios, including in the presence of correlated and extraneous data. We apply LACE-UP to dietary behavior data from the UK Biobank to demonstrate its power to uncover interpretable dietary subtypes that are associated with lipids and cardiovascular risk.
LACE-UP:多维二进制数据健康亚型分类的集合机器学习方法
疾病和行为亚型识别在生物医学研究中具有重要意义。然而,在许多情况下,由于缺乏适合二进制数据的鲁棒统计聚类方法,子类型发现受到限制。在这里,我们介绍了lacup[与UMAP(均匀流形近似和投影)和PCA(主成分分析)集成的潜在类分析],这是一种用于聚类多维二进制数据的集成机器学习方法,不需要预先指定聚类的数量,并且对现实数据设置具有鲁棒性,例如从同一个体观察到的变量的相关性以及与底层子类型无关的变量的包含。该方法集成了基于模型的聚类方法潜类分析;主成分分析,一种光谱信号处理方法;UMAP是一种先进的无模型降维算法。在模拟中,LACE-UP在各种现实场景(包括存在相关和无关数据的情况)中的表现优于黄金标准技术。我们将LACE-UP应用于英国生物银行的饮食行为数据,以证明其发现与脂质和心血管风险相关的可解释饮食亚型的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.00
自引率
0.90%
发文量
3575
审稿时长
2.5 months
期刊介绍: The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信