PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection.

IF 5.4
Guannan Yang, Ellen Menkhorst, Evdokia Dimitriadis, Kim-Anh Lê Cao
{"title":"PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection.","authors":"Guannan Yang, Ellen Menkhorst, Evdokia Dimitriadis, Kim-Anh Lê Cao","doi":"10.1093/bioinformatics/btaf475","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Integrating the knockoff framework with any variable-selection method delivers stringent false discovery rate (FDR) control without recourse to p-values, offering a powerful alternative for differential expression analysis of high-throughput omics datasets. However, existing knockoff generators rely on restrictive modelling assumptions or coarse approximations that often inflate the FDR when applied to real-world data.</p><p><strong>Results: </strong>We introduce Partial Least Squares Knockoff (PLSKO), an efficient, assumption-free generator that remains robust across diverse omics platforms. Our extensive simulations show that PLSKO is the only method to maintain FDR control with sufficient power in complex non-linear settings. Our semi-simulation studies drawn from RNA-seq, proteomics, metabolomics, and microbiome experiments confirm PLSKO generates valid knockoff variables. In pre-eclampsia multi-omics case studies, we combine PLSKO with Aggregation Knockoff to address the randomness of knockoffs and improve power, and demonstrate the method's ability to recover biologically meaningful features.</p><p><strong>Availability and implementation: </strong>Our proposed algorithm is available on Github (https://github.com/guannan-yang/PLSKO) and Zenodo (https://doi.org/10.5281/zenodo.16879594).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449248/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Integrating the knockoff framework with any variable-selection method delivers stringent false discovery rate (FDR) control without recourse to p-values, offering a powerful alternative for differential expression analysis of high-throughput omics datasets. However, existing knockoff generators rely on restrictive modelling assumptions or coarse approximations that often inflate the FDR when applied to real-world data.

Results: We introduce Partial Least Squares Knockoff (PLSKO), an efficient, assumption-free generator that remains robust across diverse omics platforms. Our extensive simulations show that PLSKO is the only method to maintain FDR control with sufficient power in complex non-linear settings. Our semi-simulation studies drawn from RNA-seq, proteomics, metabolomics, and microbiome experiments confirm PLSKO generates valid knockoff variables. In pre-eclampsia multi-omics case studies, we combine PLSKO with Aggregation Knockoff to address the randomness of knockoffs and improve power, and demonstrate the method's ability to recover biologically meaningful features.

Availability and implementation: Our proposed algorithm is available on Github (https://github.com/guannan-yang/PLSKO) and Zenodo (https://doi.org/10.5281/zenodo.16879594).

PLSKO:在组学变量选择中控制错误发现率的鲁棒仿冒发生器。
动机:将仿制品框架与任何变量选择方法集成,可以在不依赖p值的情况下提供严格的错误发现率(FDR)控制,为高通量组学数据集的差异表达分析提供强大的替代方案。然而,现有的仿冒器依赖于限制性的建模假设或粗略的近似值,当应用于实际数据时,往往会夸大FDR。结果:我们介绍了偏最小二乘仿造(PLSKO),这是一种有效的,无假设的生成器,在不同的组学平台上保持稳健。我们的大量仿真表明,PLSKO是在复杂的非线性设置中保持FDR控制具有足够功率的唯一方法。我们从RNA-seq、蛋白质组学、代谢组学和微生物组学实验中得出的半模拟研究证实,PLSKO产生了有效的仿制品变量。在子痫前期多组学案例研究中,我们将PLSKO与Aggregation Knockoff相结合,以解决仿制品的随机性和提高功率,并证明该方法能够恢复生物学上有意义的特征。可用性和实现:我们提出的算法可以在Github (https://github.com/guannan-yang/PLSKO)和Zenodo (https://doi.org/10.5281/zenodo.16879594).Supplementary)上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信