Guannan Yang, Ellen Menkhorst, Evdokia Dimitriadis, Kim-Anh Lê Cao
{"title":"PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection.","authors":"Guannan Yang, Ellen Menkhorst, Evdokia Dimitriadis, Kim-Anh Lê Cao","doi":"10.1093/bioinformatics/btaf475","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Integrating the knockoff framework with any variable-selection method delivers stringent false discovery rate (FDR) control without recourse to p-values, offering a powerful alternative for differential expression analysis of high-throughput omics datasets. However, existing knockoff generators rely on restrictive modelling assumptions or coarse approximations that often inflate the FDR when applied to real-world data.</p><p><strong>Results: </strong>We introduce Partial Least Squares Knockoff (PLSKO), an efficient, assumption-free generator that remains robust across diverse omics platforms. Our extensive simulations show that PLSKO is the only method to maintain FDR control with sufficient power in complex non-linear settings. Our semi-simulation studies drawn from RNA-seq, proteomics, metabolomics, and microbiome experiments confirm PLSKO generates valid knockoff variables. In pre-eclampsia multi-omics case studies, we combine PLSKO with Aggregation Knockoff to address the randomness of knockoffs and improve power, and demonstrate the method's ability to recover biologically meaningful features.</p><p><strong>Availability and implementation: </strong>Our proposed algorithm is available on Github (https://github.com/guannan-yang/PLSKO) and Zenodo (https://doi.org/10.5281/zenodo.16879594).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449248/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Integrating the knockoff framework with any variable-selection method delivers stringent false discovery rate (FDR) control without recourse to p-values, offering a powerful alternative for differential expression analysis of high-throughput omics datasets. However, existing knockoff generators rely on restrictive modelling assumptions or coarse approximations that often inflate the FDR when applied to real-world data.
Results: We introduce Partial Least Squares Knockoff (PLSKO), an efficient, assumption-free generator that remains robust across diverse omics platforms. Our extensive simulations show that PLSKO is the only method to maintain FDR control with sufficient power in complex non-linear settings. Our semi-simulation studies drawn from RNA-seq, proteomics, metabolomics, and microbiome experiments confirm PLSKO generates valid knockoff variables. In pre-eclampsia multi-omics case studies, we combine PLSKO with Aggregation Knockoff to address the randomness of knockoffs and improve power, and demonstrate the method's ability to recover biologically meaningful features.
Availability and implementation: Our proposed algorithm is available on Github (https://github.com/guannan-yang/PLSKO) and Zenodo (https://doi.org/10.5281/zenodo.16879594).