Large-scale composite hypothesis testing procedure for omics data analyses.

IF 2.8 Q1 GENETICS & HEREDITY

NAR Genomics and Bioinformatics Pub Date : 2025-09-05 eCollection Date: 2025-09-01 DOI:10.1093/nargab/lqaf118

Annaïg De Walsche, Franck Gauthier, Nathalie Boissot, Alain Charcosset, Tristan Mary-Huard

{"title":"Large-scale composite hypothesis testing procedure for omics data analyses.","authors":"Annaïg De Walsche, Franck Gauthier, Nathalie Boissot, Alain Charcosset, Tristan Mary-Huard","doi":"10.1093/nargab/lqaf118","DOIUrl":null,"url":null,"abstract":"Composite hypothesis testing using summary statistics is a well-established approach for assessing the effect of a single marker or gene across multiple traits or omics levels. Numerous procedures have been developed for this task and have been successfully applied to identify complex patterns of association between traits, conditions, or phenotypes. However, existing methods often struggle with scalability in large datasets or fail to account for dependencies between traits or omics levels, limiting their ability to control false positives effectively. To overcome these challenges, we present the qch_copula approach, which integrates mixture models with a copula function to capture dependencies between traits or omics and provides rigorously defined P-values for any composite hypothesis. Through a comprehensive benchmark against eight state-of-the-art methods, we demonstrate that qch_copula controls Type I error rates effectively while enhancing the detection of joint association patterns. Compared to other mixture model-based approaches, our method notably reduces memory usage during the EM algorithm, allowing the analysis of up to 20 traits and 105-106 markers. The effectiveness of qch_copula is further validated through two application cases in human and plant genetics. The method is available in the R package qch, accessible on CRAN.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf118"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12412788/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqaf118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Composite hypothesis testing using summary statistics is a well-established approach for assessing the effect of a single marker or gene across multiple traits or omics levels. Numerous procedures have been developed for this task and have been successfully applied to identify complex patterns of association between traits, conditions, or phenotypes. However, existing methods often struggle with scalability in large datasets or fail to account for dependencies between traits or omics levels, limiting their ability to control false positives effectively. To overcome these challenges, we present the qch_copula approach, which integrates mixture models with a copula function to capture dependencies between traits or omics and provides rigorously defined P-values for any composite hypothesis. Through a comprehensive benchmark against eight state-of-the-art methods, we demonstrate that qch_copula controls Type I error rates effectively while enhancing the detection of joint association patterns. Compared to other mixture model-based approaches, our method notably reduces memory usage during the EM algorithm, allowing the analysis of up to 20 traits and 10⁵-10⁶ markers. The effectiveness of qch_copula is further validated through two application cases in human and plant genetics. The method is available in the R package qch, accessible on CRAN.

Abstract Image

查看原文本刊更多论文

组学数据分析的大规模复合假设检验程序。

使用汇总统计的复合假设检验是一种行之有效的方法，用于评估单个标记或基因在多个性状或组学水平上的影响。为了这项任务，已经开发了许多程序，并已成功地应用于识别性状、条件或表型之间的复杂关联模式。然而，现有的方法往往难以在大型数据集中实现可扩展性，或者无法解释性状或组学水平之间的依赖关系，从而限制了它们有效控制假阳性的能力。为了克服这些挑战，我们提出了qch_copula方法，该方法将混合模型与copula函数集成在一起，以捕获性状或组学之间的依赖关系，并为任何复合假设提供严格定义的p值。通过对八种最先进方法的综合基准测试，我们证明了qch_copula有效地控制了I型错误率，同时增强了对联合关联模式的检测。与其他基于混合模型的方法相比，我们的方法在EM算法中显著减少了内存使用，允许分析多达20个性状和105-106个标记。qch_copula在人类和植物遗传学中的应用进一步验证了其有效性。该方法在R包qch中可用，可在CRAN上访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊