Large-scale composite hypothesis testing procedure for omics data analyses.

IF 2.8 Q1 GENETICS & HEREDITY
NAR Genomics and Bioinformatics Pub Date : 2025-09-05 eCollection Date: 2025-09-01 DOI:10.1093/nargab/lqaf118
Annaïg De Walsche, Franck Gauthier, Nathalie Boissot, Alain Charcosset, Tristan Mary-Huard
{"title":"Large-scale composite hypothesis testing procedure for omics data analyses.","authors":"Annaïg De Walsche, Franck Gauthier, Nathalie Boissot, Alain Charcosset, Tristan Mary-Huard","doi":"10.1093/nargab/lqaf118","DOIUrl":null,"url":null,"abstract":"<p><p>Composite hypothesis testing using summary statistics is a well-established approach for assessing the effect of a single marker or gene across multiple traits or omics levels. Numerous procedures have been developed for this task and have been successfully applied to identify complex patterns of association between traits, conditions, or phenotypes. However, existing methods often struggle with scalability in large datasets or fail to account for dependencies between traits or omics levels, limiting their ability to control false positives effectively. To overcome these challenges, we present the qch_copula approach, which integrates mixture models with a copula function to capture dependencies between traits or omics and provides rigorously defined <i>P</i>-values for any composite hypothesis. Through a comprehensive benchmark against eight state-of-the-art methods, we demonstrate that qch_copula controls Type I error rates effectively while enhancing the detection of joint association patterns. Compared to other mixture model-based approaches, our method notably reduces memory usage during the EM algorithm, allowing the analysis of up to 20 traits and 10<sup>5</sup>-10<sup>6</sup> markers. The effectiveness of qch_copula is further validated through two application cases in human and plant genetics. The method is available in the R package qch, accessible on CRAN.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf118"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12412788/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqaf118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Composite hypothesis testing using summary statistics is a well-established approach for assessing the effect of a single marker or gene across multiple traits or omics levels. Numerous procedures have been developed for this task and have been successfully applied to identify complex patterns of association between traits, conditions, or phenotypes. However, existing methods often struggle with scalability in large datasets or fail to account for dependencies between traits or omics levels, limiting their ability to control false positives effectively. To overcome these challenges, we present the qch_copula approach, which integrates mixture models with a copula function to capture dependencies between traits or omics and provides rigorously defined P-values for any composite hypothesis. Through a comprehensive benchmark against eight state-of-the-art methods, we demonstrate that qch_copula controls Type I error rates effectively while enhancing the detection of joint association patterns. Compared to other mixture model-based approaches, our method notably reduces memory usage during the EM algorithm, allowing the analysis of up to 20 traits and 105-106 markers. The effectiveness of qch_copula is further validated through two application cases in human and plant genetics. The method is available in the R package qch, accessible on CRAN.

Abstract Image

Abstract Image

Abstract Image

组学数据分析的大规模复合假设检验程序。
使用汇总统计的复合假设检验是一种行之有效的方法,用于评估单个标记或基因在多个性状或组学水平上的影响。为了这项任务,已经开发了许多程序,并已成功地应用于识别性状、条件或表型之间的复杂关联模式。然而,现有的方法往往难以在大型数据集中实现可扩展性,或者无法解释性状或组学水平之间的依赖关系,从而限制了它们有效控制假阳性的能力。为了克服这些挑战,我们提出了qch_copula方法,该方法将混合模型与copula函数集成在一起,以捕获性状或组学之间的依赖关系,并为任何复合假设提供严格定义的p值。通过对八种最先进方法的综合基准测试,我们证明了qch_copula有效地控制了I型错误率,同时增强了对联合关联模式的检测。与其他基于混合模型的方法相比,我们的方法在EM算法中显著减少了内存使用,允许分析多达20个性状和105-106个标记。qch_copula在人类和植物遗传学中的应用进一步验证了其有效性。该方法在R包qch中可用,可在CRAN上访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.00
自引率
2.20%
发文量
95
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信