Multiple testing in genome-wide association studies via hierarchical hidden Markov models

IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY
Pengfei Wang, Zhaofeng Tian
{"title":"Multiple testing in genome-wide association studies via hierarchical hidden Markov models","authors":"Pengfei Wang,&nbsp;Zhaofeng Tian","doi":"10.1016/j.jspi.2024.106161","DOIUrl":null,"url":null,"abstract":"<div><p>Problems of large-scale multiple testing are often encountered in modern scientific research. Conventional multiple testing procedures usually suffer considerable loss of testing efficiency when correlations among tests are ignored. In fact, appropriate use of correlation information not only enhances the efficacy of the testing procedure, but also improves the interpretability of the results. Since the disease- or trait-related single nucleotide polymorphisms (SNPs) tend to be clustered and exhibit serial correlations, hidden Markov model (HMM) based multiple testing procedures have been successfully applied in genome-wide association studies (GWAS). However, modeling the entire chromosome using a single HMM is somewhat rough. To overcome this issue, this paper employs the hierarchical hidden Markov model (HHMM) to describe local correlations among tests, and develops a multiple testing procedure that can automatically divide different class of chromosome regions, while taking into account local correlations among tests. We first propose an oracle procedure that is shown theoretically to be valid, and in fact optimal in some sense. We then develop a date-driven procedure to mimic the oracle version. Extensive simulations and a real data example show that the novel multiple testing procedure outperforms its competitors.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"232 ","pages":"Article 106161"},"PeriodicalIF":0.8000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Planning and Inference","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375824000181","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Problems of large-scale multiple testing are often encountered in modern scientific research. Conventional multiple testing procedures usually suffer considerable loss of testing efficiency when correlations among tests are ignored. In fact, appropriate use of correlation information not only enhances the efficacy of the testing procedure, but also improves the interpretability of the results. Since the disease- or trait-related single nucleotide polymorphisms (SNPs) tend to be clustered and exhibit serial correlations, hidden Markov model (HMM) based multiple testing procedures have been successfully applied in genome-wide association studies (GWAS). However, modeling the entire chromosome using a single HMM is somewhat rough. To overcome this issue, this paper employs the hierarchical hidden Markov model (HHMM) to describe local correlations among tests, and develops a multiple testing procedure that can automatically divide different class of chromosome regions, while taking into account local correlations among tests. We first propose an oracle procedure that is shown theoretically to be valid, and in fact optimal in some sense. We then develop a date-driven procedure to mimic the oracle version. Extensive simulations and a real data example show that the novel multiple testing procedure outperforms its competitors.

通过分层隐马尔可夫模型在全基因组关联研究中进行多重测试
在现代科学研究中,经常会遇到大规模多重测试的问题。传统的多重检验程序通常会因忽略检验之间的相关性而大大降低检验效率。事实上,适当利用相关信息不仅能提高测试程序的效率,还能改善结果的可解释性。由于与疾病或性状相关的单核苷酸多态性(SNPs)往往是聚集在一起的,并表现出序列相关性,因此基于隐马尔可夫模型(HMM)的多重测试程序已成功应用于全基因组关联研究(GWAS)。然而,使用单个 HMM 对整个染色体建模有些粗糙。为了克服这个问题,本文采用分层隐马尔可夫模型(HHMM)来描述测试间的局部相关性,并开发了一种多重测试程序,它能自动划分不同类别的染色体区域,同时考虑到测试间的局部相关性。我们首先提出了一个甲骨文程序,该程序在理论上证明是有效的,事实上在某种意义上是最优的。然后,我们开发了一种日期驱动程序来模仿神谕版本。大量的模拟和真实数据实例表明,新颖的多重测试程序优于其竞争对手。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Statistical Planning and Inference
Journal of Statistical Planning and Inference 数学-统计学与概率论
CiteScore
2.10
自引率
11.10%
发文量
78
审稿时长
3-6 weeks
期刊介绍: The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists. We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信