{"title":"simmrd: An open-source tool to perform simulations in Mendelian randomization","authors":"Noah Lorincz-Comi, Yihe Yang, Xiaofeng Zhu","doi":"10.1002/gepi.22544","DOIUrl":"10.1002/gepi.22544","url":null,"abstract":"<p>Mendelian randomization (MR) has become a popular tool for inferring causality of risk factors on disease. There are currently over 45 different methods available to perform MR, reflecting this extremely active research area. It would be desirable to have a standard simulation environment to objectively evaluate the existing and future methods. We present <span>simmrd</span>, an open-source software for performing simulations to evaluate the performance of MR methods in a range of scenarios encountered in practice. Researchers can directly modify the <span>simmrd</span> source code so that the research community may arrive at a widely accepted framework for researchers to evaluate the performance of different MR methods.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 2","pages":"59-73"},"PeriodicalIF":2.1,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22544","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139542107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuechan Li, John Pura, Andrew Allen, Kouros Owzar, Jianfeng Lu, Matthew Harms, Jichun Xie
{"title":"DYNATE: Localizing rare-variant association regions via multiple testing embedded in an aggregation tree","authors":"Xuechan Li, John Pura, Andrew Allen, Kouros Owzar, Jianfeng Lu, Matthew Harms, Jichun Xie","doi":"10.1002/gepi.22542","DOIUrl":"10.1002/gepi.22542","url":null,"abstract":"<p>Rare-variants (RVs) genetic association studies enable researchers to uncover the variation in phenotypic traits left unexplained by common variation. Traditional single-variant analysis lacks power; thus, researchers have developed various methods to aggregate the effects of RVs across genomic regions to study their collective impact. Some existing methods utilize a static delineation of genomic regions, often resulting in suboptimal effect aggregation, as neutral subregions within the test region will result in an attenuation of signal. Other methods use varying windows to search for signals but often result in long regions containing many neutral RVs. To pinpoint short genomic regions enriched for disease-associated RVs, we developed a novel method, DYNamic Aggregation TEsting (DYNATE). DYNATE dynamically and hierarchically aggregates smaller genomic regions into larger ones and performs multiple testing for disease associations with a controlled weighted false discovery rate. DYNATE's main advantage lies in its strong ability to identify short genomic regions highly enriched for disease-associated RVs. Extensive numerical simulations demonstrate the superior performance of DYNATE under various scenarios compared with existing methods. We applied DYNATE to an amyotrophic lateral sclerosis study and identified a new gene, <i>EPG5</i>, harboring possibly pathogenic mutations.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 1","pages":"42-55"},"PeriodicalIF":2.1,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138444394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bias and mean squared error in Mendelian randomization with invalid instrumental variables","authors":"Lu Deng, Sheng Fu, Kai Yu","doi":"10.1002/gepi.22541","DOIUrl":"10.1002/gepi.22541","url":null,"abstract":"<p>Mendelian randomization (MR) is a statistical method that utilizes genetic variants as instrumental variables (IVs) to investigate causal relationships between risk factors and outcomes. Although MR has gained popularity in recent years due to its ability to analyze summary statistics from genome-wide association studies (GWAS), it requires a substantial number of single nucleotide polymorphisms (SNPs) as IVs to ensure sufficient power for detecting causal effects. Unfortunately, the complex genetic heritability of many traits can lead to the use of invalid IVs that affect both the risk factor and the outcome directly or through an unobserved confounder. This can result in biased and imprecise estimates, as reflected by a larger mean squared error (MSE). In this study, we focus on the widely used two-stage least squares (2SLS) method and derive formulas for its bias and MSE when estimating causal effects using invalid IVs. Using those formulas, we identify conditions under which the 2SLS estimate is unbiased and reveal how the independent or correlated pleiotropic effects influence the accuracy and precision of the 2SLS estimate. We validate these formulas through extensive simulation studies and demonstrate the application of those formulas in an MR study to evaluate the causal effect of the waist-to-hip ratio on various sleeping patterns. Our results can aid in designing future MR studies and serve as benchmarks for assessing more sophisticated MR methods.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 1","pages":"27-41"},"PeriodicalIF":2.1,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136397137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Limitation of permutation-based differential correlation analysis","authors":"Hoseung Song, Michael C. Wu","doi":"10.1002/gepi.22540","DOIUrl":"10.1002/gepi.22540","url":null,"abstract":"<p>The comparison of biological systems, through the analysis of molecular changes under different conditions, has played a crucial role in the progress of modern biological science. Specifically, differential correlation analysis (DCA) has been employed to determine whether relationships between genomic features differ across conditions or outcomes. Because ascertaining the null distribution of test statistics to capture variations in correlation is challenging, several DCA methods utilize permutation which can loosen parametric (e.g., normality) assumptions. However, permutation is often problematic for DCA due to violating the assumption that samples are exchangeable under the null. Here, we examine the limitations of permutation-based DCA and investigate instances where the permutation-based DCA exhibits poor performance. Experimental results show that the permutation-based DCA often fails to control the type I error under the null hypothesis of equal correlation structures.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"637-641"},"PeriodicalIF":2.1,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72014121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to “Abstracts”","authors":"","doi":"10.1002/gepi.22543","DOIUrl":"10.1002/gepi.22543","url":null,"abstract":"<p>(2023), Abstracts. Genetic Epidemiology, 47: 520–581. https://doi.org/10.1002/gepi.22539</p><p>In the originally published Abstracts, there were authors missing for “Two-sample Mendelian Randomization Study of Circulating Metabolites and Prostate Cancer Risk in Hispanic Populations” (abstract 49). The correct authors and affiliations appear below and have been updated on the online version of the abstracts.</p><p>Harriett Fuller<sup>1</sup>, Rebecca Rohde<sup>2</sup>, Heather Highland<sup>2</sup>, Jiayi Shen<sup>3</sup>, Bing Yu<sup>4</sup>, Eric Boerwinkle<sup>4</sup>, Megan Grove<sup>4</sup>, Kari E. North<sup>2</sup>, David V. Conti<sup>3</sup>, Christopher A. Haiman<sup>3</sup>, Kristin Young<sup>2</sup>, Burcu F. Darst<sup>1</sup></p><p><sup>1</sup>Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA</p><p><sup>2</sup>Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA</p><p><sup>3</sup>Department of Population and Public Health Sciences, Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, California, USA</p><p><sup>4</sup>School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA</p><p>We apologize for this error.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"642"},"PeriodicalIF":2.1,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22543","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135241247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars L. J. van der Burg, Liesbeth C. de Wreede, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Hein Putter, Stefan Böhringer
{"title":"Haplotype reconstruction for genetically complex regions with ambiguous genotype calls: Illustration by the KIR gene region","authors":"Lars L. J. van der Burg, Liesbeth C. de Wreede, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Hein Putter, Stefan Böhringer","doi":"10.1002/gepi.22538","DOIUrl":"10.1002/gepi.22538","url":null,"abstract":"<p>Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (<i>KIR</i>) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation–maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of <i>KIR</i> genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the <i>KIR</i> gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 1","pages":"3-26"},"PeriodicalIF":2.1,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22538","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41198906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers","authors":"Zhongyuan Chen, Han Liang, Peng Wei","doi":"10.1002/gepi.22537","DOIUrl":"10.1002/gepi.22537","url":null,"abstract":"<p>Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case–control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels. <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 </mrow>\u0000 <annotation> $p$</annotation>\u0000 </semantics></math>-values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"617-636"},"PeriodicalIF":2.1,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41198905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets","authors":"Sarmistha Das, Deo Kumar Srivastava","doi":"10.1002/gepi.22536","DOIUrl":"10.1002/gepi.22536","url":null,"abstract":"<p>Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of <i>p</i> values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"600-616"},"PeriodicalIF":2.1,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41108946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siri N. Skodvin, Håkon K. Gjessing, Astanand Jugessur, Julia Romanowska, Christian M. Page, Elizabeth C. Corfield, Yunsung Lee, Siri E. Håberg, Miriam Gjerdevik
{"title":"Statistical methods to detect mother–father genetic interaction effects on risk of infertility: A genome-wide approach","authors":"Siri N. Skodvin, Håkon K. Gjessing, Astanand Jugessur, Julia Romanowska, Christian M. Page, Elizabeth C. Corfield, Yunsung Lee, Siri E. Håberg, Miriam Gjerdevik","doi":"10.1002/gepi.22534","DOIUrl":"10.1002/gepi.22534","url":null,"abstract":"<p>Infertility is a heterogeneous phenotype, and for many couples, the causes of fertility problems remain unknown. One understudied hypothesis is that allelic interactions between the genotypes of the two parents may influence the risk of infertility. Our aim was, therefore, to investigate how allelic interactions can be modeled using parental genotype data linked to 15,789 pregnancies selected from the Norwegian Mother, Father, and Child Cohort Study. The newborns in 1304 of these pregnancies were conceived using assisted reproductive technologies (ART), and the remainder were conceived naturally. Treating the use of ART as a proxy for infertility, different parameterizations were implemented in a genome-wide screen for interaction effects between maternal and paternal alleles at the same locus. Some of the models were more similar in the way they were parameterized, and some produced similar results when implemented on a genome-wide scale. The results showed near-significant interaction effects in genes relevant to the phenotype under study, such as Dynein axonemal heavy chain 17 (<i>DNAH17</i>) with a recognized role in male infertility. More generally, the interaction models presented here are readily adaptable to the study of other phenotypes in which maternal and paternal allelic interactions are likely to be involved.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"503-519"},"PeriodicalIF":2.1,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22534","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10084980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyi Chen, Zhaotong Lin, Xiaotong Shen, Ling Li, Wei Pan
{"title":"Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data","authors":"Siyi Chen, Zhaotong Lin, Xiaotong Shen, Ling Li, Wei Pan","doi":"10.1002/gepi.22535","DOIUrl":"10.1002/gepi.22535","url":null,"abstract":"<p>We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"585-599"},"PeriodicalIF":2.1,"publicationDate":"2023-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22535","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10158155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}