Gibran Hemani, Apostolos Gkatzionis, Kate Tilling, George Davey Smith
{"title":"Sensitivity analyses gain relevance by fixing parameters observable during the empirical analyses","authors":"Gibran Hemani, Apostolos Gkatzionis, Kate Tilling, George Davey Smith","doi":"10.1002/gepi.22530","DOIUrl":"10.1002/gepi.22530","url":null,"abstract":"<p>In 2017 we presented the MR Steiger method, a sensitivity analysis in Mendelian randomization (MR) for inferring causal directions between variables (Hemani et al., <span>2017</span>). We discussed many of its potential limitations including that unmeasured confounding under certain extreme circumstances could lead to the wrong inferred causal direction. Lutz et al. (<span>2022</span>) propose an R package (UCRMS) for performing sensitivity analysis of the MR Steiger method, and use it in an illustration to suggest that the MR Steiger method has a ~90% chance of giving the wrong answer due to unmeasured confounding. In this note we will show that an error in their approach to sensitivity analysis leads to the wrong conclusion about the validity of the MR Steiger test. We provide a valid alternative which uses the observed data to investigate sensitivity to unmeasured confounding.</p><p>A sensitivity analysis aims to understand the degree to which a result can change due to uncertainties in the inputs (Saltelli, <span>2002</span>). In this case for the MR Steiger test, we need to ask how sensitive is the inference of the causal direction between X and Y to possible values of unmeasured confounders influencing X and Y. Importantly, there is relative certainty in many of the parameters of this system because they are easily observed, for example, the variances of X, Y and the instrumental variables (IVs), the estimated effect of the IVs on X and Y, and therefore the IV estimate of the effect of X on Y. Often the ordinary least squares (OLS) association between X and Y is also available either due to the analysis being performed using individual level data, or by sourcing the estimate from other published results. Therefore, an appropriate sensitivity analysis must explore the extent to which the inferred causal direction between X and Y can change due to unmeasured confounding, without causing these observed parameters to change.</p><p>Lutz et al.'s proposed method does not attempt to fix all observable parameters. In the simple example provided by Lutz et al. the variance of Y varies between 28 and 39, and the OLS estimate varies between 1 and −1 across the parameter values used for the sensitivity analysis. This arises because the residual variance—which is unobserved—is fixed in their approach. Instead the phenotypic variance—which is observed—should be fixed. If they were presenting a simulation of the general performance of MR Steiger under unmeasured confounding then it would not matter that the simulated parameters are not tied to those observed in a particular empirical analysis. However in a sensitivity analysis, allowing observed parameters to vary provides no value to the analyst. To say that unmeasured confounding could reverse the causal direction, provided that the variance of Y also changes drastically, is of little use to the researcher who has a data set with an observed variance of Y. If some quantities are observed (i.e. the re","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"461-462"},"PeriodicalIF":2.1,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22530","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10001501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of regmed and BayesNetty for exploring causal models with many variables","authors":"Richard Howey, Heather J. Cordell","doi":"10.1002/gepi.22532","DOIUrl":"10.1002/gepi.22532","url":null,"abstract":"<p>Here we compare a recently proposed method and software package, <span>regmed</span>, with our own previously developed package, BayesNetty, designed to allow exploratory analysis of complex causal relationships between biological variables. We find that \u0000<span>regmed</span> generally has poorer recall but much better precision than BayesNetty. This is perhaps not too surprising as \u0000<span>regmed</span> is specifically designed for use with high-dimensional data. BayesNetty is found to be more sensitive to the resulting multiple testing problem encountered in these circumstances. However, as \u0000<span>regmed</span> is not designed to handle missing data, its performance is severely affected when missing data is present, whereas the performance of BayesNetty is only slightly affected. The performance of \u0000<span>regmed</span> can be rescued in this situation by first using BayesNetty to impute the missing data, and then applying \u0000<span>regmed</span> to the resulting “filled-in” data set.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"496-502"},"PeriodicalIF":2.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22532","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9689871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manyan Huang, Chen Lyu, Nianjun Liu, Wendy N. Nembhard, John S. Witte, Charlotte A. Hobbs, Ming Li, the National Birth Defects Prevention Study
{"title":"A gene-based association test of interactions for maternal–fetal genotypes identifies genes associated with nonsyndromic congenital heart defects","authors":"Manyan Huang, Chen Lyu, Nianjun Liu, Wendy N. Nembhard, John S. Witte, Charlotte A. Hobbs, Ming Li, the National Birth Defects Prevention Study","doi":"10.1002/gepi.22533","DOIUrl":"10.1002/gepi.22533","url":null,"abstract":"<p>The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal–fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother–infant pairs and 1306 control mother–infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, <i>TMEM107</i> (<i>p</i> = 1.64e−06) and <i>CTC1</i> (<i>p</i> = 2.0e−06), were identified for significant association with CHD in common variants analysis. Gene <i>TMEM107</i> regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene <i>CTC1</i> plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of <i>TMEM107</i> and <i>CTC1</i> with CHDs.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"475-495"},"PeriodicalIF":2.1,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22533","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9669966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Phenotypic variance partitioning by transcriptomic gene expression levels and environmental variables for anthropometric traits using GTEx data","authors":"Pastor Jullian Fabres, S. Hong Lee","doi":"10.1002/gepi.22531","DOIUrl":"10.1002/gepi.22531","url":null,"abstract":"<p>Phenotypic variation in human is the results of genetic variation and environmental influences. Understanding the contribution of genetic and environmental components to phenotypic variation is of great interest. The variance explained by genome-wide single nucleotide polymorphisms (SNPs) typically represents a small proportion of the phenotypic variance for complex traits, which may be because the genome is only a part of the whole biological process to shape the phenotypes. In this study, we propose to partition the phenotypic variance of three anthropometric traits, using gene expression levels and environmental variables from GTEx data. We use the gene expression of four tissues that are deemed relevant for the anthropometric traits (two adipose tissues, skeletal muscle tissue and blood tissue). Additionally, we estimate the transcriptome–environment correlation that partly underlies the phenotypes of the anthropometric traits. We found that genetic factors play a significant role in determining body mass index (BMI), with the proportion of phenotypic variance explained by gene expression levels of visceral adipose tissue being 0.68 (SE = 0.06). However, we also observed that environmental factors such as age, sex, ancestry, smoking status, and drinking alcohol status have a small but significant impact (0.005, SE = 0.001). Interestingly, we found a significant negative correlation between the transcriptomic and environmental effects on BMI (transcriptome–environment correlation = −0.54, SE = 0.14), suggesting an antagonistic relationship. This implies that individuals with lower genetic profiles may be more susceptible to the effects of environmental factors on BMI, while those with higher genetic profiles may be less susceptible. We also show that the estimated transcriptomic variance varies across tissues, e.g., the gene expression levels of whole blood tissue and environmental variables explain a lower proportion of BMI phenotypic variance (0.16, SE = 0.05 and 0.04, SE = 0.004 respectively). We observed a significant positive correlation between transcriptomic and environmental effects (1.21, SE = 0.23) for this tissue. In conclusion, phenotypic variance partitioning can be done using gene expression and environmental data even with a small sample size (<i>n</i> = 838 from GTEx data), which can provide insights into how the transcriptomic and environmental effects contribute to the phenotypes of the anthropometric traits.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 7","pages":"465-474"},"PeriodicalIF":2.1,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22531","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9687294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ravages: An R package for the simulation and analysis of rare variants in multicategory phenotypes","authors":"Ozvan Bocher, Gaëlle Marenne, Emmanuelle Génin, Hervé Perdry","doi":"10.1002/gepi.22529","DOIUrl":"10.1002/gepi.22529","url":null,"abstract":"<p>Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"450-460"},"PeriodicalIF":2.1,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22529","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10385156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Brief History behind the journal Genetic Epidemiology and the International Genetic Epidemiology Society","authors":"Dabeeru C. Rao","doi":"10.1002/gepi.22528","DOIUrl":"10.1002/gepi.22528","url":null,"abstract":"<p>This commentary briefly describes the process and steps that underlie the launching of the journal Genetic Epidemiology in 1984 and the International Genetic Epidemiology Society (IGES, to be pronounced as “I guess”) in 1992.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 5","pages":"361-364"},"PeriodicalIF":2.1,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22528","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9673429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuqi Wang, Chi-Yang Chiu, Alexander F. Wilson, Joan E. Bailey-Wilson, Elvira Agron, Emily Y. Chew, Jaeil Ahn, Momiao Xiong, Ruzong Fan
{"title":"Gene-level association analysis of bivariate ordinal traits with functional regressions","authors":"Shuqi Wang, Chi-Yang Chiu, Alexander F. Wilson, Joan E. Bailey-Wilson, Elvira Agron, Emily Y. Chew, Jaeil Ahn, Momiao Xiong, Ruzong Fan","doi":"10.1002/gepi.22524","DOIUrl":"10.1002/gepi.22524","url":null,"abstract":"<p>In genetic studies, many phenotypes have multiple naturally ordered discrete values. The phenotypes can be correlated with each other. If multiple correlated ordinal traits are analyzed simultaneously, the power of analysis may increase significantly while the false positives can be controlled well. In this study, we propose bivariate functional ordinal linear regression (BFOLR) models using latent regressions with cumulative logit link or probit link to perform a gene-based analysis for bivariate ordinal traits and sequencing data. In the proposed BFOLR models, genetic variant data are viewed as stochastic functions of physical positions, and the genetic effects are treated as a function of physical positions. The BFOLR models take the correlation of the two ordinal traits into account via latent variables. The BFOLR models are built upon functional data analysis which can be revised to analyze the bivariate ordinal traits and high-dimension genetic data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Extensive simulation studies show that the likelihood ratio tests of the BFOLR models control type I errors well and have good power performance. The BFOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes, CFH and ARMS2, are found to strongly associate with eye drusen size, drusen area, age-related macular degeneration (AMD) categories, and AMD severity scale.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"409-431"},"PeriodicalIF":2.1,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10065139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiwen Jiang, Haoyu Zhang, Thomas U. Ahearn, Montserrat Garcia-Closas, Nilanjan Chatterjee, Hongtu Zhu, Xiang Zhan, Ni Zhao
{"title":"The sequence kernel association test for multicategorical outcomes","authors":"Zhiwen Jiang, Haoyu Zhang, Thomas U. Ahearn, Montserrat Garcia-Closas, Nilanjan Chatterjee, Hongtu Zhu, Xiang Zhan, Ni Zhao","doi":"10.1002/gepi.22527","DOIUrl":"10.1002/gepi.22527","url":null,"abstract":"<p>Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set-based analysis methods for genome-wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set-based association analysis method, sequence kernel association test (SKAT)-MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT-MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT-MC to the Polish breast cancer study (PBCS), and identified gene <i>FGFR2</i> was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data (<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>N</mi>\u0000 \u0000 <mo>=</mo>\u0000 \u0000 <mn>127</mn>\u0000 \u0000 <mo>,</mo>\u0000 \u0000 <mn>127</mn>\u0000 </mrow>\u0000 <annotation> $N=127,127$</annotation>\u0000 </semantics></math>) with SKAT-MC, and identified 21 significant genes in the genome. Consequently, SKAT-MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT-MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 6","pages":"432-449"},"PeriodicalIF":2.1,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22527","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9985331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jasper P. Hof, Sita H. Vermeulen, Anthony C. C. Coolen, Tessel E. Galesloot
{"title":"Fast and accurate recurrent event analysis for genome-wide association studies","authors":"Jasper P. Hof, Sita H. Vermeulen, Anthony C. C. Coolen, Tessel E. Galesloot","doi":"10.1002/gepi.22525","DOIUrl":"10.1002/gepi.22525","url":null,"abstract":"<p>Many diseases recur after recovery, for example, recurrences in cancer and infections. However, research is often focused on analysing only time-to-first recurrence, thereby ignoring any subsequent recurrences that may occur after the first. Statistical models for the analysis of recurrent events are available, of which the extended Cox proportional hazards frailty model is the current state-of-the-art. However, this model is too statistically complex for computationally efficient application in high-dimensional data sets, including genome-wide association studies (GWAS). Here, we develop an application for fast and accurate recurrent event analysis in GWAS, called SPARE (SaddlePoint Approximation for Recurrent Event analysis). In SPARE, every DNA variant is tested for association with recurrence risk using a modified score statistic. A saddlepoint approximation is implemented to achieve statistical accuracy. SPARE controls the Type I error, and its statistical power is similar to existing recurrent event models, yet SPARE is significantly faster. An application of SPARE in a recurrent event GWAS on bladder cancer for 6.2 million DNA variants in 1,443 individuals required less than 15 min, whereas existing recurrent event methods would require several weeks.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 5","pages":"365-378"},"PeriodicalIF":2.1,"publicationDate":"2023-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22525","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9666885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RoPE: A robust profile likelihood method for differential gene expression analysis","authors":"Lehang Zhong, Lisa J. Strug","doi":"10.1002/gepi.22526","DOIUrl":"10.1002/gepi.22526","url":null,"abstract":"<p>Variation in RNA-Seq data creates modeling challenges for differential gene expression (DE) analysis. Statistical approaches address conventional small sample sizes and implement empirical Bayes or non-parametric tests, but frequently produce different conclusions. Increasing sample sizes enable proposal of alternative DE paradigms. Here we develop RoPE, which uses a data-driven adjustment for variation and a robust profile likelihood ratio DE test. Simulation studies show RoPE can have improved performance over existing tools as sample size increases and has the most reliable control of error rates. Application of RoPE demonstrates that an active <i>Pseudomonas aeruginosa</i> infection downregulates the <i>SLC9A3</i> Cystic Fibrosis modifier gene.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 5","pages":"379-393"},"PeriodicalIF":2.1,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22526","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9657680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}