Christina Nieuwoudt, Fabiha Binte Farooq, Angela Brooks-Wilson, Alexandre Bureau, Jinko Graham
{"title":"Statistics to prioritize rare variants in family-based sequencing studies with disease subtypes","authors":"Christina Nieuwoudt, Fabiha Binte Farooq, Angela Brooks-Wilson, Alexandre Bureau, Jinko Graham","doi":"10.1002/gepi.22579","DOIUrl":"10.1002/gepi.22579","url":null,"abstract":"<p>Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent–child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"324-343"},"PeriodicalIF":1.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22579","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian D. Chen, Chanhwa Lee, Amanda L. Tapia, Alexander P. Reiner, Hua Tang, Charles Kooperberg, JoAnn E. Manson, Yun Li, Laura M. Raffield
{"title":"Proteome-wide association study using cis and trans variants and applied to blood cell and lipid-related traits in the Women's Health Initiative study","authors":"Brian D. Chen, Chanhwa Lee, Amanda L. Tapia, Alexander P. Reiner, Hua Tang, Charles Kooperberg, JoAnn E. Manson, Yun Li, Laura M. Raffield","doi":"10.1002/gepi.22578","DOIUrl":"10.1002/gepi.22578","url":null,"abstract":"<p>In most Proteome-Wide Association Studies (PWAS), variants near the protein-coding gene (±1 Mb), also known as <i>cis</i> single nucleotide polymorphisms (SNPs), are used to predict protein levels, which are then tested for association with phenotypes. However, proteins can be regulated through variants outside of the cis region. An intermediate GWAS step to identify protein quantitative trait loci (pQTL) allows for the inclusion of trans SNPs outside the cis region in protein-level prediction models. Here, we assess the prediction of 540 proteins in 1002 individuals from the Women's Health Initiative (WHI), split equally into a GWAS set, an elastic net training set, and a testing set. We compared the testing <i>r</i><sup>2</sup> between measured and predicted protein levels using this proposed approach, to the testing <i>r</i><sup>2</sup> using only cis SNPs. The two methods usually resulted in similar testing <i>r</i><sup>2</sup>, but some proteins showed a significant increase in testing <i>r</i><sup>2</sup> with our method. For example, for cartilage acidic protein 1, the testing <i>r</i><sup>2</sup> increased from 0.101 to 0.351. We also demonstrate reproducible findings for predicted protein association with lipid and blood cell traits in WHI participants without proteomics data and in UK Biobank utilizing our PWAS weights.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"310-323"},"PeriodicalIF":1.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lai Jiang, Jiayi Shen, Burcu F. Darst, Christopher A. Haiman, Nicholas Mancuso, David V. Conti
{"title":"Hierarchical joint analysis of marginal summary statistics—Part II: High-dimensional instrumental analysis of omics data","authors":"Lai Jiang, Jiayi Shen, Burcu F. Darst, Christopher A. Haiman, Nicholas Mancuso, David V. Conti","doi":"10.1002/gepi.22577","DOIUrl":"10.1002/gepi.22577","url":null,"abstract":"<p>Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"291-309"},"PeriodicalIF":1.7,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22577","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141418544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpreting disease genome-wide association studies and polygenetic risk scores given eligibility and study design considerations","authors":"Catherine Mary Schooling, Mary Beth Terry","doi":"10.1002/gepi.22567","DOIUrl":"10.1002/gepi.22567","url":null,"abstract":"<p>Genome-wide association studies (GWAS) have been helpful in identifying genetic variants predicting cancer risk and providing new insights into cancer biology. Increasing use of genetically informed care, as well as genetically informed prevention and treatment strategies, have also drawn attention to some of the inherent limitations of cancer genetic data. Specifically, genetic endowment is lifelong. However, those recruited into cancer studies tend to be middle-aged or older people, meaning the exposure most likely starts before recruitment, as opposed to exposure and recruitment aligning, as in a trial or a target trial. Studies in survivors can be biased as a result of depletion of the susceptibles, here specifically due to genetic vulnerability and the cancer of interest or a competing risk. In addition, including prevalent cases in a case-control study will make the genetics of survival with cancer look harmful (Neyman bias). Here, we describe ways of designing GWAS to maximize explanatory power and predictive utility, by reducing selection bias due to only recruiting survivors and reducing Neyman bias due to including prevalent cases alongside using other techniques, such as selection diagrams, age-stratification, and Mendelian randomization, to facilitate GWAS interpretability and utility.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"468-472"},"PeriodicalIF":1.7,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141155102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma","authors":"Diptavo Dutta, Ananda Sen, Jaya M. Satagopan","doi":"10.1002/gepi.22566","DOIUrl":"10.1002/gepi.22566","url":null,"abstract":"<p>Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing “gene component scores” and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of <i>ASAH1</i> gene trans-regulated by methylation of several genes including <i>SIX5</i> and by CNAs in the 10q25 region including <i>TCF7L2</i>. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"414-432"},"PeriodicalIF":1.7,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22566","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140943393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saptarshi Chakraborty, Zoe Guan, Caroline E. Kostrzewa, Ronglai Shen, Colin B. Begg
{"title":"Identifying somatic fingerprints of cancers defined by germline and environmental risk factors","authors":"Saptarshi Chakraborty, Zoe Guan, Caroline E. Kostrzewa, Ronglai Shen, Colin B. Begg","doi":"10.1002/gepi.22565","DOIUrl":"10.1002/gepi.22565","url":null,"abstract":"<p>Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline <i>BRCA1/2</i> mutations and in head and neck cancer patients exposed to human papillomavirus.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"455-467"},"PeriodicalIF":1.7,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thanthirige L. M. Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas
{"title":"Meta-analysis of breast cancer risk for individuals with PALB2 pathogenic variants","authors":"Thanthirige L. M. Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas","doi":"10.1002/gepi.22561","DOIUrl":"10.1002/gepi.22561","url":null,"abstract":"<p>Multigene panel testing now allows efficient testing of many cancer susceptibility genes leading to a larger number of mutation carriers being identified. They need to be counseled about their cancer risk conferred by the specific gene mutation. An important cancer susceptibility gene is PALB2. Multiple studies reported risk estimates for breast cancer (BC) conferred by pathogenic variants in PALB2. Due to the diverse modalities of reported risk estimates (age-specific risk, odds ratio, relative risk, and standardized incidence ratio) and effect sizes, a meta-analysis combining these estimates is necessary to accurately counsel patients with this mutation. However, this is not trivial due to heterogeneity of studies in terms of study design and risk measure. We utilized a recently proposed Bayesian random-effects meta-analysis method that can synthesize estimates from such heterogeneous studies. We applied this method to combine estimates from 12 studies on BC risk for carriers of pathogenic PALB2 mutations. The estimated overall (meta-analysis-based) risk of BC is 12.80% (6.11%−22.59%) by age 50 and 48.47% (36.05%−61.74%) by age 80. Pathogenic mutations in PALB2 makes women more susceptible to BC. Our risk estimates can help clinically manage patients carrying pathogenic variants in PALB2.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"448-454"},"PeriodicalIF":1.7,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140666320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Negar Janani, Kendra A. Young, Greg Kinney, Matthew Strand, John E. Hokanson, Yaning Liu, Troy Butler, Erin Austin
{"title":"A novel application of data-consistent inversion to overcome spurious inference in genome-wide association studies","authors":"Negar Janani, Kendra A. Young, Greg Kinney, Matthew Strand, John E. Hokanson, Yaning Liu, Troy Butler, Erin Austin","doi":"10.1002/gepi.22563","DOIUrl":"10.1002/gepi.22563","url":null,"abstract":"<p>The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 6","pages":"270-288"},"PeriodicalIF":1.7,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Azeez Alade, Tabitha Peter, Tamara Busch, Waheed Awotoye, Deepti Anand, Oladayo Abimbola, Emmanuel Aladenika, Mojisola Olujitan, Oscar Rysavy, Phuong Fawng Nguyen, Thirona Naicker, Peter A. Mossey, Lord J. J. Gowans, Mekonen A. Eshete, Wasiu L. Adeyemo, Erliang Zeng, Eric Van Otterloo, Michael O'Rorke, Adebowale Adeyemo, Jeffrey C. Murray, Salil A. Lachke, Paul A. Romitti, Azeez Butali
{"title":"Shared genetic risk between major orofacial cleft phenotypes in an African population","authors":"Azeez Alade, Tabitha Peter, Tamara Busch, Waheed Awotoye, Deepti Anand, Oladayo Abimbola, Emmanuel Aladenika, Mojisola Olujitan, Oscar Rysavy, Phuong Fawng Nguyen, Thirona Naicker, Peter A. Mossey, Lord J. J. Gowans, Mekonen A. Eshete, Wasiu L. Adeyemo, Erliang Zeng, Eric Van Otterloo, Michael O'Rorke, Adebowale Adeyemo, Jeffrey C. Murray, Salil A. Lachke, Paul A. Romitti, Azeez Butali","doi":"10.1002/gepi.22564","DOIUrl":"10.1002/gepi.22564","url":null,"abstract":"<p>Nonsyndromic orofacial clefts (NSOFCs) represent a large proportion (70%–80%) of all OFCs. They can be broadly categorized into nonsyndromic cleft lip with or without cleft palate (NSCL/P) and nonsyndromic cleft palate only (NSCPO). Although NSCL/P and NSCPO are considered etiologically distinct, recent evidence suggests the presence of shared genetic risks. Thus, we investigated the genetic overlap between NSCL/P and NSCPO using African genome-wide association study (GWAS) data on NSOFCs. These data consist of 814 NSCL/P, 205 NSCPO cases, and 2159 unrelated controls. We generated common single-nucleotide variants (SNVs) association summary statistics separately for each phenotype (NSCL/P and NSCPO) under an additive genetic model. Subsequently, we employed the pleiotropic analysis under the composite null (PLACO) method to test for genetic overlap. Our analysis identified two loci with genome-wide significance (rs181737795 [<i>p</i> = 2.58E−08] and rs2221169 [<i>p</i> = 4.5E−08]) and one locus with marginal significance (rs187523265 [<i>p</i> = 5.22E−08]). Using mouse transcriptomics data and information from genetic phenotype databases, we identified <i>MDN1, MAP3k7, KMT2A, ARCN1</i>, and <i>VADC2</i> as top candidate genes for the associated SNVs. These findings enhance our understanding of genetic variants associated with NSOFCs and identify potential candidate genes for further exploration.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 6","pages":"258-269"},"PeriodicalIF":1.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22564","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meiling Liu, Yu-Ru Su, Yang Liu, Li Hsu, Qianchuan He
{"title":"Structured testing of genetic association with mixed clinical outcomes","authors":"Meiling Liu, Yu-Ru Su, Yang Liu, Li Hsu, Qianchuan He","doi":"10.1002/gepi.22560","DOIUrl":"10.1002/gepi.22560","url":null,"abstract":"<p>Genetic factors play a fundamental role in disease development. Studying the genetic association with clinical outcomes is critical for understanding disease biology and devising novel treatment targets. However, the frequencies of genetic variations are often low, making it difficult to examine the variants one-by-one. Moreover, the clinical outcomes are complex, including patients' survival time and other binary or continuous outcomes such as recurrences and lymph node count, and how to effectively analyze genetic association with these outcomes remains unclear. In this article, we proposed a structured test statistic for testing genetic association with mixed types of survival, binary, and continuous outcomes. The structured testing incorporates known biological information of variants while allowing for their heterogeneous effects and is a powerful strategy for analyzing infrequent genetic factors. Simulation studies show that the proposed test statistic has correct type I error and is highly effective in detecting significant genetic variants. We applied our approach to a uterine corpus endometrial carcinoma study and identified several genetic pathways associated with the clinical outcomes.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 5","pages":"226-237"},"PeriodicalIF":1.7,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140596715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}