{"title":"A Mixed-Effect Kernel Machine Regression Model for Integrative Analysis of Alpha Diversity in Microbiome Studies","authors":"Runzhe Li, Mo Li, Ni Zhao","doi":"10.1002/gepi.22596","DOIUrl":"10.1002/gepi.22596","url":null,"abstract":"<div>\u0000 \u0000 <p>Increasing evidence suggests that human microbiota plays a crucial role in many diseases. Alpha diversity, a commonly used summary statistic that captures the richness and/or evenness of the microbial community, has been associated with many clinical conditions. However, individual studies that assess the association between alpha diversity and clinical conditions often provide inconsistent results due to insufficient sample size, heterogeneous study populations and technical variability. In practice, meta-analysis tools have been applied to integrate data from multiple studies. However, these methods do not consider the heterogeneity caused by sequencing protocols, and the contribution of each study to the final model depends mainly on its sample size (or variance estimate). To combine studies with distinct sequencing protocols, a robust statistical framework for integrative analysis of microbiome datasets is needed. Here, we propose a mixed-effect kernel machine regression model to assess the association of alpha diversity with a phenotype of interest. Our approach readily incorporates the study-specific characteristics (including sequencing protocols) to allow for flexible modeling of microbiome effect via a kernel similarity matrix. Within the proposed framework, we provide three hypothesis testing approaches to answer different questions that are of interest to researchers. We evaluate the model performance through extensive simulations based on two distinct data generation mechanisms. We also apply our framework to data from HIV reanalysis consortium to investigate gut dysbiosis in HIV infection.</p>\u0000 </div>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Gene Expression Predictions Using Deep Learning and Functional Annotations","authors":"Pratik Ramprasad, Jingchen Ren, Wei Pan","doi":"10.1002/gepi.22595","DOIUrl":"10.1002/gepi.22595","url":null,"abstract":"<p>Transcriptome-wide association studies (TWAS) aim to uncover genotype–phenotype relationships through a two-stage procedure: predicting gene expression from genotypes using an expression quantitative trait locus (eQTL) data set, then testing the predicted expression for trait associations. Accurate gene expression prediction in stage 1 is crucial, as it directly impacts the power to identify associations in stage 2. Currently, the first stage of such studies is primarily conducted using linear models like elastic net regression, which fail to capture the nonlinear relationships inherent in biological systems. Deep learning methods have the potential to model such nonlinear effects, but have yet to demonstrably outperform linear methods at this task. To address this gap, we propose a new deep learning architecture to predict gene expression from genotypic variation across individuals. Our method utilizes a learnable input scaling layer in conjunction with a convolutional encoder to capture nonlinear effects and higher-order interactions without compromising on interpretability. We further augment this approach to allow for parameter sharing across multiple networks, enabling us to utilize prior information for individual variants in the form of functional annotations. Evaluations on real-world genomic data show that our method consistently outperforms elastic net regression across a large set of heritable genes. Furthermore, our model statistically significantly improved predictive performance by leveraging functional annotations, whereas elastic net regression failed to show equivalent gains when using the same information, suggesting that our method can capture nonlinear functional information beyond the capability of linear models.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22595","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Powerful Rare-Variant Association Analysis of Secondary Phenotypes","authors":"Hanyun Liu, Hong Zhang","doi":"10.1002/gepi.22589","DOIUrl":"10.1002/gepi.22589","url":null,"abstract":"<div>\u0000 \u0000 <p>Most genome-wide association studies are based on case-control designs, which provide abundant resources for secondary phenotype analyses. However, such studies suffer from biased sampling of primary phenotypes, and the traditional statistical methods can lead to seriously distorted analysis results when they are applied to secondary phenotypes without accounting for the biased sampling mechanism. To our knowledge, there are no statistical methods specifically tailored for rare variant association analysis with secondary phenotypes. In this article, we proposed two novel joint test statistics for identifying secondary-phenotype-associated rare variants based on prospective likelihood and retrospective likelihood, respectively. We also exploit the assumption of gene-environment independence in retrospective likelihood to improve the statistical power and adopt a two-step strategy to balance statistical power and robustness. Simulations and a real-data application are conducted to demonstrate the superior performance of our proposed methods.</p>\u0000 </div>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephanie Calluori, Kaitlin Kirkpatrick Heimke, Charlisse Caga-anan, David Kaufman, Leah E. Mechanic, Kimberly A. McAllister
{"title":"Ethical, Legal, and Social Implications of Gene-Environment Interaction Research","authors":"Stephanie Calluori, Kaitlin Kirkpatrick Heimke, Charlisse Caga-anan, David Kaufman, Leah E. Mechanic, Kimberly A. McAllister","doi":"10.1002/gepi.22591","DOIUrl":"10.1002/gepi.22591","url":null,"abstract":"<p>Many complex disorders are impacted by the interplay of genetic and environmental factors. In gene-environment interactions (GxE), an individual's genetic and epigenetic makeup impacts the response to environmental exposures. Understanding GxE can impact health at the individual, community, and population levels. The rapid expansion of GxE research in biomedical studies for complex diseases raises many unique ethical, legal, and social implications (ELSIs) that have not been extensively explored and addressed. This review article builds on discussions originating from a workshop held by the National Institute of Environmental Health Sciences (NIEHS) and the National Human Genome Research Institute (NHGRI) in January 2022, entitled: “Ethical, Legal, and Social Implications of Gene-Environment Interaction Research.” We expand upon multiple key themes to inform broad recommendations and general guidance for addressing some of the most unique and challenging ELSI in GxE research. Key takeaways include strategies and approaches for establishing sustainable community partnerships, incorporating social determinants of health and environmental justice considerations into GxE research, effectively communicating and translating GxE findings, and addressing privacy and discrimination concerns in all GxE research going forward. Additional guidelines, resources, approaches, training, and capacity building are required to further support innovative GxE research and multidisciplinary GxE research teams.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22591","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142307513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marie-Sophie C. Ogloblinsky, Ozvan Bocher, Chaker Aloui, Anne-Louise Leutenegger, Ozan Ozisik, Anaïs Baudot, Elisabeth Tournier-Lasserve, Helen Castillo-Madeen, Daniel Lewinsohn, Donald F. Conrad, Emmanuelle Génin, Gaëlle Marenne
{"title":"PSAP-Genomic-Regions: A Method Leveraging Population Data to Prioritize Coding and Non-Coding Variants in Whole Genome Sequencing for Rare Disease Diagnosis","authors":"Marie-Sophie C. Ogloblinsky, Ozvan Bocher, Chaker Aloui, Anne-Louise Leutenegger, Ozan Ozisik, Anaïs Baudot, Elisabeth Tournier-Lasserve, Helen Castillo-Madeen, Daniel Lewinsohn, Donald F. Conrad, Emmanuelle Génin, Gaëlle Marenne","doi":"10.1002/gepi.22593","DOIUrl":"10.1002/gepi.22593","url":null,"abstract":"<div>\u0000 \u0000 <p>The introduction of Next-Generation Sequencing technologies in the clinics has improved rare disease diagnosis. Nonetheless, for very heterogeneous or very rare diseases, more than half of cases still lack molecular diagnosis. Novel strategies are needed to prioritize variants within a single individual. The Population Sampling Probability (PSAP) method was developed to meet this aim but only for coding variants in exome data. Here, we propose an extension of the PSAP method to the non-coding genome called PSAP-genomic-regions. In this extension, instead of considering genes as testing units (PSAP-genes strategy), we use genomic regions defined over the whole genome that pinpoint potential functional constraints. We conceived an evaluation protocol for our method using artificially generated disease exomes and genomes, by inserting coding and non-coding pathogenic ClinVar variants in large data sets of exomes and genomes from the general population. PSAP-genomic-regions significantly improves the ranking of these variants compared to using a pathogenicity score alone. Using PSAP-genomic-regions, more than 50% of non-coding ClinVar variants were among the top 10 variants of the genome. On real sequencing data from six patients with Cerebral Small Vessel Disease and nine patients with male infertility, all causal variants were ranked in the top 100 variants with PSAP-genomic-regions. By revisiting the testing units used in the PSAP method to include non-coding variants, we have developed PSAP-genomic-regions, an efficient whole-genome prioritization tool which offers promising results for the diagnosis of unresolved rare diseases.</p></div>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elisabeth A. Rosenthal, Li Hsu, Minta Thomas, Ulrike Peters, Christopher Kachulis, Karynne Patterson, Gail P. Jarvik
{"title":"Comparing Ancestry Standardization Approaches for a Transancestry Colorectal Cancer Polygenic Risk Score","authors":"Elisabeth A. Rosenthal, Li Hsu, Minta Thomas, Ulrike Peters, Christopher Kachulis, Karynne Patterson, Gail P. Jarvik","doi":"10.1002/gepi.22590","DOIUrl":"10.1002/gepi.22590","url":null,"abstract":"<div>\u0000 \u0000 <p>Colorectal cancer (CRC) is a complex disease with monogenic, polygenic and environmental risk factors. Polygenic risk scores (PRSs) aim to identify high polygenic risk individuals. Due to differences in genetic background, PRS distributions vary by ancestry, necessitating standardization. We compared four <i>post-hoc</i> methods using the All of Us Research Program Whole Genome Sequence data for a transancestry CRC PRS. We contrasted results from linear models trained on A. the entire data or an ancestrally diverse subset AND B. covariates including principal components of ancestry or admixture. Standardization with the training subset also adjusted the variance. All methods performed similarly within ancestry, OR (95% C.I.) per s.d. change in PRS: African 1.5 (1.02, 2.08), Admixed American 2.2 (1.27, 3.85), European 1.6 (1.43, 1.89), and Middle Eastern 1.1 (0.71, 1.63). Using admixture and an ancestrally diverse training set provided distributions closest to standard Normal. Training a model on ancestrally diverse participants, adjusting both the mean and variance using admixture as covariates, created standard Normal <i>z</i>-scores, which can be used to identify patients at high polygenic risk. These scores can be incorporated into comprehensive risk calculation including other known risk factors, allowing for more precise risk estimates.</p>\u0000 </div>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142307512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The 2024 Annual Meeting of the International Genetic Epidemiology Society","authors":"","doi":"10.1002/gepi.22598","DOIUrl":"10.1002/gepi.22598","url":null,"abstract":"","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"344-398"},"PeriodicalIF":1.7,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142307514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juyeon Kim, Young Sik Park, Jin Hee Kim, Yun-Chul Hong, Young-Chul Kim, In-Jae Oh, Sun Ha Jee, Myung-Ju Ahn, Jong-Won Kim, Jae-Joon Yim, Sungho Won
{"title":"Predicting Lung Cancer in Korean Never-Smokers With Polygenic Risk Scores","authors":"Juyeon Kim, Young Sik Park, Jin Hee Kim, Yun-Chul Hong, Young-Chul Kim, In-Jae Oh, Sun Ha Jee, Myung-Ju Ahn, Jong-Won Kim, Jae-Joon Yim, Sungho Won","doi":"10.1002/gepi.22586","DOIUrl":"10.1002/gepi.22586","url":null,"abstract":"<div>\u0000 \u0000 <p>In the last few decades, genome-wide association studies (GWAS) with more than 10,000 subjects have identified several loci associated with lung cancer and these loci have been used to develop novel risk prediction tools for cancer. The present study aimed to establish a lung cancer prediction model for Korean never-smokers using polygenic risk scores (PRSs); PRSs were calculated using a pruning-thresholding-based approach based on 11 genome-wide significant single nucleotide polymorphisms (SNPs). Overall, the odds ratios tended to increase as PRSs were larger, with the odds ratio of the top 5% PRSs being 1.71 (95% confidence interval: 1.31–2.23) using the 40%–60% percentile group as the reference, and the area under the curve (AUC) of the prediction model being of 0.76 (95% confidence interval: 0.747–0.774). The receiver operating characteristic (ROC) curves of the prediction model with and without PRSs as covariates were compared using DeLong's test, and a significant difference was observed. Our results suggest that PRSs can be valuable tools for predicting the risk of lung cancer.</p>\u0000 </div>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142284342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annika Jaitner, Krasimira Tsaneva-Atanasova, Rachel M. Freathy, Jack Bowden
{"title":"Exploring and Accounting for Genetically Driven Effect Heterogeneity in Mendelian Randomization","authors":"Annika Jaitner, Krasimira Tsaneva-Atanasova, Rachel M. Freathy, Jack Bowden","doi":"10.1002/gepi.22587","DOIUrl":"10.1002/gepi.22587","url":null,"abstract":"<p>Mendelian randomization (MR) is a framework to estimate the causal effect of a modifiable health exposure, drug target or pharmaceutical intervention on a downstream outcome by using genetic variants as instrumental variables. A crucial assumption allowing estimation of the average causal effect in MR, termed <i>homogeneity</i>, is that the causal effect does not vary across levels of any instrument used in the analysis. In contrast, the science of pharmacogenetics seeks to actively uncover and exploit genetically driven effect heterogeneity for the purposes of precision medicine. In this study, we consider a recently proposed method for performing pharmacogenetic analysis on observational data—the Triangulation WIthin a STudy (TWIST) framework—and explore how it can be combined with traditional MR approaches to properly characterise average causal effects and genetically driven effect heterogeneity. We propose two new methods which not only estimate the genetically driven effect heterogeneity but also enable the estimation of a causal effect in the genetic group with and without the risk allele separately. Both methods utilise homogeneity-respecting and homogeneity-violating genetic variants and rely on a different set of assumptions. Using data from the ALSPAC study, we apply our new methods to estimate the causal effect of smoking before and during pregnancy on offspring birth weight in mothers whose genetics mean they find it (relatively) easier or harder to quit smoking.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22587","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142284341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoran Liang, Ninon Mounier, Nicolas Apfel, Sara Khalid, Timothy M. Frayling, Jack Bowden
{"title":"Using clustering of genetic variants in Mendelian randomization to interrogate the causal pathways underlying multimorbidity from a common risk factor","authors":"Xiaoran Liang, Ninon Mounier, Nicolas Apfel, Sara Khalid, Timothy M. Frayling, Jack Bowden","doi":"10.1002/gepi.22582","DOIUrl":"10.1002/gepi.22582","url":null,"abstract":"<p>Mendelian randomization (MR) is an epidemiological approach that utilizes genetic variants as instrumental variables to estimate the causal effect of an exposure on a health outcome. This paper investigates an MR scenario in which genetic variants aggregate into clusters that identify heterogeneous causal effects. Such variant clusters are likely to emerge if they affect the exposure and outcome via distinct biological pathways. In the multi-outcome MR framework, where a shared exposure causally impacts several disease outcomes simultaneously, these variant clusters can provide insights into the common disease-causing mechanisms underpinning the co-occurrence of multiple long-term conditions, a phenomenon known as multimorbidity. To identify such variant clusters, we adapt the general method of agglomerative hierarchical clustering to multi-sample summary-data MR setup, enabling cluster detection based on variant-specific ratio estimates. Particularly, we tailor the method for multi-outcome MR to aid in elucidating the causal pathways through which a common risk factor contributes to multiple morbidities. We show in simulations that our “MR-AHC” method detects clusters with high accuracy, outperforming the existing methods. We apply the method to investigate the causal effects of high body fat percentage on type 2 diabetes and osteoarthritis, uncovering interconnected cellular processes underlying this multimorbid disease pair.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22582","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141975524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}