{"title":"Identity-By-Descent Mapping Using Multi-Individual IBD With Genome-Wide Multiple Testing Adjustment","authors":"Ruoyi Cai, Sharon R. Browning","doi":"10.1002/gepi.70015","DOIUrl":"https://doi.org/10.1002/gepi.70015","url":null,"abstract":"<div>\u0000 \u0000 <p>We present an identity-by-descent mapping approach to test the association between genome-wide loci and complex traits. Our method evaluates whether levels of genetic similarities at specific genomic locations, captured by local relatedness matrices derived from multi-individual IBD sharing, are associated with phenotypic variation in complex traits. In addition, we propose an approach to adjust for multiple testing in genome-wide IBD mapping scans based on the correlation structure between test statistics across the genome. Through simulation studies, we demonstrate that our test has a well-controlled genome-wide type I error rate and superior power to detect rare and untyped variants compared to standard single-variant tests. We applied our method to systolic blood pressure data from White British individuals in the UK Biobank.</p>\u0000 </div>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144714682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rebecca Darlay, Rupal L. Shah, Richard M. Dodds, Anand T. N. Nair, Ewan R. Pearson, Miles D. Witham, Heather J. Cordell, ADMISSION Research Collaborative
{"title":"Exploring Similarities and Differences Between Methods That Exploit Patterns of Local Genetic Correlation to Identify Shared Causal Loci Through Application to Genome-Wide Association Studies of Multiple Long Term Conditions","authors":"Rebecca Darlay, Rupal L. Shah, Richard M. Dodds, Anand T. N. Nair, Ewan R. Pearson, Miles D. Witham, Heather J. Cordell, ADMISSION Research Collaborative","doi":"10.1002/gepi.70012","DOIUrl":"https://doi.org/10.1002/gepi.70012","url":null,"abstract":"<p>Genetic correlation analysis can provide useful insight into the shared genetic basis between traits or conditions of interest. However, most genome-wide analyses only inform about the degree of global (overall) genetic similarity and do not identify the specific genomic regions that give rise to this similarity. Identification of the key genomic regions contributing to shared genetic correlation between traits could allow the genes in these regions to be prioritised for investigation of potential shared biological mechanisms. In recent years, several statistical tools (e.g. LAVA, ρ-HESS, SUPERGNOVA and LOGODetect) have been developed to investigate local (in contrast to global) genetic correlation. These tools partition the genome into multiple segments and provide estimates of the genetic correlation captured by each individual segment. We applied these tools to publicly available European ancestry genome-wide association study (GWAS) summary statistics for three pairs of commonly occurring conditions: hypertension with atrial fibrillation and flutter, hypertension with chronic kidney disease, and hypertension with type 2 diabetes. Despite each of the methods aiming to address the same question, the results were found to be inconsistent across tools, with some identified regions overlapping and others implicated only by a single tool. Computer simulations using genetic data from UK Biobank, carried out under known generating conditions, suggest that LAVA and, to a lesser extent, ρ-HESS, provide the most reliable identification of genuine shared genetic factors. A newly-developed tool, HDL-L, also performed highly competitively. Here we highlight the similarities and differences between the results obtained from these methods and discuss some potential reasons underlying these differences.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.70012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Association Test Leveraging Unknown Genetic Interactions: Application to Cystic Fibrosis Lung Disease","authors":"Sangook Kim, Yu-Chung Lin, Lisa J. Strug","doi":"10.1002/gepi.70013","DOIUrl":"https://doi.org/10.1002/gepi.70013","url":null,"abstract":"<p>For complex traits such as lung disease in Cystic Fibrosis (CF), Gene x Gene or Gene x Environment interactions can impact disease severity but these remain largely unknown. Unaccounted-for genetic interactions introduce a distributional shift in the quantitative trait across the genotypic groups. Joint location and scale tests, or full distributional differences across genotype groups can account for unknown genetic interactions and increase power for gene identification compared with the conventional association test. Here we propose a new joint location and scale test (JLS), a quantile regression-basd JLS (qJLS), that addresses previous limitations. Specifically, qJLS is free of distributional assumptions, thus applies to non-Gaussian traits; is as powerful as the existing JLS tests under Gaussian traits; and is computationally efficient for genome-wide association studies (GWAS). Our simulation studies, which model unknown genetic interactions, demonstrate that qJLS is robust to skewed and heavy-tailed error distributions and is as powerful as other JLS tests in the literature under normality. Without any unknown genetic interaction, qJLS shows a large increase in power with non-Gaussian traits over conventional association tests and is slightly less powerful under normality. We apply the qJLS method to the Canadian CF Gene Modifier Study (n = 1,997) and identified a genome-wide significant variant, rs9513900 on chromosome 13, that had not previously been reported to contribute to CF lung disease. qJLS provides a powerful alternative to conventional genetic association tests, where interactions may contribute to a quantitative trait.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.70013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144300332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanghun Lee, Julian Hecker, Badri N. Vardarajan, Rachel S. Kelly, Nicole Prince, Kristina Mullin, Sharon M. Lutz, Georg Hahn, Jessica Lasky-Su, Richard P. Mayeux, Rudolph E. Tanzi, Christoph Lange, Dmitry Prokopenko
{"title":"Uncovering Ethnicity-Specific Recessive Loci for Alzheimer's Disease in 89 Dominican Families Using Family-Based WGS Analysis","authors":"Sanghun Lee, Julian Hecker, Badri N. Vardarajan, Rachel S. Kelly, Nicole Prince, Kristina Mullin, Sharon M. Lutz, Georg Hahn, Jessica Lasky-Su, Richard P. Mayeux, Rudolph E. Tanzi, Christoph Lange, Dmitry Prokopenko","doi":"10.1002/gepi.70014","DOIUrl":"https://doi.org/10.1002/gepi.70014","url":null,"abstract":"<div>\u0000 \u0000 <p>In a sample of 89 Dominican families from the National Institute on Aging's Alzheimer's Disease Sequencing Project (ADSP), where at least one family member had a confirmed Alzheimer's disease (AD) diagnosis, we conducted an exploratory recessive whole-genome sequencing (WGS) analysis using family-based association testing (FBAT-GEE). This method tests jointly for affection status and age-at-onset under a recessive inheritance mode. Our analysis identified a genome-wide significant association for rs847697 in the <i>PDK2</i> gene on chromosome 17, near the <i>MAPT</i> gene previously implicated in AD through linkage studies. Additionally, we detected four suggestive loci (<i>p</i>-value < 1 × 10<sup>−6</sup>). Given the unexpected strength of these associations in a modest sample size, we rigorously reviewed data quality, ruling out technical artifacts. The <i>PDK2</i> association was driven by a small subset of families, aligning with recessive inheritance expectations. However, it could not be replicated in other AD datasets including Estudio Familiar de Influencia Genetica en Alzheimer (EFIGA), the National Institute of Mental Health (NIMH), and European Americans from NIA ADSP, suggesting a possible population-specific or ancestry-related effect. This study highlights the effectiveness of the FBAT approach in detecting unique genetic associations in smaller, isolated populations—findings that might be diluted in larger biobank studies where these populations are underrepresented.</p>\u0000 </div>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144244417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siri N. Skodvin, Håkon K. Gjessing, Astanand Jugessur, Julia Romanowska, Alexandra Havdahl, Siri E. Håberg, Hans Ivar Hanevik, Robert Lyle, Rolv Terje Lie, Miriam Gjerdevik
{"title":"Genome-Wide Association Analyses in Family Triads and Dyads Following Assisted Reproductive Technology","authors":"Siri N. Skodvin, Håkon K. Gjessing, Astanand Jugessur, Julia Romanowska, Alexandra Havdahl, Siri E. Håberg, Hans Ivar Hanevik, Robert Lyle, Rolv Terje Lie, Miriam Gjerdevik","doi":"10.1002/gepi.70011","DOIUrl":"https://doi.org/10.1002/gepi.70011","url":null,"abstract":"<p>Genetic selection occurs at different stages before a successful birth. The genetic makeup of a couple may influence the likelihood of needing assisted reproductive technology (ART) to achieve conception. However, frequent early fetal losses may also be perceived as reduced couple fertility and may thus be a contributing factor to the need for ART treatment. As ART procedures may enhance early fetal survival, genes that impact fetal viability may have a different allele distribution in ART offspring than expected under Mendelian transmission, as well as compared with the general population. With genetic data available from the Norwegian Mother, Father, and Child Cohort Study, we defined fetal survival as the study outcome and analyzed 1336 case-parent triads and dyads where the offspring were conceived by ART. Using log-linear models implemented in the R package Haplin, we conducted genome-wide scans to estimate fetal, maternal, and parent-of-origin effects and provided a detailed discussion on how these effects are estimated and interpreted. We detected fetal effects for single-nucleotide polymorphisms (SNPs) located in <i>CXXC4-AS1</i>, <i>OPCML</i>, and <i>DYNLRB2-AS1</i>. Since these effects were not observed in a limited follow-up analysis of non-ART triads, the identified effects are unlikely caused by genetic selection before fertilization.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 5","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.70011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144197458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elizabeth R. Feldman, Yunqi Li, David J. Cutler, Tracie C. Rosser, Stephanie B. Wechsler, Lauren Sanclemente, Angela L. Rachubinski, Natalina Elliott, Paresh Vyas, Irene Roberts, Karen R. Rabin, Michael Wagner, Bruce D. Gelb, Joaquin M. Espinosa, Philip J. Lupo, Adam J. de Smith, Stephanie L. Sherman, Elizabeth J. Leslie-Clarkson
{"title":"Genome-Wide Association Studies of Down Syndrome Associated Congenital Heart Defects Suggests a Genetically Heterogeneous Risk for CHD in DS","authors":"Elizabeth R. Feldman, Yunqi Li, David J. Cutler, Tracie C. Rosser, Stephanie B. Wechsler, Lauren Sanclemente, Angela L. Rachubinski, Natalina Elliott, Paresh Vyas, Irene Roberts, Karen R. Rabin, Michael Wagner, Bruce D. Gelb, Joaquin M. Espinosa, Philip J. Lupo, Adam J. de Smith, Stephanie L. Sherman, Elizabeth J. Leslie-Clarkson","doi":"10.1002/gepi.70010","DOIUrl":"https://doi.org/10.1002/gepi.70010","url":null,"abstract":"<div>\u0000 \u0000 <p>Congenital heart defects (CHDs) are the most common structural birth defect and are present in 40%–50% of children born with Down syndrome (DS). To characterize the genetic architecture of DS-associated CHD, we sequenced genomes of a multiethnic group of children with DS and a CHD (<i>n</i> = 886: atrioventricular septal defects (AVSD), <i>n</i> = 438; atrial septal defects (ASD), <i>n</i> = 122; ventricular septal defects (VSD), <i>n</i> = 170; other types of CHD, <i>n</i> = 156) and DS with a structurally normal heart (DS + NH, <i>n</i> = 572). We performed four genome-wide association study (GWAS) for common variants (MAF > 0.05) comparing DS with CHD, stratified by CHD-subtype, to DS + NH controls. Although no SNP achieved genome-wide significance, multiple loci in each analysis achieved suggestive significance (<i>p</i> < 2 × 10<sup>−6</sup>). Of these, the 1p35.1 locus (near <i>RBBP4</i>) was specifically associated with ASD risk, and the 5q35.2 locus (near <i>MSX2</i>) was associated with any type of CHD. Each of the suggestive loci contained one or more plausible candidate genes expressed in the developing heart. While no SNP replicated (<i>p</i> < 2 × 10<sup>−6</sup>) in an independent cohort of DS + CHD (DS + CHD: <i>n</i> = 229; DS + NH: <i>n</i> = 197), most SNPs that were suggestive in our GWASs remained suggestive when meta-analyzed with the GWASs from the replication cohort. These results build on previous work to identify genetic modifiers of DS-associated CHD.</p>\u0000 </div>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144118194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Shriner, Amy R. Bentley, Ayo P. Doumatey, Jie Zhou, Guanjie Chen, Charles N. Rotimi, Adebowale A. Adeyemo
{"title":"Three Loci Affecting Variance of Body Mass Index in African Americans and Sub-Saharan Africans","authors":"Daniel Shriner, Amy R. Bentley, Ayo P. Doumatey, Jie Zhou, Guanjie Chen, Charles N. Rotimi, Adebowale A. Adeyemo","doi":"10.1002/gepi.70009","DOIUrl":"https://doi.org/10.1002/gepi.70009","url":null,"abstract":"<p>Conventional genome-wide association studies (GWAS) are designed to assess the effect of a genetic locus on phenotypic mean by genotype. Such loci explain a proportion of phenotypic variance known as narrow-sense heritability. In contrast, variance quantitative trait loci (vQTL) are associated with the phenotypic variance by genotype. These loci explain an additional proportion of phenotypic variance and contribute to broad-sense heritability but not to narrow-sense heritability. Here, a genome-wide vQTL analysis in 22,805 African Americans yielded eight loci for body mass index (BMI). Of these loci, three were replicated in 6002 sub-Saharan Africans. No locus reached genome-wide significance using the standard additive model. Furthermore, no locus showed evidence for natural selection, haplotype effects, or gene × sex or gene × study interactions. Two loci showed evidence for an effect of locus-specific ancestry resulting from admixture and for a gene × gene interaction. One locus showed evidence for interaction with diastolic blood pressure, consistent with this vQTL capturing an unmodeled gene × covariate interaction. These analyses demonstrate that relevant BMI loci can be detected by evaluating vQTL and that these loci contribute to the underexplored broad-sense heritability for this trait.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.70009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143905286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanping Li, Jaclyn M. Goodrich, Karen E. Peterson, Peter X.-K. Song, Lan Luo
{"title":"Uncertainty Quantification in Epigenetic Clocks via Conformalized Quantile Regression","authors":"Yanping Li, Jaclyn M. Goodrich, Karen E. Peterson, Peter X.-K. Song, Lan Luo","doi":"10.1002/gepi.70008","DOIUrl":"https://doi.org/10.1002/gepi.70008","url":null,"abstract":"<p>DNA methylation (DNAm) is a chemical modification of DNA that can be influenced by various factors, including age, the environment, and lifestyle. An epigenetic clock is a predictive tool that measures biological age based on DNAm levels. It can provide insights into an individual's biological age, which may differ from their chronological age. This difference, known as the epigenetic age acceleration, may reflect health status and the risk for age-related diseases. Moreover, epigenetic clocks are used in studies of aging to assess the effectiveness of antiaging interventions and to understand the underlying mechanisms of aging and disease. Various epigenetic clocks have been developed using samples from different populations, tissues, and cell types, typically by training high-dimensional linear regression models with an elastic net penalty. While these models can predict mean biological age based on DNAm with high precision, there is a lack of uncertainty quantification which is important for interpreting the precision of age estimations and for clinical decision-making. To understand the distribution of a biological age clock beyond its mean, we propose a general pipeline for training epigenetic clocks, based on an integration of high-dimensional quantile regression and conformal prediction, to effectively reveal population heterogeneity and construct prediction intervals. Our approach produces adaptive prediction intervals not only achieving nominal coverage but also accounting for the inherent variability across individuals. By using the data collected from 728 blood samples in 11 DNAm data sets from children, we find that our quantile regression-based prediction intervals are narrower than those derived from conventional mean regression-based epigenetic clocks. This observation demonstrates an improved statistical efficiency over the existing pipeline for training epigenetic clocks. In addition, the resulting intervals have a synchronized varying pattern to age acceleration, effectively revealing cellular evolutionary heterogeneity in age patterns in different developmental stages during individual childhoods and adolescent cohort. Our findings suggest that conformalized high-dimensional quantile regression can produce valid prediction intervals and uncover underlying population heterogeneity. Although our methodology focuses on the distribution of measures of biological aging in children, it is applicable to a broader range of age groups to improve understanding of epigenetic age beyond the mean. This inference-based toolbox could provide valuable insights for future applications of epigenetic interventions for age-related diseases.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.70008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunsung Lee, Miriam Gjerdevik, Astanand Jugessur, Håkon Kristian Gjessing, Elizabeth Corfield, Alexandra Havdahl, Jennifer Ruth Harris, Maria Christine Magnus, Siri Eldevik Håberg, Per Magnus
{"title":"Parent-of-Origin Effects in Childhood Asthma at Seven Years of Age","authors":"Yunsung Lee, Miriam Gjerdevik, Astanand Jugessur, Håkon Kristian Gjessing, Elizabeth Corfield, Alexandra Havdahl, Jennifer Ruth Harris, Maria Christine Magnus, Siri Eldevik Håberg, Per Magnus","doi":"10.1002/gepi.70007","DOIUrl":"https://doi.org/10.1002/gepi.70007","url":null,"abstract":"<p>Childhood asthma is more common among children whose mothers have asthma than among those whose fathers have asthma. The reasons for this are unknown, and we hypothesize that genomic imprinting may partly explain this observation. Our aim is to assess parent-of-origin (PoO) effects on childhood asthma by analyzing SNP array genotype data from a large population-based cohort. To estimate PoO effects in parent-reported childhood asthma at 7 years of age, we fit a log-linear model implemented in the HAPLIN R package to SNP array genotype data from 915 mother–father–child case triads, 603 mother–child case dyads, and 113 father–child case dyads participating in the Norwegian Mother, Father, and Child Cohort Study (MoBa). We found that alleles at two SNPs—rs3003214 and rs3003211—near the adenylosuccinate synthase 2 gene (<i>ADSS2</i> on chromosome 1q44) showed significant PoO effects at a false positive rate ≤ 0.05. The ratio of the effect of the maternally and paternally inherited G-allele at rs3003214 was 1.68 (95% CI: 1.41–2.03, <i>p</i> value = 1.13E−08). Our results suggest PoO effects at the <i>ADSS2</i> gene, particularly the maternally inherited G-allele at rs3003214, may contribute to the maternal effect in childhood asthma.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.70007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatemeh Yavartanoo, Myriam Brossard, Shelley B. Bull, Andrew D. Paterson, Yun Joo Yoo
{"title":"Dimension Reduction Using Local Principal Components for Regression-Based Multi-SNP Analysis in 1000 Genomes and the Canadian Longitudinal Study on Aging (CLSA)","authors":"Fatemeh Yavartanoo, Myriam Brossard, Shelley B. Bull, Andrew D. Paterson, Yun Joo Yoo","doi":"10.1002/gepi.70005","DOIUrl":"https://doi.org/10.1002/gepi.70005","url":null,"abstract":"<div>\u0000 \u0000 <p>For genetic association analysis based on multiple SNP regression of genotypes obtained by dense DNA sequencing or array data imputation, multi-collinearity can be a severe issue causing failure to fit the regression model. In this study, we propose a method of Dimension Reduction using Local Principal Components (DRLPC) which aims to resolve multi-collinearity by removing SNPs under the assumption that the remaining SNPs can capture the effect of a removed SNP due to high linear dependency. This approach to dimension reduction is expected to improve the power of regression-based statistical tests. We apply DRLPC to chromosome 22 SNPs of two data sets, the 1000 Genomes Project (phase 3) and the Canadian Longitudinal Study on Aging (CLSA), and calculate variance inflation factors (VIF) in various SNP-sets before and after implementing DRLPC as a metric of collinearity. Notably, DRLPC addresses multi-collinearity by excluding variables with a VIF exceeding a predetermined threshold (VIF = 20), thereby improving applicability for subsequent regression analyses. The number of variables in a final set for regression analysis is reduced to around 20% on average for larger-sized genes, whereas for smaller ones, the proportion is around 48%; suggesting that DRLPC is particularly effective for larger genes. We also compare the power of several multi-SNP statistics constructed for gene-specific analysis to evaluate power gains achieved by DRLPC. In simulation studies based on 100 genes with ≤ 500 SNPs per gene, DRLPC increases the power of the multiple regression Wald test from 60% to around 80%.</p>\u0000 </div>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"49 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}