GWAS群体相关个体的快速连锁分析方法

IF 3.8 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology Pub Date : 2023-02-05 DOI:10.1002/gepi.22516

Gregory J. M. Zajac, Sarah A. Gagliano Taliun, Carlo Sidore, Sarah E. Graham, Bjørn O. Åsvold, Ben Brumpton, Jonas B. Nielsen, Wei Zhou, Maiken Gabrielsen, Anne H. Skogholt, Lars G. Fritsche, David Schlessinger, Francesco Cucca, Kristian Hveem, Cristen J. Willer, Gonçalo R. Abecasis

{"title":"GWAS群体相关个体的快速连锁分析方法","authors":"Gregory J. M. Zajac, Sarah A. Gagliano Taliun, Carlo Sidore, Sarah E. Graham, Bjørn O. Åsvold, Ben Brumpton, Jonas B. Nielsen, Wei Zhou, Maiken Gabrielsen, Anne H. Skogholt, Lars G. Fritsche, David Schlessinger, Francesco Cucca, Kristian Hveem, Cristen J. Willer, Gonçalo R. Abecasis","doi":"10.1002/gepi.22516","DOIUrl":null,"url":null,"abstract":"Linkage analysis, a class of methods for detecting co-segregation of genomic segments and traits in families, was used to map disease-causing genes for decades before genotyping arrays and dense SNP genotyping enabled genome-wide association studies in population samples. Population samples often contain related individuals, but the segregation of alleles within families is rarely used because traditional linkage methods are computationally inefficient for larger datasets. Here, we describe Population Linkage, a novel application of Haseman–Elston regression as a method of moments estimator of variance components and their standard errors. We achieve additional computational efficiency by using modern methods for detection of IBD segments and variance component estimation, efficient preprocessing of input data, and minimizing redundant numerical calculations. We also refined variance component models to account for the biases in population-scale methods for IBD segment detection. We ran Population Linkage on four blood lipid traits in over 70,000 individuals from the HUNT and SardiNIA studies, successfully detecting 25 known genetic signals. One notable linkage signal that appeared in both was for low-density lipoprotein (LDL) cholesterol levels in the region near the gene APOE (LOD = 29.3, variance explained = 4.1%). This is the region where the missense variants rs7412 and rs429358, which together make up the ε2, ε3, and ε4 alleles each account for 2.4% and 0.8% of variation in circulating LDL cholesterol. Our results show the potential for linkage analysis and other large-scale applications of method of moments variance components estimation.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 3","pages":"231-248"},"PeriodicalIF":3.8000,"publicationDate":"2023-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22516","citationCount":"0","resultStr":"{\"title\":\"A fast linkage method for population GWAS cohorts with related individuals\",\"authors\":\"Gregory J. M. Zajac, Sarah A. Gagliano Taliun, Carlo Sidore, Sarah E. Graham, Bjørn O. Åsvold, Ben Brumpton, Jonas B. Nielsen, Wei Zhou, Maiken Gabrielsen, Anne H. Skogholt, Lars G. Fritsche, David Schlessinger, Francesco Cucca, Kristian Hveem, Cristen J. Willer, Gonçalo R. Abecasis\",\"doi\":\"10.1002/gepi.22516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Linkage analysis, a class of methods for detecting co-segregation of genomic segments and traits in families, was used to map disease-causing genes for decades before genotyping arrays and dense SNP genotyping enabled genome-wide association studies in population samples. Population samples often contain related individuals, but the segregation of alleles within families is rarely used because traditional linkage methods are computationally inefficient for larger datasets. Here, we describe Population Linkage, a novel application of Haseman–Elston regression as a method of moments estimator of variance components and their standard errors. We achieve additional computational efficiency by using modern methods for detection of IBD segments and variance component estimation, efficient preprocessing of input data, and minimizing redundant numerical calculations. We also refined variance component models to account for the biases in population-scale methods for IBD segment detection. We ran Population Linkage on four blood lipid traits in over 70,000 individuals from the HUNT and SardiNIA studies, successfully detecting 25 known genetic signals. One notable linkage signal that appeared in both was for low-density lipoprotein (LDL) cholesterol levels in the region near the gene APOE (LOD = 29.3, variance explained = 4.1%). This is the region where the missense variants rs7412 and rs429358, which together make up the ε2, ε3, and ε4 alleles each account for 2.4% and 0.8% of variation in circulating LDL cholesterol. Our results show the potential for linkage analysis and other large-scale applications of method of moments variance components estimation.\",\"PeriodicalId\":12710,\"journal\":{\"name\":\"Genetic Epidemiology\",\"volume\":\"47 3\",\"pages\":\"231-248\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2023-02-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22516\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetic Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22516\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22516","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

连锁分析是一种检测家族中基因组片段和性状共分离的方法，在基因分型阵列和密集SNP基因分型实现种群样本全基因组关联研究之前的几十年里，它被用于绘制致病基因。群体样本通常包含相关个体，但很少使用家族内等位基因的分离，因为传统的连锁方法对于较大的数据集计算效率低下。本文描述了Haseman-Elston回归作为方差分量及其标准误差矩估计方法的一种新应用——人口连锁。我们通过使用现代方法来检测IBD片段和方差分量估计，有效的预处理输入数据，并最大限度地减少冗余的数值计算，实现了额外的计算效率。我们还改进了方差成分模型，以解释IBD片段检测的总体尺度方法中的偏差。我们对来自HUNT和SardiNIA研究的7万多人的四种血脂特征进行了Population Linkage，成功地检测到25种已知的遗传信号。两者中出现的一个值得注意的连锁信号是APOE基因附近区域的低密度脂蛋白(LDL)胆固醇水平(LOD = 29.3，方差解释= 4.1%)。在这个区域，错配变异rs7412和rs429358共同构成了ε2、ε3和ε4等位基因，分别占循环LDL胆固醇变异的2.4%和0.8%。我们的研究结果显示了矩方差分量估计方法在连锁分析和其他大规模应用中的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A fast linkage method for population GWAS cohorts with related individuals

查看原文本刊更多论文

A fast linkage method for population GWAS cohorts with related individuals

Linkage analysis, a class of methods for detecting co-segregation of genomic segments and traits in families, was used to map disease-causing genes for decades before genotyping arrays and dense SNP genotyping enabled genome-wide association studies in population samples. Population samples often contain related individuals, but the segregation of alleles within families is rarely used because traditional linkage methods are computationally inefficient for larger datasets. Here, we describe Population Linkage, a novel application of Haseman–Elston regression as a method of moments estimator of variance components and their standard errors. We achieve additional computational efficiency by using modern methods for detection of IBD segments and variance component estimation, efficient preprocessing of input data, and minimizing redundant numerical calculations. We also refined variance component models to account for the biases in population-scale methods for IBD segment detection. We ran Population Linkage on four blood lipid traits in over 70,000 individuals from the HUNT and SardiNIA studies, successfully detecting 25 known genetic signals. One notable linkage signal that appeared in both was for low-density lipoprotein (LDL) cholesterol levels in the region near the gene APOE (LOD = 29.3, variance explained = 4.1%). This is the region where the missense variants rs7412 and rs429358, which together make up the ε2, ε3, and ε4 alleles each account for 2.4% and 0.8% of variation in circulating LDL cholesterol. Our results show the potential for linkage analysis and other large-scale applications of method of moments variance components estimation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genetic Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

4.40

自引率

9.50%

发文量

审稿时长

6-12 weeks

期刊介绍： Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations. Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.