Monica Isgut, Kijoung Song, Margaret G. Ehm, May Dongmei Wang, Jonathan Davitte
{"title":"病例和对照定义对全基因组关联研究(GWAS)结果的影响","authors":"Monica Isgut, Kijoung Song, Margaret G. Ehm, May Dongmei Wang, Jonathan Davitte","doi":"10.1002/gepi.22523","DOIUrl":null,"url":null,"abstract":"<p>Genome-wide association studies (GWAS) have significantly advanced our understanding of the genetic underpinnings of diseases, but case and control cohort definitions for a given disease can vary between different published studies. For example, two GWAS for the same disease using the UK Biobank data set might use different data sources (i.e., self-reported questionnaires, hospital records, etc.) or different levels of granularity (i.e., specificity of inclusion criteria) to define cases and controls. The extent to which this variability in cohort definitions impacts the end-results of a GWAS study is unclear. In this study, we systematically evaluated the effect of the data sources used for case and control definitions on GWAS findings. Using the UK Biobank, we selected three diseases—glaucoma, migraine, and iron-deficiency anemia. For each disease, we designed 13 GWAS, each using different combinations of data sources to define cases and controls, and then calculated the pairwise genetic correlations between all GWAS for each disease. We found that the data sources used to define cases for a given disease can have a significant impact on GWAS end-results, but the extent of this depends heavily on the disease in question. This suggests the need for greater scrutiny on how case cohorts are defined for GWAS.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 5","pages":"394-406"},"PeriodicalIF":1.7000,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effect of case and control definitions on genome-wide association study (GWAS) findings\",\"authors\":\"Monica Isgut, Kijoung Song, Margaret G. Ehm, May Dongmei Wang, Jonathan Davitte\",\"doi\":\"10.1002/gepi.22523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Genome-wide association studies (GWAS) have significantly advanced our understanding of the genetic underpinnings of diseases, but case and control cohort definitions for a given disease can vary between different published studies. For example, two GWAS for the same disease using the UK Biobank data set might use different data sources (i.e., self-reported questionnaires, hospital records, etc.) or different levels of granularity (i.e., specificity of inclusion criteria) to define cases and controls. The extent to which this variability in cohort definitions impacts the end-results of a GWAS study is unclear. In this study, we systematically evaluated the effect of the data sources used for case and control definitions on GWAS findings. Using the UK Biobank, we selected three diseases—glaucoma, migraine, and iron-deficiency anemia. For each disease, we designed 13 GWAS, each using different combinations of data sources to define cases and controls, and then calculated the pairwise genetic correlations between all GWAS for each disease. We found that the data sources used to define cases for a given disease can have a significant impact on GWAS end-results, but the extent of this depends heavily on the disease in question. This suggests the need for greater scrutiny on how case cohorts are defined for GWAS.</p>\",\"PeriodicalId\":12710,\"journal\":{\"name\":\"Genetic Epidemiology\",\"volume\":\"47 5\",\"pages\":\"394-406\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetic Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22523\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22523","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Effect of case and control definitions on genome-wide association study (GWAS) findings
Genome-wide association studies (GWAS) have significantly advanced our understanding of the genetic underpinnings of diseases, but case and control cohort definitions for a given disease can vary between different published studies. For example, two GWAS for the same disease using the UK Biobank data set might use different data sources (i.e., self-reported questionnaires, hospital records, etc.) or different levels of granularity (i.e., specificity of inclusion criteria) to define cases and controls. The extent to which this variability in cohort definitions impacts the end-results of a GWAS study is unclear. In this study, we systematically evaluated the effect of the data sources used for case and control definitions on GWAS findings. Using the UK Biobank, we selected three diseases—glaucoma, migraine, and iron-deficiency anemia. For each disease, we designed 13 GWAS, each using different combinations of data sources to define cases and controls, and then calculated the pairwise genetic correlations between all GWAS for each disease. We found that the data sources used to define cases for a given disease can have a significant impact on GWAS end-results, but the extent of this depends heavily on the disease in question. This suggests the need for greater scrutiny on how case cohorts are defined for GWAS.
期刊介绍:
Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations.
Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.