Soutrik Mandal, Do Hyun Kim, Xing Hua, Shilan Li, Jianxin Shi
{"title":"Estimating the overall fraction of phenotypic variance attributed to high-dimensional predictors measured with error.","authors":"Soutrik Mandal, Do Hyun Kim, Xing Hua, Shilan Li, Jianxin Shi","doi":"10.1093/biostatistics/kxad001","DOIUrl":null,"url":null,"abstract":"<p><p>In prospective genomic studies (e.g., DNA methylation, metagenomics, and transcriptomics), it is crucial to estimate the overall fraction of phenotypic variance (OFPV) attributed to the high-dimensional genomic variables, a concept similar to heritability analyses in genome-wide association studies (GWAS). Unlike genetic variants in GWAS, these genomic variables are typically measured with error due to technical limitation and temporal instability. While the existing methods developed for GWAS can be used, ignoring measurement error may severely underestimate OFPV and mislead the design of future studies. Assuming that measurement error variances are distributed similarly between causal and noncausal variables, we show that the asymptotic attenuation factor equals to the average intraclass correlation coefficients of all genomic variables, which can be estimated based on a pilot study with repeated measurements. We illustrate the method by estimating the contribution of microbiome taxa to body mass index and multiple allergy traits in the American Gut Project. Finally, we show that measurement error does not cause meaningful bias when estimating the correlation of effect sizes for two traits.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"486-503"},"PeriodicalIF":1.8000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017132/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biostatistics/kxad001","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
In prospective genomic studies (e.g., DNA methylation, metagenomics, and transcriptomics), it is crucial to estimate the overall fraction of phenotypic variance (OFPV) attributed to the high-dimensional genomic variables, a concept similar to heritability analyses in genome-wide association studies (GWAS). Unlike genetic variants in GWAS, these genomic variables are typically measured with error due to technical limitation and temporal instability. While the existing methods developed for GWAS can be used, ignoring measurement error may severely underestimate OFPV and mislead the design of future studies. Assuming that measurement error variances are distributed similarly between causal and noncausal variables, we show that the asymptotic attenuation factor equals to the average intraclass correlation coefficients of all genomic variables, which can be estimated based on a pilot study with repeated measurements. We illustrate the method by estimating the contribution of microbiome taxa to body mass index and multiple allergy traits in the American Gut Project. Finally, we show that measurement error does not cause meaningful bias when estimating the correlation of effect sizes for two traits.
在前瞻性基因组研究(如 DNA 甲基化、元基因组学和转录组学)中,估算归因于高维基因组变量的表型变异(OFPV)的总体比例至关重要,这一概念类似于全基因组关联研究(GWAS)中的遗传率分析。与全基因组关联研究中的遗传变异不同,这些基因组变量的测量通常会因技术限制和时间不稳定性而产生误差。虽然可以使用为全基因组关联研究(GWAS)开发的现有方法,但忽略测量误差可能会严重低估 OFPV,并误导未来的研究设计。假设测量误差方差在因果变量和非因果变量之间分布相似,我们证明渐近衰减因子等于所有基因组变量的平均类内相关系数,这可以根据重复测量的试验研究来估计。我们通过估算美国肠道项目中微生物群分类群对体重指数和多种过敏特征的贡献来说明这种方法。最后,我们表明,在估计两个性状的效应大小相关性时,测量误差不会造成有意义的偏差。
期刊介绍:
Among the important scientific developments of the 20th century is the explosive growth in statistical reasoning and methods for application to studies of human health. Examples include developments in likelihood methods for inference, epidemiologic statistics, clinical trials, survival analysis, and statistical genetics. Substantive problems in public health and biomedical research have fueled the development of statistical methods, which in turn have improved our ability to draw valid inferences from data. The objective of Biostatistics is to advance statistical science and its application to problems of human health and disease, with the ultimate goal of advancing the public''s health.