{"title":"ON BLOCKWISE AND REFERENCE PANEL-BASED ESTIMATORS FOR GENETIC DATA PREDICTION IN HIGH DIMENSIONS.","authors":"Bingxin Zhao, Shurong Zheng, Hongtu Zhu","doi":"10.1214/24-aos2378","DOIUrl":null,"url":null,"abstract":"<p><p>Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11391480/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-aos2378","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.
期刊介绍:
The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.