{"title":"Estimation of the covariance structure from SNP allele frequencies","authors":"J. van Waaij, Zilong Li, C. Wiuf","doi":"10.1515/sagmb-2022-0005","DOIUrl":null,"url":null,"abstract":"Abstract We propose two new statistics, V ̂ $\\hat{V}$ and S ̂ $\\hat{S}$ , to disentangle the population history of related populations from SNP frequency data. If the populations are related by a tree, we show by theoretical means as well as by simulation that the new statistics are able to identify the root of a tree correctly, in contrast to standard statistics, such as the observed matrix of F 2-statistics (distances between pairs of populations). The statistic V ̂ $\\hat{V}$ is obtained by averaging over all SNPs (similar to standard statistics). Its expectation is the true covariance matrix of the observed population SNP frequencies, offset by a matrix with identical entries. In contrast, the statistic S ̂ $\\hat{S}$ is put in a Bayesian context and is obtained by averaging over pairs of SNPs, such that each SNP is only used once. It thus makes use of the joint distribution of pairs of SNPs. In addition, we provide a number of novel mathematical results about old and new statistics, and their mutual relationship.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2022-0005","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract We propose two new statistics, V ̂ $\hat{V}$ and S ̂ $\hat{S}$ , to disentangle the population history of related populations from SNP frequency data. If the populations are related by a tree, we show by theoretical means as well as by simulation that the new statistics are able to identify the root of a tree correctly, in contrast to standard statistics, such as the observed matrix of F 2-statistics (distances between pairs of populations). The statistic V ̂ $\hat{V}$ is obtained by averaging over all SNPs (similar to standard statistics). Its expectation is the true covariance matrix of the observed population SNP frequencies, offset by a matrix with identical entries. In contrast, the statistic S ̂ $\hat{S}$ is put in a Bayesian context and is obtained by averaging over pairs of SNPs, such that each SNP is only used once. It thus makes use of the joint distribution of pairs of SNPs. In addition, we provide a number of novel mathematical results about old and new statistics, and their mutual relationship.
期刊介绍:
Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.