种内进化伞形模型中主成分的理论分析

IF 1.2 4区 生物学 Q4 ECOLOGY
Maxime Estavoyer , Olivier François
{"title":"种内进化伞形模型中主成分的理论分析","authors":"Maxime Estavoyer ,&nbsp;Olivier François","doi":"10.1016/j.tpb.2022.08.002","DOIUrl":null,"url":null,"abstract":"<div><p>Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model – the umbrella model – for the diffusion of genetic variants. The model is based on genetic drift without any particular geographical structure. In the umbrella model, splits from an ancestral population occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. When variants uniquely represented in the sample are removed, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including singleton variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.</p></div>","PeriodicalId":49437,"journal":{"name":"Theoretical Population Biology","volume":"148 ","pages":"Pages 11-21"},"PeriodicalIF":1.2000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0040580922000521/pdfft?md5=e289fcb0a12b991033f6945f5b6b7d2e&pid=1-s2.0-S0040580922000521-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Theoretical analysis of principal components in an umbrella model of intraspecific evolution\",\"authors\":\"Maxime Estavoyer ,&nbsp;Olivier François\",\"doi\":\"10.1016/j.tpb.2022.08.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model – the umbrella model – for the diffusion of genetic variants. The model is based on genetic drift without any particular geographical structure. In the umbrella model, splits from an ancestral population occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. When variants uniquely represented in the sample are removed, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including singleton variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.</p></div>\",\"PeriodicalId\":49437,\"journal\":{\"name\":\"Theoretical Population Biology\",\"volume\":\"148 \",\"pages\":\"Pages 11-21\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0040580922000521/pdfft?md5=e289fcb0a12b991033f6945f5b6b7d2e&pid=1-s2.0-S0040580922000521-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Theoretical Population Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0040580922000521\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Population Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0040580922000521","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

主成分分析(PCA)是从多位点基因型数据中描述种群结构最常用的方法之一。然而,关于现代人类地理范围的扩展,PCA的解释受到质疑,因为在主成分中观察到的波浪状模式存在不确定性。确实有人认为,当PCA应用于遗传分化随地理距离增加的数据时,波浪状模式是数学伪影,通常会出现。在这里,我们提出了另一种理论,为观察波样模式的主成分分析。我们研究了一个聚结模型-伞模型-遗传变异的扩散。该模型基于遗传漂变,没有任何特定的地理结构。在保护伞模型中,祖先种群的分裂几乎连续不断地发生,以有规律的速度产生小的女儿种群。我们的研究结果为样本基因组序列在该模型下的主成分分析提供了特征值和特征向量的详细数学描述。当样本中唯一表示的变量被移除时,PCA特征向量被定义为周期性增加的余弦函数,再现在平衡距离隔离模型中观察到的波状模式。包括分析中的单变量,最大特征值对应的特征向量呈现复杂的波形。我们的预测的准确性进一步研究了聚结模拟。我们的分析支持这样的假设,即高度结构化的波浪状模式可能只来自遗传漂变,而可能并不总是空间结构化数据的人为结果。根据我们的新理论,与美洲人类有关的基因组数据被重新分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Theoretical analysis of principal components in an umbrella model of intraspecific evolution

Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model – the umbrella model – for the diffusion of genetic variants. The model is based on genetic drift without any particular geographical structure. In the umbrella model, splits from an ancestral population occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. When variants uniquely represented in the sample are removed, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including singleton variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Theoretical Population Biology
Theoretical Population Biology 生物-进化生物学
CiteScore
2.50
自引率
14.30%
发文量
43
审稿时长
6-12 weeks
期刊介绍: An interdisciplinary journal, Theoretical Population Biology presents articles on theoretical aspects of the biology of populations, particularly in the areas of demography, ecology, epidemiology, evolution, and genetics. Emphasis is on the development of mathematical theory and models that enhance the understanding of biological phenomena. Articles highlight the motivation and significance of the work for advancing progress in biology, relying on a substantial mathematical effort to obtain biological insight. The journal also presents empirical results and computational and statistical methods directly impinging on theoretical problems in population biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信