多物种聚结模型下测序和基因分型错误对基因组数据贝叶斯分析的影响。

IF 5.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Jiayi Ji, Paschalia Kapli, Tomáš Flouri, Ziheng Yang
{"title":"多物种聚结模型下测序和基因分型错误对基因组数据贝叶斯分析的影响。","authors":"Jiayi Ji, Paschalia Kapli, Tomáš Flouri, Ziheng Yang","doi":"10.1093/molbev/msaf184","DOIUrl":null,"url":null,"abstract":"<p><p>The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":"42 8","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359030/pdf/","citationCount":"0","resultStr":"{\"title\":\"The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.\",\"authors\":\"Jiayi Ji, Paschalia Kapli, Tomáš Flouri, Ziheng Yang\",\"doi\":\"10.1093/molbev/msaf184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.</p>\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\"42 8\",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359030/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msaf184\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf184","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

多物种聚结(MSC)模型解释了整个基因组的谱系波动,并提供了一个框架,用于分析来自密切相关物种的基因组数据,以估计物种系统发育和分化时间,推断种间基因流动,并划定物种边界。由于MSC模型假设正确的序列,低读取深度的测序和基因分型错误可能是一个严重的问题。在这里,我们使用计算机模拟来评估系统基因组数据中的基因分型错误对物种树和种群参数(如物种分裂时间、种群大小和基因流动速率)的贝叶斯推断的影响。基数调用错误率是非常有影响的。在e = 0.001的低比率下(Phred评分为30),物种树和种群参数的估计几乎不受基因分型误差的影响,即使是在~ 3×的低深度。在高错误率(e = 0.005或0.01)和低深度(小于10倍)时,基因分型错误会降低物种树估计的能力,并在估计种群大小、物种分化时间和基因流动速率时引入偏差。将序列中的杂合子视为缺失数据(歧义)可以减少基因分型错误的影响。我们的模拟表明,在推理精度和准确性方面,在高深度对几个样本进行排序比在低深度对许多样本进行排序更可取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.

The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular biology and evolution
Molecular biology and evolution 生物-进化生物学
CiteScore
19.70
自引率
3.70%
发文量
257
审稿时长
1 months
期刊介绍: Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信