多物种聚结模型下测序和基因分型错误对基因组数据贝叶斯分析的影响。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution Pub Date : 2025-07-30 DOI:10.1093/molbev/msaf184

Jiayi Ji, Paschalia Kapli, Tomáš Flouri, Ziheng Yang

{"title":"多物种聚结模型下测序和基因分型错误对基因组数据贝叶斯分析的影响。","authors":"Jiayi Ji, Paschalia Kapli, Tomáš Flouri, Ziheng Yang","doi":"10.1093/molbev/msaf184","DOIUrl":null,"url":null,"abstract":"The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":"42 8","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359030/pdf/","citationCount":"0","resultStr":"{\"title\":\"The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.\",\"authors\":\"Jiayi Ji, Paschalia Kapli, Tomáš Flouri, Ziheng Yang\",\"doi\":\"10.1093/molbev/msaf184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\"42 8\",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359030/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msaf184\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf184","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

多物种聚结（MSC）模型解释了整个基因组的谱系波动，并提供了一个框架，用于分析来自密切相关物种的基因组数据，以估计物种系统发育和分化时间，推断种间基因流动，并划定物种边界。由于MSC模型假设正确的序列，低读取深度的测序和基因分型错误可能是一个严重的问题。在这里，我们使用计算机模拟来评估系统基因组数据中的基因分型错误对物种树和种群参数（如物种分裂时间、种群大小和基因流动速率）的贝叶斯推断的影响。基数调用错误率是非常有影响的。在e = 0.001的低比率下（Phred评分为30），物种树和种群参数的估计几乎不受基因分型误差的影响，即使是在~ 3×的低深度。在高错误率（e = 0.005或0.01）和低深度（小于10倍）时，基因分型错误会降低物种树估计的能力，并在估计种群大小、物种分化时间和基因流动速率时引入偏差。将序列中的杂合子视为缺失数据（歧义）可以减少基因分型错误的影响。我们的模拟表明，在推理精度和准确性方面，在高深度对几个样本进行排序比在低深度对许多样本进行排序更可取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.

The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular biology and evolution 生物-进化生物学

CiteScore

19.70

自引率

3.70%

发文量

257

审稿时长

1 months

期刊介绍： Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.