Jiayi Ji, Paschalia Kapli, Tomáš Flouri, Ziheng Yang
{"title":"多物种聚结模型下测序和基因分型错误对基因组数据贝叶斯分析的影响。","authors":"Jiayi Ji, Paschalia Kapli, Tomáš Flouri, Ziheng Yang","doi":"10.1093/molbev/msaf184","DOIUrl":null,"url":null,"abstract":"<p><p>The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":"42 8","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359030/pdf/","citationCount":"0","resultStr":"{\"title\":\"The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.\",\"authors\":\"Jiayi Ji, Paschalia Kapli, Tomáš Flouri, Ziheng Yang\",\"doi\":\"10.1093/molbev/msaf184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.</p>\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\"42 8\",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359030/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msaf184\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf184","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.
The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.
期刊介绍:
Molecular Biology and Evolution
Journal Overview:
Publishes research at the interface of molecular (including genomics) and evolutionary biology
Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic
Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research
Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.