DNA序列与蛋白质序列在推断深层系统发育方面同样有用。

IF 6.1 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology Pub Date : 2023-11-01 DOI:10.1093/sysbio/syad036

Paschalia Kapli, Ioanna Kotari, Maximilian J Telford, Nick Goldman, Ziheng Yang

{"title":"DNA序列与蛋白质序列在推断深层系统发育方面同样有用。","authors":"Paschalia Kapli, Ioanna Kotari, Maximilian J Telford, Nick Goldman, Ziheng Yang","doi":"10.1093/sysbio/syad036","DOIUrl":null,"url":null,"abstract":"Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627555/pdf/","citationCount":"0","resultStr":"{\"title\":\"DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies.\",\"authors\":\"Paschalia Kapli, Ioanna Kotari, Maximilian J Telford, Nick Goldman, Ziheng Yang\",\"doi\":\"10.1093/sysbio/syad036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies.\",\"PeriodicalId\":22120,\"journal\":{\"name\":\"Systematic Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627555/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systematic Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/sysbio/syad036\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syad036","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

基于蛋白质序列比DNA序列更不容易同源性和饱和或成分异质性问题的看法，对深层系统发育的推断几乎只使用蛋白质而不是DNA序列。在这里，我们分析了一个理想化遗传密码下密码子进化的模型，并证明这些认知可能是误解。我们进行了一项模拟研究，以评估蛋白质与DNA序列在推断深层系统发育中的效用，在序列中各位点和树上各谱系之间的异质取代过程模型下生成蛋白质编码数据，然后使用核苷酸、氨基酸和密码子模型进行分析。在核苷酸取代模型（可能排除第三密码子位置）下对DNA序列的分析至少与在现代氨基酸模型下对相应蛋白质序列的分析一样频繁地恢复了正确的树。我们还将不同的数据分析策略应用于经验数据集，以推断后生动物的系统发育。我们从模拟和真实数据中得出的结果表明，DNA序列可能与蛋白质一样有助于推断深层系统发育，不应被排除在此类分析之外。核苷酸模型下的DNA数据分析比蛋白质数据分析具有主要的计算优势，这可能使在推断深层系统发育中使用先进的模型来解释核苷酸替代过程中位点间和谱系间的异质性成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies.

查看原文本刊更多论文

DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies.

Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Systematic Biology 生物-进化生物学

CiteScore

13.00

自引率

7.70%

发文量

审稿时长

6-12 weeks

期刊介绍： Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.