Ricardo Muñiz-Trejo, Yeonwoo Park, Joseph W Thornton
{"title":"祖先序列重建对位点间和谱系间进化异质性的鲁棒性。","authors":"Ricardo Muñiz-Trejo, Yeonwoo Park, Joseph W Thornton","doi":"10.1093/molbev/msaf084","DOIUrl":null,"url":null,"abstract":"<p><p>Ancestral sequence reconstruction is typically performed using homogeneous evolutionary models, which assume that the same substitution propensities affect all sites and lineages. These assumptions are routinely violated: heterogeneous structural and functional constraints favor different amino acids at different sites, and these constraints often change among lineages as epistatic substitutions accrue at other sites. To evaluate how violations of the homogeneity assumption affect ancestral sequence reconstruction under realistic conditions, we developed site-specific substitution models and parameterized them using data from deep mutational scanning experiments on three protein families; we then used these models to perform ancestral sequence reconstruction on the empirical alignments and on alignments simulated under heterogeneous conditions derived from the experiments. Extensive among-site and -lineage heterogeneity is present in these datasets, but the sequences reconstructed from empirical alignments are almost identical when heterogeneous or homogeneous models are used for ancestral sequence reconstruction. Using models fit to deep mutational scanning data from distantly related proteins in which mutational effects are very different also has a minimal impact on ancestral sequence reconstruction. The rare differences occur primarily where phylogenetic signal is weak-at fast-evolving sites and nodes connected by long branches. When ancestral sequence reconstruction is performed on simulated data, errors in the reconstructed sequences become more likely as branch lengths increase, but incorporating heterogeneity into the model does not improve accuracy. These data establish that ancestral sequence reconstruction is robust to unincorporated realistic forms of evolutionary heterogeneity, because the primary determinant of ancestral sequence reconstruction is phylogenetic signal, not the substitution model. The best way to improve accuracy is therefore not to develop more elaborate models but to apply ancestral sequence reconstruction to densely sampled alignments that maximize phylogenetic signal at the nodes of interest.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":"42 4","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12046983/pdf/","citationCount":"0","resultStr":"{\"title\":\"Robustness of Ancestral Sequence Reconstruction to Among-site and Among-lineage Evolutionary Heterogeneity.\",\"authors\":\"Ricardo Muñiz-Trejo, Yeonwoo Park, Joseph W Thornton\",\"doi\":\"10.1093/molbev/msaf084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Ancestral sequence reconstruction is typically performed using homogeneous evolutionary models, which assume that the same substitution propensities affect all sites and lineages. These assumptions are routinely violated: heterogeneous structural and functional constraints favor different amino acids at different sites, and these constraints often change among lineages as epistatic substitutions accrue at other sites. To evaluate how violations of the homogeneity assumption affect ancestral sequence reconstruction under realistic conditions, we developed site-specific substitution models and parameterized them using data from deep mutational scanning experiments on three protein families; we then used these models to perform ancestral sequence reconstruction on the empirical alignments and on alignments simulated under heterogeneous conditions derived from the experiments. Extensive among-site and -lineage heterogeneity is present in these datasets, but the sequences reconstructed from empirical alignments are almost identical when heterogeneous or homogeneous models are used for ancestral sequence reconstruction. Using models fit to deep mutational scanning data from distantly related proteins in which mutational effects are very different also has a minimal impact on ancestral sequence reconstruction. The rare differences occur primarily where phylogenetic signal is weak-at fast-evolving sites and nodes connected by long branches. When ancestral sequence reconstruction is performed on simulated data, errors in the reconstructed sequences become more likely as branch lengths increase, but incorporating heterogeneity into the model does not improve accuracy. These data establish that ancestral sequence reconstruction is robust to unincorporated realistic forms of evolutionary heterogeneity, because the primary determinant of ancestral sequence reconstruction is phylogenetic signal, not the substitution model. The best way to improve accuracy is therefore not to develop more elaborate models but to apply ancestral sequence reconstruction to densely sampled alignments that maximize phylogenetic signal at the nodes of interest.</p>\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\"42 4\",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12046983/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msaf084\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf084","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Robustness of Ancestral Sequence Reconstruction to Among-site and Among-lineage Evolutionary Heterogeneity.
Ancestral sequence reconstruction is typically performed using homogeneous evolutionary models, which assume that the same substitution propensities affect all sites and lineages. These assumptions are routinely violated: heterogeneous structural and functional constraints favor different amino acids at different sites, and these constraints often change among lineages as epistatic substitutions accrue at other sites. To evaluate how violations of the homogeneity assumption affect ancestral sequence reconstruction under realistic conditions, we developed site-specific substitution models and parameterized them using data from deep mutational scanning experiments on three protein families; we then used these models to perform ancestral sequence reconstruction on the empirical alignments and on alignments simulated under heterogeneous conditions derived from the experiments. Extensive among-site and -lineage heterogeneity is present in these datasets, but the sequences reconstructed from empirical alignments are almost identical when heterogeneous or homogeneous models are used for ancestral sequence reconstruction. Using models fit to deep mutational scanning data from distantly related proteins in which mutational effects are very different also has a minimal impact on ancestral sequence reconstruction. The rare differences occur primarily where phylogenetic signal is weak-at fast-evolving sites and nodes connected by long branches. When ancestral sequence reconstruction is performed on simulated data, errors in the reconstructed sequences become more likely as branch lengths increase, but incorporating heterogeneity into the model does not improve accuracy. These data establish that ancestral sequence reconstruction is robust to unincorporated realistic forms of evolutionary heterogeneity, because the primary determinant of ancestral sequence reconstruction is phylogenetic signal, not the substitution model. The best way to improve accuracy is therefore not to develop more elaborate models but to apply ancestral sequence reconstruction to densely sampled alignments that maximize phylogenetic signal at the nodes of interest.
期刊介绍:
Molecular Biology and Evolution
Journal Overview:
Publishes research at the interface of molecular (including genomics) and evolutionary biology
Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic
Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research
Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.