Ben Bettisworth, Nikolaos Psonis, Nikos Poulakakis, Pavlos Pavlidis, Alexandros Stamatakis
{"title":"读取长度支配古代DNA读取的系统发育定位精度。","authors":"Ben Bettisworth, Nikolaos Psonis, Nikos Poulakakis, Pavlos Pavlidis, Alexandros Stamatakis","doi":"10.1093/molbev/msaf006","DOIUrl":null,"url":null,"abstract":"<p><p>A common problem when analyzing ancient DNA data is to identify the species that corresponds to the recovered analyzing ancient DNA sequence(s). The standard approach is to deploy sequence similarity-based tools, such as BLAST. However, as analyzing ancient DNA reads may frequently stem from unsampled taxa due to extinction, it is likely that there is no exact match in any database. As a consequence, these tools may not be able to accurately place such reads in a phylogenetic context. Phylogenetic placement is a technique where a read is placed onto a specific branch of a phylogenetic reference tree, which allows for a substantially finer resolution when identifying reads. Prior applications of phylogenetic placement have deployed only on data from extant sources. Therefore, it is unclear how the analyzing ancient DNA damage affects phylogenetic placement's applicability to analyzing ancient DNA data. To investigate how analyzing ancient DNA damage affects placement accuracy, we re-implemented a statistical model of analyzing ancient DNA damage. We deploy this model, along with a modified version of the existing assessment pipeline PEWO, to 7 empirical datasets with 4 leading tools: APPLES, EPA-Ng, pplacer, and RAPPAS. We explore the analyzing ancient DNA damage parameter space via a grid search in order to identify the analyzing ancient DNA damage factors that exhibit the largest impact on placement accuracy. We find that the frequency of DNA backbone nicks (and consequently read length) has the, by far, largest impact on analyzing ancient DNA read placement accuracy, and that other factors, such as misincorporations, have a negligible effect on overall placement accuracy.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11839404/pdf/","citationCount":"0","resultStr":"{\"title\":\"Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads.\",\"authors\":\"Ben Bettisworth, Nikolaos Psonis, Nikos Poulakakis, Pavlos Pavlidis, Alexandros Stamatakis\",\"doi\":\"10.1093/molbev/msaf006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>A common problem when analyzing ancient DNA data is to identify the species that corresponds to the recovered analyzing ancient DNA sequence(s). The standard approach is to deploy sequence similarity-based tools, such as BLAST. However, as analyzing ancient DNA reads may frequently stem from unsampled taxa due to extinction, it is likely that there is no exact match in any database. As a consequence, these tools may not be able to accurately place such reads in a phylogenetic context. Phylogenetic placement is a technique where a read is placed onto a specific branch of a phylogenetic reference tree, which allows for a substantially finer resolution when identifying reads. Prior applications of phylogenetic placement have deployed only on data from extant sources. Therefore, it is unclear how the analyzing ancient DNA damage affects phylogenetic placement's applicability to analyzing ancient DNA data. To investigate how analyzing ancient DNA damage affects placement accuracy, we re-implemented a statistical model of analyzing ancient DNA damage. We deploy this model, along with a modified version of the existing assessment pipeline PEWO, to 7 empirical datasets with 4 leading tools: APPLES, EPA-Ng, pplacer, and RAPPAS. We explore the analyzing ancient DNA damage parameter space via a grid search in order to identify the analyzing ancient DNA damage factors that exhibit the largest impact on placement accuracy. We find that the frequency of DNA backbone nicks (and consequently read length) has the, by far, largest impact on analyzing ancient DNA read placement accuracy, and that other factors, such as misincorporations, have a negligible effect on overall placement accuracy.</p>\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2025-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11839404/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msaf006\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf006","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads.
A common problem when analyzing ancient DNA data is to identify the species that corresponds to the recovered analyzing ancient DNA sequence(s). The standard approach is to deploy sequence similarity-based tools, such as BLAST. However, as analyzing ancient DNA reads may frequently stem from unsampled taxa due to extinction, it is likely that there is no exact match in any database. As a consequence, these tools may not be able to accurately place such reads in a phylogenetic context. Phylogenetic placement is a technique where a read is placed onto a specific branch of a phylogenetic reference tree, which allows for a substantially finer resolution when identifying reads. Prior applications of phylogenetic placement have deployed only on data from extant sources. Therefore, it is unclear how the analyzing ancient DNA damage affects phylogenetic placement's applicability to analyzing ancient DNA data. To investigate how analyzing ancient DNA damage affects placement accuracy, we re-implemented a statistical model of analyzing ancient DNA damage. We deploy this model, along with a modified version of the existing assessment pipeline PEWO, to 7 empirical datasets with 4 leading tools: APPLES, EPA-Ng, pplacer, and RAPPAS. We explore the analyzing ancient DNA damage parameter space via a grid search in order to identify the analyzing ancient DNA damage factors that exhibit the largest impact on placement accuracy. We find that the frequency of DNA backbone nicks (and consequently read length) has the, by far, largest impact on analyzing ancient DNA read placement accuracy, and that other factors, such as misincorporations, have a negligible effect on overall placement accuracy.
期刊介绍:
Molecular Biology and Evolution
Journal Overview:
Publishes research at the interface of molecular (including genomics) and evolutionary biology
Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic
Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research
Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.