Ben Bettisworth, Nikolaos Psonis, Nikos Poulakakis, Pavlos Pavlidis, Alexandros Stamatakis
{"title":"Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads.","authors":"Ben Bettisworth, Nikolaos Psonis, Nikos Poulakakis, Pavlos Pavlidis, Alexandros Stamatakis","doi":"10.1093/molbev/msaf006","DOIUrl":null,"url":null,"abstract":"<p><p>A common problem when analyzing ancient DNA (aDNA) data is to identify the species which corresponds to the recovered aDNA sequence(s). The standard approach is to deploy sequence similarity based tools, such as BLAST. However, as aDNA reads may frequently stem from unsampled taxa due to extinction, it is likely that there is no exact match in any database. As a consequence, these tools may not be able to accurately place such reads in a phylogenetic context. Phylogenetic placement is a technique where a read is placed onto a specific branch of a phylogenetic reference tree, which allows for a substantially finer resolution when identifying reads. Prior applications of phylogenetic placement has deployed only on data from extant sources. Therefore, it is unclear how the aDNA damage affects phylogenetic placement's applicability to aDNA data. To investigate how aDNA damage affects placement accuracy, we re-implemented a statistical model of aDNA damage. We deploy this model, along with a modified version of the existing assessment pipeline PEWO, to seven empirical datasets with four leading tools: APPLES, EPA-NG, pplacer, and RAPPAS. We explore the aDNA damage parameter space via a grid search in order to identify the aDNA damage factors that exhibit the largest impact on placement accuracy. We find that the frequency of DNA backbone nicks (and consequently read length) has the, by far, largest impact on aDNA read placement accuracy, and that other factors, such as misincorporations, have a negligible effect on overall placement accuracy.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf006","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
A common problem when analyzing ancient DNA (aDNA) data is to identify the species which corresponds to the recovered aDNA sequence(s). The standard approach is to deploy sequence similarity based tools, such as BLAST. However, as aDNA reads may frequently stem from unsampled taxa due to extinction, it is likely that there is no exact match in any database. As a consequence, these tools may not be able to accurately place such reads in a phylogenetic context. Phylogenetic placement is a technique where a read is placed onto a specific branch of a phylogenetic reference tree, which allows for a substantially finer resolution when identifying reads. Prior applications of phylogenetic placement has deployed only on data from extant sources. Therefore, it is unclear how the aDNA damage affects phylogenetic placement's applicability to aDNA data. To investigate how aDNA damage affects placement accuracy, we re-implemented a statistical model of aDNA damage. We deploy this model, along with a modified version of the existing assessment pipeline PEWO, to seven empirical datasets with four leading tools: APPLES, EPA-NG, pplacer, and RAPPAS. We explore the aDNA damage parameter space via a grid search in order to identify the aDNA damage factors that exhibit the largest impact on placement accuracy. We find that the frequency of DNA backbone nicks (and consequently read length) has the, by far, largest impact on aDNA read placement accuracy, and that other factors, such as misincorporations, have a negligible effect on overall placement accuracy.
期刊介绍:
Molecular Biology and Evolution
Journal Overview:
Publishes research at the interface of molecular (including genomics) and evolutionary biology
Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic
Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research
Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.