Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads.

IF 11 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution Pub Date : 2025-01-17 DOI:10.1093/molbev/msaf006

Ben Bettisworth, Nikolaos Psonis, Nikos Poulakakis, Pavlos Pavlidis, Alexandros Stamatakis

{"title":"Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads.","authors":"Ben Bettisworth, Nikolaos Psonis, Nikos Poulakakis, Pavlos Pavlidis, Alexandros Stamatakis","doi":"10.1093/molbev/msaf006","DOIUrl":null,"url":null,"abstract":"<p><p>A common problem when analyzing ancient DNA (aDNA) data is to identify the species which corresponds to the recovered aDNA sequence(s). The standard approach is to deploy sequence similarity based tools, such as BLAST. However, as aDNA reads may frequently stem from unsampled taxa due to extinction, it is likely that there is no exact match in any database. As a consequence, these tools may not be able to accurately place such reads in a phylogenetic context. Phylogenetic placement is a technique where a read is placed onto a specific branch of a phylogenetic reference tree, which allows for a substantially finer resolution when identifying reads. Prior applications of phylogenetic placement has deployed only on data from extant sources. Therefore, it is unclear how the aDNA damage affects phylogenetic placement's applicability to aDNA data. To investigate how aDNA damage affects placement accuracy, we re-implemented a statistical model of aDNA damage. We deploy this model, along with a modified version of the existing assessment pipeline PEWO, to seven empirical datasets with four leading tools: APPLES, EPA-NG, pplacer, and RAPPAS. We explore the aDNA damage parameter space via a grid search in order to identify the aDNA damage factors that exhibit the largest impact on placement accuracy. We find that the frequency of DNA backbone nicks (and consequently read length) has the, by far, largest impact on aDNA read placement accuracy, and that other factors, such as misincorporations, have a negligible effect on overall placement accuracy.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf006","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

A common problem when analyzing ancient DNA (aDNA) data is to identify the species which corresponds to the recovered aDNA sequence(s). The standard approach is to deploy sequence similarity based tools, such as BLAST. However, as aDNA reads may frequently stem from unsampled taxa due to extinction, it is likely that there is no exact match in any database. As a consequence, these tools may not be able to accurately place such reads in a phylogenetic context. Phylogenetic placement is a technique where a read is placed onto a specific branch of a phylogenetic reference tree, which allows for a substantially finer resolution when identifying reads. Prior applications of phylogenetic placement has deployed only on data from extant sources. Therefore, it is unclear how the aDNA damage affects phylogenetic placement's applicability to aDNA data. To investigate how aDNA damage affects placement accuracy, we re-implemented a statistical model of aDNA damage. We deploy this model, along with a modified version of the existing assessment pipeline PEWO, to seven empirical datasets with four leading tools: APPLES, EPA-NG, pplacer, and RAPPAS. We explore the aDNA damage parameter space via a grid search in order to identify the aDNA damage factors that exhibit the largest impact on placement accuracy. We find that the frequency of DNA backbone nicks (and consequently read length) has the, by far, largest impact on aDNA read placement accuracy, and that other factors, such as misincorporations, have a negligible effect on overall placement accuracy.

查看原文本刊更多论文

读取长度支配古代DNA读取的系统发育定位精度。

在分析古代DNA （aDNA）数据时，一个常见的问题是确定与恢复的aDNA序列相对应的物种。标准的方法是部署基于序列相似性的工具，比如BLAST。然而，由于aDNA读取可能经常来自由于灭绝而未采样的分类群，因此很可能在任何数据库中都没有精确匹配。因此，这些工具可能无法在系统发育的背景下准确地定位这些读取。系统发育定位是一种技术，其中读取放置在系统发育参考树的特定分支上，这允许在识别读取时具有更精细的分辨率。以前系统发育定位的应用只部署在现有来源的数据上。因此，目前尚不清楚aDNA损伤如何影响系统发育定位对aDNA数据的适用性。为了研究aDNA损伤如何影响定位精度，我们重新实现了aDNA损伤的统计模型。我们将该模型与现有评估管道PEWO的修改版本一起部署到七个经验数据集，并使用四种主要工具：apple， EPA-NG， placer和RAPPAS。我们通过网格搜索来探索aDNA损伤参数空间，以确定对定位精度影响最大的aDNA损伤因素。我们发现，到目前为止，DNA主干缺口的频率（以及由此产生的读取长度）对aDNA读取位置准确性的影响最大，而其他因素，如误合并，对整体放置准确性的影响可以忽略不计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular biology and evolution 生物-进化生物学

CiteScore

19.70

自引率

3.70%

发文量

257

审稿时长

1 months

期刊介绍： Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.