UMI-nea: a fast, robust tool for reference-free UMI deduplication and accurate quantification.

IF 5.4
Jixin Deng, Jingxiao Zhang, Song Tian, John DiCarlo, Hong Xu, Samuel J Rulli, Jonathan M Shaffer, Vikas Gupta, Toeresin Karakoyun
{"title":"UMI-nea: a fast, robust tool for reference-free UMI deduplication and accurate quantification.","authors":"Jixin Deng, Jingxiao Zhang, Song Tian, John DiCarlo, Hong Xu, Samuel J Rulli, Jonathan M Shaffer, Vikas Gupta, Toeresin Karakoyun","doi":"10.1093/bioinformatics/btaf514","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>One of the key applications of Unique Molecular Identifiers (UMIs) in high-throughput sequencing is to correct for PCR amplification bias and removal of PCR duplicates, thereby improving quantification in DNA-seq and RNA-seq applications. Accurately grouping error-bearing UMIs that originate from the same input molecule through a UMI deduplication method is a critical step in this process. However, many existing UMI deduplication tools rely on simple Hamming distance comparisons or suboptimal clustering algorithms, often resulting in erroneous UMI groupings, particularly in error-prone long-read sequencing or ultra-high-depth short-read sequencing.</p><p><strong>Results: </strong>We introduce UMI-nea, a tool that utilizes Levenshtein distance comparisons and a novel clustering approach to optimize multithreading workflows. Compared against three other indel-aware UMI deduplication tools, UMI-nea achieves more accurate UMI groupings with efficient run time. It demonstrates robust performance across diverse sequencing platforms, depths, and UMI lengths. Additionally, UMI-nea incorporates a data-guided adaptive UMI filter, further enhancing quantification accuracy.</p><p><strong>Availability and implementation: </strong>UMI-nea is available on github https://github.com/Qiaseq-research/UMI-nea.git or Zenodo https://doi.org/10.5281/zenodo.16745758. Sequencing data are stored at https://qiagenpublic.blob.core.windows.net/umi-nea-datasets/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453673/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: One of the key applications of Unique Molecular Identifiers (UMIs) in high-throughput sequencing is to correct for PCR amplification bias and removal of PCR duplicates, thereby improving quantification in DNA-seq and RNA-seq applications. Accurately grouping error-bearing UMIs that originate from the same input molecule through a UMI deduplication method is a critical step in this process. However, many existing UMI deduplication tools rely on simple Hamming distance comparisons or suboptimal clustering algorithms, often resulting in erroneous UMI groupings, particularly in error-prone long-read sequencing or ultra-high-depth short-read sequencing.

Results: We introduce UMI-nea, a tool that utilizes Levenshtein distance comparisons and a novel clustering approach to optimize multithreading workflows. Compared against three other indel-aware UMI deduplication tools, UMI-nea achieves more accurate UMI groupings with efficient run time. It demonstrates robust performance across diverse sequencing platforms, depths, and UMI lengths. Additionally, UMI-nea incorporates a data-guided adaptive UMI filter, further enhancing quantification accuracy.

Availability and implementation: UMI-nea is available on github https://github.com/Qiaseq-research/UMI-nea.git or Zenodo https://doi.org/10.5281/zenodo.16745758. Sequencing data are stored at https://qiagenpublic.blob.core.windows.net/umi-nea-datasets/.

UMI-nea:一个快速,强大的工具,用于无参考的UMI重复数据删除和准确定量。
动机:Unique Molecular Identifiers (UMIs)在高通量测序中的关键应用之一是纠正PCR扩增偏差和去除PCR重复,从而提高DNA-seq和RNA-seq应用中的定量。通过UMI重复数据删除方法对来自相同输入分子的带有错误的UMI进行精确分组是这一过程中的关键步骤。然而,许多现有的UMI重复数据删除工具依赖于简单的汉明距离比较或次优聚类算法,经常导致错误的UMI分组,特别是在容易出错的长读测序或超高深度短读测序中。结果:我们介绍了UMI-nea,这是一个利用Levenshtein距离比较和一种新的聚类方法来优化多线程工作流的工具。与其他三种可识别索引的UMI重复数据删除工具相比,UMI-nea实现了更精确的UMI分组和高效的运行时间。它在不同的测序平台、深度和UMI长度上表现出强大的性能。此外,UMI-nea还集成了一个数据导向的自适应UMI滤波器,进一步提高了量化精度。可用性:uni -nea可在github https://github.com/Qiaseq-research/UMI-nea.git或Zenodo https://doi.org/10.5281/zenodo.16745758上获得。测序数据存储在https://qiagenpublic.blob.core.windows.net/umi-nea-datasets/.Supplementary information网站;补充数据可在Bioinformatics网站在线获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信