NanoMnT: an STR analysis tool for Oxford Nanopore sequencing data driven by a comprehensive analysis of error profile in STR regions.

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Gyumin Park, Hyunsu An, Han Luo, Jihwan Park
{"title":"NanoMnT: an STR analysis tool for Oxford Nanopore sequencing data driven by a comprehensive analysis of error profile in STR regions.","authors":"Gyumin Park, Hyunsu An, Han Luo, Jihwan Park","doi":"10.1093/gigascience/giaf013","DOIUrl":null,"url":null,"abstract":"<p><p>Oxford Nanopore Technology (ONT) sequencing is a third-generation sequencing technology that enables cost-effective long-read sequencing, with broad applications in biological research. However, its high sequencing error rate in low-complexity regions hampers its applications in short tandem repeat (STR)-related research. To address this, we generated a comprehensive STR error profile of ONT by analyzing publicly available Nanopore sequencing datasets. We show that the sequencing error rate is influenced not only by STR length but also by the repeat unit and the flanking sequences of STR regions. Interestingly, certain flanking sequences were associated with higher sequencing accuracy, suggesting that certain STR loci are more suitable for Nanopore sequencing compared to other loci. While base quality scores of substitution errors within the STR regions were lower than those of correctly sequenced bases, such patterns were not observed for indel errors. Furthermore, choosing the most recent basecaller version and using the super accuracy model significantly improved STR sequencing accuracy. Finally, we present NanoMnT, a lightweight Python tool that corrects STR sequencing errors in sequencing data and estimates STR allele sizes. NanoMnT leverages the characteristics of ONT when estimating STR allele size and exhibits superior results for 1-bp- and 2-bp repeat STR compared to existing tools. By integrating our findings, we improved STR allele estimation accuracy for Ax10 repeats from 55% to 78% and up to 85% when excluding loci with unfavorable flanking sequences. Using NanoMnT, we present the utility of our findings by identifying microsatellite instability status in cancer sequencing data. NanoMnT is publicly available at https://github.com/18parkky/NanoMnT.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11912559/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf013","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Oxford Nanopore Technology (ONT) sequencing is a third-generation sequencing technology that enables cost-effective long-read sequencing, with broad applications in biological research. However, its high sequencing error rate in low-complexity regions hampers its applications in short tandem repeat (STR)-related research. To address this, we generated a comprehensive STR error profile of ONT by analyzing publicly available Nanopore sequencing datasets. We show that the sequencing error rate is influenced not only by STR length but also by the repeat unit and the flanking sequences of STR regions. Interestingly, certain flanking sequences were associated with higher sequencing accuracy, suggesting that certain STR loci are more suitable for Nanopore sequencing compared to other loci. While base quality scores of substitution errors within the STR regions were lower than those of correctly sequenced bases, such patterns were not observed for indel errors. Furthermore, choosing the most recent basecaller version and using the super accuracy model significantly improved STR sequencing accuracy. Finally, we present NanoMnT, a lightweight Python tool that corrects STR sequencing errors in sequencing data and estimates STR allele sizes. NanoMnT leverages the characteristics of ONT when estimating STR allele size and exhibits superior results for 1-bp- and 2-bp repeat STR compared to existing tools. By integrating our findings, we improved STR allele estimation accuracy for Ax10 repeats from 55% to 78% and up to 85% when excluding loci with unfavorable flanking sequences. Using NanoMnT, we present the utility of our findings by identifying microsatellite instability status in cancer sequencing data. NanoMnT is publicly available at https://github.com/18parkky/NanoMnT.

NanoMnT:一个STR分析工具,用于牛津纳米孔测序数据,由STR区域的错误剖面的综合分析驱动。
牛津纳米孔技术(ONT)测序是第三代测序技术,具有成本效益的长读测序,在生物研究中具有广泛的应用。然而,其在低复杂度区域的高测序错误率阻碍了其在短串联重复序列(STR)相关研究中的应用。为了解决这个问题,我们通过分析公开可用的纳米孔测序数据集,生成了ONT的全面STR错误概况。研究结果表明,序列错误率不仅受序列长度的影响,还受重复单元和侧翼序列的影响。有趣的是,某些侧翼序列与更高的测序精度相关,这表明某些STR位点比其他位点更适合进行纳米孔测序。虽然STR区域内替换错误的碱基质量分数低于正确测序的碱基,但在indel错误中没有观察到这种模式。此外,选择最新的碱基调用者版本和使用超精度模型显著提高了STR测序的准确性。最后,我们提出了NanoMnT,这是一个轻量级的Python工具,可以纠正测序数据中的STR测序错误并估计STR等位基因大小。NanoMnT在估计STR等位基因大小时利用了ONT的特性,与现有工具相比,在1-bp和2-bp重复STR上显示出更好的结果。通过整合我们的研究结果,我们将Ax10重复序列的STR等位基因估计精度从55%提高到78%,在排除具有不利侧链序列的位点时提高到85%。使用NanoMnT,我们通过识别癌症测序数据中的微卫星不稳定状态来展示我们的发现的实用性。NanoMnT可在https://github.com/18parkky/NanoMnT公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信