HQAlign: Aligning nanopore reads for SV detection using current-level modeling

Dhaivat Joshi, Suhas Diggavi, Mark J. P. Chaisson, Sreeram Kannan
{"title":"HQAlign: Aligning nanopore reads for SV detection using current-level modeling","authors":"Dhaivat Joshi, Suhas Diggavi, Mark J. P. Chaisson, Sreeram Kannan","doi":"arxiv-2301.03834","DOIUrl":null,"url":null,"abstract":"Motivation: Detection of structural variants (SV) from the alignment of\nsample DNA reads to the reference genome is an important problem in\nunderstanding human diseases. Long reads that can span repeat regions, along\nwith an accurate alignment of these long reads play an important role in\nidentifying novel SVs. Long read sequencers such as nanopore sequencing can\naddress this problem by providing very long reads but with high error rates,\nmaking accurate alignment challenging. Many errors induced by nanopore\nsequencing have a bias because of the physics of the sequencing process and\nproper utilization of these error characteristics can play an important role in\ndesigning a robust aligner for SV detection problems. In this paper, we design\nand evaluate HQAlign, an aligner for SV detection using nanopore sequenced\nreads. The key ideas of HQAlign include (i) using basecalled nanopore reads\nalong with the nanopore physics to improve alignments for SVs (ii)\nincorporating SV specific changes to the alignment pipeline (iii) adapting\nthese into existing state-of-the-art long read aligner pipeline, minimap2\n(v2.24), for efficient alignments. Results: We show that HQAlign captures about 4%-6% complementary SVs across\ndifferent datasets which are missed by minimap2 alignments while having a\nstandalone performance at par with minimap2 for real nanopore reads data. For\nthe common SV calls between HQAlign and minimap2, HQAlign improves the start\nand the end breakpoint accuracy for about 10%-50% of SVs across different\ndatasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2\n85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13\nassembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to\nGRCh37 human genome.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"4 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Other Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2301.03834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Detection of structural variants (SV) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long read sequencers such as nanopore sequencing can address this problem by providing very long reads but with high error rates, making accurate alignment challenging. Many errors induced by nanopore sequencing have a bias because of the physics of the sequencing process and proper utilization of these error characteristics can play an important role in designing a robust aligner for SV detection problems. In this paper, we design and evaluate HQAlign, an aligner for SV detection using nanopore sequenced reads. The key ideas of HQAlign include (i) using basecalled nanopore reads along with the nanopore physics to improve alignments for SVs (ii) incorporating SV specific changes to the alignment pipeline (iii) adapting these into existing state-of-the-art long read aligner pipeline, minimap2 (v2.24), for efficient alignments. Results: We show that HQAlign captures about 4%-6% complementary SVs across different datasets which are missed by minimap2 alignments while having a standalone performance at par with minimap2 for real nanopore reads data. For the common SV calls between HQAlign and minimap2, HQAlign improves the start and the end breakpoint accuracy for about 10%-50% of SVs across different datasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2 85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13 assembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to GRCh37 human genome.
HQAlign:使用电流级建模对SV检测的纳米孔读数进行对齐
动机:从样本DNA序列到参考基因组的比对中检测结构变异(SV)是了解人类疾病的一个重要问题。可以跨越重复区域的长读取,以及这些长读取的精确对齐在识别新的sv中起着重要作用。像纳米孔测序这样的长读段测序仪可以解决这个问题,因为它提供了很长的读段,但错误率很高,这使得准确的比对具有挑战性。由于测序过程的物理性质,纳米预测序引起的许多误差都具有偏倚性,正确利用这些误差特性可以在设计用于SV检测问题的鲁棒对准器中发挥重要作用。在本文中,我们设计并评估了HQAlign,一种利用纳米孔测序仪检测SV的校准器。HQAlign的关键思想包括(i)使用基本的纳米孔readsalong和纳米孔物理来改善SV的对准;(ii)将SV特定的变化纳入对准管道;(iii)将这些调整到现有的最先进的长读对准管道minimap2(v2.24)中,以实现有效的对准。结果:我们发现HQAlign在不同的数据集上捕获了4%-6%的互补sv,这是minimap2校准所遗漏的,而对于真实的纳米孔读取数据,HQAlign的独立性能与minimap2相当。对于HQAlign和minimap2之间的常见SV调用,HQAlign在不同数据集上提高了大约10%-50%的SV的开始和结束断点精度。此外,HQAlign将纳米孔reads与最近端粒-端粒chm13组装的比对率从minimap285.64%提高到89.35%,将纳米孔reads与grch37人类基因组的比对率从83.48%提高到86.65%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信