Dhaivat Joshi, Suhas Diggavi, Mark J. P. Chaisson, Sreeram Kannan
{"title":"HQAlign: Aligning nanopore reads for SV detection using current-level modeling","authors":"Dhaivat Joshi, Suhas Diggavi, Mark J. P. Chaisson, Sreeram Kannan","doi":"arxiv-2301.03834","DOIUrl":null,"url":null,"abstract":"Motivation: Detection of structural variants (SV) from the alignment of\nsample DNA reads to the reference genome is an important problem in\nunderstanding human diseases. Long reads that can span repeat regions, along\nwith an accurate alignment of these long reads play an important role in\nidentifying novel SVs. Long read sequencers such as nanopore sequencing can\naddress this problem by providing very long reads but with high error rates,\nmaking accurate alignment challenging. Many errors induced by nanopore\nsequencing have a bias because of the physics of the sequencing process and\nproper utilization of these error characteristics can play an important role in\ndesigning a robust aligner for SV detection problems. In this paper, we design\nand evaluate HQAlign, an aligner for SV detection using nanopore sequenced\nreads. The key ideas of HQAlign include (i) using basecalled nanopore reads\nalong with the nanopore physics to improve alignments for SVs (ii)\nincorporating SV specific changes to the alignment pipeline (iii) adapting\nthese into existing state-of-the-art long read aligner pipeline, minimap2\n(v2.24), for efficient alignments. Results: We show that HQAlign captures about 4%-6% complementary SVs across\ndifferent datasets which are missed by minimap2 alignments while having a\nstandalone performance at par with minimap2 for real nanopore reads data. For\nthe common SV calls between HQAlign and minimap2, HQAlign improves the start\nand the end breakpoint accuracy for about 10%-50% of SVs across different\ndatasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2\n85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13\nassembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to\nGRCh37 human genome.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"4 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Other Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2301.03834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Detection of structural variants (SV) from the alignment of
sample DNA reads to the reference genome is an important problem in
understanding human diseases. Long reads that can span repeat regions, along
with an accurate alignment of these long reads play an important role in
identifying novel SVs. Long read sequencers such as nanopore sequencing can
address this problem by providing very long reads but with high error rates,
making accurate alignment challenging. Many errors induced by nanopore
sequencing have a bias because of the physics of the sequencing process and
proper utilization of these error characteristics can play an important role in
designing a robust aligner for SV detection problems. In this paper, we design
and evaluate HQAlign, an aligner for SV detection using nanopore sequenced
reads. The key ideas of HQAlign include (i) using basecalled nanopore reads
along with the nanopore physics to improve alignments for SVs (ii)
incorporating SV specific changes to the alignment pipeline (iii) adapting
these into existing state-of-the-art long read aligner pipeline, minimap2
(v2.24), for efficient alignments. Results: We show that HQAlign captures about 4%-6% complementary SVs across
different datasets which are missed by minimap2 alignments while having a
standalone performance at par with minimap2 for real nanopore reads data. For
the common SV calls between HQAlign and minimap2, HQAlign improves the start
and the end breakpoint accuracy for about 10%-50% of SVs across different
datasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2
85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13
assembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to
GRCh37 human genome.