{"title":"An Error Correction Algorithm for NGS Data","authors":"M. Kchouk, J. Gibrat, M. Elloumi","doi":"10.1109/DEXA.2017.33","DOIUrl":null,"url":null,"abstract":"The Oxford Nanopore and Pacbio SMRT sequencing technologies has revolutionized the Next-Generation Sequencing (NGS) environment by producing long reads that exceed 60 kbp and helped to the completion of many biological projects. But, long reads are characterized by a high error rate which increases the difficulty of biological problems like the genome assembly problem. Error correction of long reads has become a challenge for bioinformaticians, which motivates the development of new approaches for error correction adapted to NGS technologies. In this paper, we present a new denovo self-error correction algorithm using only long reads. Our algorithm operates in two steps: First, we use a fast hashing method which allows to find alignments between the longest reads and other reads in a set of long reads. Next, we use the longest reads as seeds to obtain the final alignment of long reads by using a dynamic programming algorithm in a band of width w. Our error correction algorithm does not require high quality reads, in contrast to existing hybrid error correction ones.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2017.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The Oxford Nanopore and Pacbio SMRT sequencing technologies has revolutionized the Next-Generation Sequencing (NGS) environment by producing long reads that exceed 60 kbp and helped to the completion of many biological projects. But, long reads are characterized by a high error rate which increases the difficulty of biological problems like the genome assembly problem. Error correction of long reads has become a challenge for bioinformaticians, which motivates the development of new approaches for error correction adapted to NGS technologies. In this paper, we present a new denovo self-error correction algorithm using only long reads. Our algorithm operates in two steps: First, we use a fast hashing method which allows to find alignments between the longest reads and other reads in a set of long reads. Next, we use the longest reads as seeds to obtain the final alignment of long reads by using a dynamic programming algorithm in a band of width w. Our error correction algorithm does not require high quality reads, in contrast to existing hybrid error correction ones.