An Error Correction Algorithm for NGS Data

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI:10.1109/DEXA.2017.33

M. Kchouk, J. Gibrat, M. Elloumi

引用次数: 1

Abstract

The Oxford Nanopore and Pacbio SMRT sequencing technologies has revolutionized the Next-Generation Sequencing (NGS) environment by producing long reads that exceed 60 kbp and helped to the completion of many biological projects. But, long reads are characterized by a high error rate which increases the difficulty of biological problems like the genome assembly problem. Error correction of long reads has become a challenge for bioinformaticians, which motivates the development of new approaches for error correction adapted to NGS technologies. In this paper, we present a new denovo self-error correction algorithm using only long reads. Our algorithm operates in two steps: First, we use a fast hashing method which allows to find alignments between the longest reads and other reads in a set of long reads. Next, we use the longest reads as seeds to obtain the final alignment of long reads by using a dynamic programming algorithm in a band of width w. Our error correction algorithm does not require high quality reads, in contrast to existing hybrid error correction ones.

查看原文本刊更多论文

一种NGS数据纠错算法

牛津纳米孔和Pacbio SMRT测序技术通过产生超过60 kbp的长reads，彻底改变了下一代测序(NGS)环境，并帮助完成了许多生物项目。但是，长读取具有高错误率的特点，这增加了基因组组装等生物学问题的难度。长读段的纠错已成为生物信息学家面临的一个挑战，这促使了适应NGS技术的纠错新方法的发展。在本文中，我们提出了一种新的仅使用长读的denovo自纠错算法。我们的算法分两步操作:首先，我们使用快速哈希方法，该方法允许查找长读集合中最长读和其他读之间的对齐。接下来，我们将最长的读取作为种子，通过动态规划算法在宽度为w的频带内获得长读取的最终对齐。与现有的混合纠错算法相比，我们的纠错算法不需要高质量的读取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

自引率

0.00%

发文量