一种NGS数据纠错算法

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI:10.1109/DEXA.2017.33

M. Kchouk, J. Gibrat, M. Elloumi

{"title":"一种NGS数据纠错算法","authors":"M. Kchouk, J. Gibrat, M. Elloumi","doi":"10.1109/DEXA.2017.33","DOIUrl":null,"url":null,"abstract":"The Oxford Nanopore and Pacbio SMRT sequencing technologies has revolutionized the Next-Generation Sequencing (NGS) environment by producing long reads that exceed 60 kbp and helped to the completion of many biological projects. But, long reads are characterized by a high error rate which increases the difficulty of biological problems like the genome assembly problem. Error correction of long reads has become a challenge for bioinformaticians, which motivates the development of new approaches for error correction adapted to NGS technologies. In this paper, we present a new denovo self-error correction algorithm using only long reads. Our algorithm operates in two steps: First, we use a fast hashing method which allows to find alignments between the longest reads and other reads in a set of long reads. Next, we use the longest reads as seeds to obtain the final alignment of long reads by using a dynamic programming algorithm in a band of width w. Our error correction algorithm does not require high quality reads, in contrast to existing hybrid error correction ones.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Error Correction Algorithm for NGS Data\",\"authors\":\"M. Kchouk, J. Gibrat, M. Elloumi\",\"doi\":\"10.1109/DEXA.2017.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Oxford Nanopore and Pacbio SMRT sequencing technologies has revolutionized the Next-Generation Sequencing (NGS) environment by producing long reads that exceed 60 kbp and helped to the completion of many biological projects. But, long reads are characterized by a high error rate which increases the difficulty of biological problems like the genome assembly problem. Error correction of long reads has become a challenge for bioinformaticians, which motivates the development of new approaches for error correction adapted to NGS technologies. In this paper, we present a new denovo self-error correction algorithm using only long reads. Our algorithm operates in two steps: First, we use a fast hashing method which allows to find alignments between the longest reads and other reads in a set of long reads. Next, we use the longest reads as seeds to obtain the final alignment of long reads by using a dynamic programming algorithm in a band of width w. Our error correction algorithm does not require high quality reads, in contrast to existing hybrid error correction ones.\",\"PeriodicalId\":127009,\"journal\":{\"name\":\"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEXA.2017.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2017.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

牛津纳米孔和Pacbio SMRT测序技术通过产生超过60 kbp的长reads，彻底改变了下一代测序(NGS)环境，并帮助完成了许多生物项目。但是，长读取具有高错误率的特点，这增加了基因组组装等生物学问题的难度。长读段的纠错已成为生物信息学家面临的一个挑战，这促使了适应NGS技术的纠错新方法的发展。在本文中，我们提出了一种新的仅使用长读的denovo自纠错算法。我们的算法分两步操作:首先，我们使用快速哈希方法，该方法允许查找长读集合中最长读和其他读之间的对齐。接下来，我们将最长的读取作为种子，通过动态规划算法在宽度为w的频带内获得长读取的最终对齐。与现有的混合纠错算法相比，我们的纠错算法不需要高质量的读取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Error Correction Algorithm for NGS Data

The Oxford Nanopore and Pacbio SMRT sequencing technologies has revolutionized the Next-Generation Sequencing (NGS) environment by producing long reads that exceed 60 kbp and helped to the completion of many biological projects. But, long reads are characterized by a high error rate which increases the difficulty of biological problems like the genome assembly problem. Error correction of long reads has become a challenge for bioinformaticians, which motivates the development of new approaches for error correction adapted to NGS technologies. In this paper, we present a new denovo self-error correction algorithm using only long reads. Our algorithm operates in two steps: First, we use a fast hashing method which allows to find alignments between the longest reads and other reads in a set of long reads. Next, we use the longest reads as seeds to obtain the final alignment of long reads by using a dynamic programming algorithm in a band of width w. Our error correction algorithm does not require high quality reads, in contrast to existing hybrid error correction ones.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

自引率

0.00%

发文量