SeqyClean: A Pipeline for High-throughput Sequence Data Preprocessing

I. Zhbannikov, Samuel S. Hunter, J. Foster, M. Settles
{"title":"SeqyClean: A Pipeline for High-throughput Sequence Data Preprocessing","authors":"I. Zhbannikov, Samuel S. Hunter, J. Foster, M. Settles","doi":"10.1145/3107411.3107446","DOIUrl":null,"url":null,"abstract":"Modern high-throughput sequencing instruments produce massive amounts of data, which often contains noise in the form of sequencing errors, sequencing adaptors, and contaminating reads. This noise complicates genomics studies. Although many preprocessing software tools have been developed to reduce the sequence noise, many of them cannot handle data from multiple technologies and few address more than one type of noise. We present SeqyClean, a comprehensive preprocessing software pipeline. SeqyClean effectively removes multiple sources of noise in high throughput sequence data and, according to our tests, outperforms other available preprocessing tools. We show that preprocessing data with SeqyClean first improves both de-novo genome assembly and genome mapping. We have used SeqyClean extensively in the genomics core at the Institute for Bioinformatics and Evolutionary STudies (IBEST) at the University of Idaho, so it has been validated with both test and production data. SeqyClean is available as open source software under the MIT License at http://github.com/ibest/seqyclean","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3107411.3107446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 58

Abstract

Modern high-throughput sequencing instruments produce massive amounts of data, which often contains noise in the form of sequencing errors, sequencing adaptors, and contaminating reads. This noise complicates genomics studies. Although many preprocessing software tools have been developed to reduce the sequence noise, many of them cannot handle data from multiple technologies and few address more than one type of noise. We present SeqyClean, a comprehensive preprocessing software pipeline. SeqyClean effectively removes multiple sources of noise in high throughput sequence data and, according to our tests, outperforms other available preprocessing tools. We show that preprocessing data with SeqyClean first improves both de-novo genome assembly and genome mapping. We have used SeqyClean extensively in the genomics core at the Institute for Bioinformatics and Evolutionary STudies (IBEST) at the University of Idaho, so it has been validated with both test and production data. SeqyClean is available as open source software under the MIT License at http://github.com/ibest/seqyclean
SeqyClean:一个用于高通量序列数据预处理的管道
现代高通量测序仪器产生大量数据,这些数据通常包含测序错误、测序适配器和污染读取的噪声。这种噪音使基因组学研究复杂化。虽然已经开发了许多预处理软件工具来降低序列噪声,但其中许多软件工具无法处理来自多种技术的数据,并且很少处理一种以上的噪声。我们提出了SeqyClean,一个全面的预处理软件管道。SeqyClean有效地消除了高通量序列数据中的多个噪声源,并且根据我们的测试,优于其他可用的预处理工具。我们发现,使用SeqyClean对数据进行预处理首先提高了从头基因组组装和基因组定位。我们已经在爱达荷大学生物信息学和进化研究所(IBEST)的基因组学核心中广泛使用了SeqyClean,因此它已经通过测试和生产数据进行了验证。SeqyClean是MIT许可下的开源软件,可在http://github.com/ibest/seqyclean上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信