大数据分析带来的下一代测序革命

Q1 Biochemistry, Genetics and Molecular Biology
R. Tripathi, Pawan Sharma, P. Chakraborty, P. Varadwaj
{"title":"大数据分析带来的下一代测序革命","authors":"R. Tripathi, Pawan Sharma, P. Chakraborty, P. Varadwaj","doi":"10.1080/21553769.2016.1178180","DOIUrl":null,"url":null,"abstract":"ABSTRACT Next-generation sequencing (NGS) technology has led to an unrivaled explosion in the amount of genomic data and this escalation has collaterally raised the challenges of sharing, archiving, integrating and analyzing these data. The scale and efficiency of NGS have posed a challenge for analysis of these vast genomic data, gene interactions, annotations and expression studies. However, this limitation of NGS can be safely overcome by tools and algorithms using big data framework. Based on this framework, here we have reviewed the current state of knowledge of big data algorithms for NGS to reveal hidden patterns in sequencing, analysis and annotation, and so on. The APACHE-based Hadoop framework gives an on-interest and adaptable environment for substantial scale data analysis. It has several components for partitioning of large-scale data onto clusters of commodity hardware, in a fault-tolerant manner. Packages like MapReduce, Cloudburst, Crossbow, Myrna, Eoulsan, DistMap, Seal and Contrail perform various NGS applications, such as adapter trimming, quality checking, read mapping, de novo assembly, quantification, expression analysis, variant analysis, and annotation. This review paper deals with the current applications of the Hadoop technology with their usage and limitations in perspective of NGS.","PeriodicalId":12756,"journal":{"name":"Frontiers in Life Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/21553769.2016.1178180","citationCount":"34","resultStr":"{\"title\":\"Next-generation sequencing revolution through big data analytics\",\"authors\":\"R. Tripathi, Pawan Sharma, P. Chakraborty, P. Varadwaj\",\"doi\":\"10.1080/21553769.2016.1178180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Next-generation sequencing (NGS) technology has led to an unrivaled explosion in the amount of genomic data and this escalation has collaterally raised the challenges of sharing, archiving, integrating and analyzing these data. The scale and efficiency of NGS have posed a challenge for analysis of these vast genomic data, gene interactions, annotations and expression studies. However, this limitation of NGS can be safely overcome by tools and algorithms using big data framework. Based on this framework, here we have reviewed the current state of knowledge of big data algorithms for NGS to reveal hidden patterns in sequencing, analysis and annotation, and so on. The APACHE-based Hadoop framework gives an on-interest and adaptable environment for substantial scale data analysis. It has several components for partitioning of large-scale data onto clusters of commodity hardware, in a fault-tolerant manner. Packages like MapReduce, Cloudburst, Crossbow, Myrna, Eoulsan, DistMap, Seal and Contrail perform various NGS applications, such as adapter trimming, quality checking, read mapping, de novo assembly, quantification, expression analysis, variant analysis, and annotation. This review paper deals with the current applications of the Hadoop technology with their usage and limitations in perspective of NGS.\",\"PeriodicalId\":12756,\"journal\":{\"name\":\"Frontiers in Life Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1080/21553769.2016.1178180\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Life Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/21553769.2016.1178180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Life Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/21553769.2016.1178180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 34

摘要

新一代测序(NGS)技术导致了基因组数据量的空前增长,同时也增加了共享、存档、整合和分析这些数据的挑战。NGS的规模和效率对这些庞大的基因组数据的分析、基因相互作用、注释和表达研究提出了挑战。然而,使用大数据框架的工具和算法可以安全地克服NGS的这一限制。在此框架下,我们回顾了NGS大数据算法的知识现状,以揭示测序、分析和注释等方面的隐藏模式。基于apache的Hadoop框架为大规模数据分析提供了一个感兴趣且可适应的环境。它有几个组件,用于以容错方式将大规模数据分区到商用硬件集群上。MapReduce、Cloudburst、Crossbow、Myrna、Eoulsan、DistMap、Seal和Contrail等软件包执行各种NGS应用程序,如适配器修剪、质量检查、读取映射、从头组装、量化、表达分析、变体分析和注释。本文从NGS的角度综述了Hadoop技术的应用现状、使用情况和局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Next-generation sequencing revolution through big data analytics
ABSTRACT Next-generation sequencing (NGS) technology has led to an unrivaled explosion in the amount of genomic data and this escalation has collaterally raised the challenges of sharing, archiving, integrating and analyzing these data. The scale and efficiency of NGS have posed a challenge for analysis of these vast genomic data, gene interactions, annotations and expression studies. However, this limitation of NGS can be safely overcome by tools and algorithms using big data framework. Based on this framework, here we have reviewed the current state of knowledge of big data algorithms for NGS to reveal hidden patterns in sequencing, analysis and annotation, and so on. The APACHE-based Hadoop framework gives an on-interest and adaptable environment for substantial scale data analysis. It has several components for partitioning of large-scale data onto clusters of commodity hardware, in a fault-tolerant manner. Packages like MapReduce, Cloudburst, Crossbow, Myrna, Eoulsan, DistMap, Seal and Contrail perform various NGS applications, such as adapter trimming, quality checking, read mapping, de novo assembly, quantification, expression analysis, variant analysis, and annotation. This review paper deals with the current applications of the Hadoop technology with their usage and limitations in perspective of NGS.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Frontiers in Life Science
Frontiers in Life Science MULTIDISCIPLINARY SCIENCES-
CiteScore
5.50
自引率
0.00%
发文量
0
期刊介绍: Frontiers in Life Science publishes high quality and innovative research at the frontier of biology with an emphasis on interdisciplinary research. We particularly encourage manuscripts that lie at the interface of the life sciences and either the more quantitative sciences (including chemistry, physics, mathematics, and informatics) or the social sciences (philosophy, anthropology, sociology and epistemology). We believe that these various disciplines can all contribute to biological research and provide original insights to the most recurrent questions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信