Scaling up genome annotation using MAKER and work queue.

Q4 Health Professions
Andrew Thrasher, Zachary Musgrave, Brian Kachmarck, Douglas Thain, Scott Emrich
{"title":"Scaling up genome annotation using MAKER and work queue.","authors":"Andrew Thrasher,&nbsp;Zachary Musgrave,&nbsp;Brian Kachmarck,&nbsp;Douglas Thain,&nbsp;Scott Emrich","doi":"10.1504/IJBRA.2014.062994","DOIUrl":null,"url":null,"abstract":"<p><p>Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds. </p>","PeriodicalId":35444,"journal":{"name":"International Journal of Bioinformatics Research and Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJBRA.2014.062994","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJBRA.2014.062994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Health Professions","Score":null,"Total":0}
引用次数: 9

Abstract

Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.

使用MAKER和工作队列扩展基因组注释。
下一代测序技术使许多基因组测序成为可能。由于总体需求的增加和许多所需分析的内在并行性,这些生物信息学应用程序应该理想地运行在集群、云和/或网格上。我们提出了一个修改后的注释框架,使用Caenorhabditis japonica测试用例使用50个worker实现了45倍的加速。我们还在Amazon EC2云框架中评估了这些修改。底层基因组注释(MAKER)作为MPI应用程序并行化。我们的框架使它现在可以在没有MPI的情况下运行,同时利用各种分布式计算资源。这种并行框架还允许简单的显式数据传输,这有助于克服通常依赖于共享文件系统的生物信息学工具的主要限制。结合起来,我们提出的框架可以使用,甚至在开发的早期阶段,很容易在集群、网格和云上运行序列分析工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Bioinformatics Research and Applications
International Journal of Bioinformatics Research and Applications Health Professions-Health Information Management
CiteScore
0.60
自引率
0.00%
发文量
26
期刊介绍: Bioinformatics is an interdisciplinary research field that combines biology, computer science, mathematics and statistics into a broad-based field that will have profound impacts on all fields of biology. The emphasis of IJBRA is on basic bioinformatics research methods, tool development, performance evaluation and their applications in biology. IJBRA addresses the most innovative developments, research issues and solutions in bioinformatics and computational biology and their applications. Topics covered include Databases, bio-grid, system biology Biomedical image processing, modelling and simulation Bio-ontology and data mining, DNA assembly, clustering, mapping Computational genomics/proteomics Silico technology: computational intelligence, high performance computing E-health, telemedicine Gene expression, microarrays, identification, annotation Genetic algorithms, fuzzy logic, neural networks, data visualisation Hidden Markov models, machine learning, support vector machines Molecular evolution, phylogeny, modelling, simulation, sequence analysis Parallel algorithms/architectures, computational structural biology Phylogeny reconstruction algorithms, physiome, protein structure prediction Sequence assembly, search, alignment Signalling/computational biomedical data engineering Simulated annealing, statistical analysis, stochastic grammars.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信