Closha 2.0: a bio-workflow design system for massive genome data analysis on high performance cluster infrastructure.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2024-11-12 DOI:10.1186/s12859-024-05963-8

Gunhwan Ko, Pan-Gyu Kim, Byung-Ha Yoon, JaeHee Kim, Wangho Song, IkSu Byeon, JongCheol Yoon, Byungwook Lee, Young-Kuk Kim

{"title":"Closha 2.0: a bio-workflow design system for massive genome data analysis on high performance cluster infrastructure.","authors":"Gunhwan Ko, Pan-Gyu Kim, Byung-Ha Yoon, JaeHee Kim, Wangho Song, IkSu Byeon, JongCheol Yoon, Byungwook Lee, Young-Kuk Kim","doi":"10.1186/s12859-024-05963-8","DOIUrl":null,"url":null,"abstract":"Background: The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and significant computational challenges. As the cost of next-generation sequencing (NGS) has decreased, the amount of genomic data has surged globally. However, the cost and complexity of the computational resources required continue to be substantial barriers to leveraging big data. A promising solution to these computational challenges is cloud computing, which provides researchers with the necessary CPUs, memory, storage, and software tools.Results: Here, we present Closha 2.0, a cloud computing service that offers a user-friendly platform for analyzing massive genomic datasets. Closha 2.0 is designed to provide a cloud-based environment that enables all genomic researchers, including those with limited or no programming experience, to easily analyze their genomic data. The new 2.0 version of Closha has more user-friendly features than the previous 1.0 version. Firstly, the workbench features a script editor that supports Python, R, and shell script programming, enabling users to write scripts and integrate them into their pipelines. This functionality is particularly useful for downstream analysis. Second, Closha 2.0 runs on containers, which execute each tool in an independent environment. This provides a stable environment and prevents dependency issues and version conflicts among tools. Additionally, users can execute each step of a pipeline individually, allowing them to test applications at each stage and adjust parameters to achieve the desired results. We also updated a high-speed data transmission tool called GBox that facilitates the rapid transfer of large datasets.Conclusions: The analysis pipelines on Closha 2.0 are reproducible, with all analysis parameters and inputs being permanently recorded. Closha 2.0 simplifies multi-step analysis with drag-and-drop functionality and provides a user-friendly interface for genomic scientists to obtain accurate results from NGS data. Closha 2.0 is freely available at https://www.kobic.re.kr/closha2 .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"353"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558834/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05963-8","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and significant computational challenges. As the cost of next-generation sequencing (NGS) has decreased, the amount of genomic data has surged globally. However, the cost and complexity of the computational resources required continue to be substantial barriers to leveraging big data. A promising solution to these computational challenges is cloud computing, which provides researchers with the necessary CPUs, memory, storage, and software tools.

Results: Here, we present Closha 2.0, a cloud computing service that offers a user-friendly platform for analyzing massive genomic datasets. Closha 2.0 is designed to provide a cloud-based environment that enables all genomic researchers, including those with limited or no programming experience, to easily analyze their genomic data. The new 2.0 version of Closha has more user-friendly features than the previous 1.0 version. Firstly, the workbench features a script editor that supports Python, R, and shell script programming, enabling users to write scripts and integrate them into their pipelines. This functionality is particularly useful for downstream analysis. Second, Closha 2.0 runs on containers, which execute each tool in an independent environment. This provides a stable environment and prevents dependency issues and version conflicts among tools. Additionally, users can execute each step of a pipeline individually, allowing them to test applications at each stage and adjust parameters to achieve the desired results. We also updated a high-speed data transmission tool called GBox that facilitates the rapid transfer of large datasets.

Conclusions: The analysis pipelines on Closha 2.0 are reproducible, with all analysis parameters and inputs being permanently recorded. Closha 2.0 simplifies multi-step analysis with drag-and-drop functionality and provides a user-friendly interface for genomic scientists to obtain accurate results from NGS data. Closha 2.0 is freely available at https://www.kobic.re.kr/closha2 .

查看原文本刊更多论文

Closha 2.0：在高性能集群基础设施上进行海量基因组数据分析的生物工作流程设计系统。

背景：下一代测序数据的爆炸式增长带来了超大规模的数据集和巨大的计算挑战。随着下一代测序（NGS）成本的降低，全球基因组数据量激增。然而，所需计算资源的成本和复杂性仍然是利用大数据的巨大障碍。云计算是应对这些计算挑战的一个前景广阔的解决方案，它能为研究人员提供必要的 CPU、内存、存储空间和软件工具：在此，我们介绍 Closha 2.0，它是一种云计算服务，为分析海量基因组数据集提供了一个用户友好型平台。Closha 2.0旨在提供一个基于云计算的环境，让所有基因组研究人员，包括那些编程经验有限或毫无编程经验的人，都能轻松分析他们的基因组数据。与之前的 1.0 版本相比，新的 2.0 版本 Closha 具有更多方便用户的功能。首先，工作台具有脚本编辑器，支持 Python、R 和 shell 脚本编程，使用户能够编写脚本并将其集成到他们的管道中。这一功能对于下游分析尤为有用。其次，Closha 2.0 在容器中运行，每个工具都在独立的环境中运行。这就提供了一个稳定的环境，避免了工具之间的依赖性问题和版本冲突。此外，用户还可以单独执行流水线的每个步骤，从而可以在每个阶段测试应用并调整参数以达到预期效果。我们还更新了名为 GBox 的高速数据传输工具，该工具有助于快速传输大型数据集：Closha 2.0上的分析管道是可重复的，所有分析参数和输入都被永久记录下来。Closha 2.0 通过拖放功能简化了多步骤分析，为基因组科学家提供了友好的用户界面，使他们能从 NGS 数据中获得准确的结果。Closha 2.0 可在 https://www.kobic.re.kr/closha2 免费获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.