{"title":"CMK: Enhancing Resource Usage Monitoring across Diverse Bioinformatics Workflow Management Systems","authors":"Robert Nica, Stefan Götz, Germán Moltó","doi":"10.1007/s10723-024-09777-z","DOIUrl":null,"url":null,"abstract":"<p>The increasing use of multiple Workflow Management Systems (WMS) employing various workflow languages and shared workflow repositories enhances the open-source bioinformatics ecosystem. Efficient resource utilization in these systems is crucial for keeping costs low and improving processing times, especially for large-scale bioinformatics workflows running in cloud environments. Recognizing this, our study introduces a novel reference architecture, Cloud Monitoring Kit (CMK), for a multi-platform monitoring system. Our solution is designed to generate uniform, aggregated metrics from containerized workflow tasks scheduled by different WMS. Central to the proposed solution is the use of task labeling methods, which enable convenient grouping and aggregating of metrics independent of the WMS employed. This approach builds upon existing technology, providing additional benefits of modularity and capacity to seamlessly integrate with other data processing or collection systems. We have developed and released an open-source implementation of our system, which we evaluated on Amazon Web Services (AWS) using a transcriptomics data analysis workflow executed on two scientific WMS. The findings of this study indicate that CMK provides valuable insights into resource utilization. In doing so, it paves the way for more efficient management of resources in containerized scientific workflows running in public cloud environments, and it provides a foundation for optimizing task configurations, reducing costs, and enhancing scheduling decisions. Overall, our solution addresses the immediate needs of bioinformatics workflows and offers a scalable and adaptable framework for future advancements in cloud-based scientific computing.</p>","PeriodicalId":54817,"journal":{"name":"Journal of Grid Computing","volume":"13 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Grid Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10723-024-09777-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The increasing use of multiple Workflow Management Systems (WMS) employing various workflow languages and shared workflow repositories enhances the open-source bioinformatics ecosystem. Efficient resource utilization in these systems is crucial for keeping costs low and improving processing times, especially for large-scale bioinformatics workflows running in cloud environments. Recognizing this, our study introduces a novel reference architecture, Cloud Monitoring Kit (CMK), for a multi-platform monitoring system. Our solution is designed to generate uniform, aggregated metrics from containerized workflow tasks scheduled by different WMS. Central to the proposed solution is the use of task labeling methods, which enable convenient grouping and aggregating of metrics independent of the WMS employed. This approach builds upon existing technology, providing additional benefits of modularity and capacity to seamlessly integrate with other data processing or collection systems. We have developed and released an open-source implementation of our system, which we evaluated on Amazon Web Services (AWS) using a transcriptomics data analysis workflow executed on two scientific WMS. The findings of this study indicate that CMK provides valuable insights into resource utilization. In doing so, it paves the way for more efficient management of resources in containerized scientific workflows running in public cloud environments, and it provides a foundation for optimizing task configurations, reducing costs, and enhancing scheduling decisions. Overall, our solution addresses the immediate needs of bioinformatics workflows and offers a scalable and adaptable framework for future advancements in cloud-based scientific computing.
期刊介绍:
Grid Computing is an emerging technology that enables large-scale resource sharing and coordinated problem solving within distributed, often loosely coordinated groups-what are sometimes termed "virtual organizations. By providing scalable, secure, high-performance mechanisms for discovering and negotiating access to remote resources, Grid technologies promise to make it possible for scientific collaborations to share resources on an unprecedented scale, and for geographically distributed groups to work together in ways that were previously impossible. Similar technologies are being adopted within industry, where they serve as important building blocks for emerging service provider infrastructures.
Even though the advantages of this technology for classes of applications have been acknowledged, research in a variety of disciplines, including not only multiple domains of computer science (networking, middleware, programming, algorithms) but also application disciplines themselves, as well as such areas as sociology and economics, is needed to broaden the applicability and scope of the current body of knowledge.