Efficient Integration of Containers into Scientific Workflows

Proceedings of the 9th Workshop on Scientific Cloud Computing Pub Date : 2018-06-11 DOI:10.1145/3217880.3217887

Kyle M. D. Sweeney, D. Thain

引用次数: 12

Abstract

Containers offer a powerful way to create portability for scientific applications. However yet incorporating them into workflows requires careful consideration, as straightforward approaches can increase network usage and runtime. We identified three issues in this process: container composition, containerizing workers or jobs, and container image translation. To tackle composition, we define data into three types: OS data, Read-Only, andWorking data, and define dynamic and static composition. Using the static composition (creating a single container for each job) leads to massive waste in sending duplicate data over the network. Dynamic composition (sending the data types separately) enables caching on worker nodes. To answer running workers or jobs inside a container, we looked at the costs of running inside of a container. Finally, when using different types of container technologies simultaneously, we found it's better to convert to the target image types before sending the container images, instead of repeating the same conversion at the job nodes, leading to more wasted time.

查看原文本刊更多论文

容器与科学工作流程的有效集成

容器为科学应用程序创建可移植性提供了一种强大的方式。然而，将它们合并到工作流中需要仔细考虑，因为直接的方法会增加网络使用和运行时间。我们在这个过程中确定了三个问题:容器组成、集装箱工人或工作以及容器图像翻译。为了解决组合问题，我们将数据定义为三种类型:操作系统数据、只读数据和工作数据，并定义了动态和静态组合。使用静态组合(为每个作业创建一个容器)会导致通过网络发送重复数据的大量浪费。动态组合(分别发送数据类型)支持在工作节点上进行缓存。为了回答在容器内运行工人或工作的问题，我们研究了在容器内运行的成本。最后，当同时使用不同类型的容器技术时，我们发现最好在发送容器映像之前转换为目标映像类型，而不是在作业节点上重复相同的转换，这会导致更多的时间浪费。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 9th Workshop on Scientific Cloud Computing

自引率

0.00%

发文量