DAG工作流的主动存储集群文件系统设计

International Symposium on Design and Implementation of Symbolic Computation Systems Pub Date : 2013-11-18 DOI:10.1145/2534645.2534656

P. Donnelly, D. Thain

{"title":"DAG工作流的主动存储集群文件系统设计","authors":"P. Donnelly, D. Thain","doi":"10.1145/2534645.2534656","DOIUrl":null,"url":null,"abstract":"We present the conceptual design of Confuga, a cluster file system designed to meet the needs of DAG-structured workflows. Today's premier cluster file system Hadoop is commonly used to support large peta-scale data sets on commodity hardware and to exploit active storage through Map-Reduce, a specific workflow pattern. Unfortunately, DAG-structured workflows have very different requirements from Map-Reduce workflows: whole-file access is standard and multiple dependencies are common. Confuga will meet these new requirements by replicating rather than striping files as in Hadoop, by exploiting DAG-structured workflow consistency semantics, and by permitting multiple dependencies in job descriptions. To the end user, Confuga will appear as a drop-in replacement for a batch system and a file system, combined into a single entity that can be invoked by existing workflow managers. In this paper, we describe the design philosophy of Confuga, sketch the major components of the system, and explain how the system will behave under expected workloads.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Design of an active storage cluster file system for DAG workflows\",\"authors\":\"P. Donnelly, D. Thain\",\"doi\":\"10.1145/2534645.2534656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present the conceptual design of Confuga, a cluster file system designed to meet the needs of DAG-structured workflows. Today's premier cluster file system Hadoop is commonly used to support large peta-scale data sets on commodity hardware and to exploit active storage through Map-Reduce, a specific workflow pattern. Unfortunately, DAG-structured workflows have very different requirements from Map-Reduce workflows: whole-file access is standard and multiple dependencies are common. Confuga will meet these new requirements by replicating rather than striping files as in Hadoop, by exploiting DAG-structured workflow consistency semantics, and by permitting multiple dependencies in job descriptions. To the end user, Confuga will appear as a drop-in replacement for a batch system and a file system, combined into a single entity that can be invoked by existing workflow managers. In this paper, we describe the design philosophy of Confuga, sketch the major components of the system, and explain how the system will behave under expected workloads.\",\"PeriodicalId\":166804,\"journal\":{\"name\":\"International Symposium on Design and Implementation of Symbolic Computation Systems\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Design and Implementation of Symbolic Computation Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2534645.2534656\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Design and Implementation of Symbolic Computation Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2534645.2534656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

我们提出了Confuga的概念设计，这是一个集群文件系统，旨在满足dag结构化工作流的需求。当今最重要的集群文件系统Hadoop通常用于支持商用硬件上的大规模数据集，并通过Map-Reduce(一种特定的工作流模式)利用活动存储。不幸的是，dag结构化工作流与Map-Reduce工作流有着非常不同的需求:整个文件访问是标准的，多个依赖关系是常见的。Confuga将通过复制而不是像Hadoop那样条带化文件，利用dag结构的工作流一致性语义，以及允许在工作描述中使用多个依赖项来满足这些新需求。对于最终用户来说，Confuga将作为批处理系统和文件系统的临时替代品，合并成一个单一的实体，可以由现有的工作流管理器调用。在本文中，我们描述了Confuga的设计理念，概述了系统的主要组件，并解释了系统在预期工作负载下的行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Design of an active storage cluster file system for DAG workflows

We present the conceptual design of Confuga, a cluster file system designed to meet the needs of DAG-structured workflows. Today's premier cluster file system Hadoop is commonly used to support large peta-scale data sets on commodity hardware and to exploit active storage through Map-Reduce, a specific workflow pattern. Unfortunately, DAG-structured workflows have very different requirements from Map-Reduce workflows: whole-file access is standard and multiple dependencies are common. Confuga will meet these new requirements by replicating rather than striping files as in Hadoop, by exploiting DAG-structured workflow consistency semantics, and by permitting multiple dependencies in job descriptions. To the end user, Confuga will appear as a drop-in replacement for a batch system and a file system, combined into a single entity that can be invoked by existing workflow managers. In this paper, we describe the design philosophy of Confuga, sketch the major components of the system, and explain how the system will behave under expected workloads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Symposium on Design and Implementation of Symbolic Computation Systems

自引率

0.00%

发文量