MadLINQ: large-scale distributed matrix computation for the cloud

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2012-04-10 DOI:10.1145/2168836.2168857

Zhengping Qian, Xiuwei Chen, Nanxi Kang, Mingcheng Chen, Yuan Yu, T. Moscibroda, Zheng Zhang

{"title":"MadLINQ: large-scale distributed matrix computation for the cloud","authors":"Zhengping Qian, Xiuwei Chen, Nanxi Kang, Mingcheng Chen, Yuan Yu, T. Moscibroda, Zheng Zhang","doi":"10.1145/2168836.2168857","DOIUrl":null,"url":null,"abstract":"The computation core of many data-intensive applications can be best expressed as matrix computations. The MadLINQ project addresses the following two important research problems: the need for a highly scalable, efficient and fault-tolerant matrix computation system that is also easy to program, and the seamless integration of such specialized execution engines in a general purpose data-parallel computing system.\n MadLINQ exposes a unified programming model to both matrix algorithm and application developers. Matrix algorithms are expressed as sequential programs operating on tiles (i.e., sub-matrices). For application developers, MadLINQ provides a distributed matrix computation library for .NET languages. Via the LINQ technology, MadLINQ also seamlessly integrates with DryadLINQ, a data-parallel computing system focusing on relational algebra.\n The system automatically handles the parallelization and distributed execution of programs on a large cluster. It outperforms current state-of-the-art systems by employing two key techniques, both of which are enabled by the matrix abstraction: exploiting extra parallelism using fine-grained pipelining and efficient on-demand failure recovery using a distributed fault-tolerant execution engine. We describe the design and implementation of MadLINQ and evaluate system performance using several real-world applications.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"10 1","pages":"197-210"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eleventh European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2168836.2168857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 69

Abstract

The computation core of many data-intensive applications can be best expressed as matrix computations. The MadLINQ project addresses the following two important research problems: the need for a highly scalable, efficient and fault-tolerant matrix computation system that is also easy to program, and the seamless integration of such specialized execution engines in a general purpose data-parallel computing system. MadLINQ exposes a unified programming model to both matrix algorithm and application developers. Matrix algorithms are expressed as sequential programs operating on tiles (i.e., sub-matrices). For application developers, MadLINQ provides a distributed matrix computation library for .NET languages. Via the LINQ technology, MadLINQ also seamlessly integrates with DryadLINQ, a data-parallel computing system focusing on relational algebra. The system automatically handles the parallelization and distributed execution of programs on a large cluster. It outperforms current state-of-the-art systems by employing two key techniques, both of which are enabled by the matrix abstraction: exploiting extra parallelism using fine-grained pipelining and efficient on-demand failure recovery using a distributed fault-tolerant execution engine. We describe the design and implementation of MadLINQ and evaluate system performance using several real-world applications.

查看原文本刊更多论文

MadLINQ:用于云的大规模分布式矩阵计算

许多数据密集型应用的计算核心可以用矩阵计算来最好地表达。MadLINQ项目解决了以下两个重要的研究问题:需要一个高度可扩展、高效和容错的矩阵计算系统，并且易于编程，以及在通用数据并行计算系统中无缝集成这种专门的执行引擎。MadLINQ向矩阵算法和应用程序开发人员公开了统一的编程模型。矩阵算法表示为在块(即子矩阵)上操作的顺序程序。对于应用程序开发人员，MadLINQ为。net语言提供了一个分布式矩阵计算库。通过LINQ技术，MadLINQ还与DryadLINQ无缝集成，DryadLINQ是一个专注于关系代数的数据并行计算系统。该系统自动处理大型集群上程序的并行化和分布式执行。通过采用两项关键技术(这两项技术都是由矩阵抽象实现的)，它优于当前最先进的系统:使用细粒度管道利用额外的并行性，使用分布式容错执行引擎利用高效的按需故障恢复。我们描述了MadLINQ的设计和实现，并使用几个实际应用程序评估了系统性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Eleventh European Conference on Computer Systems

自引率

0.00%

发文量