Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning

Datenbanksysteme für Business, Technologie und Web Pub Date : 1900-01-01 DOI:10.18420/btw2019-17

Matthias Boehm, A. Evfimievski, B. Reinwald

{"title":"Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning","authors":"Matthias Boehm, A. Evfimievski, B. Reinwald","doi":"10.18420/btw2019-17","DOIUrl":null,"url":null,"abstract":"Cumulative aggregates are often overlooked yet important operations in large-scale machine learning (ML) systems. Examples are prefix sums and more complex aggregates, but also preprocessing techniques such as the removal of empty rows or columns. These operations are challenging to parallelize over distributed, blocked matrices—as commonly used in ML systems—due to recursive data dependencies. However, computing prefix sums is a classic example of a presumably sequential operation that can be efficiently parallelized via aggregation trees. In this paper, we describe an efficient framework for data-parallel cumulative aggregates over distributed, blocked matrices. The basic idea is a self-similar operator composed of a forward cascade that reduces the data size by orders of magnitude per iteration until the data fits in local memory, a local cumulative aggregate over the partial aggregates, and a backward cascade to produce the final result. We also generalize this framework for complex cumulative aggregates of sum-product expressions, and characterize the class of supported operations. Finally, we describe the end-to-end compiler and runtime integration into SystemML, and the use of cumulative aggregates in other operations. Our experiments show that this framework achieves both high performance for moderate data sizes and good scalability.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Datenbanksysteme für Business, Technologie und Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18420/btw2019-17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Cumulative aggregates are often overlooked yet important operations in large-scale machine learning (ML) systems. Examples are prefix sums and more complex aggregates, but also preprocessing techniques such as the removal of empty rows or columns. These operations are challenging to parallelize over distributed, blocked matrices—as commonly used in ML systems—due to recursive data dependencies. However, computing prefix sums is a classic example of a presumably sequential operation that can be efficiently parallelized via aggregation trees. In this paper, we describe an efficient framework for data-parallel cumulative aggregates over distributed, blocked matrices. The basic idea is a self-similar operator composed of a forward cascade that reduces the data size by orders of magnitude per iteration until the data fits in local memory, a local cumulative aggregate over the partial aggregates, and a backward cascade to produce the final result. We also generalize this framework for complex cumulative aggregates of sum-product expressions, and characterize the class of supported operations. Finally, we describe the end-to-end compiler and runtime integration into SystemML, and the use of cumulative aggregates in other operations. Our experiments show that this framework achieves both high performance for moderate data sizes and good scalability.

查看原文本刊更多论文

大规模机器学习的高效数据并行累积聚合

在大规模机器学习(ML)系统中，累积聚合通常是被忽视的重要操作。例如前缀和和更复杂的聚合，以及预处理技术，例如删除空行或空列。由于递归数据依赖关系，这些操作很难并行化分布的阻塞矩阵(如ML系统中常用的那样)。然而，计算前缀和是一个可以通过聚合树有效并行化的顺序操作的典型例子。在本文中，我们描述了一个有效的框架，用于数据并行累积聚合分布，阻塞矩阵。其基本思想是一个自相似的运算符，由前向级联组成，前向级联在每次迭代中按数量级减少数据大小，直到数据适合局部内存;局部累积聚合在部分聚合之上;后向级联产生最终结果。我们还将这个框架推广到和积表达式的复杂累积聚集，并描述了支持操作的类别。最后，我们描述了将端到端的编译器和运行时集成到SystemML中，以及在其他操作中使用累积聚合。我们的实验表明，该框架在中等数据量下实现了高性能和良好的可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Datenbanksysteme für Business, Technologie und Web

自引率

0.00%

发文量