Comparative performance analysis of a Big Data NORA problem on a variety of architectures

2013 International Conference on Collaboration Technologies and Systems (CTS) Pub Date : 2013-05-20 DOI:10.1109/CTS.2013.6567199

P. Kogge, D. Bayliss

引用次数: 8

Abstract

Non Obvious Relationship Analysis (NORA) is one of the most stressing classes of Big Data Analytics problems. This paper proposes a reference NORA problem that is representative of real problems, and can rationally scale to very large sizes. It then develops a highly concurrent implementation that can run on large systems. Each step of this implementation is sized in terms of how much of four different resources (CPU, memory, disk, and network) might be used. From this, a parameterized model projecting both execution time and utilizations is used to identify the “tall poles” in performance. The parameters are then modified to represent several different target systems, from a large cluster typical of today to variations in an advanced architecture where processing has been moved into memory. A “thought experiment” then uses this model to discover the parameters of a system that would provide both a near 100X speedup, but with a balanced design where no resource is badly over or under utilized.

查看原文本刊更多论文

不同架构下大数据NORA问题的性能比较分析

非明显关系分析(NORA)是大数据分析中最具挑战性的问题之一。本文提出了一个代表实际问题的参考NORA问题，并且可以合理地扩展到非常大的规模。然后开发一个可以在大型系统上运行的高度并发实现。该实现的每个步骤都是根据可能使用的四种不同资源(CPU、内存、磁盘和网络)的多少来确定大小的。由此，可以使用一个参数化模型来预测执行时间和利用率，以确定性能中的“最高极点”。然后修改参数以表示几个不同的目标系统，从当今典型的大型集群到高级体系结构中的变体，其中处理已转移到内存中。然后，一个“思想实验”使用这个模型来发现一个系统的参数，这个系统既可以提供近100倍的加速，又具有平衡的设计，没有资源严重过剩或利用不足。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Conference on Collaboration Technologies and Systems (CTS)

自引率

0.00%

发文量