大数据、深度数据以及系统架构对性能的影响

2013 International Conference on Collaboration Technologies and Systems (CTS) Pub Date : 2013-05-20 DOI:10.1109/CTS.2013.6567201

P. Kogge

{"title":"大数据、深度数据以及系统架构对性能的影响","authors":"P. Kogge","doi":"10.1109/CTS.2013.6567201","DOIUrl":null,"url":null,"abstract":"Summary form only given. “Big Data” traditionally refers to some combination of high volume of data, high velocity of change, and/or wide variety and complexity of the underlying data. Solving such problems has evolved into using paradigms like MapReduce on large clusters of compute nodes. More recently, a growing number of “Deep Data” problems have arisen where it is the relationships between objects, and not necessarily the collections of objects, that are important, and for which the traditional implementation techniques are unsatisfactory. This talk addresses a study of a class of such “challenge problems” first formulated by David Bayliss of LexisNexis, and what are their execution characteristics on both current and future architectures. The goal is to discover, to at least a first order approximation, what are the tall poles preventing a speedup of their solution. A variety or architectures are considered, ranging from standard server blades in large scale configurations, to emerging variations that leverage simpler and more energy efficient chip sets, through systems built on 3D chip stacks, and on to new architectures that were designed from the ground up to “follow the links.” Such architectures are considered for two variants of such problems: a traditional partitioned data approach where data is “pre-boiled” to provide fast response, and one that uses very large graphs in very large shared memories. The results are not necessarily intuitive; the bottlenecks in such problems are not where current systems have the bulk of their capabilities or costs, nor where obvious near term upgrades will have major effects. Instead, it appears that only highly scalable memory-intensive architectures offer the potential for truly major gains in application performance.","PeriodicalId":256633,"journal":{"name":"2013 International Conference on Collaboration Technologies and Systems (CTS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Big data, deep data, and the effect of system architectures on performance\",\"authors\":\"P. Kogge\",\"doi\":\"10.1109/CTS.2013.6567201\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. “Big Data” traditionally refers to some combination of high volume of data, high velocity of change, and/or wide variety and complexity of the underlying data. Solving such problems has evolved into using paradigms like MapReduce on large clusters of compute nodes. More recently, a growing number of “Deep Data” problems have arisen where it is the relationships between objects, and not necessarily the collections of objects, that are important, and for which the traditional implementation techniques are unsatisfactory. This talk addresses a study of a class of such “challenge problems” first formulated by David Bayliss of LexisNexis, and what are their execution characteristics on both current and future architectures. The goal is to discover, to at least a first order approximation, what are the tall poles preventing a speedup of their solution. A variety or architectures are considered, ranging from standard server blades in large scale configurations, to emerging variations that leverage simpler and more energy efficient chip sets, through systems built on 3D chip stacks, and on to new architectures that were designed from the ground up to “follow the links.” Such architectures are considered for two variants of such problems: a traditional partitioned data approach where data is “pre-boiled” to provide fast response, and one that uses very large graphs in very large shared memories. The results are not necessarily intuitive; the bottlenecks in such problems are not where current systems have the bulk of their capabilities or costs, nor where obvious near term upgrades will have major effects. Instead, it appears that only highly scalable memory-intensive architectures offer the potential for truly major gains in application performance.\",\"PeriodicalId\":256633,\"journal\":{\"name\":\"2013 International Conference on Collaboration Technologies and Systems (CTS)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Collaboration Technologies and Systems (CTS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CTS.2013.6567201\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Collaboration Technologies and Systems (CTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CTS.2013.6567201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

只提供摘要形式。“大数据”传统上指的是高数据量、高变化速度和/或基础数据的多样性和复杂性的某种组合。解决这类问题已经演变成在大型计算节点集群上使用MapReduce这样的范例。最近，越来越多的“深度数据”问题出现了，其中对象之间的关系，而不一定是对象的集合，这是重要的，传统的实现技术是不令人满意的。本演讲将讨论由LexisNexis的David Bayliss首先提出的一类“挑战问题”，以及它们在当前和未来架构上的执行特征。我们的目标是发现，至少在一阶近似下，是什么阻碍了解的加速。考虑了各种架构，从大规模配置的标准服务器刀片，到利用更简单和更节能的芯片组的新兴变体，通过构建在3D芯片堆栈上的系统，以及从头开始设计的新架构“遵循链接”。这类架构被考虑用于这类问题的两种变体:一种是传统的分区数据方法，其中数据被“预煮”以提供快速响应;另一种是在非常大的共享内存中使用非常大的图。结果不一定是直观的;这些问题的瓶颈不在于当前系统的大部分能力或成本，也不在于近期明显升级将产生重大影响的地方。相反，似乎只有高度可伸缩的内存密集型架构才有可能在应用程序性能方面获得真正的重大收益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Big data, deep data, and the effect of system architectures on performance

Summary form only given. “Big Data” traditionally refers to some combination of high volume of data, high velocity of change, and/or wide variety and complexity of the underlying data. Solving such problems has evolved into using paradigms like MapReduce on large clusters of compute nodes. More recently, a growing number of “Deep Data” problems have arisen where it is the relationships between objects, and not necessarily the collections of objects, that are important, and for which the traditional implementation techniques are unsatisfactory. This talk addresses a study of a class of such “challenge problems” first formulated by David Bayliss of LexisNexis, and what are their execution characteristics on both current and future architectures. The goal is to discover, to at least a first order approximation, what are the tall poles preventing a speedup of their solution. A variety or architectures are considered, ranging from standard server blades in large scale configurations, to emerging variations that leverage simpler and more energy efficient chip sets, through systems built on 3D chip stacks, and on to new architectures that were designed from the ground up to “follow the links.” Such architectures are considered for two variants of such problems: a traditional partitioned data approach where data is “pre-boiled” to provide fast response, and one that uses very large graphs in very large shared memories. The results are not necessarily intuitive; the bottlenecks in such problems are not where current systems have the bulk of their capabilities or costs, nor where obvious near term upgrades will have major effects. Instead, it appears that only highly scalable memory-intensive architectures offer the potential for truly major gains in application performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 International Conference on Collaboration Technologies and Systems (CTS)

自引率

0.00%

发文量