An experimental comparison of complex object implementations for big data systems

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI:10.1145/3127479.3129248

Sourav Sikdar, Kia Teymourian, C. Jermaine

{"title":"An experimental comparison of complex object implementations for big data systems","authors":"Sourav Sikdar, Kia Teymourian, C. Jermaine","doi":"10.1145/3127479.3129248","DOIUrl":null,"url":null,"abstract":"Many cloud-based data management and analytics systems support complex objects. Dataflow platforms such as Spark and Flink allow programmers to manipulate sets consisting of objects from a host programming language (often Java). Document databases such as MongoDB make use of hierarchical interchange formats---most popularly JSON---which embody a data model where individual records can themselves contain sets of records. Systems such as Dremel and AsterixDB allow complex nesting of data structures. Clearly, no system designer would expect a system that stores JSON objects as text to perform at the same level as a system based upon a custom-built physical data model. The question we ask is: How significant is the performance hit associated with choosing a particular physical implementation? Is the choice going to result in a negligible performance cost, or one that is debilitating? Unfortunately, there does not exist a scientific study of the effect of physical complex model implementation on system performance in the literature. Hence it is difficult for a system designer to fully understand performance implications of such choices. This paper is an attempt to remedy that.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 Symposium on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3127479.3129248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Many cloud-based data management and analytics systems support complex objects. Dataflow platforms such as Spark and Flink allow programmers to manipulate sets consisting of objects from a host programming language (often Java). Document databases such as MongoDB make use of hierarchical interchange formats---most popularly JSON---which embody a data model where individual records can themselves contain sets of records. Systems such as Dremel and AsterixDB allow complex nesting of data structures. Clearly, no system designer would expect a system that stores JSON objects as text to perform at the same level as a system based upon a custom-built physical data model. The question we ask is: How significant is the performance hit associated with choosing a particular physical implementation? Is the choice going to result in a negligible performance cost, or one that is debilitating? Unfortunately, there does not exist a scientific study of the effect of physical complex model implementation on system performance in the literature. Hence it is difficult for a system designer to fully understand performance implications of such choices. This paper is an attempt to remedy that.

查看原文本刊更多论文

大数据系统中复杂对象实现的实验比较

许多基于云的数据管理和分析系统支持复杂对象。像Spark和Flink这样的数据流平台允许程序员操作由宿主编程语言(通常是Java)的对象组成的集合。MongoDB等文档数据库使用分层交换格式(最流行的是JSON)，它体现了一种数据模型，其中单个记录本身可以包含记录集。像Dremel和AsterixDB这样的系统允许复杂的数据结构嵌套。显然，没有系统设计人员会期望将JSON对象存储为文本的系统与基于自定义构建的物理数据模型的系统执行相同的级别。我们要问的问题是:选择特定的物理实现对性能的影响有多大?这种选择会导致可以忽略不计的性能成本，还是会削弱性能成本?遗憾的是，目前文献中还没有关于物理复杂模型实现对系统性能影响的科学研究。因此，系统设计师很难完全理解这些选择对性能的影响。本文试图弥补这一点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 Symposium on Cloud Computing

自引率

0.00%

发文量