Blending SQL and NewSQL Approaches: Reference Architectures for Enterprise Big Data Challenges

2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery Pub Date : 2013-10-10 DOI:10.1109/CyberC.2013.34

K. Doshi, Tao Zhong, Zhongyan Lu, Xi Tang, Ting Lou, Gang Deng

{"title":"Blending SQL and NewSQL Approaches: Reference Architectures for Enterprise Big Data Challenges","authors":"K. Doshi, Tao Zhong, Zhongyan Lu, Xi Tang, Ting Lou, Gang Deng","doi":"10.1109/CyberC.2013.34","DOIUrl":null,"url":null,"abstract":"As it becomes ever more pervasively engaged in data driven commerce, a modern enterprise becomes increasingly dependent upon reliable and high speed transaction services. At the same time it aspires to capitalize upon large inflows of information to draw timely business insights and improve business results. These two imperatives are frequently in conflict because of the widely divergent strategies that must be pursued: the need to bolster on-line transactional processing generally drives a business towards a small cluster of high-end servers running a mature, ACID compliant, SQL relational database, while high throughput analytics on massive and growing volumes of data favor the selection of very large clusters running non-traditional (NoSQL/NewSQL) databases that employ softer consistency protocols for performance and availability. This paper describes an approach in which the two imperatives are addressed by blending the two types (scale-up and scale-out) of data processing. It breaks down data growth that enterprises experience into three classes-Chronological, Horizontal, and Vertical, and picks out different approaches for blending SQL and NewSQL platforms for each class. To simplify application logic that must comprehend both types of data platforms, the paper describes two new capabilities: (a) a data integrator to quickly sift out updates that happen in an RDBMS and funnel them into a NewSQL database, and (b) extensions to the Hibernate-OGM framework that reduce the programming sophistication required for integrating HBase and Hive back ends with application logic designed for relational front ends. Finally the paper details several instances in which these approaches have been applied in real-world, at a number of software vendors with whom the authors have collaborated on design, implementation and deployment of blended solutions.","PeriodicalId":133756,"journal":{"name":"2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC.2013.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

As it becomes ever more pervasively engaged in data driven commerce, a modern enterprise becomes increasingly dependent upon reliable and high speed transaction services. At the same time it aspires to capitalize upon large inflows of information to draw timely business insights and improve business results. These two imperatives are frequently in conflict because of the widely divergent strategies that must be pursued: the need to bolster on-line transactional processing generally drives a business towards a small cluster of high-end servers running a mature, ACID compliant, SQL relational database, while high throughput analytics on massive and growing volumes of data favor the selection of very large clusters running non-traditional (NoSQL/NewSQL) databases that employ softer consistency protocols for performance and availability. This paper describes an approach in which the two imperatives are addressed by blending the two types (scale-up and scale-out) of data processing. It breaks down data growth that enterprises experience into three classes-Chronological, Horizontal, and Vertical, and picks out different approaches for blending SQL and NewSQL platforms for each class. To simplify application logic that must comprehend both types of data platforms, the paper describes two new capabilities: (a) a data integrator to quickly sift out updates that happen in an RDBMS and funnel them into a NewSQL database, and (b) extensions to the Hibernate-OGM framework that reduce the programming sophistication required for integrating HBase and Hive back ends with application logic designed for relational front ends. Finally the paper details several instances in which these approaches have been applied in real-world, at a number of software vendors with whom the authors have collaborated on design, implementation and deployment of blended solutions.

查看原文本刊更多论文

混合SQL和NewSQL方法:企业大数据挑战的参考架构

随着企业越来越广泛地从事数据驱动的商业活动，现代企业越来越依赖于可靠和高速的交易服务。与此同时，它渴望利用大量流入的信息来获得及时的业务见解并改善业务结果。由于必须采取的战略大相径庭，这两项必要措施经常发生冲突:支持在线事务处理的需求通常会推动企业转向运行成熟的、符合ACID的SQL关系数据库的高端服务器的小型集群，而对大量和不断增长的数据量进行高吞吐量分析则倾向于选择运行非传统(NoSQL/NewSQL)数据库的非常大的集群，这些数据库采用较软的一致性协议来提高性能和可用性。本文描述了一种方法，其中通过混合两种类型的数据处理(向上扩展和向外扩展)来解决这两个要求。它将企业经历的数据增长分为三种类型——时间、水平和垂直，并为每种类型选择了混合SQL和NewSQL平台的不同方法。为了简化必须理解这两种类型数据平台的应用程序逻辑，本文描述了两个新功能:(a)数据集成器快速筛选RDBMS中发生的更新并将它们汇集到NewSQL数据库中;(b) Hibernate-OGM框架的扩展，它减少了将HBase和Hive后端与为关系前端设计的应用程序逻辑集成所需的编程复杂性。最后，本文详细介绍了这些方法在现实世界中应用的几个实例，在许多软件供应商中，作者与他们合作设计、实现和部署混合解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

自引率

0.00%

发文量