K. Doshi, Tao Zhong, Zhongyan Lu, Xi Tang, Ting Lou, Gang Deng
{"title":"Blending SQL and NewSQL Approaches: Reference Architectures for Enterprise Big Data Challenges","authors":"K. Doshi, Tao Zhong, Zhongyan Lu, Xi Tang, Ting Lou, Gang Deng","doi":"10.1109/CyberC.2013.34","DOIUrl":null,"url":null,"abstract":"As it becomes ever more pervasively engaged in data driven commerce, a modern enterprise becomes increasingly dependent upon reliable and high speed transaction services. At the same time it aspires to capitalize upon large inflows of information to draw timely business insights and improve business results. These two imperatives are frequently in conflict because of the widely divergent strategies that must be pursued: the need to bolster on-line transactional processing generally drives a business towards a small cluster of high-end servers running a mature, ACID compliant, SQL relational database, while high throughput analytics on massive and growing volumes of data favor the selection of very large clusters running non-traditional (NoSQL/NewSQL) databases that employ softer consistency protocols for performance and availability. This paper describes an approach in which the two imperatives are addressed by blending the two types (scale-up and scale-out) of data processing. It breaks down data growth that enterprises experience into three classes-Chronological, Horizontal, and Vertical, and picks out different approaches for blending SQL and NewSQL platforms for each class. To simplify application logic that must comprehend both types of data platforms, the paper describes two new capabilities: (a) a data integrator to quickly sift out updates that happen in an RDBMS and funnel them into a NewSQL database, and (b) extensions to the Hibernate-OGM framework that reduce the programming sophistication required for integrating HBase and Hive back ends with application logic designed for relational front ends. Finally the paper details several instances in which these approaches have been applied in real-world, at a number of software vendors with whom the authors have collaborated on design, implementation and deployment of blended solutions.","PeriodicalId":133756,"journal":{"name":"2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC.2013.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
As it becomes ever more pervasively engaged in data driven commerce, a modern enterprise becomes increasingly dependent upon reliable and high speed transaction services. At the same time it aspires to capitalize upon large inflows of information to draw timely business insights and improve business results. These two imperatives are frequently in conflict because of the widely divergent strategies that must be pursued: the need to bolster on-line transactional processing generally drives a business towards a small cluster of high-end servers running a mature, ACID compliant, SQL relational database, while high throughput analytics on massive and growing volumes of data favor the selection of very large clusters running non-traditional (NoSQL/NewSQL) databases that employ softer consistency protocols for performance and availability. This paper describes an approach in which the two imperatives are addressed by blending the two types (scale-up and scale-out) of data processing. It breaks down data growth that enterprises experience into three classes-Chronological, Horizontal, and Vertical, and picks out different approaches for blending SQL and NewSQL platforms for each class. To simplify application logic that must comprehend both types of data platforms, the paper describes two new capabilities: (a) a data integrator to quickly sift out updates that happen in an RDBMS and funnel them into a NewSQL database, and (b) extensions to the Hibernate-OGM framework that reduce the programming sophistication required for integrating HBase and Hive back ends with application logic designed for relational front ends. Finally the paper details several instances in which these approaches have been applied in real-world, at a number of software vendors with whom the authors have collaborated on design, implementation and deployment of blended solutions.