Niloy J. Mukherjee, S. Chavan, Maria Colgan, M. Gleeson, Xiaoming He, Allison L. Holloway, J. Kamp, Kartik Kulkarni, T. Lahiri, Juan R. Loaiza, N. MacNaughton, Atrayee Mullick, S. Muthulingam, V. Raja, Raunak Rungta
{"title":"分布式Oracle内存数据库的容错实时分析","authors":"Niloy J. Mukherjee, S. Chavan, Maria Colgan, M. Gleeson, Xiaoming He, Allison L. Holloway, J. Kamp, Kartik Kulkarni, T. Lahiri, Juan R. Loaiza, N. MacNaughton, Atrayee Mullick, S. Muthulingam, V. Raja, Raunak Rungta","doi":"10.1109/ICDE.2016.7498333","DOIUrl":null,"url":null,"abstract":"Modern data management systems are required to address new breeds of OLTAP applications. These applications demand real time analytical insights over massive data volumes not only on dedicated data warehouses but also on live mainstream production environments where data gets continuously ingested and modified. Oracle introduced the Database In-memory Option (DBIM) in 2014 as a unique dual row and column format architecture aimed to address the emerging space of mixed OLTAP applications along with traditional OLAP workloads. The architecture allows both the row format and the column format to be maintained simultaneously with strict transactional consistency. While the row format is persisted in underlying storage, the column format is maintained purely in-memory without incurring additional logging overheads in OLTP. Maintenance of columnar data purely in memory creates the need for distributed data management architectures. Performance of analytics incurs severe regressions in single server architectures during server failures as it takes non-trivial time to recover and rebuild terabytes of in-memory columnar format. A distributed and distribution aware architecture therefore becomes necessary to provide real time high availability of the columnar format for glitch-free in-memory analytic query execution across server failures and additions, besides providing scale out of capacity and compute to address real time throughput requirements over large volumes of in-memory data. In this paper, we will present the high availability aspects of the distributed architecture of Oracle DBIM that includes extremely scaled out application transparent column format duplication mechanism, distributed query execution on duplicated in-memory columnar format, and several scenarios of fault tolerant analytic query execution across the in-memory column format at various stages of redistribution of columnar data during cluster topology changes.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"03 1","pages":"1298-1309"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Fault-tolerant real-time analytics with distributed Oracle Database In-memory\",\"authors\":\"Niloy J. Mukherjee, S. Chavan, Maria Colgan, M. Gleeson, Xiaoming He, Allison L. Holloway, J. Kamp, Kartik Kulkarni, T. Lahiri, Juan R. Loaiza, N. MacNaughton, Atrayee Mullick, S. Muthulingam, V. Raja, Raunak Rungta\",\"doi\":\"10.1109/ICDE.2016.7498333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern data management systems are required to address new breeds of OLTAP applications. These applications demand real time analytical insights over massive data volumes not only on dedicated data warehouses but also on live mainstream production environments where data gets continuously ingested and modified. Oracle introduced the Database In-memory Option (DBIM) in 2014 as a unique dual row and column format architecture aimed to address the emerging space of mixed OLTAP applications along with traditional OLAP workloads. The architecture allows both the row format and the column format to be maintained simultaneously with strict transactional consistency. While the row format is persisted in underlying storage, the column format is maintained purely in-memory without incurring additional logging overheads in OLTP. Maintenance of columnar data purely in memory creates the need for distributed data management architectures. Performance of analytics incurs severe regressions in single server architectures during server failures as it takes non-trivial time to recover and rebuild terabytes of in-memory columnar format. A distributed and distribution aware architecture therefore becomes necessary to provide real time high availability of the columnar format for glitch-free in-memory analytic query execution across server failures and additions, besides providing scale out of capacity and compute to address real time throughput requirements over large volumes of in-memory data. In this paper, we will present the high availability aspects of the distributed architecture of Oracle DBIM that includes extremely scaled out application transparent column format duplication mechanism, distributed query execution on duplicated in-memory columnar format, and several scenarios of fault tolerant analytic query execution across the in-memory column format at various stages of redistribution of columnar data during cluster topology changes.\",\"PeriodicalId\":6883,\"journal\":{\"name\":\"2016 IEEE 32nd International Conference on Data Engineering (ICDE)\",\"volume\":\"03 1\",\"pages\":\"1298-1309\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 32nd International Conference on Data Engineering (ICDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2016.7498333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2016.7498333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
摘要
需要现代数据管理系统来解决新型OLTAP应用。这些应用程序需要对大量数据进行实时分析,不仅需要在专用数据仓库中,还需要在数据不断被摄取和修改的实时主流生产环境中。Oracle在2014年推出了数据库内存选项(Database in -memory Option, DBIM),作为一种独特的双行双列格式架构,旨在解决混合OLTAP应用程序和传统OLAP工作负载的新兴空间。该体系结构允许同时维护行格式和列格式,并具有严格的事务一致性。虽然行格式在底层存储中持久化,但列格式完全在内存中维护,不会在OLTP中产生额外的日志开销。纯粹在内存中维护列数据需要分布式数据管理架构。在服务器故障期间,单服务器架构中的分析性能会导致严重的退化,因为恢复和重建内存中tb的列格式需要花费大量时间。因此,除了提供超出容量的扩展和计算来解决大量内存数据的实时吞吐量需求外,还需要分布式和分布感知架构来提供柱状格式的实时高可用性,以便跨服务器故障和添加执行无故障的内存分析查询。在本文中,我们将介绍Oracle DBIM分布式架构的高可用性方面,包括高度向外扩展的应用透明列格式复制机制,在重复的内存列格式上执行分布式查询,以及在集群拓扑变化期间,在列数据重新分配的各个阶段跨内存列格式执行容错分析查询的几个场景。
Fault-tolerant real-time analytics with distributed Oracle Database In-memory
Modern data management systems are required to address new breeds of OLTAP applications. These applications demand real time analytical insights over massive data volumes not only on dedicated data warehouses but also on live mainstream production environments where data gets continuously ingested and modified. Oracle introduced the Database In-memory Option (DBIM) in 2014 as a unique dual row and column format architecture aimed to address the emerging space of mixed OLTAP applications along with traditional OLAP workloads. The architecture allows both the row format and the column format to be maintained simultaneously with strict transactional consistency. While the row format is persisted in underlying storage, the column format is maintained purely in-memory without incurring additional logging overheads in OLTP. Maintenance of columnar data purely in memory creates the need for distributed data management architectures. Performance of analytics incurs severe regressions in single server architectures during server failures as it takes non-trivial time to recover and rebuild terabytes of in-memory columnar format. A distributed and distribution aware architecture therefore becomes necessary to provide real time high availability of the columnar format for glitch-free in-memory analytic query execution across server failures and additions, besides providing scale out of capacity and compute to address real time throughput requirements over large volumes of in-memory data. In this paper, we will present the high availability aspects of the distributed architecture of Oracle DBIM that includes extremely scaled out application transparent column format duplication mechanism, distributed query execution on duplicated in-memory columnar format, and several scenarios of fault tolerant analytic query execution across the in-memory column format at various stages of redistribution of columnar data during cluster topology changes.