ICE: Managing cold state for big data applications

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI:10.1109/ICDE.2016.7498262

B. Chandramouli, Justin J. Levandoski, Eli Cortez C. Vilarinho

{"title":"ICE: Managing cold state for big data applications","authors":"B. Chandramouli, Justin J. Levandoski, Eli Cortez C. Vilarinho","doi":"10.1109/ICDE.2016.7498262","DOIUrl":null,"url":null,"abstract":"The use of big data in a business revolves around a monitor-mine-manage (M3) loop: data is monitored in real-time, while mined insights are used to manage the business and derive value. While mining has traditionally been performed offline, recent years have seen an increasing need to perform all phases of M3 in real-time. A stream processing engine (SPE) enables such a seamless M3 loop for applications such as targeted advertising, recommender systems, risk analysis, and call-center analytics. However, these M3 applications require the SPE to maintain massive amounts of state in memory, leading to resource usage skew: memory is scarce and over-utilized, whereas CPU and I/O are under-utilized. In this paper, we propose a novel solution to scaling SPEs for memory-bound M3 applications that leverages natural access skew in data-parallel subqueries, where a small fraction of the state is hot (frequently accessed) and most state is cold (infrequently accessed). We present ICE (incremental coldstate engine), a framework that allows an SPE to seamlessly migrate cold state to secondary storage (disk or flash). ICE uses a novel architecture that exploits the semantics of individual stream operators to efficiently manage cold state in an SPE using an incremental log-structured store. We implemented ICE inside an SPE. Experiments using real data show that ICE can reduce memory usage significantly without sacrificing performance, and can sometimes even improve performance.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"47 1","pages":"457-468"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2016.7498262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The use of big data in a business revolves around a monitor-mine-manage (M3) loop: data is monitored in real-time, while mined insights are used to manage the business and derive value. While mining has traditionally been performed offline, recent years have seen an increasing need to perform all phases of M3 in real-time. A stream processing engine (SPE) enables such a seamless M3 loop for applications such as targeted advertising, recommender systems, risk analysis, and call-center analytics. However, these M3 applications require the SPE to maintain massive amounts of state in memory, leading to resource usage skew: memory is scarce and over-utilized, whereas CPU and I/O are under-utilized. In this paper, we propose a novel solution to scaling SPEs for memory-bound M3 applications that leverages natural access skew in data-parallel subqueries, where a small fraction of the state is hot (frequently accessed) and most state is cold (infrequently accessed). We present ICE (incremental coldstate engine), a framework that allows an SPE to seamlessly migrate cold state to secondary storage (disk or flash). ICE uses a novel architecture that exploits the semantics of individual stream operators to efficiently manage cold state in an SPE using an incremental log-structured store. We implemented ICE inside an SPE. Experiments using real data show that ICE can reduce memory usage significantly without sacrificing performance, and can sometimes even improve performance.

查看原文本刊更多论文

ICE:管理大数据应用的冷状态

大数据在企业中的使用围绕着一个监控-挖掘-管理(M3)循环:数据被实时监控，而挖掘的见解被用于管理业务并获得价值。虽然采矿传统上是离线进行的，但近年来，人们越来越需要实时执行M3的所有阶段。流处理引擎(SPE)为定向广告、推荐系统、风险分析和呼叫中心分析等应用程序提供了这样一个无缝的M3循环。然而，这些M3应用程序需要SPE在内存中维护大量状态，从而导致资源使用倾斜:内存稀缺且过度使用，而CPU和I/O未得到充分利用。在本文中，我们提出了一种新的解决方案来扩展内存受限M3应用程序的spe，该解决方案利用数据并行子查询中的自然访问倾斜，其中一小部分状态是热的(经常访问)，而大多数状态是冷的(不经常访问)。我们提出了ICE(增量冷状态引擎)，这是一个允许SPE无缝地将冷状态迁移到二级存储(磁盘或闪存)的框架。ICE使用一种新颖的体系结构，该体系结构利用单个流操作符的语义，使用增量日志结构存储有效地管理SPE中的冷状态。我们在SPE中实现了ICE。使用真实数据的实验表明，ICE可以在不牺牲性能的情况下显著减少内存使用，有时甚至可以提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE 32nd International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量