2016 IEEE 32nd International Conference on Data Engineering (ICDE)最新文献_第10页

Ranking support for matched patterns over complex event streams: The CEPR system 对复杂事件流上匹配模式的排序支持:CEPR系统

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498343

Jiaqi Gu, Jin Wang, C. Zaniolo

引用次数: 7

HAWK: Hardware support for unstructured log processing HAWK:非结构化日志处理的硬件支持

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498263

Prateek Tandon, Faissal M. Sleiman, Michael J. Cafarella, T. Wenisch

{"title":"HAWK: Hardware support for unstructured log processing","authors":"Prateek Tandon, Faissal M. Sleiman, Michael J. Cafarella, T. Wenisch","doi":"10.1109/ICDE.2016.7498263","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498263","url":null,"abstract":"Rapidly processing high-velocity text data is critical for many technical and business applications. Widely used software solutions for processing these large text corpora target disk-resident data and rely on pre-computed indexes and large clusters to achieve high performance. However, greater capacity and falling costs are enabling a shift to RAM-resident data sets. The enormous bandwidth of RAM can facilitate scan operations that are competitive with pre-computed indexes for interactive, ad-hoc queries. However, software approaches for processing these large text corpora fall far short of saturating available bandwidth and meeting peak scan rates possible on modern memory systems. In this paper, we present HAWK, a hardware accelerator for ad hoc queries against large in-memory logs. HAWK comprises a stall-free hardware pipeline that scans input data at a constant rate, examining multiple input characters in parallel during a single accelerator clock cycle. We describe a 1GHz 32-characterwide HAWK design targeting ASIC implementation, designed to process data at 32GB/s (up to two orders of magnitude faster than software solutions), and demonstrate a scaled-down FPGA prototype that operates at 100MHz with 4-wide parallelism, which processes at 400MB/s (13× faster than software grep for large multi-pattern scans).","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"4 4 1","pages":"469-480"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75939254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Virtual lightweight snapshots for consistent analytics in NoSQL stores 用于NoSQL存储一致分析的虚拟轻量级快照

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498334

F. Chirigati, Jérôme Siméon, Martin Hirzel, J. Freire

引用次数: 7

FastFunction: Replacing a herd of lemmings with a cheetah a ruby framework for interaction with PostgreSQL databases FastFunction:用猎豹代替一群旅鼠，一个与PostgreSQL数据库交互的ruby框架

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498331

Henrietta Dombrovskaya, Srivathsava Rangarajan, Jonathan Marks

引用次数: 1

A model-based approach for text clustering with outlier detection 基于模型的离群点检测文本聚类方法

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498276

Jianhua Yin, Jianyong Wang

引用次数: 59

Fault-tolerant real-time analytics with distributed Oracle Database In-memory 分布式Oracle内存数据库的容错实时分析

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498333

Niloy J. Mukherjee, S. Chavan, Maria Colgan, M. Gleeson, Xiaoming He, Allison L. Holloway, J. Kamp, Kartik Kulkarni, T. Lahiri, Juan R. Loaiza, N. MacNaughton, Atrayee Mullick, S. Muthulingam, V. Raja, Raunak Rungta

{"title":"Fault-tolerant real-time analytics with distributed Oracle Database In-memory","authors":"Niloy J. Mukherjee, S. Chavan, Maria Colgan, M. Gleeson, Xiaoming He, Allison L. Holloway, J. Kamp, Kartik Kulkarni, T. Lahiri, Juan R. Loaiza, N. MacNaughton, Atrayee Mullick, S. Muthulingam, V. Raja, Raunak Rungta","doi":"10.1109/ICDE.2016.7498333","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498333","url":null,"abstract":"Modern data management systems are required to address new breeds of OLTAP applications. These applications demand real time analytical insights over massive data volumes not only on dedicated data warehouses but also on live mainstream production environments where data gets continuously ingested and modified. Oracle introduced the Database In-memory Option (DBIM) in 2014 as a unique dual row and column format architecture aimed to address the emerging space of mixed OLTAP applications along with traditional OLAP workloads. The architecture allows both the row format and the column format to be maintained simultaneously with strict transactional consistency. While the row format is persisted in underlying storage, the column format is maintained purely in-memory without incurring additional logging overheads in OLTP. Maintenance of columnar data purely in memory creates the need for distributed data management architectures. Performance of analytics incurs severe regressions in single server architectures during server failures as it takes non-trivial time to recover and rebuild terabytes of in-memory columnar format. A distributed and distribution aware architecture therefore becomes necessary to provide real time high availability of the columnar format for glitch-free in-memory analytic query execution across server failures and additions, besides providing scale out of capacity and compute to address real time throughput requirements over large volumes of in-memory data. In this paper, we will present the high availability aspects of the distributed architecture of Oracle DBIM that includes extremely scaled out application transparent column format duplication mechanism, distributed query execution on duplicated in-memory columnar format, and several scenarios of fault tolerant analytic query execution across the in-memory column format at various stages of redistribution of columnar data during cluster topology changes.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"03 1","pages":"1298-1309"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86523050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

“Told you i didn't like it”: Exploiting uninteresting items for effective collaborative filtering “告诉过你我不喜欢它”:利用无趣的项目进行有效的协同过滤

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498253

Won-Seok Hwang, J. Parc, Sang-Wook Kim, Jongwuk Lee, Dongwon Lee

{"title":"“Told you i didn't like it”: Exploiting uninteresting items for effective collaborative filtering","authors":"Won-Seok Hwang, J. Parc, Sang-Wook Kim, Jongwuk Lee, Dongwon Lee","doi":"10.1109/ICDE.2016.7498253","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498253","url":null,"abstract":"We study how to improve the accuracy and running time of top-N recommendation with collaborative filtering (CF). Unlike existing works that use mostly rated items (which is only a small fraction in a rating matrix), we propose the notion of pre-use preferences of users toward a vast amount of unrated items. Using this novel notion, we effectively identify uninteresting items that were not rated yet but are likely to receive very low ratings from users, and impute them as zero. This simple-yet-novel zero-injection method applied to a set of carefully-chosen uninteresting items not only addresses the sparsity problem by enriching a rating matrix but also completely prevents uninteresting items from being recommended as top-N items, thereby improving accuracy greatly. As our proposed idea is method-agnostic, it can be easily applied to a wide variety of popular CF methods. Through comprehensive experiments using the Movielens dataset and MyMediaLite implementation, we successfully demonstrate that our solution consistently and universally improves the accuracies of popular CF methods (e.g., item-based CF, SVD-based CF, and SVD++) by two to five orders of magnitude on average. Furthermore, our approach reduces the running time of those CF methods by 1.2 to 2.3 times when its setting produces the best accuracy. The datasets and codes that we used in experiments are available at: https://goo.gl/KUrmip.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"49 1","pages":"349-360"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89247949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Towards Virtual Private NoSQL datastores 迈向虚拟私有NoSQL数据存储

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498240

Pietro Colombo, E. Ferrari

引用次数: 19

Keyword-aware continuous kNN query on road networks 基于关键字感知的道路网络连续kNN查询

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498297

Bolong Zheng, Kai Zheng, Xiaokui Xiao, Han Su, Hongzhi Yin, Xiaofang Zhou, Guohui Li

{"title":"Keyword-aware continuous kNN query on road networks","authors":"Bolong Zheng, Kai Zheng, Xiaokui Xiao, Han Su, Hongzhi Yin, Xiaofang Zhou, Guohui Li","doi":"10.1109/ICDE.2016.7498297","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498297","url":null,"abstract":"It is nowadays quite common for road networks to have textual contents on the vertices, which describe auxiliary information (e.g., business, traffic, etc.) associated with the vertex. In such road networks, which are modelled as weighted undirected graphs, each vertex is associated with one or more keywords, and each edge is assigned with a weight, which can be its physical length or travelling time. In this paper, we study the problem of keyword-aware continuous k nearest neighbour (KCkNN) search on road networks, which computes the k nearest vertices that contain the query keywords issued by a moving object and maintains the results continuously as the object is moving on the road network. Reducing the query processing costs in terms of computation and communication has attracted considerable attention in the database community with interesting techniques proposed. This paper proposes a framework, called a Labelling AppRoach for Continuous kNN query (LARC), on road networks to cope with KCkNN query efficiently. First we build a pivot-based reverse label index and a keyword-based pivot tree index to improve the efficiency of keyword-aware k nearest neighbour (KkNN) search by avoiding massive network traversals and sequential probe of keywords. To reduce the frequency of unnecessary result updates, we develop the concepts of dominance interval and region on road network, which share the similar intuition with safe region for processing continuous queries in Euclidean space but are more complicated and thus require more dedicated design. For high frequency keywords, we resolve the dominance interval when the query results changed. In addition, a path-based dominance updating approach is proposed to compute the dominance region efficiently when the query keywords are of low frequency. We conduct extensive experiments by comparing our algorithms with the state-of-the-art methods on real data sets. The empirical observations have verified the superiority of our proposed solution in all aspects of index size, communication cost and computation time.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"85 1","pages":"871-882"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84001472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Incremental updates on compressed XML 对压缩XML进行增量更新

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498310

S. Böttcher, Rita Hartel, T. Jacobs, S. Maneth

引用次数: 4