2016 IEEE 32nd International Conference on Data Engineering (ICDE)最新文献

筛选
英文 中文
Leveraging non-volatile memory for instant restarts of in-memory database systems 利用非易失性内存立即重新启动内存中的数据库系统
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498351
David Schwalb, Martin Faust, Markus Dreseler, Pedro Flemming, H. Plattner
{"title":"Leveraging non-volatile memory for instant restarts of in-memory database systems","authors":"David Schwalb, Martin Faust, Markus Dreseler, Pedro Flemming, H. Plattner","doi":"10.1109/ICDE.2016.7498351","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498351","url":null,"abstract":"Emerging non-volatile memory technologies (NVM) offer fast and byte-addressable access, allowing to rethink the durability mechanisms of in-memory databases. Hyrise-NV is a database storage engine that maintains table and index structures on NVM. Our architecture updates the database state and index structures transactionally consistent on NVM using multi-version data structures, allowing to instantly recover data-bases independent of their size. In this paper, we demonstrate the instant restart capabilities of Hyrise-NV, storing all data on non-volatile memory. Recovering a dataset of size 92.2 GB takes about 53 seconds using our log-based approach, whereas Hyrise-NV recovers in under one second.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"1386-1389"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88599417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Joint repairs for web wrappers 织网机的接缝修补
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498320
Stefano Ortona, G. Orsi, Tim Furche, Marcello Buoncristiano
{"title":"Joint repairs for web wrappers","authors":"Stefano Ortona, G. Orsi, Tim Furche, Marcello Buoncristiano","doi":"10.1109/ICDE.2016.7498320","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498320","url":null,"abstract":"Automated web scraping is a popular means for acquiring data from the web. Scrapers (or wrappers) are derived from either manually or automatically annotated examples, often resulting in under/over segmented data, together with missing or spurious content. Automatic repair and maintenance of the extracted data is thus a necessary complement to automatic wrapper generation. Moreover, the extracted data is often the result of a long-term data acquisition effort and thus jointly repairing wrappers together with the generated data reduces future needs for data cleaning. We study the problem of computing joint repairs for XPath-based wrappers and their extracted data. We show that the problem is NP-complete in general but becomes tractable under a few natural assumptions. Even tractable solutions to the problem are still impractical on very large datasets, but we propose an optimal approximation that proves effective across a wide variety of domains and sources. Our approach relies on encoded domain knowledge, but require no per-source supervision. An evaluation spanning more than 100k web pages from 100 different sites of a wide variety of application domains, shows that joint repairs are able to increase the quality of wrappers between 15% and 60% independently of the wrapper generation system, eliminating all errors in more than 50% of the cases.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"191 1","pages":"1146-1157"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74461790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
pSCAN: Fast and exact structural graph clustering pSCAN:快速和精确的结构图聚类
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498245
Lijun Chang, Wei Li, Xuemin Lin, Lu Qin, W. Zhang
{"title":"pSCAN: Fast and exact structural graph clustering","authors":"Lijun Chang, Wei Li, Xuemin Lin, Lu Qin, W. Zhang","doi":"10.1109/ICDE.2016.7498245","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498245","url":null,"abstract":"In this paper, we study the problem of structural graph clustering, a fundamental problem in managing and analyzing graph data. Given a large graph G = (V, E), structural graph clustering is to assign vertices in V to clusters and to identify the sets of hub vertices and outlier vertices as well, such that vertices in the same cluster are densely connected to each other while vertices in different clusters are loosely connected to each other. Firstly, we prove that the existing SCAN approach is worst-case optimal. Nevertheless, it is still not scalable to large graphs due to exhaustively computing structural similarity for every pair of adjacent vertices. Secondly, we make three observations about structural graph clustering, which present opportunities for further optimization. Based on these observations, in this paper we develop a new two-step paradigm for scalable structural graph clustering. Thirdly, following this paradigm, we present a new approach aiming to reduce the number of structural similarity computations. Moreover, we propose optimization techniques to speed up checking whether two vertices are structure-similar to each other. Finally, we conduct extensive performance studies on large real and synthetic graphs, which demonstrate that our new approach outperforms the state-of-the-art approaches by over one order of magnitude. Noticeably, for the twitter graph with 1 billion edges, our approach takes 25 minutes while the state-of-the-art approach cannot finish even after 24 hours.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"12 1","pages":"253-264"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77053800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
A column store engine for real-time streaming analytics 用于实时流分析的列存储引擎
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498332
Alex Skidanov, Anders J. Papito, A. Prout
{"title":"A column store engine for real-time streaming analytics","authors":"Alex Skidanov, Anders J. Papito, A. Prout","doi":"10.1109/ICDE.2016.7498332","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498332","url":null,"abstract":"This paper describes novel aspects of the column store implemented in the MemSQL database engine and describes the design choices made to support real-time streaming workloads. Column stores have traditionally been restricted to data warehouse scenarios where low latency queries are a secondary goal, and where restricting data ingestion to be offline, batched, append-only, or some combination thereof is acceptable. In contrast, the MemSQL column store implementation treats low latency queries and ongoing writes as first class citizens, with a focus on avoiding interference between read, ingest, update, and storage optimization workloads through the use of fragmented snapshot transactions and optimistic storage reordering. This implementation broadens the range of serviceable column store workloads to include those with more stringent demands on query and data latency, such as those backing operational systems used by adtech, financial services, fraud detection and other real-time or data streaming applications.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"139 1","pages":"1287-1297"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79913183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings 动态生成有效的近似子序列映射的变长签名
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498238
Jongik Kim, Chen Li, Xiaohui Xie
{"title":"Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings","authors":"Jongik Kim, Chen Li, Xiaohui Xie","doi":"10.1109/ICDE.2016.7498238","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498238","url":null,"abstract":"Recent advances in DNA sequencing have enabled a flood of sequencing-based applications for studying biology and medicine. A key requirement of these applications is to rapidly and accurately map DNA subsequences to a reference genome. This DNA subsequence mapping problem shares core technical challenges with the similarity query processing problem studied in the database research literature. To solve this problem, existing techniques first extract signatures from a query, then retrieve candidate mapping positions from an index using the extracted signatures, and finally verify the candidate positions. The efficiency of these techniques depends critically on signatures selected from queries, while signature selection relies on an indexing scheme of a reference genome. The q-gram inverted indexing, one of the most widely used indexing schemes, can discover candidate positions quickly, but has the limitation that signatures of queries are restricted to fixed-length q-grams. To address the problem, we propose a flexible way to generate variable-length signatures using a fixed-length q-gram index. The proposed technique groups a few q-grams into a variable-length signature, and generates candidate positions for the variable-length signature using the inverted lists of the q-grams. We also propose a novel dynamic programming algorithm to balance between the filtering power of signatures and the overhead of generating candidate positions for the signatures. Through extensive experiments on both simulated and real genomic data, we show that our technique substantially improves the performance of read mapping in terms of both mapping speed and accuracy.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"26 1","pages":"169-180"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81746037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Context-aware advertisement recommendation for high-speed social news feeding 基于上下文感知的广告推荐,支持高速社交新闻推送
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498266
Yuchen Li, Dongxiang Zhang, Ziquan Lan, K. Tan
{"title":"Context-aware advertisement recommendation for high-speed social news feeding","authors":"Yuchen Li, Dongxiang Zhang, Ziquan Lan, K. Tan","doi":"10.1109/ICDE.2016.7498266","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498266","url":null,"abstract":"Social media advertising is a multi-billion dollar market and has become the major revenue source for Facebook and Twitter. To deliver ads to potentially interested users, these social network platforms learn a prediction model for each user based on their personal interests. However, as user interests often evolve slowly, the user may end up receiving repetitive ads. In this paper, we propose a context-aware advertising framework that takes into account the relatively static personal interests as well as the dynamic news feed from friends to drive growth in the ad click-through rate. To meet the real-time requirement, we first propose an online retrieval strategy that finds k most relevant ads matching the dynamic context when a read operation is triggered. To avoid frequent retrieval when the context varies little, we propose a safe region method to quickly determine whether the top-k ads of a user are changed. Finally, we propose a hybrid model to combine the merits of both methods by analyzing the dynamism of news feed to determine an appropriate retrieval strategy. Extensive experiments conducted on multiple real social networks and ad datasets verified the efficiency and robustness of our hybrid model.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"43 1","pages":"505-516"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86511555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Accelerating database workloads by software-hardware-system co-design 通过软件-硬件系统协同设计加速数据库工作负载
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498362
R. Bordawekar, Mohammad Sadoghi
{"title":"Accelerating database workloads by software-hardware-system co-design","authors":"R. Bordawekar, Mohammad Sadoghi","doi":"10.1109/ICDE.2016.7498362","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498362","url":null,"abstract":"The key objective of this tutorial is to provide a broad, yet an in-depth survey of the emerging field of co-designing software, hardware, and systems components for accelerating enterprise data management workloads. The overall goal of this tutorial is two-fold. First, we provide a concise system-level characterization of different types of data management technologies, namely, the relational and NoSQL databases and data stream management systems from the perspective of analytical workloads. Using the characterization, we discuss opportunities for accelerating key data management workloads using software and hardware approaches. Second, we dive deeper into the hardware acceleration opportunities using Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs) for the query execution pipeline. Furthermore, we explore other hardware acceleration mechanisms such as single-instruction multiple-data (SIMD) that enables short-vector data parallelism.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"17 1","pages":"1428-1431"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87190999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Answering why-not questions on metric probabilistic range queries 回答关于度量概率范围查询的why-not问题
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498288
Lu Chen, Yunjun Gao, Kai Wang, Christian S. Jensen, Gang Chen
{"title":"Answering why-not questions on metric probabilistic range queries","authors":"Lu Chen, Yunjun Gao, Kai Wang, Christian S. Jensen, Gang Chen","doi":"10.1109/ICDE.2016.7498288","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498288","url":null,"abstract":"Metric probabilistic range queries (MPRQ) have received substantial attention due to their utility in multimedia and text retrieval, decision making, etc. Existing MPRQ studies generally aim to improve query efficiency and resource usage. In contrast, we define and offer solutions to why-not questions on MPRQ. Given an original metric probabilistic range query and a why-not set W of uncertain objects that are absent from the query result, a why-not question on MPRQ explains why the uncertain objects in W do not appear in the query result, and provides refinements of the original query and/or W with the minimal penalty, so that the uncertain objects in W appear in the result of the refined query. Specifically, we propose a framework that consists of three efficient solutions, one that modifies the original query, one that modifies the why-not set, and one that modifies both the original query and the why-not set. Extensive experiments using both real and synthetic data sets offer insights into the properties of the proposed algorithms, and show that they are effective and efficient.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"57 1","pages":"767-778"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88035175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Edge classification in networks 网络中的边缘分类
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498311
C. Aggarwal, Gewen He, Peixiang Zhao
{"title":"Edge classification in networks","authors":"C. Aggarwal, Gewen He, Peixiang Zhao","doi":"10.1109/ICDE.2016.7498311","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498311","url":null,"abstract":"We consider in this paper the edge classification problem in networks, which is defined as follows. Given a graph-structured network G(N, A), where N is a set of vertices and A ⊆ N ×N is a set of edges, in which a subset Al ⊆ A of edges are properly labeled a priori, determine for those edges in Au = AAl the edge labels which are unknown. The edge classification problem has numerous applications in graph mining and social network analysis, such as relationship discovery, categorization, and recommendation. Although the vertex classification problem has been well known and extensively explored in networks, edge classification is relatively unknown and in an urgent need for careful studies. In this paper, we present a series of efficient, neighborhood-based algorithms to perform edge classification in networks. To make the proposed algorithms scalable in large-scale networks, which can be either disk-resident or streamlike, we further devise efficient, cost-effective probabilistic edge classification methods without a significant compromise to the classification accuracy. We carry out experimental studies in a series of real-world networks, and the experimental results demonstrate both the effectiveness and efficiency of the proposed methods for edge classification in large networks.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"5 1","pages":"1038-1049"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85067625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
An interval join optimized for modern hardware 为现代硬件优化的间隔连接
2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498316
Danila Piatov, S. Helmer, Anton Dignös
{"title":"An interval join optimized for modern hardware","authors":"Danila Piatov, S. Helmer, Anton Dignös","doi":"10.1109/ICDE.2016.7498316","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498316","url":null,"abstract":"We develop an algorithm for efficiently joining relations on interval-based attributes with overlap predicates, which, for example, are commonly found in temporal databases. Using a new data structure and a lazy evaluation technique, we are able to achieve impressive performance gains by optimizing memory accesses exploiting features of modern CPU architectures. In an experimental evaluation with real-world datasets our algorithm is able to outperform the state-of-the-art by an order of magnitude.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"140 1","pages":"1098-1109"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85191273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信