2014 IEEE 30th International Conference on Data Engineering最新文献

筛选
英文 中文
Rethinking main memory OLTP recovery 重新思考主存OLTP恢复
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816685
Nirmesh Malviya, Ariel Weisberg, S. Madden, M. Stonebraker
{"title":"Rethinking main memory OLTP recovery","authors":"Nirmesh Malviya, Ariel Weisberg, S. Madden, M. Stonebraker","doi":"10.1109/ICDE.2014.6816685","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816685","url":null,"abstract":"Fine-grained, record-oriented write-ahead logging, as exemplified by systems like ARIES, has been the gold standard for relational database recovery. In this paper, we show that in modern high-throughput transaction processing systems, this is no longer the optimal way to recover a database system. In particular, as transaction throughputs get higher, ARIES-style logging starts to represent a non-trivial fraction of the overall transaction execution time. We propose a lighter weight, coarse-grained command logging technique which only records the transactions that were executed on the database. It then does recovery by starting from a transactionally consistent checkpoint and replaying the commands in the log as if they were new transactions. By avoiding the overhead of fine-grained logging of before and after images (both CPU complexity as well as substantial associated 110), command logging can yield significantly higher throughput at run-time. Recovery times for command logging are higher compared to an ARIEs-style physiological logging approach, but with the advent of high-availability techniques that can mask the outage of a recovering node, recovery speeds have become secondary in importance to run-time performance for most applications. We evaluated our approach on an implementation of TPCC in a main memory database system (VoltDB), and found that command logging can offer 1.5 x higher throughput than a main-memory optimized implementation of ARIEs-style physiological logging.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115641197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 135
Contract & Expand: I/O Efficient SCCs Computing 合同和扩展:I/O高效SCCs计算
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816652
Zhiwei Zhang, Lu Qin, J. Yu
{"title":"Contract & Expand: I/O Efficient SCCs Computing","authors":"Zhiwei Zhang, Lu Qin, J. Yu","doi":"10.1109/ICDE.2014.6816652","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816652","url":null,"abstract":"As an important branch of big data processing, big graph processing is becoming increasingly popular in recent years. Strongly connected component (SCC) computation is a fundamental graph operation on directed graphs, where an SCC is a maximal subgraph S of a directed graph G in which every pair of nodes is reachable from each other in S. By contracting each SCC into a node, a large general directed graph can be represented by a small directed acyclic graph (DAG). In the literature, there are I/O efficient semi-external algorithms to compute all SCCs of a graph G, by assuming that all nodes of a graph G can fit in the main memory. However, many real graphs are large and even the nodes cannot reside entirely in the main memory. In this paper, we study new I/O efficient external algorithms to find all SCCs for a directed graph G whose nodes cannot fit entirely in the main memory. To overcome the deficiency of the existing external graph contraction based approach that usually cannot stop in finite iterations, and the external DFS based approach that will generate a large number of random I/Os, we explore a new contraction-expansion based approach. In the graph contraction phase, instead of contracting the whole graph as the contraction based approach, we only contract the nodes of a graph, which are much more selective. The contraction phase stops when all nodes of the graph can fit in the main memory, such that the semi-external algorithm can be used in SCC computation. In the graph expansion phase, as the graph is expanded in the reverse order as it is contracted, the SCCs of all nodes in the graph are computed. Both graph contraction phase and graph expansion phase use only I/O efficient sequential scans and external sorts of nodes/edges in the graph. Our algorithm leverages the efficiency of the semi-external SCC computation algorithm and usually stops in a small number of iterations. We further optimize our approach by reducing the size of nodes and edges of the contracted graph in each iteration. We conduct extensive experimental studies using both real and synthetic web-scale graphs to confirm the I/O efficiency of our approaches.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116405072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Query optimization of distributed pattern matching 分布式模式匹配的查询优化
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816640
Jiewen Huang, K. Venkatraman, D. Abadi
{"title":"Query optimization of distributed pattern matching","authors":"Jiewen Huang, K. Venkatraman, D. Abadi","doi":"10.1109/ICDE.2014.6816640","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816640","url":null,"abstract":"Greedy algorithms for subgraph pattern matching operations are often sufficient when the graph data set can be held in memory on a single machine. However, as graph data sets increasingly expand and require external storage and partitioning across a cluster of machines, more sophisticated query optimization techniques become critical to avoid explosions in query latency. In this paper, we introduce several query optimization techniques for distributed graph pattern matching. These techniques include (1) a System-R style dynamic programming-based optimization algorithm that considers both linear and bushy plans, (2) a cycle detection-based algorithm that leverages cycles to reduce intermediate result set sizes, and (3) a computation reusing technique that eliminates redundant query execution and data transfer over the network. Experimental results show that these algorithms can lead to an order of magnitude improvement in query performance.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122533147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Leveraging metadata for identifying local, robust multi-variate temporal (RMT) features 利用元数据来识别本地的、健壮的多变量时态(RMT)特征
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816667
Xiaolan Wang, K. Candan, M. Sapino
{"title":"Leveraging metadata for identifying local, robust multi-variate temporal (RMT) features","authors":"Xiaolan Wang, K. Candan, M. Sapino","doi":"10.1109/ICDE.2014.6816667","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816667","url":null,"abstract":"Many applications generate and/or consume multi-variate temporal data, yet experts often lack the means to adequately and systematically search for and interpret multi-variate observations. In this paper, we first observe that multi-variate time series often carry localized multi-variate temporal features that are robust against noise. We then argue that these multi-variate temporal features can be extracted by simultaneously considering, at multiple scales, temporal characteristics of the time-series along with external knowledge, including variate relationships, known a priori. Relying on these observations, we develop algorithms to detect robust multi-variate temporal (RMT) features which can be indexed for efficient and accurate retrieval and can be used for supporting analysis tasks, such as classification. Experiments confirm that the proposed RMT algorithm is highly effective and efficient in identifying robust multi-scale temporal features of multi-variate time series.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124560407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
DBDesigner: A customizable physical design tool for Vertica Analytic Database DBDesigner:为Vertica分析数据库定制的物理设计工具
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816725
R. Varadarajan, V. Bharathan, A. Cary, J. Dave, Sreenath Bodagala
{"title":"DBDesigner: A customizable physical design tool for Vertica Analytic Database","authors":"R. Varadarajan, V. Bharathan, A. Cary, J. Dave, Sreenath Bodagala","doi":"10.1109/ICDE.2014.6816725","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816725","url":null,"abstract":"In this paper, we present Vertica's customizable physical design tool, called the DBDesigner (DBD), that produces designs optimized for various scenarios and applications. For a given workload and space budget, DBD automatically recommends a physical design that optimizes query performance, storage footprint, fault tolerance and recovery to meet different customer requirements. Vertica is a distributed, massively parallel columnar database that physically organizes data into projections. Projections are attribute subsets from one or more tables with tuples sorted by one or more attributes, that are replicated or segmented (distributed) on cluster nodes. The key challenges involved in projection design are picking appropriate column sets, sort orders, cluster data distributions and column encodings. To achieve the desired trade-off between query performance and storage footprint, DBD operates under three different design policies: (a) load-optimized, (b) query-optimized or (c) balanced. These policies indirectly control the number of projections proposed and queries optimized to achieve the desired balance. To cater to query workloads that evolve over time, DBD also operates in a comprehensive and incremental design mode. In addition, DBD lets users override specific features of projection design based on their intimate knowledge about the data and query workloads. We present the complete physical design algorithm, describing in detail how projection candidates are efficiently explored and evaluated using optimizer's cost and benefit model. Our experimental results show that DBD produces good physical designs that satisfy a variety of customer use cases.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129695452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
SLICE: Reviving regions-based pruning for reverse k nearest neighbors queries SLICE:为反向k近邻查询恢复基于区域的修剪
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816698
Shiyu Yang, M. A. Cheema, Xuemin Lin, Ying Zhang
{"title":"SLICE: Reviving regions-based pruning for reverse k nearest neighbors queries","authors":"Shiyu Yang, M. A. Cheema, Xuemin Lin, Ying Zhang","doi":"10.1109/ICDE.2014.6816698","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816698","url":null,"abstract":"Given a set of facilities and a set of users, a reverse k nearest neighbors (RkNN) query q returns every user for which the query facility is one of the k-closest facilities. Due to its importance, RkNN query has received significant research attention in the past few years. Almost all of the existing techniques adopt a pruning-and-verification framework. Regions-based pruning and half-space pruning are the two most notable pruning strategies. The half-space based approach prunes a larger area and is generally believed to be superior. Influenced by this perception, almost all existing RkNN algorithms utilize and improve the half-space pruning strategy. We observe the weaknesses and strengths of both strategies and discover that the regions-based pruning has certain strengths that have not been exploited in the past. Motivated by this, we present a new RkNN algorithm called SLICE that utilizes the strength of regions-based pruning and overcomes its limitations. Our extensive experimental study on synthetic and real data sets demonstrate that SLICE is significantly more efficient than the existing algorithms. We also provide a detailed theoretical analysis to analyze various aspects of our algorithm such as I/O cost, the unpruned area, and the cost of its verification phase etc. The experimental study validates our theoretical analysis.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126269172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Keyword-based correlated network computation over large social media 基于关键词的大型社交媒体相关网络计算
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816657
Jianxin Li, Chengfei Liu, Md. Saiful Islam
{"title":"Keyword-based correlated network computation over large social media","authors":"Jianxin Li, Chengfei Liu, Md. Saiful Islam","doi":"10.1109/ICDE.2014.6816657","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816657","url":null,"abstract":"Recent years have witnessed an unprecedented proliferation of social media, e.g., millions of blog posts, micro-blog posts, and social networks on the Internet. This kind of social media data can be modeled in a large graph where nodes represent the entities and edges represent relationships between entities of the social media. Discovering keyword-based correlated networks of these large graphs is an important primitive in data analysis, from which users can pay more attention about their concerned information in the large graph. In this paper, we propose and define the problem of keyword-based correlated network computation over a massive graph. To do this, we first present a novel tree data structure that only maintains the shortest path of any two graph nodes, by which the massive graph can be equivalently transformed into a tree data structure for addressing our proposed problem. After that, we design efficient algorithms to build the transformed tree data structure from a graph offline and compute the γ-bounded keyword matched subgraphs based on the pre-built tree data structure on the fly. To further improve the efficiency, we propose weighted shingle-based approximation approaches to measure the correlation among a large number of γ-bounded keyword matched subgraphs. At last, we develop a merge-sort based approach to efficiently generate the correlated networks. Our extensive experiments demonstrate the efficiency of our algorithms on reducing time and space cost. The experimental results also justify the effectiveness of our method in discovering correlated networks from three real datasets.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128084700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Finding common ground among experts' opinions on data clustering: With applications in malware analysis 在数据聚类的专家意见中找到共同点:在恶意软件分析中的应用
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816636
Guanhua Yan
{"title":"Finding common ground among experts' opinions on data clustering: With applications in malware analysis","authors":"Guanhua Yan","doi":"10.1109/ICDE.2014.6816636","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816636","url":null,"abstract":"Data clustering is a basic technique for knowledge discovery and data mining. As the volume of data grows significantly, data clustering becomes computationally prohibitive and resource demanding, and sometimes it is necessary to outsource these tasks to third party experts who specialize in data clustering. The goal of this work is to develop techniques that find common ground among experts' opinions on data clustering, which may be biased due to the features or algorithms used in clustering. Our work differs from the large body of existing approaches to consensus clustering, as we do not require all data objects be grouped into clusters. Rather, our work is motivated by real-world applications that demand high confidence in how data objects - if they are selected - are grouped together.We formulate the problem rigorously and show that it is NP-complete. We further develop a lightweight technique based on finding a maximum independent set in a 3-uniform hypergraph to select data objects that do not form conflicts among experts' opinions. We apply our proposed method to a real-world malware dataset with hundreds of thousands of instances to find malware clusters based on how multiple major AV (Anti-Virus) software classify these samples. Our work offers a new direction for consensus clustering by striking a balance between the clustering quality and the amount of data objects chosen to be clustered.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130059670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Pay-as-you-go reconciliation in schema matching networks 模式匹配网络中的现收现付协调
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816653
Nguyen Quoc Viet Hung, T. Nguyen, Z. Miklós, K. Aberer, A. Gal, M. Weidlich
{"title":"Pay-as-you-go reconciliation in schema matching networks","authors":"Nguyen Quoc Viet Hung, T. Nguyen, Z. Miklós, K. Aberer, A. Gal, M. Weidlich","doi":"10.1109/ICDE.2014.6816653","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816653","url":null,"abstract":"Schema matching is the process of establishing correspondences between the attributes of database schemas for data integration purposes. Although several automatic schema matching tools have been developed, their results are often incomplete or erroneous. To obtain a correct set of correspondences, a human expert is usually required to validate the generated correspondences. We analyze this reconciliation process in a setting where a number of schemas needs to be matched, in the presence of consistency expectations about the network of attribute correspondences. We develop a probabilistic model that helps to identify the most uncertain correspondences, thus allowing us to guide the expert's work and collect his input about the most problematic cases. As the availability of such experts is often limited, we develop techniques that can construct a set of good quality correspondences with a high probability, even if the expert does not validate all the necessary correspondences. We demonstrate the efficiency of our techniques through extensive experimentation using real-world datasets.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129373778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
ADaPT: Automatic Data Personalization based on contextual preferences ADaPT:基于上下文偏好的自动数据个性化
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816749
A. Miele, E. Quintarelli, Emanuele Rabosio, L. Tanca
{"title":"ADaPT: Automatic Data Personalization based on contextual preferences","authors":"A. Miele, E. Quintarelli, Emanuele Rabosio, L. Tanca","doi":"10.1109/ICDE.2014.6816749","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816749","url":null,"abstract":"This demo presents a framework for personalizing data access on the basis of the users' context and of the preferences they show while in that context. The system is composed of (i) a server application, which “tailors” a view over the available data on the basis of the user's contextual preferences, previously inferred from log data, and (ii) a client application running on the user's mobile device, which allows to query the data view and collects the activity log for later mining. At each change of context detected by the system the corresponding tailored view is loaded on the client device: accordingly, the most relevant data is available to the user even when the connection is unstable or lacking. The demo features a movie database, where users can browse data in different contexts and appreciate the personalization of the data views according to the inferred contextual preferences.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132876125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信