2014 IEEE 30th International Conference on Data Engineering最新文献_第7页

Fast incremental SimRank on link-evolving graphs 链接演化图上的快速增量simmrank

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816660

Weiren Yu, Xuemin Lin, W. Zhang

{"title":"Fast incremental SimRank on link-evolving graphs","authors":"Weiren Yu, Xuemin Lin, W. Zhang","doi":"10.1109/ICDE.2014.6816660","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816660","url":null,"abstract":"SimRank is an arresting measure of node-pair similarity based on hyperlinks. It iteratively follows the concept that 2 nodes are similar if they are referenced by similar nodes. Real graphs are often large, and links constantly evolve with small changes over time. This paper considers fast incremental computations of SimRank on link-evolving graphs. The prior approach [12] to this issue factorizes the graph via a singular value decomposition (SVD) first, and then incrementally maintains this factorization for link updates at the expense of exactness. Consequently, all node-pair similarities are estimated in O(r4n2) time on a graph of n nodes, where r is the target rank of the low-rank approximation, which is not negligibly small in practice. In this paper, we propose a novel fast incremental paradigm. (1) We characterize the SimRank update matrix ΔS, in response to every link update, via a rank-one Sylvester matrix equation. By virtue of this, we devise a fast incremental algorithm computing similarities of n2 node-pairs in O(Kn2) time for K iterations. (2) We also propose an effective pruning technique capturing the “affected areas” of ΔS to skip unnecessary computations, without loss of exactness. This can further accelerate the incremental SimRank computation to O(K(nd+|AFF|)) time, where d is the average in-degree of the old graph, and |AFF| (≤ n2) is the size of “affected areas” in ΔS, and in practice, |AFF| ≪ n2. Our empirical evaluations verify that our algorithm (a) outperforms the best known link-update algorithm [12], and (b) runs much faster than its batch counterpart when link updates are small.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129257459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

History-aware query optimization with materialized intermediate views 具有物化中间视图的历史感知查询优化

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816678

L. Perez, C. Jermaine

引用次数: 39

Crowd-powered find algorithms 众筹搜索算法

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816715

A. Sarma, Aditya G. Parameswaran, H. Garcia-Molina, A. Halevy

引用次数: 60

Scalable top-k spatio-temporal term querying 可伸缩的top-k时空项查询

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816647

Anders Skovsgaard, Darius Sidlauskas, Christian S. Jensen

引用次数: 76

Managing uncertainty in spatial and spatio-temporal data 管理空间和时空数据中的不确定性

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816766

Reynold Cheng, Tobias Emrich, H. Kriegel, N. Mamoulis, M. Renz, Goce Trajcevski, Andreas Züfle

引用次数: 28

Cloud service placement via subgraph matching 通过子图匹配放置云服务

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816704

Bo Zong, R. Raghavendra, M. Srivatsa, Xifeng Yan, Ambuj K. Singh, Kang-Won Lee

{"title":"Cloud service placement via subgraph matching","authors":"Bo Zong, R. Raghavendra, M. Srivatsa, Xifeng Yan, Ambuj K. Singh, Kang-Won Lee","doi":"10.1109/ICDE.2014.6816704","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816704","url":null,"abstract":"Fast service placement, finding a set of nodes with enough free capacity of computation, storage, and network connectivity, is a routine task in daily cloud administration. In this work, we formulate this as a subgraph matching problem. Different from the traditional setting, including approximate and probabilistic graphs, subgraph matching on data-center networks has two unique properties. (1) Node/edge labels representing vacant CPU cycles and network bandwidth change rapidly, while the network topology varies little. (2) There is a partial order on node/edge labels. Basically, one needs to place service in nodes with enough free capacity. Existing graph indexing techniques have not considered very frequent label updates, and none of them supports partial order on numeric labels. Therefore, we resort to a new graph index framework, Gradin, to address both challenges. Gradin encodes subgraphs into multi-dimensional vectors and organizes them with indices such that it can efficiently search the matches of a query's subgraphs and combine them to form a full match. In particular, we analyze how the index parameters affect update and search performance with theoretical results. Moreover, a revised pruning algorithm is introduced to reduce unnecessary search during the combination of partial matches. Using both real and synthetic datasets, we demonstrate that Gradin outperforms the baseline approaches up to 10 times.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124019966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Effective location identification from microblogs 从微博中有效识别位置

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816708

Guoliang Li, Jun Hu, Jianhua Feng, K. Tan

{"title":"Effective location identification from microblogs","authors":"Guoliang Li, Jun Hu, Jianhua Feng, K. Tan","doi":"10.1109/ICDE.2014.6816708","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816708","url":null,"abstract":"The rapid development of social networks has resulted in a proliferation of user-generated content (UGC). The UGC data, when properly analyzed, can be beneficial to many applications. For example, identifying a user's locations from microblogs is very important for effective location-based advertisement and recommendation. In this paper, we study the problem of identifying a user's locations from microblogs. This problem is rather challenging because the location information in a microblog is incomplete and we cannot get an accurate location from a local microblog. To address this challenge, we propose a global location identification method, called Glitter. Glitter combines multiple microblogs of a user and utilizes them to identify the user's locations. Glitter not only improves the quality of identifying a user's location but also supplements the location of a microblog so as to obtain an accurate location of a microblog. To facilitate location identification, GLITTER organizes points of interest (POIs) into a tree structure where leaf nodes are POIs and non-leaf nodes are segments of POIs, e.g., countries, states, cities, districts, and streets. Using the tree structure, Glitter first extracts candidate locations from each microblog of a user which correspond to some tree nodes. Then Glitter aggregates these candidate locations and identifies top-k locations of the user. Using the identified top-k user locations, Glitter refines the candidate locations and computes top-k locations of each microblog. To achieve high recall, we enable fuzzy matching between locations and microblogs. We propose an incremental algorithm to support dynamic updates of microblogs. Experimental results on real-world datasets show that our method achieves high quality and good performance, and scales very well.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123206531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Continuous pattern detection over billion-edge graph using distributed framework 基于分布式框架的十亿边图连续模式检测

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816681

Jun Gao, Chang Zhou, Jiashuai Zhou, J. Yu

{"title":"Continuous pattern detection over billion-edge graph using distributed framework","authors":"Jun Gao, Chang Zhou, Jiashuai Zhou, J. Yu","doi":"10.1109/ICDE.2014.6816681","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816681","url":null,"abstract":"Continuous pattern detection plays an important role in monitoring-related applications. The large size and dynamic update of graphs, along with the massive search space, pose huge challenges in developing an efficient continuous pattern detection system. In this paper, we leverage a distributed graph processing framework to approximately detect a given pattern over a large dynamic graph. We aim to improve the scalability and precision, and reduce the response time and message cost in the detection. We convert a given query pattern into a Single-Sink DAG (Directed Acyclic Graph), and propose an evaluation plan with message transitions on the DAG, which is shorten by SSD plan, to detect the pattern in a large dynamic graph. SSD plan can guide the data graph exploration via messages, and the messages will converge at data sink vertices, which then detect existences of the query pattern. We also conduct join operations over partial vertices during the graph exploration to improve the precision of pattern detection. In addition, we show that SSD plan can support the continuous query over dynamic graphs with slight extensions. We further design various sink vertex selection strategies and neighborhood based transition rule attachment to lower the evaluation cost. The experiments on billion-edge real-life graphs using Giraph, an open source implementation of Pregel, illustrate the efficiency and effectiveness of our method.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"333 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132134699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Private search on key-value stores with hierarchical indexes 对具有层次索引的键值存储进行私有搜索

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816687

Haibo Hu, Jianliang Xu, Xizhong Xu, Kexin Pei, Byron Choi, Shuigeng Zhou

{"title":"Private search on key-value stores with hierarchical indexes","authors":"Haibo Hu, Jianliang Xu, Xizhong Xu, Kexin Pei, Byron Choi, Shuigeng Zhou","doi":"10.1109/ICDE.2014.6816687","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816687","url":null,"abstract":"Query processing that preserves both the query privacy at the client and the data privacy at the server is a new research problem. It has many practical applications, especially when the queries are about the sensitive attributes of records. However, most existing studies, including those originating from data outsourcing, address the data privacy and query privacy separately. Although secure multiparty computation (SMC) is a suitable computing paradigm for this problem, it has significant computation and communication overheads, thus unable to scale up to large datasets. Fortunately, recent advances in cryptography bring us two relevant tools - conditional oblivious transfer and homomorphic encryption. In this paper, we integrate database indexing techniques with these tools in the context of private search on key-value stores. We first present an oblivious index traversal framework, in which the server cannot trace the index traversal path of a query during evaluation. The framework is generic and can support a wide range of query types with a suitable homomorphic encryption algorithm in place. Based on this framework, we devise secure protocols for classic key search queries on B+-tree and R-tree indexes. Our approach is verified by both security analysis and performance study.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128796101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

MELODY-JOIN: Efficient Earth Mover's Distance similarity joins using MapReduce MELODY-JOIN:使用MapReduce的高效土方距离相似连接

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816702

Jin Huang, Rui Zhang, R. Buyya, Jian Chen

{"title":"MELODY-JOIN: Efficient Earth Mover's Distance similarity joins using MapReduce","authors":"Jin Huang, Rui Zhang, R. Buyya, Jian Chen","doi":"10.1109/ICDE.2014.6816702","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816702","url":null,"abstract":"The Earth Mover's Distance (EMD) similarity join retrieves pairs of records with EMD below a given threshold. It has a number of important applications such as near duplicate image retrieval and pattern analysis in probabilistic datasets. However, the computational cost of EMD is super cubic to the number of bins in the histograms used to represent the data objects. Consequently, the EMD similarity join operation is prohibitive for large datasets. This is the first paper that specifically addresses the EMD similarity join and we propose to use MapReduce to approach this problem. The MapReduce algorithms designed for generic metric distance similarity joins are inefficient for the EMD similarity join because they involve a large number of distance computations and have unbalanced workloads on reducers when dealing with skewed datasets. We propose a novel framework, named MELODY-JOIN, which transforms data into the space of EMD lower bounds and performs pruning and partitioning at a low cost because computing these EMD lower bounds has a constant complexity. Furthermore, we address two key problems, the limited pruning power and the unbalanced workloads, by enhancing each phase in the MELODY-JOIN framework. We conduct extensive experiments on real datasets. The results show that MELODY-JOIN outperforms the state-of-the-art technique by an order of magnitude, scales up better on large datasets than the state-of-the-art technique, and scales out well on distributed machines.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127048964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21