2014 IEEE 30th International Conference on Data Engineering最新文献_第2页

MassJoin: A mapreduce-based method for scalable string similarity joins MassJoin:一个基于mapreduce的方法，用于可伸缩的字符串相似连接

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816663

Dong Deng, Guoliang Li, Shuang Hao, Jiannan Wang, Jianhua Feng

引用次数: 112

LinkSCAN*: Overlapping community detection using the link-space transformation LinkSCAN*:使用链接空间变换的重叠社区检测

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816659

Sungsu Lim, Seungwoo Ryu, Sejeong Kwon, Kyomin Jung, Jae-Gil Lee

{"title":"LinkSCAN*: Overlapping community detection using the link-space transformation","authors":"Sungsu Lim, Seungwoo Ryu, Sejeong Kwon, Kyomin Jung, Jae-Gil Lee","doi":"10.1109/ICDE.2014.6816659","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816659","url":null,"abstract":"In this paper, for overlapping community detection, we propose a novel framework of the link-space transformation that transforms a given original graph into a link-space graph. Its unique idea is to consider topological structure and link similarity separately using two distinct types of graphs: the line graph and the original graph. For topological structure, each link of the original graph is mapped to a node of the link-space graph, which enables us to discover overlapping communities using non-overlapping community detection algorithms as in the line graph. For link similarity, it is calculated on the original graph and carried over into the link-space graph, which enables us to keep the original structure on the transformed graph. Thus, our transformation, by combining these two advantages, facilitates overlapping community detection as well as improves the resulting quality. Based on this framework, we develop the algorithm LinkSCAN that performs structural clustering on the link-space graph. Moreover, we propose the algorithm LinkSCAN* that enhances the efficiency of LinkSCAN by sampling. Extensive experiments were conducted using the LFR benchmark networks as well as some real-world networks. The results show that our algorithms achieve higher accuracy, quality, and coverage than the state-of-the-art algorithms.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132632266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 72

R-Store: A scalable distributed system for supporting real-time analytics R-Store:支持实时分析的可扩展分布式系统

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816638

Feng Li, M. Tamer Özsu, Gang Chen, B. Ooi

{"title":"R-Store: A scalable distributed system for supporting real-time analytics","authors":"Feng Li, M. Tamer Özsu, Gang Chen, B. Ooi","doi":"10.1109/ICDE.2014.6816638","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816638","url":null,"abstract":"It is widely recognized that OLTP and OLAP queries have different data access patterns, processing needs and requirements. Hence, the OLTP queries and OLAP queries are typically handled by two different systems, and the data are periodically extracted from the OLTP system, transformed and loaded into the OLAP system for data analysis. With the awareness of the ability of big data in providing enterprises useful insights from vast amounts of data, effective and timely decisions derived from real-time analytics are important. It is therefore desirable to provide real-time OLAP querying support, where OLAP queries read the latest data while OLTP queries create the new versions. In this paper, we propose R-Store, a scalable distributed system for supporting real-time OLAP by extending the MapReduce framework. We extend an open source distributed key/value system, HBase, as the underlying storage system that stores data cube and real-time data. When real-time data are updated, they are streamed to a streaming MapReduce, namely Hstreaming, for updating the cube on incremental basis. Based on the metadata stored in the storage system, either the data cube or OLTP database or both are used by the MapReduce jobs for OLAP queries. We propose techniques to efficiently scan the real-time data in the storage system, and design an adaptive algorithm to process the real-time query based on our proposed cost model. The main objectives are to ensure the freshness of answers and low processing latency. The experiments conducted on the TPC-H data set demonstrate the effectiveness and efficiency of our approach.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133094344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Outsourcing multi-version key-value stores with verifiable data freshness 外包具有可验证数据新鲜度的多版本键值存储

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816744

Y. Tang, Ling Liu, Ting Wang, Xin Hu, R. Sailer, P. Pietzuch

引用次数: 14

Efficient instant-fuzzy search with proximity ranking 有效的即时模糊搜索与邻近排名

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816662

Inci Cetindil, Jamshid Esmaelnezhad, Taewoo Kim, Chen Li

{"title":"Efficient instant-fuzzy search with proximity ranking","authors":"Inci Cetindil, Jamshid Esmaelnezhad, Taewoo Kim, Chen Li","doi":"10.1109/ICDE.2014.6816662","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816662","url":null,"abstract":"Instant search is an emerging information-retrieval paradigm in which a system finds answers to a query instantly while a user types in keywords character-by-character. Fuzzy search further improves user search experiences by finding relevant answers with keywords similar to query keywords. A main computational challenge in this paradigm is the high-speed requirement, i.e., each query needs to be answered within milliseconds to achieve an instant response and a high query throughput. At the same time, we also need good ranking functions that consider the proximity of keywords to compute relevance scores. In this paper, we study how to integrate proximity information into ranking in instant-fuzzy search while achieving efficient time and space complexities. We adapt existing solutions on proximity ranking to instant-fuzzy search. A naïve solution is computing all answers then ranking them, but it cannot meet this high-speed requirement on large data sets when there are too many answers, so there are studies of early-termination techniques to efficiently compute relevant answers. To overcome the space and time limitations of these solutions, we propose an approach that focuses on common phrases in the data and queries, assuming records with these phrases are ranked higher. We study how to index these phrases and develop an incremental-computation algorithm for efficiently segmenting a query into phrases and computing relevant answers. We conducted a thorough experimental study on real data sets to show the tradeoffs between time, space, and quality of these solutions.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129508014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Kondenzer: Exploration and visualization of archived social media Kondenzer:存档社会媒体的探索和可视化

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816741

Omar Alonso, Kartikay Khandelwal

引用次数: 3

Exploration of the effect of Category Match Score in search advertising 类别匹配分数在搜索广告中的作用探讨

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816731

Youngchul Cha, Junghoo Cho, Jian Yuan, Tak W. Yan

引用次数: 0

C-DMr: Crowd-powered Decision Maker for real world Knapsack Problems C-DMr:现实世界背包问题的群众动力决策者

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816734

Leihao Xia, Caleb Chen Cao, Lei Chen, Zhao Chen

{"title":"C-DMr: Crowd-powered Decision Maker for real world Knapsack Problems","authors":"Leihao Xia, Caleb Chen Cao, Lei Chen, Zhao Chen","doi":"10.1109/ICDE.2014.6816734","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816734","url":null,"abstract":"Knapsack problems range over a large sphere of real world challenges [?]. For example, every year a professor has to decide her new “squad” of students/staff from possibly hundreds of candidates, while having a restricted budget of funding in consideration. Moreover, in many cases, she has to resort to her colleagues and senior students to make comparisons among the candidates. The difficulties of such tasks are mainly three-fold: 1) the knowledge about the candidates are distributed among a crowd; 2) the underlying factors are human-intrinsic and hard to be formatted; 3) the size of candidates exceeds the capacity of human for a one-shot decision. Other examples in this category include gear set preparation for a venture trip, syllabus design for a popular course and inventory design for goods shelf, where the two difficulties are commonly observed. Consequently, a person may be heavily entangled to work out a final decision, which may even be inaccurate. Driven by this demand, in this demo, we present C-DMr - a Crowd-powered Decision Maker that incorporates the wisdom of the informed crowds to solve such real world Knapsack Problems. The core module of this web-based system is a set of algorithms along with a novel interactive interface. The interface incrementally presents comparison jobs and motivates the crowd to participate with a rewarding mechanism, and the set of algorithms solves the Knapsack Problem given only pairwise preferences among candidates. We demonstrate the novelty and usefulness of C-DMr by forming a aforementioned “squad” for a recruiting professor. Specifically four functionalities are shown: 1) a Candidates Entrance that collects the information about all candidates; 2) a Jury Trial that facilitates informed crowds to contribute preferences; 3) an Knapsack Analyzer that measures the on-going “squad”; and 4) a Consultant that recommends a final set of candidates to the professor.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132145523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Practical k nearest neighbor queries with location privacy 具有位置隐私的实用最近邻查询

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816688

X. Yi, Russell Paulet, E. Bertino, V. Varadharajan

{"title":"Practical k nearest neighbor queries with location privacy","authors":"X. Yi, Russell Paulet, E. Bertino, V. Varadharajan","doi":"10.1109/ICDE.2014.6816688","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816688","url":null,"abstract":"In mobile communication, spatial queries pose a serious threat to user location privacy because the location of a query may reveal sensitive information about the mobile user. In this paper, we study k nearest neighbor (kNN) queries where the mobile user queries the location-based service (LBS) provider about k nearest points of interest (POIs) on the basis of his current location. We propose a solution for the mobile user to preserve his location privacy in kNN queries. The proposed solution is built on the Paillier public-key cryptosystem and can provide both location privacy and data privacy. In particular, our solution allows the mobile user to retrieve one type of POIs, for example, k nearest car parks, without revealing to the LBS provider what type of points is retrieved. For a cloaking region with n×n cells and m types of points, the total communication complexity for the mobile user to retrieve a type of k nearest POIs is O(n+m) while the computation complexities of the mobile user and the LBS provider are O(n + m) and O(n2m), respectively. Compared with existing solutions for kNN queries with location privacy, our solutions are more efficient. Experiments have shown that our solutions are practical for kNN queries.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"13 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128846995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 84

Exploiting group recommendation functions for flexible preferences 利用群体推荐功能实现灵活的偏好

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816669

Senjuti Basu Roy, Saravanan Thirumuruganathan, S. Amer-Yahia, Gautam Das, Cong Yu

{"title":"Exploiting group recommendation functions for flexible preferences","authors":"Senjuti Basu Roy, Saravanan Thirumuruganathan, S. Amer-Yahia, Gautam Das, Cong Yu","doi":"10.1109/ICDE.2014.6816669","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816669","url":null,"abstract":"We examine the problem of enabling the flexibility of updating one's preferences in group recommendation. In our setting, any group member can provide a vector of preferences that, in addition to past preferences and other group members' preferences, will be accounted for in computing group recommendation. This functionality is essential in many group recommendation applications, such as travel planning, online games, book clubs, or strategic voting, as it has been previously shown that user preferences may vary depending on mood, context, and company (i.e., other people in the group). Preferences are enforced in an feedback box that replaces preferences provided by the users by a potentially different feedback vector that is better suited for maximizing the individual satisfaction when computing the group recommendation. The feedback box interacts with a traditional recommendation box that implements a group consensus semantics in the form of Aggregated Voting or Least Misery, two popular aggregation functions for group recommendation. We develop efficient algorithms to compute robust group recommendations that are appropriate in situations where users have changing preferences. Our extensive empirical study on real world data-sets validates our findings.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116622068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31