2014 IEEE 30th International Conference on Data Engineering最新文献_第10页

Trendspedia: An Internet observatory for analyzing and visualizing the evolving web Trendspedia:一个分析和可视化网络发展的互联网观察站

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816742

W. Kang, A. Tung, Wei Chen, Xinyu Li, Qiyue Song, Chao Zhang, Feng Zhao, Xiajuan Zhou

{"title":"Trendspedia: An Internet observatory for analyzing and visualizing the evolving web","authors":"W. Kang, A. Tung, Wei Chen, Xinyu Li, Qiyue Song, Chao Zhang, Feng Zhao, Xiajuan Zhou","doi":"10.1109/ICDE.2014.6816742","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816742","url":null,"abstract":"The popularity of social media services has been innovating the way of information acquisition in modern society. Meanwhile, mass information is generated in every single day. To extract useful knowledge, much effort has been invested in analyzing social media contents, e.g., (emerging) topic discovery. With these findings, however, users may still find it hard to obtain knowledge of great interest in conformity with their preference. In this paper, we present a novel system which brings proper context to continuously incoming social media contents, such that mass information can be indexed, organized and analyzed around Wikipedia entities. Four data analytics tools are employed in the system. Three of them aim to enrich each Wikipedia entity by analyzing the relevant contents while the other one builds an information network among the most relevant Wikipedia entities. With our system, users can easily pinpoint valuable information and knowledge they are interested in, as well as navigate to other closely related entities through the information network for further exploration.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133987091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Declarative cartography: In-database map generalization of geospatial datasets 声明式制图:地理空间数据集的数据库内地图综合

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816720

Pimin Konstantin Kefaloukos, M. V. Salles, Martin Zachariasen

{"title":"Declarative cartography: In-database map generalization of geospatial datasets","authors":"Pimin Konstantin Kefaloukos, M. V. Salles, Martin Zachariasen","doi":"10.1109/ICDE.2014.6816720","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816720","url":null,"abstract":"Creating good maps is the challenge of map generalization. An important generalization method is selecting subsets of the data to be shown at different zoom-levels of a zoomable map, subject to a set of spatial constraints. Applying these constraints serves the dual purpose of increasing the information quality of the map and improving the performance of data transfer and rendering. Unfortunately, with current tools, users must explicitly specify which objects to show at each zoom level of their map, while keeping their application constraints implicit. This paper introduces a novel declarative approach to map generalization based on a language called CVL, the Cartographic Visualization Language. In contrast to current tools, users declare application constraints and object importance in CVL, while leaving the selection of objects implicit. In order to compute an explicit selection of objects, CVL scripts are translated into an algorithmic search task. We show how this translation allows for reuse of existing algorithms from the optimization literature, while at the same time supporting fully pluggable, user-defined constraints and object weight functions. In addition, we show how to evaluate CVL entirely inside a relational database. The latter allows users to seamlessly integrate storage of geospatial data with its transformation into map visualizations. In a set of experiments with a variety of real-world data sets, we find that CVL produces generalizations in reasonable time for off-line processing; furthermore, the quality of the generalizations is high with respect to the chosen objective function.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134091543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

In-RDBMS inverted indexes revisited 重新访问了In-RDBMS倒排索引

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816664

Ian Rae, A. Halverson, J. Naughton

{"title":"In-RDBMS inverted indexes revisited","authors":"Ian Rae, A. Halverson, J. Naughton","doi":"10.1109/ICDE.2014.6816664","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816664","url":null,"abstract":"Every major open-source and commercial RDBMS offers some form of support for full-text search using inverted indexes. When providing this support, some developers have implemented specialized indexes that adapt techniques from the Information Retrieval (IR) community to work in a database setting, while others have opted to rely on the standard relational query engine to process inverted index lookups. This choice is an important one, since the storage formats and algorithms used can vary greatly between a specialized index and a relational index, but these alternatives have not been thoroughly compared in the same system. Our work explores the differences in implementation and performance of three representative environments for an in-RDBMS inverted index: an in-RDBMS IR engine, a row-oriented relational query engine, and a column-oriented relational query engine. We found that a specialized IR engine integrated into the RDBMS can provide more than an order of magnitude speedup over both the row- and column-oriented relational query engines for conjunctive and phrase queries. For warm queries, this advantage is largely algorithmic, and we show that by using ZigZag merge join to accelerate conjunctive and phrase query processing, relational inverted indexes can provide performance comparable to a specialized in-RDBMS IR engine with no change to the underlying storage format. Compression and index format, in contrast, have more impact on cold queries, where the IR and column-oriented engines are able to outperform the row-oriented engine, even with ZigZag merge join.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129592545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

VoidWiz: Resolving incompleteness using network effects VoidWiz:使用网络效应解决不完整性

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816748

Christina Christodoulakis, C. Faloutsos, Renée J. Miller

引用次数: 2

Personalized Query Suggestion With Diversity Awareness 具有多样性意识的个性化查询建议

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816668

Di Jiang, K. Leung, Jan Vosecky, Wilfred Ng

{"title":"Personalized Query Suggestion With Diversity Awareness","authors":"Di Jiang, K. Leung, Jan Vosecky, Wilfred Ng","doi":"10.1109/ICDE.2014.6816668","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816668","url":null,"abstract":"Query suggestion is an important functionality provided by the search engine to facilitate information seeking of the users. Existing query suggestion methods usually focus on recommending queries that are the most relevant to the input query. However, such relevance-oriented strategy cannot effectively handle query uncertainty, a common scenario that the input query can be interpreted as multiple different meanings. To alleviate this problem, the concepts of diversification and person-alization have been individually introduced to query suggestion systems. These two concepts are often seen as incompatible alternatives, because diversification considers multiple aspects of the input query to maximize the probability that some query aspect is relevant to the user while personalization aims to adapt the suggestions to a specific aspect that aligns with the preference of a specific user. In this paper, we refute this antagonistic view and propose a new query suggestion paradigm, Personalized Query Suggestion With Diversity Awareness (PQS-DA) to effectively combine diversification and personalization into one unified framework. In PQS-DA, the suggested queries are effectively diversified to cover different potential facets of the input query while the ranking of suggested queries are personalized to ensure that the top ones are those that align with a user's personal preference. We evaluate PQS-DA on a real-life search engine query log against several state-of-the-art methods with respect to a variety of metrics. The experimental results verify our hypothesis that diversification and personalization can be effectively integrated and they are able to enhance each other within the PQS-DA framework, which significantly outperforms several strong baselines with respect to a series of metrics.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129801766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Stochastic skyline route planning under time-varying uncertainty 时变不确定性下的随机天际线路线规划

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816646

B. Yang, Chenjuan Guo, Christian S. Jensen, Manohar Kaul, Shuo Shang

引用次数: 119

Scalable distance-based outlier detection over high-volume data streams 在大容量数据流上可扩展的基于距离的异常值检测

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816641

Lei Cao, Di Yang, Qingyang Wang, Yanwei Yu, Jiayuan Wang, Elke A. Rundensteiner

{"title":"Scalable distance-based outlier detection over high-volume data streams","authors":"Lei Cao, Di Yang, Qingyang Wang, Yanwei Yu, Jiayuan Wang, Elke A. Rundensteiner","doi":"10.1109/ICDE.2014.6816641","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816641","url":null,"abstract":"The discovery of distance-based outliers from huge volumes of streaming data is critical for modern applications ranging from credit card fraud detection to moving object monitoring. In this work, we propose the first general framework to handle the three major classes of distance-based outliers in streaming environments, including the traditional distance-threshold based and the nearest-neighbor-based definitions. Our LEAP framework encompasses two general optimization principles applicable across all three outlier types. First, our “minimal probing” principle uses a lightweight probing operation to gather minimal yet sufficient evidence for outlier detection. This principle overturns the state-of-the-art methodology that requires routinely conducting expensive complete neighborhood searches to identify outliers. Second, our “lifespan-aware prioritization” principle leverages the temporal relationships among stream data points to prioritize the processing order among them during the probing process. Guided by these two principles, we design an outlier detection strategy which is proven to be optimal in CPU costs needed to determine the outlier status of any data point during its entire life. Our comprehensive experimental studies, using both synthetic as well as real streaming data, demonstrate that our methods are 3 orders of magnitude faster than state-of-the-art methods for a rich diversity of scenarios tested yet scale to high dimensional streaming data.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127758507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 119

Geometry approach for k-regret query k-遗憾查询的几何方法

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816699

Peng Peng, R. C. Wong

{"title":"Geometry approach for k-regret query","authors":"Peng Peng, R. C. Wong","doi":"10.1109/ICDE.2014.6816699","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816699","url":null,"abstract":"Returning tuples that users may be interested in is one of the most important goals for multi-criteria decision making. Top-k queries and skyline queries are two representative queries. A top-k query has its merit of returning a limited number of tuples to users but requires users to give their exact utility functions. A skyline query has its merit that users do not need to give their exact utility functions but has no control over the number of tuples to be returned. In this paper, we study a k-regret query, a recently proposed query, which integrates the merits of the two representative queries. We first identify some interesting geometry properties for the k-regret query. Based on these properties, we define a set of candidate points called happy points for the k-regret query, which has not been studied in the literature. This result is very fundamental and beneficial to not only all existing algorithms but also all new algorithms to be developed for the k-regret query. Since it is found that the number of happy points is very small, the efficiency of all existing algorithms can be improved significantly. Furthermore, based on other geometry properties, we propose two efficient algorithms each of which performs more efficiently than the best-known fastest algorithm. Our experimental results show that our proposed algorithms run faster than the best-known method on both synthetic and real datasets. In particular, in our experiments on real datasets, the best-known method took more than 3 hours to answer a k-regret query but one of our proposed methods took about a few minutes and the other took within a second.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"07 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127297459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Near neighbor join 近邻联接

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816728

H. Kllapi, Boulos Harb, Cong Yu

{"title":"Near neighbor join","authors":"H. Kllapi, Boulos Harb, Cong Yu","doi":"10.1109/ICDE.2014.6816728","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816728","url":null,"abstract":"An increasing number of Web applications such as friends recommendation depend on the ability to join objects at scale. The traditional approach taken is nearest neighbor join (also called similarity join), whose goal is to find, based on a given join function, the closest set of objects or all the objects within a distance threshold to each object in the input. The scalability of techniques utilizing this approach often depends on the characteristics of the objects and the join function. However, many real-world join functions are intricately engineered and constantly evolving, which makes the design of white-box methods that rely on understanding the join function impractical. Finding a technique that can join extremely large number of objects with complex join functions has always been a tough challenge. In this paper, we propose a practical alternative approach called near neighbor join that, although does not find the closest neighbors, finds close neighbors, and can do so at extremely large scale when the join functions are complex. In particular, we design and implement a super-scalable system we name SAJ that is capable of best-effort joining of billions of objects for complex functions. Extensive experimental analysis over real-world large datasets shows that SAJ is scalable and generates good results.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"371 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133019420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Distributed execution of continuous queries 连续查询的分布式执行

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816767

Rajeev Gupta, K. Ramamritham

{"title":"Distributed execution of continuous queries","authors":"Rajeev Gupta, K. Ramamritham","doi":"10.1109/ICDE.2014.6816767","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816767","url":null,"abstract":"Data delivered over the internet is increasingly being used for providing dynamic and personalized user experiences. To achieve this, queries are executed over fast changing data from distributed sources. As these queries require data from multiple sources, these queries are executed at an intermediate proxy or data aggregator. Typically, users of these queries are not interested in all the data updates. Query results may be associated with an imprecision bound or threshold which can be used to limit the number of refresh messages. These queries can be categorized based on the types of results required: in an entity based query the user is just interested in knowing the ids of the data items (or entities) satisfying certain selection condition; in a value based query the user is interested in the value of some aggregation over distributed data items; and in a threshold query the user wants to know whether a Boolean condition, expressed as a threshold over an aggregation of data items, is true. We methodically present techniques for executing all these categories of continuous aggregation queries over distributed data so that the number of message exchanges between data sources, aggregators, and users is minimized. The value of individual data items can be uncertain with an associated probability. A data aggregator can execute the query either by getting all the required data or by sending appropriate sub-queries to the distributed data sources. For getting the data, the aggregator can use either push or pull based mechanisms. Each of these methods has different ways of minimizing the number of message exchanges. We present various algorithms for the same.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128828049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1