2011 IEEE 27th International Conference on Data Engineering最新文献_第8页

Semantic stream query optimization exploiting dynamic metadata 利用动态元数据的语义流查询优化

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767840

L. Ding, Karen Works, Elke A. Rundensteiner

{"title":"Semantic stream query optimization exploiting dynamic metadata","authors":"L. Ding, Karen Works, Elke A. Rundensteiner","doi":"10.1109/ICDE.2011.5767840","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767840","url":null,"abstract":"Data stream management systems (DSMS) processing long-running queries over large volumes of stream data must typically deliver time-critical responses. We propose the first semantic query optimization (SQO) approach that utilizes dynamic substream metadata at runtime to find a more efficient query plan than the one selected at compilation time. We identify four SQO techniques guaranteed to result in performance gains. Based on classic satisfiability theory we then design a lightweight query optimization algorithm that efficiently detects SQO opportunities at runtime. At the logical level, our algorithm instantiates multiple concurrent SQO plans, each processing different partially overlapping substreams. Our novel execution paradigm employs multi-modal operators to support the execution of these concurrent SQO logical plans in a single physical plan. This highly agile execution strategy reduces resource utilization while supporting lightweight adaptivity. Our extensive experimental study in the CAPE stream processing system using both synthetic and real data confirms that our optimization techniques significantly reduce query execution times, up to 60%, compared to the traditional approach.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122391270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Spatio-temporal joins on symbolic indoor tracking data 符号室内跟踪数据的时空连接

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767902

Hua Lu, B. Yang, Christian S. Jensen

{"title":"Spatio-temporal joins on symbolic indoor tracking data","authors":"Hua Lu, B. Yang, Christian S. Jensen","doi":"10.1109/ICDE.2011.5767902","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767902","url":null,"abstract":"To facilitate a variety of applications, positioning systems are deployed in indoor settings. For example, Bluetooth and RFID positioning are deployed in airports to support real-time monitoring of delays as well as off-line flow and space usage analyses. Such deployments generate large collections of tracking data. Like in other data management applications, joins are indispensable in this setting. However, joins on indoor tracking data call for novel techniques that take into account the limited capabilities of the positioning systems as well as the specifics of indoor spaces. This paper proposes and studies probabilistic, spatio-temporal joins on historical indoor tracking data. Two meaningful types of join are defined. They return object pairs that satisfy spatial join predicates either at a time point or during a time interval. The predicates considered include “same X,” where X is a semantic region such as a room or hallway. Based on an analysis on the uncertainty inherent to indoor tracking data, effective join probabilities are formalized and evaluated for object pairs. Efficient two-phase hash-based algorithms are proposed for the point and interval joins. In a filter-and-refine framework, an R-tree variant is proposed that facilitates the retrieval of join candidates, and pruning rules are supplied that eliminate candidate pairs that do not qualify. An empirical study on both synthetic and real data shows that the proposed techniques are efficient and scalable.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125496594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

SystemML: Declarative machine learning on MapReduce SystemML:基于MapReduce的声明式机器学习

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767930

A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, Vikas Sindhwani, S. Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan

{"title":"SystemML: Declarative machine learning on MapReduce","authors":"A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, Vikas Sindhwani, S. Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan","doi":"10.1109/ICDE.2011.5767930","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767930","url":null,"abstract":"MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) algorithms on massive datasets has led to an increased interest in implementing ML algorithms on MapReduce. However, the cost of implementing a large class of ML algorithms as low-level MapReduce jobs on varying data and machine cluster sizes can be prohibitive. In this paper, we propose SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment. This higher-level language exposes several constructs including linear algebra primitives that constitute key building blocks for a broad class of supervised and unsupervised ML algorithms. The algorithms expressed in SystemML are compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines. We describe and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source MapReduce implementation. We report an extensive performance evaluation on three ML algorithms on varying data and cluster sizes.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129049499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 316

Fast-join: An efficient method for fuzzy token matching based string similarity join 快速连接:一种基于模糊标记匹配的字符串相似连接的有效方法

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767865

Jiannan Wang, Guoliang Li, Jianhua Feng

引用次数: 141

Preventing equivalence attacks in updated, anonymized data 防止更新的匿名数据中的对等攻击

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767924

Yeye He, Siddharth Barman, J. Naughton

引用次数: 49

Social networking on top of the WebdamExchange system 在WebdamExchange系统之上的社交网络

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767939

Émilien Antoine, A. Galland, K. Lyngbaek, A. Marian, N. Polyzotis

引用次数: 11

Join queries on uncertain data: Semantics and efficient processing 不确定数据上的联接查询:语义和高效处理

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767888

Tingjian Ge

{"title":"Join queries on uncertain data: Semantics and efficient processing","authors":"Tingjian Ge","doi":"10.1109/ICDE.2011.5767888","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767888","url":null,"abstract":"Uncertain data is quite common nowadays in a variety of modern database applications. At the same time, the join operation is one of the most important but expensive operations in SQL. However, join queries on uncertain data have not been adequately addressed thus far. In this paper, we study the SQL join operation on uncertain attributes. We observe and formalize two kinds of join operations on such data, namely v-join and d-join. They are each useful for different applications. Using probability theory, we then devise efficient query processing algorithms for these join operations. Specifically, we use probability bounds that are based on the moments of random variables to either early accept or early reject a candidate v-join result tuple. We also devise an indexing mechanism and an algorithm called Two-End Zigzag Join to further save I/O costs. For d-join, we first observe that it can be reduced to a special form of similarity join in a multidimensional space. We then design an efficient algorithm called condensed d-join and an optimal condensation scheme based on dynamic programming. Finally, we perform a comprehensive empirical study using both real datasets and synthetic datasets.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126092174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Algorithms for local sensor synchronization 局部传感器同步算法

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767841

Lixing Wang, Y. Yang, Xin Miao, D. Papadias, Yunhao Liu

{"title":"Algorithms for local sensor synchronization","authors":"Lixing Wang, Y. Yang, Xin Miao, D. Papadias, Yunhao Liu","doi":"10.1109/ICDE.2011.5767841","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767841","url":null,"abstract":"In a wireless sensor network (WSN), each sensor monitors environmental parameters, and reports its readings to a base station, possibly through other nodes. A sensor works in cycles, in each of which it stays active for a fixed duration, and then sleeps until the next cycle. The frequency of such cycles determines the portion of time that a sensor is active, and is the dominant factor on its battery life. The majority of existing work assumes globally synchronized WSN where all sensors have the same frequency. This leads to waste of battery power for applications that entail different accuracy of measurements, or environments where sensor readings have large variability. To overcome this problem, we propose LS, a query processing framework for locally synchronized WSN. We consider that each sensor ni has a distinct sampling frequency fi, which is determined by the application or environment requirements. The complication of LS is that ni has to wake up with a network frequency Fi≥fi, in order to forward messages of other sensors. Our goal is to minimize the sum of Fi without delaying packet transmissions. Specifically, given a routing tree, we first present a dynamic programming algorithm that computes the optimal network frequency of each sensor; then, we develop a heuristic for finding the best tree topology, if this is not fixed in advance.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"AES-21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interactive SQL query suggestion: Making databases user-friendly 交互式SQL查询建议:使数据库用户友好

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767843

Ju Fan, Guoliang Li, Lizhu Zhou

引用次数: 61

Outlier detection in graph streams 图流中的异常值检测

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767885

C. Aggarwal, Yuchen Zhao, Philip S. Yu

{"title":"Outlier detection in graph streams","authors":"C. Aggarwal, Yuchen Zhao, Philip S. Yu","doi":"10.1109/ICDE.2011.5767885","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767885","url":null,"abstract":"A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the “typical” behavior of the underlying network. In this paper, we will provide first results on the problem of structural outlier detection in massive network streams. Such problems are inherently challenging, because the problem of outlier detection is specially challenging because of the high volume of the underlying network stream. The stream scenario also increases the computational challenges for the approach. We use a structural connectivity model in order to define outliers in graph streams. In order to handle the sparsity problem of massive networks, we dynamically partition the network in order to construct statistically robust models of the connectivity behavior. We design a reservoir sampling method in order to maintain structural summaries of the underlying network. These structural summaries are designed in order to create robust, dynamic and efficient models for outlier detection in graph streams. We present experimental results illustrating the effectiveness and efficiency of our approach.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127218828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 202