2011 IEEE 27th International Conference on Data Engineering最新文献

筛选
英文 中文
Semantic stream query optimization exploiting dynamic metadata 利用动态元数据的语义流查询优化
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767840
L. Ding, Karen Works, Elke A. Rundensteiner
{"title":"Semantic stream query optimization exploiting dynamic metadata","authors":"L. Ding, Karen Works, Elke A. Rundensteiner","doi":"10.1109/ICDE.2011.5767840","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767840","url":null,"abstract":"Data stream management systems (DSMS) processing long-running queries over large volumes of stream data must typically deliver time-critical responses. We propose the first semantic query optimization (SQO) approach that utilizes dynamic substream metadata at runtime to find a more efficient query plan than the one selected at compilation time. We identify four SQO techniques guaranteed to result in performance gains. Based on classic satisfiability theory we then design a lightweight query optimization algorithm that efficiently detects SQO opportunities at runtime. At the logical level, our algorithm instantiates multiple concurrent SQO plans, each processing different partially overlapping substreams. Our novel execution paradigm employs multi-modal operators to support the execution of these concurrent SQO logical plans in a single physical plan. This highly agile execution strategy reduces resource utilization while supporting lightweight adaptivity. Our extensive experimental study in the CAPE stream processing system using both synthetic and real data confirms that our optimization techniques significantly reduce query execution times, up to 60%, compared to the traditional approach.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122391270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Spatio-temporal joins on symbolic indoor tracking data 符号室内跟踪数据的时空连接
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767902
Hua Lu, B. Yang, Christian S. Jensen
{"title":"Spatio-temporal joins on symbolic indoor tracking data","authors":"Hua Lu, B. Yang, Christian S. Jensen","doi":"10.1109/ICDE.2011.5767902","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767902","url":null,"abstract":"To facilitate a variety of applications, positioning systems are deployed in indoor settings. For example, Bluetooth and RFID positioning are deployed in airports to support real-time monitoring of delays as well as off-line flow and space usage analyses. Such deployments generate large collections of tracking data. Like in other data management applications, joins are indispensable in this setting. However, joins on indoor tracking data call for novel techniques that take into account the limited capabilities of the positioning systems as well as the specifics of indoor spaces. This paper proposes and studies probabilistic, spatio-temporal joins on historical indoor tracking data. Two meaningful types of join are defined. They return object pairs that satisfy spatial join predicates either at a time point or during a time interval. The predicates considered include “same X,” where X is a semantic region such as a room or hallway. Based on an analysis on the uncertainty inherent to indoor tracking data, effective join probabilities are formalized and evaluated for object pairs. Efficient two-phase hash-based algorithms are proposed for the point and interval joins. In a filter-and-refine framework, an R-tree variant is proposed that facilitates the retrieval of join candidates, and pruning rules are supplied that eliminate candidate pairs that do not qualify. An empirical study on both synthetic and real data shows that the proposed techniques are efficient and scalable.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125496594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
SystemML: Declarative machine learning on MapReduce SystemML:基于MapReduce的声明式机器学习
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767930
A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, Vikas Sindhwani, S. Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
{"title":"SystemML: Declarative machine learning on MapReduce","authors":"A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, Vikas Sindhwani, S. Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan","doi":"10.1109/ICDE.2011.5767930","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767930","url":null,"abstract":"MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) algorithms on massive datasets has led to an increased interest in implementing ML algorithms on MapReduce. However, the cost of implementing a large class of ML algorithms as low-level MapReduce jobs on varying data and machine cluster sizes can be prohibitive. In this paper, we propose SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment. This higher-level language exposes several constructs including linear algebra primitives that constitute key building blocks for a broad class of supervised and unsupervised ML algorithms. The algorithms expressed in SystemML are compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines. We describe and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source MapReduce implementation. We report an extensive performance evaluation on three ML algorithms on varying data and cluster sizes.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129049499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 316
Fast-join: An efficient method for fuzzy token matching based string similarity join 快速连接:一种基于模糊标记匹配的字符串相似连接的有效方法
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767865
Jiannan Wang, Guoliang Li, Jianhua Feng
{"title":"Fast-join: An efficient method for fuzzy token matching based string similarity join","authors":"Jiannan Wang, Guoliang Li, Jianhua Feng","doi":"10.1109/ICDE.2011.5767865","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767865","url":null,"abstract":"String similarity join that finds similar string pairs between two string sets is an essential operation in many applications, and has attracted significant attention recently in the database community. A significant challenge in similarity join is to implement an effective fuzzy match operation to find all similar string pairs which may not match exactly. In this paper, we propose a new similarity metrics, called “fuzzy token matching based similarity”, which extends token-based similarity functions (e.g., Jaccard similarity and Cosine similarity) by allowing fuzzy match between two tokens. We study the problem of similarity join using this new similarity metrics and present a signature-based method to address this problem. We propose new signature schemes and develop effective pruning techniques to improve the performance. Experimental results show that our approach achieves high efficiency and result quality, and significantly outperforms state-of-the-art methods.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130281818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 141
Preventing equivalence attacks in updated, anonymized data 防止更新的匿名数据中的对等攻击
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767924
Yeye He, Siddharth Barman, J. Naughton
{"title":"Preventing equivalence attacks in updated, anonymized data","authors":"Yeye He, Siddharth Barman, J. Naughton","doi":"10.1109/ICDE.2011.5767924","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767924","url":null,"abstract":"In comparison to the extensive body of existing work considering publish-once, static anonymization, dynamic anonymization is less well studied. Previous work, most notably m-invariance, has made considerable progress in devising a scheme that attempts to prevent individual records from being associated with too few sensitive values. We show, however, that in the presence of updates, even an m-invariant table can be exploited by a new type of attack we call the “equivalence-attack.” To deal with the equivalence attack, we propose a graph-based anonymization algorithm that leverages solutions to the classic “min-cut/max-flow” problem, and demonstrate with experiments that our algorithm is efficient and effective in preventing equivalence attacks.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121507854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Social networking on top of the WebdamExchange system 在WebdamExchange系统之上的社交网络
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767939
Émilien Antoine, A. Galland, K. Lyngbaek, A. Marian, N. Polyzotis
{"title":"Social networking on top of the WebdamExchange system","authors":"Émilien Antoine, A. Galland, K. Lyngbaek, A. Marian, N. Polyzotis","doi":"10.1109/ICDE.2011.5767939","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767939","url":null,"abstract":"The demonstration presents the WebdamExchange system, a distributed knowledge base management system with access rights, localization and provenance. This system is based on the exchange of logical statements that describe documents, collections, access rights, keys and localization information and updates of this data. We illustrate how the model can be used in a social-network context to help users keep control on their data on the web. In particular, we show how users within very different schemes of data-distribution (centralized, dht, unstructured P2P, etc.) can still transparently collaborate while keeping a good control over their own data.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115760951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Join queries on uncertain data: Semantics and efficient processing 不确定数据上的联接查询:语义和高效处理
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767888
Tingjian Ge
{"title":"Join queries on uncertain data: Semantics and efficient processing","authors":"Tingjian Ge","doi":"10.1109/ICDE.2011.5767888","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767888","url":null,"abstract":"Uncertain data is quite common nowadays in a variety of modern database applications. At the same time, the join operation is one of the most important but expensive operations in SQL. However, join queries on uncertain data have not been adequately addressed thus far. In this paper, we study the SQL join operation on uncertain attributes. We observe and formalize two kinds of join operations on such data, namely v-join and d-join. They are each useful for different applications. Using probability theory, we then devise efficient query processing algorithms for these join operations. Specifically, we use probability bounds that are based on the moments of random variables to either early accept or early reject a candidate v-join result tuple. We also devise an indexing mechanism and an algorithm called Two-End Zigzag Join to further save I/O costs. For d-join, we first observe that it can be reduced to a special form of similarity join in a multidimensional space. We then design an efficient algorithm called condensed d-join and an optimal condensation scheme based on dynamic programming. Finally, we perform a comprehensive empirical study using both real datasets and synthetic datasets.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126092174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Algorithms for local sensor synchronization 局部传感器同步算法
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767841
Lixing Wang, Y. Yang, Xin Miao, D. Papadias, Yunhao Liu
{"title":"Algorithms for local sensor synchronization","authors":"Lixing Wang, Y. Yang, Xin Miao, D. Papadias, Yunhao Liu","doi":"10.1109/ICDE.2011.5767841","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767841","url":null,"abstract":"In a wireless sensor network (WSN), each sensor monitors environmental parameters, and reports its readings to a base station, possibly through other nodes. A sensor works in cycles, in each of which it stays active for a fixed duration, and then sleeps until the next cycle. The frequency of such cycles determines the portion of time that a sensor is active, and is the dominant factor on its battery life. The majority of existing work assumes globally synchronized WSN where all sensors have the same frequency. This leads to waste of battery power for applications that entail different accuracy of measurements, or environments where sensor readings have large variability. To overcome this problem, we propose LS, a query processing framework for locally synchronized WSN. We consider that each sensor ni has a distinct sampling frequency fi, which is determined by the application or environment requirements. The complication of LS is that ni has to wake up with a network frequency Fi≥fi, in order to forward messages of other sensors. Our goal is to minimize the sum of Fi without delaying packet transmissions. Specifically, given a routing tree, we first present a dynamic programming algorithm that computes the optimal network frequency of each sensor; then, we develop a heuristic for finding the best tree topology, if this is not fixed in advance.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"AES-21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive SQL query suggestion: Making databases user-friendly 交互式SQL查询建议:使数据库用户友好
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767843
Ju Fan, Guoliang Li, Lizhu Zhou
{"title":"Interactive SQL query suggestion: Making databases user-friendly","authors":"Ju Fan, Guoliang Li, Lizhu Zhou","doi":"10.1109/ICDE.2011.5767843","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767843","url":null,"abstract":"SQL is a classical and powerful tool for querying relational databases. However, it is rather hard for inexperienced users to pose SQL queries, as they are required to be proficient in SQL syntax and have a thorough understanding of the underlying schema. To give users gratification, we propose SQLSUGG, an effective and user-friendly keyword-based method to help various users formulate SQL queries. SQLSUGG suggests SQL queries as users type in keywords, and can save users' typing efforts and help users avoid tedious SQL debugging. To achieve high suggestion effectiveness, we propose queryable templates to model the structures of SQL queries. We propose a template ranking model to suggest templates relevant to query keywords. We generate SQL queries from each suggested template based on the degree of matchings between keywords and attributes. For efficiency, we propose a progressive algorithm to compute top-k templates, and devise an efficient method to generate SQL queries from templates. We have implemented our methods on two real data sets, and the experimental results show that our method achieves high effectiveness and efficiency.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125210769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Outlier detection in graph streams 图流中的异常值检测
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767885
C. Aggarwal, Yuchen Zhao, Philip S. Yu
{"title":"Outlier detection in graph streams","authors":"C. Aggarwal, Yuchen Zhao, Philip S. Yu","doi":"10.1109/ICDE.2011.5767885","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767885","url":null,"abstract":"A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the “typical” behavior of the underlying network. In this paper, we will provide first results on the problem of structural outlier detection in massive network streams. Such problems are inherently challenging, because the problem of outlier detection is specially challenging because of the high volume of the underlying network stream. The stream scenario also increases the computational challenges for the approach. We use a structural connectivity model in order to define outliers in graph streams. In order to handle the sparsity problem of massive networks, we dynamically partition the network in order to construct statistically robust models of the connectivity behavior. We design a reservoir sampling method in order to maintain structural summaries of the underlying network. These structural summaries are designed in order to create robust, dynamic and efficient models for outlier detection in graph streams. We present experimental results illustrating the effectiveness and efficiency of our approach.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127218828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 202
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信