Proceedings of the 2009 ACM SIGMOD International Conference on Management of data最新文献_第6页

Dynamic plan generation for parameterized queries 为参数化查询生成动态计划

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559946

A. Ghazal, Dawit Yimam Seid, Ramesh Bhashyam, A. Crolotte, Manjula Koppuravuri, G. Vinod

{"title":"Dynamic plan generation for parameterized queries","authors":"A. Ghazal, Dawit Yimam Seid, Ramesh Bhashyam, A. Crolotte, Manjula Koppuravuri, G. Vinod","doi":"10.1145/1559845.1559946","DOIUrl":"https://doi.org/10.1145/1559845.1559946","url":null,"abstract":"Query processing in a DBMS typically involves two distinct phases: compilation, which generates the best plan and its corresponding execution steps, and execution, which evaluates these steps against database objects. For some queries, considerable resource savings can be achieved by skipping the compilation phase when the same query was previously submitted and its plan was already cached. In a number of important applications the same query, called a Parameterized Query (PQ), is repeatedly submitted in the same basic form but with different parameter values. PQ's are extensively used in both data update (e.g. batch update programs) and data access queries. There are tradeoffs associated with caching and re-using query plans such as space utilization and maintenance cost. Besides, pre-compiled plans may be suboptimal for a particular execution due to various reasons including data skew and inability to exploit value-based query transformation like materialized view rewrite and unsatisfiable predicate elimination. We address these tradeoffs by distinguishing two types of plans for PQ's: generic and specific plans. Generic plans are pre-compiled plans that are independent of the actual parameter values. Prior to execution, parameter values are plugged in to generic plans. In specific plans, parameter values are plugged prior to the compilation phase. This paper provides a practical framework for dynamically deciding between specific and generic plans for PQ's based on a mix of rule and cost based heuristics which are implemented in the Teradata 12.0 DBMS.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117323402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Top-k generation of integrated schemas based on directed and weighted correspondences 基于有向和加权通信的Top-k集成模式生成

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559913

A. Radwan, Lucian Popa, I. Stanoi, A. Younis

{"title":"Top-k generation of integrated schemas based on directed and weighted correspondences","authors":"A. Radwan, Lucian Popa, I. Stanoi, A. Younis","doi":"10.1145/1559845.1559913","DOIUrl":"https://doi.org/10.1145/1559845.1559913","url":null,"abstract":"Schema integration is the problem of creating a unified target schema based on a set of existing source schemas and based on a set of correspondences that are the result of matching the source schemas. Previous methods for schema integration rely on the exploration, implicit or explicit, of the multiple design choices that are possible for the integrated schema. Such exploration relies heavily on user interaction; thus, it is time consuming and labor intensive. Furthermore, previous methods have ignored the additional information that typically results from the schema matching process, that is, the weights and in some cases the directions that are associated with the correspondences. In this paper, we propose a more automatic approach to schema integration that is based on the use of directed and weighted correspondences between the concepts that appear in the source schemas. A key component of our approach is a novel top-k ranking algorithm for the automatic generation of the best candidate schemas. The algorithm gives more weight to schemas that combine the concepts with higher similarity or coverage. Thus, the algorithm makes certain decisions that otherwise would likely be taken by a human expert. We show that the algorithm runs in polynomial time and moreover has good performance in practice.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115560415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Incremental maintenance of length normalized indexes for approximate string matching 增量维护长度归一化索引的近似字符串匹配

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559891

Marios Hadjieleftheriou, Nick Koudas, D. Srivastava

{"title":"Incremental maintenance of length normalized indexes for approximate string matching","authors":"Marios Hadjieleftheriou, Nick Koudas, D. Srivastava","doi":"10.1145/1559845.1559891","DOIUrl":"https://doi.org/10.1145/1559845.1559891","url":null,"abstract":"Approximate string matching is a problem that has received a lot of attention recently. Existing work on information retrieval has concentrated on a variety of similarity measures TF/IDF, BM25, HMM, etc.) specifically tailored for document retrieval purposes. As new applications that depend on retrieving short strings are becoming popular(e.g., local search engines like YellowPages.com, Yahoo!Local, and Google Maps) new indexing methods are needed, tailored for short strings. For that purpose, a number of indexing techniques and related algorithms have been proposed based on length normalized similarity measures. A common denominator of indexes for length normalized measures is that maintaining the underlying structures in the presence of incremental updates is inefficient, mainly due to data dependent, precomputed weights associated with each distinct token and string. Incorporating updates usually is accomplished by rebuilding the indexes at regular time intervals. In this paper we present a framework that advocates lazy update propagation with the following key feature: Efficient, incremental updates that immediately reflect the new data in the indexes in a way that gives strict guarantees on the quality of subsequent query answers. More specifically, our techniques guarantee against false negatives and limit the number of false positives produced. We implement a fully working prototype and illustrate that the proposed ideas work really well in practice for real datasets.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114207365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

A framework for testing query transformation rules 用于测试查询转换规则的框架

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559874

Hicham G. Elmongui, Vivek R. Narasayya, Ravishankar Ramamurthy

引用次数: 9

Optimizing complex extraction programs over evolving text data 在不断发展的文本数据上优化复杂的提取程序

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559881

Fei Chen, Byron J. Gao, A. Doan, Jun Yang, R. Ramakrishnan

{"title":"Optimizing complex extraction programs over evolving text data","authors":"Fei Chen, Byron J. Gao, A. Doan, Jun Yang, R. Ramakrishnan","doi":"10.1145/1559845.1559881","DOIUrl":"https://doi.org/10.1145/1559845.1559881","url":null,"abstract":"Most information extraction (IE) approaches have considered only static text corpora, over which we apply IE only once. Many real-world text corpora however are dynamic. They evolve over time, and so to keep extracted information up to date we often must apply IE repeatedly, to consecutive corpus snapshots. Applying IE from scratch to each snapshot can take a lot of time. To avoid doing this, we have recently developed Cyclex, a system that recycles previous IE results to speed up IE over subsequent corpus snapshots. Cyclex clearly demonstrated the promise of the recycling idea. The work itself however is limited in that it considers only IE programs that contain a single IE ``blackbox.'' In practice, many IE programs are far more complex, containing multiple IE blackboxes connected in a compositional ``workflow.'' In this paper, we present Delex, a system that removes the above limitation. First we identify many difficult challenges raised by Delex, including modeling complex IE programs for recycling purposes, implementing the recycling process efficiently, and searching for an optimal execution plan in a vast plan space with different recycling alternatives. Next we describe our solutions to these challenges. Finally, we describe extensive experiments with both rule-based and learning-based IE programs over two real-world data sets, which demonstrate the utility of our approach.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123358867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

MobileMiner: a real world case study of data mining in mobile communication MobileMiner:移动通信中数据挖掘的真实案例研究

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559988

Tengjiao Wang, Bishan Yang, Jun Gao, Dongqing Yang, Shiwei Tang, Haoyu Wu, Kedong Liu, J. Pei

引用次数: 15

Session details: Special invited session on systems research and information management 会议详情:系统研究和信息管理特别邀请会议

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/3257477

M. Carey

引用次数: 0

Top-k queries on uncertain data: on score distribution and typical answers 不确定数据的Top-k查询:分数分布和典型答案

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559886

Tingjian Ge, S. Zdonik, S. Madden

{"title":"Top-k queries on uncertain data: on score distribution and typical answers","authors":"Tingjian Ge, S. Zdonik, S. Madden","doi":"10.1145/1559845.1559886","DOIUrl":"https://doi.org/10.1145/1559845.1559886","url":null,"abstract":"Uncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs between reporting high-scoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions. One option would be to display the complete distribution of all potential top-k tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties, which is not dealt with in the previous work in the area. Our work includes a systematic empirical study on both real dataset and synthetic datasets.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122064920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 93

Session details: Research session 9: data on the web 研究部分9:网络上的数据

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/3257457

L. Gravano

引用次数: 0

Data warehouse technology by infobright 数据仓库技术由infobright提供

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559933

D. Ślęzak, Victoria Eastwood

引用次数: 47