Proceedings of the 2006 ACM SIGMOD international conference on Management of data最新文献_第4页

Proactive identification of performance problems 主动识别性能问题

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142582

S. Duan, S. Babu

引用次数: 15

Approximately detecting duplicates for streaming data using stable bloom filters 近似检测重复流数据使用稳定的布隆过滤器

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142477

Fan Deng, Davood Rafiei

引用次数: 182

Managing information extraction: state of the art and research directions 管理信息提取:现状与研究方向

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142595

A. Doan, R. Ramakrishnan, Shivakumar Vaithyanathan

引用次数: 87

Supporting ad-hoc ranking aggregates 支持临时排序聚合

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142481

Chengkai Li, K. Chang, I. Ilyas

引用次数: 77

Graph-based synopses for relational selectivity estimation 关系选择性估计的基于图的概要

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142497

Joshua Spiegel, N. Polyzotis

引用次数: 54

Database support for matching: limitations and opportunities 数据库对匹配的支持:限制和机会

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142484

A. Kini, S. Shankar, J. Naughton, D. DeWitt

{"title":"Database support for matching: limitations and opportunities","authors":"A. Kini, S. Shankar, J. Naughton, D. DeWitt","doi":"10.1145/1142473.1142484","DOIUrl":"https://doi.org/10.1145/1142473.1142484","url":null,"abstract":"We define a match join of R and S with predicate θ to be a subset of the θ-join of R and S such that each tuple of R and S contributes to at most one result tuple. Match joins and their generalizations belong to a broad class of matching problems that have attracted a great deal of attention in disciplines including operations research and theoretical computer science. Instances of these problems arise in practice in resource allocation scenarios. To the best of our knowledge no one uses an RDBMS as a tool to help solve these problems; our goal in this paper is to explore whether or not this needs to be the case. We show that the simple approach of computing the full θ-join and then applying standard graph-matching algorithms to the result is ineffective for all but the smallest of problem instances. By contrast, a closer study shows that the DBMS primitives of grouping, sorting, and joining can be exploited to yield efficient match join operations. This suggests that RDBMSs can play a role in matching related problems beyond merely serving as expensive file systems exporting data sets to external user programs.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122569579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

To search or to crawl?: towards a query optimizer for text-centric tasks 搜索还是爬行?面向以文本为中心的任务的查询优化器

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142504

Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay Jain, L. Gravano

{"title":"To search or to crawl?: towards a query optimizer for text-centric tasks","authors":"Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay Jain, L. Gravano","doi":"10.1145/1142473.1142504","DOIUrl":"https://doi.org/10.1145/1142473.1142504","url":null,"abstract":"Text is ubiquitous and, not surprisingly, many important applications rely on textual data for a variety of tasks. As a notable example, information extraction applications derive structured relations from unstructured text; as another example, focused crawlers explore the web to locate pages about specific topics. Execution plans for text-centric tasks follow two general paradigms for processing a text database: either we can scan, or 'crawl,\" the text database or, alternatively, we can exploit search engine indexes and retrieve the documents of interest via carefully crafted queries constructed in task-specific ways. The choice between crawl- and query-based execution plans can have a substantial impact on both execution time and output \"completeness\" (e.g., in terms of recall). Nevertheless, this choice is typically ad-hoc and based on heuristics or plain intuition. In this paper, we present fundamental building blocks to make the choice of execution plans for text-centric tasks in an informed, cost-based way. Towards this goal, we show how to analyze query- and crawl-based plans in terms of both execution time and output completeness. We adapt results from random-graph theory and statistics to develop a rigorous cost model for the execution plans. Our cost model reflects the fact that the performance of the plans depends on fundamental task-specific properties of the underlying text databases. We identify these properties and present efficient techniques for estimating the associated cost-model parameters. Overall, our approach helps predict the most appropriate execution plans for a task, resulting in significant efficiency and output completeness benefits. We complement our results with a large-scale experimental evaluation for three important text-centric tasks and over multiple real-life data sets.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134227609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81

ObjectRank: a system for authority-based search on databases ObjectRank:一个基于权限的数据库搜索系统

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142593

Heasoo Hwang, Vagelis Hristidis, Y. Papakonstantinou

引用次数: 55

User-defined aggregate functions: bridging theory and practice 用户自定义聚合函数:理论与实践的桥梁

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142480

Sara Cohen

引用次数: 48

LINQ: reconciling object, relations and XML in the .NET framework LINQ:在。net框架中协调对象、关系和XML

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142552

E. Meijer, B. Beckman, G. Bierman

{"title":"LINQ: reconciling object, relations and XML in the .NET framework","authors":"E. Meijer, B. Beckman, G. Bierman","doi":"10.1145/1142473.1142552","DOIUrl":"https://doi.org/10.1145/1142473.1142552","url":null,"abstract":"Many software applications today need to handle data from different data models; typically objects from the host programming language along with the relational and XML data models. The ROX impedance mismatch makes programs awkward to write and hard to maintain.The .NET Language-Integrated Query (LINQ) framework, proposed for the next release of the .NET framework, approaches this problem by defining a pattern of general-purpose standard query operators for traversal, filter, and projection. Based on this pattern, any .NET language can define special query comprehension syntax that is subsequently compiled into these standard operators (our code examples are in VB).Besides the general query operators, the LINQ framework also defines two domain specific APIs that work over XML (XLinq) and relational data (DLinq) respectively. The operators over XML use a lightweight and easy to use in-memory XML representation to provide XQuery-style expressiveness in the host programming language. The operators over relational data provide a simple OR mapping by leveraging remotable queries that are executed directly in the back-end relational store.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"321 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131357138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 449