Proceedings of the 2006 ACM SIGMOD international conference on Management of data最新文献

筛选
英文 中文
Proactive identification of performance problems 主动识别性能问题
S. Duan, S. Babu
{"title":"Proactive identification of performance problems","authors":"S. Duan, S. Babu","doi":"10.1145/1142473.1142582","DOIUrl":"https://doi.org/10.1145/1142473.1142582","url":null,"abstract":"We propose to demonstrate Fa, an automated tool for timely and accurate prediction of Service-Level-Agreement (SLA) violations caused by performance problems in database systems. Fa periodically collects performance data at three levels: applications, database server, and operating system. This data is used to construct probabilistic models for predicting SLA violations. Fa currently uses graphical Bayesian network models because of their ability to support a wide range of inferences, including prediction and diagnosis, as well as their support for interactive visualization and presentation of complex system behavior in intuitive ways.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121360873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Approximately detecting duplicates for streaming data using stable bloom filters 近似检测重复流数据使用稳定的布隆过滤器
Fan Deng, Davood Rafiei
{"title":"Approximately detecting duplicates for streaming data using stable bloom filters","authors":"Fan Deng, Davood Rafiei","doi":"10.1145/1142473.1142477","DOIUrl":"https://doi.org/10.1145/1142473.1142477","url":null,"abstract":"Traditional duplicate elimination techniques are not applicable to many data stream applications. In general, precisely eliminating duplicates in an unbounded data stream is not feasible in many streaming scenarios. Therefore, we target at approximately eliminating duplicates in streaming environments given a limited space. Based on a well-known bitmap sketch, we introduce a data structure, Stable Bloom Filter, and a novel and simple algorithm. The basic idea is as follows: since there is no way to store the whole history of the stream, SBF continuously evicts the stale information so that SBF has room for those more recent elements. After finding some properties of SBF analytically, we show that a tight upper bound of false positive rates is guaranteed. In our empirical study, we compare SBF to alternative methods. The results show that our method is superior in terms of both accuracy and time effciency when a fixed small space and an acceptable false positive rate are given.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115477326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 182
Managing information extraction: state of the art and research directions 管理信息提取:现状与研究方向
A. Doan, R. Ramakrishnan, Shivakumar Vaithyanathan
{"title":"Managing information extraction: state of the art and research directions","authors":"A. Doan, R. Ramakrishnan, Shivakumar Vaithyanathan","doi":"10.1145/1142473.1142595","DOIUrl":"https://doi.org/10.1145/1142473.1142595","url":null,"abstract":"This tutorial makes the case for developing a unified framework that manages information extraction from unstructured data (focusing in particular on text). We first survey research on information extraction in the database, AI, NLP, IR, and Web communities in recent years. Then we discuss why this is the right time for the database community to actively participate and address the problem of managing information extraction (including in particular the challenges of maintaining and querying the extracted information, and accounting for the imprecision and uncertainty inherent in the extraction process). Finally, we show how interested researchers can take the next step, by pointing to open problems, available datasets, applicable standards, and software tools. We do not assume prior knowledge of text management, NLP, extraction techniques, or machine learning.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127536994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Supporting ad-hoc ranking aggregates 支持临时排序聚合
Chengkai Li, K. Chang, I. Ilyas
{"title":"Supporting ad-hoc ranking aggregates","authors":"Chengkai Li, K. Chang, I. Ilyas","doi":"10.1145/1142473.1142481","DOIUrl":"https://doi.org/10.1145/1142473.1142481","url":null,"abstract":"This paper presents a principled framework for efficient processing of ad-hoc top-k (ranking) aggregate queries, which provide the k groups with the highest aggregates as results. Essential support of such queries is lacking in current systems, which process the queries in a naïve materialize-group-sort scheme that can be prohibitively inefficient. Our framework is based on three fundamental principles. The Upper-Bound Principle dictates the requirements of early pruning, and the Group-Ranking and Tuple-Ranking Principles dictate group-ordering and tuple-ordering requirements. They together guide the query processor toward a provably optimal tuple schedule for aggregate query processing. We propose a new execution framework to apply the principles and requirements. We address the challenges in realizing the framework and implementing new query operators, enabling efficient group-aware and rank-aware query plans. The experimental study validates our framework by demonstrating orders of magnitude performance improvement in the new query plans, compared with the traditional plans.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127589785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Graph-based synopses for relational selectivity estimation 关系选择性估计的基于图的概要
Joshua Spiegel, N. Polyzotis
{"title":"Graph-based synopses for relational selectivity estimation","authors":"Joshua Spiegel, N. Polyzotis","doi":"10.1145/1142473.1142497","DOIUrl":"https://doi.org/10.1145/1142473.1142497","url":null,"abstract":"This paper introduces the Tuple Graph (TUG) synopses, a new class of data summaries that enable accurate selectivity estimates for complex relational queries. The proposed summarization framework adopts a \"semi-structured\" view of the relational database, modeling a relational data set as a graph of tuples and join queries as graph traversals respectively. The key idea is to approximate the structure of the induced data graph in a concise synopsis, and to estimate the selectivity of a query by performing the corresponding traversal over the summarized graph. We detail the TUG synopsis model that is based on this novel approach, and we describe an efficient and scalable construction algorithm for building accurate TUGs within a specific storage budget. We validate the performance of TUGs with an extensive experimental study on real-life and synthetic data sets. Our results verify the effectiveness of TUGs in generating accurate selectivity estimates for complex join queries, and demonstrate their benefits over existing summarization techniques.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127016803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Database support for matching: limitations and opportunities 数据库对匹配的支持:限制和机会
A. Kini, S. Shankar, J. Naughton, D. DeWitt
{"title":"Database support for matching: limitations and opportunities","authors":"A. Kini, S. Shankar, J. Naughton, D. DeWitt","doi":"10.1145/1142473.1142484","DOIUrl":"https://doi.org/10.1145/1142473.1142484","url":null,"abstract":"We define a match join of R and S with predicate θ to be a subset of the θ-join of R and S such that each tuple of R and S contributes to at most one result tuple. Match joins and their generalizations belong to a broad class of matching problems that have attracted a great deal of attention in disciplines including operations research and theoretical computer science. Instances of these problems arise in practice in resource allocation scenarios. To the best of our knowledge no one uses an RDBMS as a tool to help solve these problems; our goal in this paper is to explore whether or not this needs to be the case. We show that the simple approach of computing the full θ-join and then applying standard graph-matching algorithms to the result is ineffective for all but the smallest of problem instances. By contrast, a closer study shows that the DBMS primitives of grouping, sorting, and joining can be exploited to yield efficient match join operations. This suggests that RDBMSs can play a role in matching related problems beyond merely serving as expensive file systems exporting data sets to external user programs.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122569579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
To search or to crawl?: towards a query optimizer for text-centric tasks 搜索还是爬行?面向以文本为中心的任务的查询优化器
Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay Jain, L. Gravano
{"title":"To search or to crawl?: towards a query optimizer for text-centric tasks","authors":"Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay Jain, L. Gravano","doi":"10.1145/1142473.1142504","DOIUrl":"https://doi.org/10.1145/1142473.1142504","url":null,"abstract":"Text is ubiquitous and, not surprisingly, many important applications rely on textual data for a variety of tasks. As a notable example, information extraction applications derive structured relations from unstructured text; as another example, focused crawlers explore the web to locate pages about specific topics. Execution plans for text-centric tasks follow two general paradigms for processing a text database: either we can scan, or 'crawl,\" the text database or, alternatively, we can exploit search engine indexes and retrieve the documents of interest via carefully crafted queries constructed in task-specific ways. The choice between crawl- and query-based execution plans can have a substantial impact on both execution time and output \"completeness\" (e.g., in terms of recall). Nevertheless, this choice is typically ad-hoc and based on heuristics or plain intuition. In this paper, we present fundamental building blocks to make the choice of execution plans for text-centric tasks in an informed, cost-based way. Towards this goal, we show how to analyze query- and crawl-based plans in terms of both execution time and output completeness. We adapt results from random-graph theory and statistics to develop a rigorous cost model for the execution plans. Our cost model reflects the fact that the performance of the plans depends on fundamental task-specific properties of the underlying text databases. We identify these properties and present efficient techniques for estimating the associated cost-model parameters. Overall, our approach helps predict the most appropriate execution plans for a task, resulting in significant efficiency and output completeness benefits. We complement our results with a large-scale experimental evaluation for three important text-centric tasks and over multiple real-life data sets.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134227609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
ObjectRank: a system for authority-based search on databases ObjectRank:一个基于权限的数据库搜索系统
Heasoo Hwang, Vagelis Hristidis, Y. Papakonstantinou
{"title":"ObjectRank: a system for authority-based search on databases","authors":"Heasoo Hwang, Vagelis Hristidis, Y. Papakonstantinou","doi":"10.1145/1142473.1142593","DOIUrl":"https://doi.org/10.1145/1142473.1142593","url":null,"abstract":"We present ObjectRank demo system that performs authority-based keyword search on bibliographic databases. We also provide Inverse ObjectRank as a keyword-specific specificity metric and other calibration parameters such as Global ObjectRank. Users can specify various combinations of calibration values to control the behavior of the demo. Finally, we propose a methodology that enables us to extend query results using the ontology graph.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133997972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
User-defined aggregate functions: bridging theory and practice 用户自定义聚合函数:理论与实践的桥梁
Sara Cohen
{"title":"User-defined aggregate functions: bridging theory and practice","authors":"Sara Cohen","doi":"10.1145/1142473.1142480","DOIUrl":"https://doi.org/10.1145/1142473.1142480","url":null,"abstract":"The ability to create user-defined aggregate functions (UDAs) is rapidly becoming a standard feature in relational database systems. Therefore, problems such as query optimization, query rewriting and view maintenance must take into account queries (or views) with UDAs. There is a wealth of research on these problems for queries with general aggregate functions. Unfortunately, there is a mismatch between the manner in which UDAs are created, and the information that the database system requires in order to apply previous research.The purpose of this paper is to explore this mismatch and to bridge the gap between theory and practice, thereby enabling UDAs to become first-class citizens within the database. Specifically, we consider query optimization, query rewriting and view maintenance for queries with UDAs. For each of these problems we first survey previous results and explore the mismatch between theory and practice. We then present theoretical and practical insights that can be combined to derive a coherent framework for defining UDAs within a database system.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"729 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134127122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
LINQ: reconciling object, relations and XML in the .NET framework LINQ:在。net框架中协调对象、关系和XML
E. Meijer, B. Beckman, G. Bierman
{"title":"LINQ: reconciling object, relations and XML in the .NET framework","authors":"E. Meijer, B. Beckman, G. Bierman","doi":"10.1145/1142473.1142552","DOIUrl":"https://doi.org/10.1145/1142473.1142552","url":null,"abstract":"Many software applications today need to handle data from different data models; typically objects from the host programming language along with the relational and XML data models. The ROX impedance mismatch makes programs awkward to write and hard to maintain.The .NET Language-Integrated Query (LINQ) framework, proposed for the next release of the .NET framework, approaches this problem by defining a pattern of general-purpose standard query operators for traversal, filter, and projection. Based on this pattern, any .NET language can define special query comprehension syntax that is subsequently compiled into these standard operators (our code examples are in VB).Besides the general query operators, the LINQ framework also defines two domain specific APIs that work over XML (XLinq) and relational data (DLinq) respectively. The operators over XML use a lightweight and easy to use in-memory XML representation to provide XQuery-style expressiveness in the host programming language. The operators over relational data provide a simple OR mapping by leveraging remotable queries that are executed directly in the back-end relational store.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"321 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131357138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 449
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信