Proceedings of the 2009 ACM SIGMOD International Conference on Management of data最新文献_第2页

DataLens: making a good first impression DataLens:留下良好的第一印象

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559997

B. Liu, H. Jagadish

{"title":"DataLens: making a good first impression","authors":"B. Liu, H. Jagadish","doi":"10.1145/1559845.1559997","DOIUrl":"https://doi.org/10.1145/1559845.1559997","url":null,"abstract":"When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the \"best\" results appear first. This approach is well-suited for information retrieval, and for some database queries, such as similarity queries or under-specified (or keyword) queries with known (or guessable) user preferences. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined. An alternative approach is not to try to show the estimated best results on the first page, but instead to help users learn what is available in the whole result set and direct them to finding what they need. We present DataLens, a framework that: i) generates the most representative data points to display on the first page without sorting or ranking, ii) allows users to drill-down to more similar items in a hierarchical fashion, and iii) dynamically adjusts the representatives based on the user's new query conditions. To the best of our knowledge, DataLens is the first to allow hierarchical database result browsing and searching at the same time.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"370 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123487446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Session details: Research session 4: security II 会议详情:研究会议4:安全性

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/3257452

Dan Suciu

引用次数: 0

Serial and parallel methods for i/o efficient suffix tree construction 串行和并行的i/o高效后缀树构建方法

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559931

A. Ghoting, K. Makarychev

{"title":"Serial and parallel methods for i/o efficient suffix tree construction","authors":"A. Ghoting, K. Makarychev","doi":"10.1145/1559845.1559931","DOIUrl":"https://doi.org/10.1145/1559845.1559931","url":null,"abstract":"Over the past three decades, the suffix tree has served as a fundamental data structure in string processing. However, its widespread applicability has been hindered due to the fact that suffix tree construction does not scale well with the size of the input string. With advances in data collection and storage technologies, large strings have become ubiquitous, especially across emerging applications involving text, time series, and biological sequence data. To benefit from these advances, it is imperative that we realize a scalable suffix tree construction algorithm. To deal with the aforementioned challenge, the past few years have seen the emergence of several disk-based suffix tree construction algorithms. However, construction times continue to be daunting -- for e.g., indexing the entire Human genome still takes over 30 hours on a system with 2 gigabytes of physical memory. In this paper, first, we empirically demonstrate and argue that all existing suffix tree construction algorithms have a severe limitation -- to glean reasonable disk I/O efficiency, the input string being indexed must fit in main memory. This limitation is attributed to the poor locality properties of existing suffix tree construction algorithms and inhibits both sequential and parallel scalability. To deal with this limitation, second, we show that through careful algorithm design, one of the simplest suffix tree construction algorithms can be re-architected to build a suffix tree in a tiled fashion, allowing the implementation to maintain a constant working set size and fixed memory footprint when indexing strings of any size. Third, we show how improved locality of reference coupled with effective collective communication facilitates an efficient parallelization on massively parallel systems like the IBM Blue Gene/L. Finally, we empirically show that the proposed approach affords improvements of several orders of magnitude when indexing large strings. Furthermore, we demonstrate that the proposed parallelization is scalable and allows one to index the entire Human genome on a 1024 processor system in under 15 minutes.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121556956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Query simplification: graceful degradation for join-order optimization 查询简化:联合顺序优化的优雅退化

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559889

Thomas Neumann

{"title":"Query simplification: graceful degradation for join-order optimization","authors":"Thomas Neumann","doi":"10.1145/1559845.1559889","DOIUrl":"https://doi.org/10.1145/1559845.1559889","url":null,"abstract":"Join ordering is one of the most important, but also most challenging problems of query optimization. In general finding the optimal join order is NP-hard. Existing dynamic programming algorithms exhibit exponential runtime even for the restricted, but highly relevant class of star joins. Therefore, it is infeasible to find the optimal join order when the query includes a large number of joins. Existing approaches for large queries switch to greedy heuristics or randomized algorithms at some point, which can degrade query execution performance by orders of magnitude. We propose a new paradigm for optimizing large queries: when a query is too complex to be optimized exactly, we simplify the query's join graph until the optimization problem becomes tractable within a given time budget. During simplification, we apply safe simplifications before more risky ones. This way join ordering problems are solved optimally if possible, and gracefully degrade with increasing query complexity. This paper presents a general framework for query simplification and a strategy for directing the simplification process. Extensive experiments with different kinds of queries, different join-graph structures, and different cost functions indicate that query simplification is very robust and outperforms previous methods for join-order optimization.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124053043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Session details: Research session 7: testing and security 会议详情:研究会议7:测试和安全

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/3257455

Berni Schiefer

引用次数: 0

FlexRecs: expressing and combining flexible recommendations FlexRecs:表达和组合灵活的建议

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559923

G. Koutrika, Benjamin Bercovitz, H. Garcia-Molina

引用次数: 165

Ordering, distinctness, aggregation, partitioning and DQP optimization in sybase ASE 15 sybase ASE 15中的排序、区别性、聚合、分区和DQP优化

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559947

Mihnea Andrei, Xun Cheng, Sudipto Chowdhuri, Curtis Johnson, Edwin Seputis

{"title":"Ordering, distinctness, aggregation, partitioning and DQP optimization in sybase ASE 15","authors":"Mihnea Andrei, Xun Cheng, Sudipto Chowdhuri, Curtis Johnson, Edwin Seputis","doi":"10.1145/1559845.1559947","DOIUrl":"https://doi.org/10.1145/1559845.1559947","url":null,"abstract":"The Sybase ASE RDBMS version 15 was subject to major enhancements, including semantic partitions and a full QP rewrite. The new ASE QP supports horizontal and vertical parallel processing over semantically partitioned tables, and many other modern QP techniques, as cost-based eager aggregation and cost-based join relocation DQP. In the new query optimizer, the ordering, distinctness, aggregation, partitioning, and DQP optimizations were based on a common framework: plan fragment equivalence classes and logical properties. Our main outcomes are a) an eager enforcement policy for ordering, partitioning and DQP location; b) a distinctness and aggregation optimization policy, opportunistically based on the eager ordering enforcement, and which has an optimization-time computational complexity similar to join processing; c) support for the user to force all of the above optimizer decisions, still guaranteeing a valid plan, based on the Abstract Plan technology. We describe the implementation of this solution in the ASE 15 optimizer. Finally, we give our experimental results: the generation of such complex plans comes with a small increase of the optimizer's SS size, hence within an acceptable optimization time; at execution, we have obtained performance improvements of orders of magnitude for some queries.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131446614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Exploiting context analysis for combining multiple entity resolution systems 利用上下文分析来组合多个实体解析系统

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559869

Zhaoqi Chen, D. Kalashnikov, S. Mehrotra

{"title":"Exploiting context analysis for combining multiple entity resolution systems","authors":"Zhaoqi Chen, D. Kalashnikov, S. Mehrotra","doi":"10.1145/1559845.1559869","DOIUrl":"https://doi.org/10.1145/1559845.1559869","url":null,"abstract":"Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER approaches has been developed to address the ER challenge. This paper proposes a new ER Ensemble framework. The task of ER Ensemble is to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER. The framework proposed in this paper leverages the observation that often no single ER method always performs the best, consistently outperforming other ER techniques in terms of quality. Instead, different ER solutions perform better in different contexts. The framework employs two novel combining approaches, which are based on supervised learning. The two approaches learn a mapping of the clustering decisions of the base-level ER systems, together with the local context, into a combined clustering decision. The paper empirically studies the framework by applying it to different domains. The experiments demonstrate that the proposed framework achieves significantly higher disambiguation quality compared to the current state of the art solutions.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116964914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 93

Bridging the application and DBMS divide using static analysis and dynamic profiling 使用静态分析和动态分析弥合应用程序和DBMS的鸿沟

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559975

S. Chaudhuri, Vivek R. Narasayya, M. Syamala

{"title":"Bridging the application and DBMS divide using static analysis and dynamic profiling","authors":"S. Chaudhuri, Vivek R. Narasayya, M. Syamala","doi":"10.1145/1559845.1559975","DOIUrl":"https://doi.org/10.1145/1559845.1559975","url":null,"abstract":"Relational database management systems (RDBMSs) today serve as the backend for many real-world data intensive applications. Database developers use data access APIs such as ADO.NET to execute SQL queries and access data. While modern program analysis and code profilers are extensively used during the software development life cycle, there is a significant gap in these technologies for database applications because these tools have little or no understanding of data access APIs or the DBMS. We have developed tools that: (a) Enhance traditional static analysis of programs by leveraging understanding of database APIs to help developers identify security, correctness and performance problems in the application. This enables such problems to be detected early in the application lifecycle. (b) Extend the existing DBMS and application profiling infrastructure to enable correlation of application events with DBMS events. This allows profiling across application, data access and DBMS layers. We demonstrate how our tools enable a rich class of analysis, tuning and profiling tasks that are otherwise not possible today.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116324231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Filtered statistics 过滤数据

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559943

P. Terlecki, Hardik Bati, C. Galindo-Legaria, P. Zabback

引用次数: 2