{"title":"DataLens: making a good first impression","authors":"B. Liu, H. Jagadish","doi":"10.1145/1559845.1559997","DOIUrl":"https://doi.org/10.1145/1559845.1559997","url":null,"abstract":"When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the \"best\" results appear first. This approach is well-suited for information retrieval, and for some database queries, such as similarity queries or under-specified (or keyword) queries with known (or guessable) user preferences. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined. An alternative approach is not to try to show the estimated best results on the first page, but instead to help users learn what is available in the whole result set and direct them to finding what they need. We present DataLens, a framework that: i) generates the most representative data points to display on the first page without sorting or ranking, ii) allows users to drill-down to more similar items in a hierarchical fashion, and iii) dynamically adjusts the representatives based on the user's new query conditions. To the best of our knowledge, DataLens is the first to allow hierarchical database result browsing and searching at the same time.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"370 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123487446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Research session 4: security II","authors":"Dan Suciu","doi":"10.1145/3257452","DOIUrl":"https://doi.org/10.1145/3257452","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121469118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Serial and parallel methods for i/o efficient suffix tree construction","authors":"A. Ghoting, K. Makarychev","doi":"10.1145/1559845.1559931","DOIUrl":"https://doi.org/10.1145/1559845.1559931","url":null,"abstract":"Over the past three decades, the suffix tree has served as a fundamental data structure in string processing. However, its widespread applicability has been hindered due to the fact that suffix tree construction does not scale well with the size of the input string. With advances in data collection and storage technologies, large strings have become ubiquitous, especially across emerging applications involving text, time series, and biological sequence data. To benefit from these advances, it is imperative that we realize a scalable suffix tree construction algorithm. To deal with the aforementioned challenge, the past few years have seen the emergence of several disk-based suffix tree construction algorithms. However, construction times continue to be daunting -- for e.g., indexing the entire Human genome still takes over 30 hours on a system with 2 gigabytes of physical memory. In this paper, first, we empirically demonstrate and argue that all existing suffix tree construction algorithms have a severe limitation -- to glean reasonable disk I/O efficiency, the input string being indexed must fit in main memory. This limitation is attributed to the poor locality properties of existing suffix tree construction algorithms and inhibits both sequential and parallel scalability. To deal with this limitation, second, we show that through careful algorithm design, one of the simplest suffix tree construction algorithms can be re-architected to build a suffix tree in a tiled fashion, allowing the implementation to maintain a constant working set size and fixed memory footprint when indexing strings of any size. Third, we show how improved locality of reference coupled with effective collective communication facilitates an efficient parallelization on massively parallel systems like the IBM Blue Gene/L. Finally, we empirically show that the proposed approach affords improvements of several orders of magnitude when indexing large strings. Furthermore, we demonstrate that the proposed parallelization is scalable and allows one to index the entire Human genome on a 1024 processor system in under 15 minutes.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121556956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query simplification: graceful degradation for join-order optimization","authors":"Thomas Neumann","doi":"10.1145/1559845.1559889","DOIUrl":"https://doi.org/10.1145/1559845.1559889","url":null,"abstract":"Join ordering is one of the most important, but also most challenging problems of query optimization. In general finding the optimal join order is NP-hard. Existing dynamic programming algorithms exhibit exponential runtime even for the restricted, but highly relevant class of star joins. Therefore, it is infeasible to find the optimal join order when the query includes a large number of joins. Existing approaches for large queries switch to greedy heuristics or randomized algorithms at some point, which can degrade query execution performance by orders of magnitude. We propose a new paradigm for optimizing large queries: when a query is too complex to be optimized exactly, we simplify the query's join graph until the optimization problem becomes tractable within a given time budget. During simplification, we apply safe simplifications before more risky ones. This way join ordering problems are solved optimally if possible, and gracefully degrade with increasing query complexity. This paper presents a general framework for query simplification and a strategy for directing the simplification process. Extensive experiments with different kinds of queries, different join-graph structures, and different cost functions indicate that query simplification is very robust and outperforms previous methods for join-order optimization.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124053043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Research session 7: testing and security","authors":"Berni Schiefer","doi":"10.1145/3257455","DOIUrl":"https://doi.org/10.1145/3257455","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132538648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FlexRecs: expressing and combining flexible recommendations","authors":"G. Koutrika, Benjamin Bercovitz, H. Garcia-Molina","doi":"10.1145/1559845.1559923","DOIUrl":"https://doi.org/10.1145/1559845.1559923","url":null,"abstract":"Recommendation systems have become very popular but most recommendation methods are `hard-wired' into the system making experimentation with and implementation of new recommendation paradigms cumbersome. In this paper, we propose FlexRecs, a framework that decouples the definition of a recommendation process from its execution and supports flexible recommendations over structured data. In FlexRecs, a recommendation approach can be defined declaratively as a high-level parameterized workflow comprising traditional relational operators and new operators that generate or combine recommendations. We describe a prototype flexible recommendation engine that realizes the proposed framework and we present example workflows and experimental results that show its potential for capturing multiple, existing or novel, recommendations easily and having a flexible recommendation system that combines extensibility with reasonable performance.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133514980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ordering, distinctness, aggregation, partitioning and DQP optimization in sybase ASE 15","authors":"Mihnea Andrei, Xun Cheng, Sudipto Chowdhuri, Curtis Johnson, Edwin Seputis","doi":"10.1145/1559845.1559947","DOIUrl":"https://doi.org/10.1145/1559845.1559947","url":null,"abstract":"The Sybase ASE RDBMS version 15 was subject to major enhancements, including semantic partitions and a full QP rewrite. The new ASE QP supports horizontal and vertical parallel processing over semantically partitioned tables, and many other modern QP techniques, as cost-based eager aggregation and cost-based join relocation DQP. In the new query optimizer, the ordering, distinctness, aggregation, partitioning, and DQP optimizations were based on a common framework: plan fragment equivalence classes and logical properties. Our main outcomes are a) an eager enforcement policy for ordering, partitioning and DQP location; b) a distinctness and aggregation optimization policy, opportunistically based on the eager ordering enforcement, and which has an optimization-time computational complexity similar to join processing; c) support for the user to force all of the above optimizer decisions, still guaranteeing a valid plan, based on the Abstract Plan technology. We describe the implementation of this solution in the ASE 15 optimizer. Finally, we give our experimental results: the generation of such complex plans comes with a small increase of the optimizer's SS size, hence within an acceptable optimization time; at execution, we have obtained performance improvements of orders of magnitude for some queries.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131446614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting context analysis for combining multiple entity resolution systems","authors":"Zhaoqi Chen, D. Kalashnikov, S. Mehrotra","doi":"10.1145/1559845.1559869","DOIUrl":"https://doi.org/10.1145/1559845.1559869","url":null,"abstract":"Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER approaches has been developed to address the ER challenge. This paper proposes a new ER Ensemble framework. The task of ER Ensemble is to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER. The framework proposed in this paper leverages the observation that often no single ER method always performs the best, consistently outperforming other ER techniques in terms of quality. Instead, different ER solutions perform better in different contexts. The framework employs two novel combining approaches, which are based on supervised learning. The two approaches learn a mapping of the clustering decisions of the base-level ER systems, together with the local context, into a combined clustering decision. The paper empirically studies the framework by applying it to different domains. The experiments demonstrate that the proposed framework achieves significantly higher disambiguation quality compared to the current state of the art solutions.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116964914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging the application and DBMS divide using static analysis and dynamic profiling","authors":"S. Chaudhuri, Vivek R. Narasayya, M. Syamala","doi":"10.1145/1559845.1559975","DOIUrl":"https://doi.org/10.1145/1559845.1559975","url":null,"abstract":"Relational database management systems (RDBMSs) today serve as the backend for many real-world data intensive applications. Database developers use data access APIs such as ADO.NET to execute SQL queries and access data. While modern program analysis and code profilers are extensively used during the software development life cycle, there is a significant gap in these technologies for database applications because these tools have little or no understanding of data access APIs or the DBMS. We have developed tools that: (a) Enhance traditional static analysis of programs by leveraging understanding of database APIs to help developers identify security, correctness and performance problems in the application. This enables such problems to be detected early in the application lifecycle. (b) Extend the existing DBMS and application profiling infrastructure to enable correlation of application events with DBMS events. This allows profiling across application, data access and DBMS layers. We demonstrate how our tools enable a rich class of analysis, tuning and profiling tasks that are otherwise not possible today.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116324231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Terlecki, Hardik Bati, C. Galindo-Legaria, P. Zabback
{"title":"Filtered statistics","authors":"P. Terlecki, Hardik Bati, C. Galindo-Legaria, P. Zabback","doi":"10.1145/1559845.1559943","DOIUrl":"https://doi.org/10.1145/1559845.1559943","url":null,"abstract":"Column statistics are an important element of cardinality estimation frameworks. More accurate estimates allow the optimizer of a RDBMS to generate better plans and improve the overall system's efficiency. This paper introduces filtered statistics, which model value distribution over a set of rows restricted by a predicate. This feature, available in Microsoft SQL Server, can be used to handle column correlation, as well as focus on interesting data ranges. In particular, it fits well for scenarios with logical subtables, like flexible schema or multi-tenant applications. Integration with the existing cardinality estimation infrastructure is presented.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"20 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116411318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}