Proceedings of the 2009 ACM SIGMOD International Conference on Management of data最新文献

筛选
英文 中文
DataLens: making a good first impression DataLens:留下良好的第一印象
B. Liu, H. Jagadish
{"title":"DataLens: making a good first impression","authors":"B. Liu, H. Jagadish","doi":"10.1145/1559845.1559997","DOIUrl":"https://doi.org/10.1145/1559845.1559997","url":null,"abstract":"When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the \"best\" results appear first. This approach is well-suited for information retrieval, and for some database queries, such as similarity queries or under-specified (or keyword) queries with known (or guessable) user preferences. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined. An alternative approach is not to try to show the estimated best results on the first page, but instead to help users learn what is available in the whole result set and direct them to finding what they need. We present DataLens, a framework that: i) generates the most representative data points to display on the first page without sorting or ranking, ii) allows users to drill-down to more similar items in a hierarchical fashion, and iii) dynamically adjusts the representatives based on the user's new query conditions. To the best of our knowledge, DataLens is the first to allow hierarchical database result browsing and searching at the same time.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"370 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123487446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Session details: Research session 4: security II 会议详情:研究会议4:安全性
Dan Suciu
{"title":"Session details: Research session 4: security II","authors":"Dan Suciu","doi":"10.1145/3257452","DOIUrl":"https://doi.org/10.1145/3257452","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121469118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Serial and parallel methods for i/o efficient suffix tree construction 串行和并行的i/o高效后缀树构建方法
A. Ghoting, K. Makarychev
{"title":"Serial and parallel methods for i/o efficient suffix tree construction","authors":"A. Ghoting, K. Makarychev","doi":"10.1145/1559845.1559931","DOIUrl":"https://doi.org/10.1145/1559845.1559931","url":null,"abstract":"Over the past three decades, the suffix tree has served as a fundamental data structure in string processing. However, its widespread applicability has been hindered due to the fact that suffix tree construction does not scale well with the size of the input string. With advances in data collection and storage technologies, large strings have become ubiquitous, especially across emerging applications involving text, time series, and biological sequence data. To benefit from these advances, it is imperative that we realize a scalable suffix tree construction algorithm. To deal with the aforementioned challenge, the past few years have seen the emergence of several disk-based suffix tree construction algorithms. However, construction times continue to be daunting -- for e.g., indexing the entire Human genome still takes over 30 hours on a system with 2 gigabytes of physical memory. In this paper, first, we empirically demonstrate and argue that all existing suffix tree construction algorithms have a severe limitation -- to glean reasonable disk I/O efficiency, the input string being indexed must fit in main memory. This limitation is attributed to the poor locality properties of existing suffix tree construction algorithms and inhibits both sequential and parallel scalability. To deal with this limitation, second, we show that through careful algorithm design, one of the simplest suffix tree construction algorithms can be re-architected to build a suffix tree in a tiled fashion, allowing the implementation to maintain a constant working set size and fixed memory footprint when indexing strings of any size. Third, we show how improved locality of reference coupled with effective collective communication facilitates an efficient parallelization on massively parallel systems like the IBM Blue Gene/L. Finally, we empirically show that the proposed approach affords improvements of several orders of magnitude when indexing large strings. Furthermore, we demonstrate that the proposed parallelization is scalable and allows one to index the entire Human genome on a 1024 processor system in under 15 minutes.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121556956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Query simplification: graceful degradation for join-order optimization 查询简化:联合顺序优化的优雅退化
Thomas Neumann
{"title":"Query simplification: graceful degradation for join-order optimization","authors":"Thomas Neumann","doi":"10.1145/1559845.1559889","DOIUrl":"https://doi.org/10.1145/1559845.1559889","url":null,"abstract":"Join ordering is one of the most important, but also most challenging problems of query optimization. In general finding the optimal join order is NP-hard. Existing dynamic programming algorithms exhibit exponential runtime even for the restricted, but highly relevant class of star joins. Therefore, it is infeasible to find the optimal join order when the query includes a large number of joins. Existing approaches for large queries switch to greedy heuristics or randomized algorithms at some point, which can degrade query execution performance by orders of magnitude. We propose a new paradigm for optimizing large queries: when a query is too complex to be optimized exactly, we simplify the query's join graph until the optimization problem becomes tractable within a given time budget. During simplification, we apply safe simplifications before more risky ones. This way join ordering problems are solved optimally if possible, and gracefully degrade with increasing query complexity. This paper presents a general framework for query simplification and a strategy for directing the simplification process. Extensive experiments with different kinds of queries, different join-graph structures, and different cost functions indicate that query simplification is very robust and outperforms previous methods for join-order optimization.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124053043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Session details: Research session 7: testing and security 会议详情:研究会议7:测试和安全
Berni Schiefer
{"title":"Session details: Research session 7: testing and security","authors":"Berni Schiefer","doi":"10.1145/3257455","DOIUrl":"https://doi.org/10.1145/3257455","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132538648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexRecs: expressing and combining flexible recommendations FlexRecs:表达和组合灵活的建议
G. Koutrika, Benjamin Bercovitz, H. Garcia-Molina
{"title":"FlexRecs: expressing and combining flexible recommendations","authors":"G. Koutrika, Benjamin Bercovitz, H. Garcia-Molina","doi":"10.1145/1559845.1559923","DOIUrl":"https://doi.org/10.1145/1559845.1559923","url":null,"abstract":"Recommendation systems have become very popular but most recommendation methods are `hard-wired' into the system making experimentation with and implementation of new recommendation paradigms cumbersome. In this paper, we propose FlexRecs, a framework that decouples the definition of a recommendation process from its execution and supports flexible recommendations over structured data. In FlexRecs, a recommendation approach can be defined declaratively as a high-level parameterized workflow comprising traditional relational operators and new operators that generate or combine recommendations. We describe a prototype flexible recommendation engine that realizes the proposed framework and we present example workflows and experimental results that show its potential for capturing multiple, existing or novel, recommendations easily and having a flexible recommendation system that combines extensibility with reasonable performance.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133514980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 165
Ordering, distinctness, aggregation, partitioning and DQP optimization in sybase ASE 15 sybase ASE 15中的排序、区别性、聚合、分区和DQP优化
Mihnea Andrei, Xun Cheng, Sudipto Chowdhuri, Curtis Johnson, Edwin Seputis
{"title":"Ordering, distinctness, aggregation, partitioning and DQP optimization in sybase ASE 15","authors":"Mihnea Andrei, Xun Cheng, Sudipto Chowdhuri, Curtis Johnson, Edwin Seputis","doi":"10.1145/1559845.1559947","DOIUrl":"https://doi.org/10.1145/1559845.1559947","url":null,"abstract":"The Sybase ASE RDBMS version 15 was subject to major enhancements, including semantic partitions and a full QP rewrite. The new ASE QP supports horizontal and vertical parallel processing over semantically partitioned tables, and many other modern QP techniques, as cost-based eager aggregation and cost-based join relocation DQP. In the new query optimizer, the ordering, distinctness, aggregation, partitioning, and DQP optimizations were based on a common framework: plan fragment equivalence classes and logical properties. Our main outcomes are a) an eager enforcement policy for ordering, partitioning and DQP location; b) a distinctness and aggregation optimization policy, opportunistically based on the eager ordering enforcement, and which has an optimization-time computational complexity similar to join processing; c) support for the user to force all of the above optimizer decisions, still guaranteeing a valid plan, based on the Abstract Plan technology. We describe the implementation of this solution in the ASE 15 optimizer. Finally, we give our experimental results: the generation of such complex plans comes with a small increase of the optimizer's SS size, hence within an acceptable optimization time; at execution, we have obtained performance improvements of orders of magnitude for some queries.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131446614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploiting context analysis for combining multiple entity resolution systems 利用上下文分析来组合多个实体解析系统
Zhaoqi Chen, D. Kalashnikov, S. Mehrotra
{"title":"Exploiting context analysis for combining multiple entity resolution systems","authors":"Zhaoqi Chen, D. Kalashnikov, S. Mehrotra","doi":"10.1145/1559845.1559869","DOIUrl":"https://doi.org/10.1145/1559845.1559869","url":null,"abstract":"Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER approaches has been developed to address the ER challenge. This paper proposes a new ER Ensemble framework. The task of ER Ensemble is to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER. The framework proposed in this paper leverages the observation that often no single ER method always performs the best, consistently outperforming other ER techniques in terms of quality. Instead, different ER solutions perform better in different contexts. The framework employs two novel combining approaches, which are based on supervised learning. The two approaches learn a mapping of the clustering decisions of the base-level ER systems, together with the local context, into a combined clustering decision. The paper empirically studies the framework by applying it to different domains. The experiments demonstrate that the proposed framework achieves significantly higher disambiguation quality compared to the current state of the art solutions.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116964914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Bridging the application and DBMS divide using static analysis and dynamic profiling 使用静态分析和动态分析弥合应用程序和DBMS的鸿沟
S. Chaudhuri, Vivek R. Narasayya, M. Syamala
{"title":"Bridging the application and DBMS divide using static analysis and dynamic profiling","authors":"S. Chaudhuri, Vivek R. Narasayya, M. Syamala","doi":"10.1145/1559845.1559975","DOIUrl":"https://doi.org/10.1145/1559845.1559975","url":null,"abstract":"Relational database management systems (RDBMSs) today serve as the backend for many real-world data intensive applications. Database developers use data access APIs such as ADO.NET to execute SQL queries and access data. While modern program analysis and code profilers are extensively used during the software development life cycle, there is a significant gap in these technologies for database applications because these tools have little or no understanding of data access APIs or the DBMS. We have developed tools that: (a) Enhance traditional static analysis of programs by leveraging understanding of database APIs to help developers identify security, correctness and performance problems in the application. This enables such problems to be detected early in the application lifecycle. (b) Extend the existing DBMS and application profiling infrastructure to enable correlation of application events with DBMS events. This allows profiling across application, data access and DBMS layers. We demonstrate how our tools enable a rich class of analysis, tuning and profiling tasks that are otherwise not possible today.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116324231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Filtered statistics 过滤数据
P. Terlecki, Hardik Bati, C. Galindo-Legaria, P. Zabback
{"title":"Filtered statistics","authors":"P. Terlecki, Hardik Bati, C. Galindo-Legaria, P. Zabback","doi":"10.1145/1559845.1559943","DOIUrl":"https://doi.org/10.1145/1559845.1559943","url":null,"abstract":"Column statistics are an important element of cardinality estimation frameworks. More accurate estimates allow the optimizer of a RDBMS to generate better plans and improve the overall system's efficiency. This paper introduces filtered statistics, which model value distribution over a set of rows restricted by a predicate. This feature, available in Microsoft SQL Server, can be used to handle column correlation, as well as focus on interesting data ranges. In particular, it fits well for scenarios with logical subtables, like flexible schema or multi-tenant applications. Integration with the existing cardinality estimation infrastructure is presented.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"20 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116411318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信