2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)最新文献_第6页

Processing online news streams for large-scale semantic analysis 处理在线新闻流进行大规模语义分析

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452710

Milos Krstajic, Florian Mansmann, A. Stoffel, M. Atkinson, D. Keim

引用次数: 34

Caching all plans with just one optimizer call 仅用一个优化器调用缓存所有计划

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452737

D. Dash, Ioannis Alagiannis, Cristina Maier, A. Ailamaki

{"title":"Caching all plans with just one optimizer call","authors":"D. Dash, Ioannis Alagiannis, Cristina Maier, A. Ailamaki","doi":"10.1109/ICDEW.2010.5452737","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452737","url":null,"abstract":"Modern database management systems (DBMS) answer a multitude of complex queries on increasingly larger datasets. Given the complexities of the queries and the numerous design features, manual design is no longer an option. Instead, automatically designing the database is vital to maximize its performance and to reduce the total cost of ownership. For this purpose, commercial DBMS feature automated physical designers suggesting an efficient DB design by using the optimizer as a cost model. Unfortunately, consulting the optimizer is time-consuming, an effect which is typically counter-acted by drastically pruning the search space, thereby potentially missing the optimal solution. Recently techniques cache the optimizer's output and evaluate some plans with the cached results, reducing the number of calls to the optimizer. Still, however, the cost of invoking the optimizer to fill the cache is nontrivial, undermining scalability when running workloads with thousands of queries. In this paper, we use the intermediate optimization results in a dynamic programming based optimizer to reduce the cache initialization overhead. We demonstrate the accuracy and efficiency of our techniques by implementing them on the PostgreSQL open source query optimizer. For a star-schema workload, our techniques build the cost model 5 to 10 times faster than the conventional approach, while preserving accuracy.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132608519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop Pig数据处理平台的扩展，用于使用Hadoop进行可扩展的RDF数据处理

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452704

Y. Tanimura, Akiyoshi Matono, S. Lynden, I. Kojima

引用次数: 20

Top-k pipe join 顶置管接头

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452769

D. Martinenghi, M. Tagliasacchi

引用次数: 7

Keyword based search over semantic data in polynomial time 在多项式时间内对语义数据进行基于关键词的搜索

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452697

P. Cappellari, R. D. Virgilio, A. Maccioni, M. Miscione

引用次数: 2

Duplicate detection in probabilistic data 概率数据中的重复检测

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2009-12-01 DOI: 10.1109/ICDEW.2010.5452759

Fabian Panse, M. V. Keulen, A. D. Keijzer, N. Ritter

引用次数: 25

Vertical partitioning of relational OLTP databases using integer programming 使用整数规划的关系OLTP数据库的垂直分区

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2009-11-09 DOI: 10.1109/ICDEW.2010.5452739

Rasmus Resen Amossen

{"title":"Vertical partitioning of relational OLTP databases using integer programming","authors":"Rasmus Resen Amossen","doi":"10.1109/ICDEW.2010.5452739","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452739","url":null,"abstract":"A way to optimize performance of relational row store databases is to reduce the row widths by vertically partitioning tables into table fractions in order to minimize the number of irrelevant columns/attributes read by each transaction. This paper considers vertical partitioning algorithms for relational row-store OLTP databases with an H-store-like architecture, meaning that we would like to maximize the number of single-sited transactions. We present a model for the vertical partitioning problem that, given a schema together with a vertical partitioning and a workload, estimates the costs (bytes read/written by storage layer access methods and bytes transferred between sites) of evaluating the workload on the given partitioning. The cost model allows for arbitrarily prioritizing load balancing of sites vs. total cost minimization. We show that finding a minimum-cost vertical partitioning in this model is NP-hard and therefore the problem should obviously not be solved manually by a human DBA. We present two algorithms returning solutions in which single-sitedness of read queries is preserved while allowing column replication (which may allow a drastically reduced cost compared to disjoint partitioning). The first algorithm is a quadratic integer program that finds optimal minimum-cost solutions with respect to the model, and the second algorithm is a more scalable heuristic based on simulated annealing. Experiments show that the algorithms can reduce the cost of the model objective by 37% when applied to the TPC-C benchmark and the heuristic is shown to obtain solutions with costs close to the ones found using the quadratic program.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115811527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

A database server for next-generation scientific data management 下一代科学数据管理的数据库服务器

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 1900-01-01 DOI: 10.1109/ICDEW.2010.5452723

M. Eltabakh, Walid G. Aref, A. Elmagarmid

引用次数: 3