2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)最新文献

筛选
英文 中文
Processing online news streams for large-scale semantic analysis 处理在线新闻流进行大规模语义分析
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452710
Milos Krstajic, Florian Mansmann, A. Stoffel, M. Atkinson, D. Keim
{"title":"Processing online news streams for large-scale semantic analysis","authors":"Milos Krstajic, Florian Mansmann, A. Stoffel, M. Atkinson, D. Keim","doi":"10.1109/ICDEW.2010.5452710","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452710","url":null,"abstract":"While Internet has enabled us to access a vast amount of online news articles originating from thousands of different sources, the human capability to read all these articles has stayed rather constant. Usually, the publishing industry takes over the role of filtering this enormous amount of information and presenting it in an appropriate way to the group of their subscribers. In this paper, the semantic analysis of such news streams is discussed by introducing a system that streams online news collected by the Europe Media Monitor to our proposed semantic news analysis system. Thereby, we describe in detail the emerging challenges and the corresponding engineering solutions to process incoming articles close to real-time. To demonstrate the use of our system, the case studies show a) temporal analysis of entities, such as institutions or persons, and b) their co-occurence in news articles.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121325681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Caching all plans with just one optimizer call 仅用一个优化器调用缓存所有计划
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452737
D. Dash, Ioannis Alagiannis, Cristina Maier, A. Ailamaki
{"title":"Caching all plans with just one optimizer call","authors":"D. Dash, Ioannis Alagiannis, Cristina Maier, A. Ailamaki","doi":"10.1109/ICDEW.2010.5452737","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452737","url":null,"abstract":"Modern database management systems (DBMS) answer a multitude of complex queries on increasingly larger datasets. Given the complexities of the queries and the numerous design features, manual design is no longer an option. Instead, automatically designing the database is vital to maximize its performance and to reduce the total cost of ownership. For this purpose, commercial DBMS feature automated physical designers suggesting an efficient DB design by using the optimizer as a cost model. Unfortunately, consulting the optimizer is time-consuming, an effect which is typically counter-acted by drastically pruning the search space, thereby potentially missing the optimal solution. Recently techniques cache the optimizer's output and evaluate some plans with the cached results, reducing the number of calls to the optimizer. Still, however, the cost of invoking the optimizer to fill the cache is nontrivial, undermining scalability when running workloads with thousands of queries. In this paper, we use the intermediate optimization results in a dynamic programming based optimizer to reduce the cache initialization overhead. We demonstrate the accuracy and efficiency of our techniques by implementing them on the PostgreSQL open source query optimizer. For a star-schema workload, our techniques build the cost model 5 to 10 times faster than the conventional approach, while preserving accuracy.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132608519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop Pig数据处理平台的扩展,用于使用Hadoop进行可扩展的RDF数据处理
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452704
Y. Tanimura, Akiyoshi Matono, S. Lynden, I. Kojima
{"title":"Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop","authors":"Y. Tanimura, Akiyoshi Matono, S. Lynden, I. Kojima","doi":"10.1109/ICDEW.2010.5452704","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452704","url":null,"abstract":"In order to effectively handle the growing amount of available RDF data, a scalable and flexible RDF data processing framework is needed. We previously proposed a Hadoop-based framework, which takes advantages of scalable and fault-tolerant distributed processing technologies, originally proposed as Google's distributed file system and MapReduce parallel model. In this paper, we present a method extending the Pig data processing platform on top of the Hadoop infrastructure. Pig compiles programs written in a high level language, called Pig Latin, into MapReduce programs that can be executed by Hadoop. In order to support RDF, Pig was extended with the ability to load and store RDF data efficiently. Furthermore, as reasoning is an important requirement for most systems storing RDF data, support for inferring new triples using entailment rules was also added. In this paper, we describe these extensions and present an evaluation of their performance.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130208374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Top-k pipe join 顶置管接头
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452769
D. Martinenghi, M. Tagliasacchi
{"title":"Top-k pipe join","authors":"D. Martinenghi, M. Tagliasacchi","doi":"10.1109/ICDEW.2010.5452769","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452769","url":null,"abstract":"In the context of service composition and orchestration, service invocation is typically scheduled according to execution plans, whose topology establishes whether different services are to be invoked in parallel or in a sequence. In the latter case, we may have a configuration, called pipe join, in which the output of a service is used as input for another service. When the services involved in a pipe join output results sorted by score, the problem arises of efficiently determining the join tuples (aka combinations) with the highest combined scores. In this paper we study different execution strategies related to the pipe join configuration. First, we consider a strategy that minimizes the access costs to achieve a target number of combinations. Then, we propose a strategy that explicitly considers the scores of the output tuples in order to provide deterministic guarantees that the top-k combinations have been found. Finally, a hybrid strategy is presented.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129937791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Keyword based search over semantic data in polynomial time 在多项式时间内对语义数据进行基于关键词的搜索
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452697
P. Cappellari, R. D. Virgilio, A. Maccioni, M. Miscione
{"title":"Keyword based search over semantic data in polynomial time","authors":"P. Cappellari, R. D. Virgilio, A. Maccioni, M. Miscione","doi":"10.1109/ICDEW.2010.5452697","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452697","url":null,"abstract":"In pursuing the development of Yanii, a novel keyword based search system on graph structures, in this paper we present the computational complexity study of the approach, highlighting a comparative study with actual PTIME state-of-the-art solutions. The comparative study focuses on a theoretical analysis of different frameworks to define complexity ranges, which they correspond to, in the polynomial time class. We characterize such systems in terms of general measures, which give a general description of the behavior of these frameworks according to different aspects that are more general and informative than mere benchmark tests on a few test cases. We show that Yanii holds better performance than others, confirming itself as a promising approach deserving further practical investigation and improvement.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121378424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Duplicate detection in probabilistic data 概率数据中的重复检测
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2009-12-01 DOI: 10.1109/ICDEW.2010.5452759
Fabian Panse, M. V. Keulen, A. D. Keijzer, N. Ritter
{"title":"Duplicate detection in probabilistic data","authors":"Fabian Panse, M. V. Keulen, A. D. Keijzer, N. Ritter","doi":"10.1109/ICDEW.2010.5452759","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452759","url":null,"abstract":"Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114504785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Vertical partitioning of relational OLTP databases using integer programming 使用整数规划的关系OLTP数据库的垂直分区
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2009-11-09 DOI: 10.1109/ICDEW.2010.5452739
Rasmus Resen Amossen
{"title":"Vertical partitioning of relational OLTP databases using integer programming","authors":"Rasmus Resen Amossen","doi":"10.1109/ICDEW.2010.5452739","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452739","url":null,"abstract":"A way to optimize performance of relational row store databases is to reduce the row widths by vertically partitioning tables into table fractions in order to minimize the number of irrelevant columns/attributes read by each transaction. This paper considers vertical partitioning algorithms for relational row-store OLTP databases with an H-store-like architecture, meaning that we would like to maximize the number of single-sited transactions. We present a model for the vertical partitioning problem that, given a schema together with a vertical partitioning and a workload, estimates the costs (bytes read/written by storage layer access methods and bytes transferred between sites) of evaluating the workload on the given partitioning. The cost model allows for arbitrarily prioritizing load balancing of sites vs. total cost minimization. We show that finding a minimum-cost vertical partitioning in this model is NP-hard and therefore the problem should obviously not be solved manually by a human DBA. We present two algorithms returning solutions in which single-sitedness of read queries is preserved while allowing column replication (which may allow a drastically reduced cost compared to disjoint partitioning). The first algorithm is a quadratic integer program that finds optimal minimum-cost solutions with respect to the model, and the second algorithm is a more scalable heuristic based on simulated annealing. Experiments show that the algorithms can reduce the cost of the model objective by 37% when applied to the TPC-C benchmark and the heuristic is shown to obtain solutions with costs close to the ones found using the quadratic program.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115811527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A database server for next-generation scientific data management 下一代科学数据管理的数据库服务器
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 1900-01-01 DOI: 10.1109/ICDEW.2010.5452723
M. Eltabakh, Walid G. Aref, A. Elmagarmid
{"title":"A database server for next-generation scientific data management","authors":"M. Eltabakh, Walid G. Aref, A. Elmagarmid","doi":"10.1109/ICDEW.2010.5452723","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452723","url":null,"abstract":"The growth of scientific information and the increasing automation of data collection have made databases integral to many scientific disciplines including life sciences, physics, meteorology, earth and atmospheric sciences, and chemistry. These sciences pose new data management challenges to current database system technologies. The thesis work presented in this paper proposes a database server for next-generation scientific data management. The proposed sever realizes two core requirements in scientific databases, mainly, (1) Annotation management, and (2) Complex dependencies involving human actions. In the paper, we discuss the challenges involved in each of these requirements and present the key contributions and main results in each of the two fronts.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125017948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信