Proceedings. International Database Engineering and Applications Symposium最新文献_第8页

Query optimization using column statistics in hive 在hive中使用列统计进行查询优化

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076636

Anja Gruenheid, E. Omiecinski, L. Mark

{"title":"Query optimization using column statistics in hive","authors":"Anja Gruenheid, E. Omiecinski, L. Mark","doi":"10.1145/2076623.2076636","DOIUrl":"https://doi.org/10.1145/2076623.2076636","url":null,"abstract":"Hive is a data warehousing solution on top of the Hadoop MapReduce framework that has been designed to handle large amounts of data and store them in tables like a relational database management system or a conventional data warehouse while using the parallelization and batch processing functionalities of the Hadoop MapReduce framework to speed up the execution of queries. Data inserted into Hive is stored in the Hadoop FileSystem (HDFS), which is part of the Hadoop MapReduce framework. To make the data accessible to the user, Hive uses a query language similar to SQL, which is called HiveQL. When a query is issued in HiveQL, it is translated by a parser into a query execution plan that is optimized and then turned into a series of map and reduce iterations. These iterations are then executed on the data stored in the HDFS, writing the output to a file.\u0000 The goal of this work is to to develop an approach for improving the performance of the HiveQL queries executed in the Hive framework. For that purpose, we introduce an extension to the Hive MetaStore which stores metadata that has been extracted on the column level of the user database. These column level statistics are then used for example in combination with join ordering algorithms which are adapted to the specific needs of the Hadoop MapReduce environment to improve the overall performance of the HiveQL query execution.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"31 1","pages":"97-105"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89770805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Efficient incremental breadth-depth XML event mining 高效的增量宽度深度XML事件挖掘

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076649

Rashed K. Salem, J. Darmont, Omar Boussaïd

引用次数: 5

Databases on the web: national web domain survey 网络上的数据库:全国网络域调查

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076646

Denis Shestakov

引用次数: 30

Addressing resource usage in stream processing systems: sizing window effect 在流处理系统中寻址资源使用:窗口大小效应

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076658

Sabina Surdu, Vasile-Marian Scuturici

引用次数: 2

Union rewritings for XPath fragments 为XPath片段重写联合

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076630

F. Afrati, M. Damigos, M. Gergatsoulis

引用次数: 4

Query answering on trajectory cuboids using prime numbers encodings 使用素数编码对轨迹长方体进行查询应答

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076652

E. Masciari

引用次数: 2

An efficient local region and clustering-based ensemble system for intrusion detection 基于局部区域和聚类的入侵检测集成系统

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076647

H. Huu, Nouria Harbi, J. Darmont

{"title":"An efficient local region and clustering-based ensemble system for intrusion detection","authors":"H. Huu, Nouria Harbi, J. Darmont","doi":"10.1145/2076623.2076647","DOIUrl":"https://doi.org/10.1145/2076623.2076647","url":null,"abstract":"The dramatic proliferation of sophisticated cyber attacks, in conjunction with the ever growing use of Internet-based services and applications, is nowadays becoming a great concern in any organization. Among many efficient security solutions proposed in the literature to deal with this evolving threat, ensemble approaches, a particular family of data mining, have proven very successful in designing high performance intrusion detection systems (IDSs) resting on the mutual combination of multiple classifiers. However, the strength of ensemble systems depends heavily on the methods to generate and combine individual classifiers. In this thread, we propose a novel design method to generate a robust ensemble-based IDS. In our approach, individual classifiers are built using both the input feature space and additional features exploited from k-means clustering. In addition, the ensemble combination is calculated based on the classification ability of classifiers on different local data regions defined in form of k-means clustering. Experimental results prove that our solution is superior to several well-known methods.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"17 1","pages":"185-191"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78254538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Semantics-enabled web APIs selection patterns 支持语义的web api选择模式

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076650

D. Bianchini, V. D. Antonellis, M. Melchiori

引用次数: 4

A new architectural paradigm for content-based web applications: Borè 基于内容的web应用程序的新架构范例:Borè

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076648

Antonio Bevacqua, M. Carnuccio, R. Ortale, E. Ritacco

引用次数: 5

Scalable queries for large datasets using cloud computing: a case study 使用云计算的大型数据集的可伸缩查询:案例研究

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076626

James P. McGlothlin, L. Khan

{"title":"Scalable queries for large datasets using cloud computing: a case study","authors":"James P. McGlothlin, L. Khan","doi":"10.1145/2076623.2076626","DOIUrl":"https://doi.org/10.1145/2076623.2076626","url":null,"abstract":"Cloud computing is rapidly growing in popularity as a solution for processing and retrieving huge amounts of data over clusters of inexpensive commodity hardware. The most common data model utilized by cloud computing software is the NoSQL data model. While this data model is extremely scalable, it is much more efficient for simple retrievals and scans than for the complex analytical queries typical in a relational database model. In this paper, we evaluate emerging cloud computing technologies using a representative use case. Our use case involves analyzing telecommunications logs for performance monitoring and quality assurance. Clearly, the size of such logs is growing exponentially as more devices communicate more frequently and the amount of data being transferred steadily increases. We analyze potential solutions to provide a scalable database which supports both retrieval and analysis. We will investigate and analyze all the major open source cloud computing solutions and designs. We then choose the most applicable subset of these technologies for experimentation. We provide a performance evaluation of these products, and we analyze our results and make recommendations. This paper provides a comprehensive survey of technologies for scalable data processing and an in-depth performance evaluation of these technologies.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"57 6 1","pages":"8-16"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77750913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2