2014 IEEE 30th International Conference on Data Engineering Workshops最新文献_第5页

PolarDBMS: Towards a cost-effective and policy-based data management in the cloud PolarDBMS:在云中实现经济高效和基于策略的数据管理

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818323

Ilir Fetai, Filip-Martin Brinkmann, H. Schuldt

{"title":"PolarDBMS: Towards a cost-effective and policy-based data management in the cloud","authors":"Ilir Fetai, Filip-Martin Brinkmann, H. Schuldt","doi":"10.1109/ICDEW.2014.6818323","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818323","url":null,"abstract":"The proliferation of Cloud computing has attracted a large variety of applications which are completely deployed on resources of Cloud providers. As data management is an essential part of these applications, Cloud providers have to deal with many different requirements for data management, depending on the characteristics and guarantees these applications are supposed to have. The objective of a Cloud provider is to support these diverse requirements with a basic set of customizable modules and protocols that can be (dynamically) combined. With the pay-as-you-go cost model of the Cloud, literally each user action and resource usage has a price tag attached to it. Thus, for the application providers, it is essential that the needs of their applications are provided in a cost-optimized manner. In this paper, we present the work in progress PolarDBMS, a flexible and dynamically adaptable system for managing data in the Cloud. PolarDBMS derives policies from application and service objectives. Based on these policies, it will automatically deploy the most efficient and cost-optimized set of modules and protocols and monitor their compliance. If necessary, the modules and/or their customization is changed dynamically at run-time. Several modules and protocols that have already been developed are presented. Additionally, we discuss the challenges that have to be met to fully implement PolarDBMS.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123741733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Towards optimization of RDF analytical queries on MapReduce 面向MapReduce的RDF分析查询优化

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818351

P. Ravindra

{"title":"Towards optimization of RDF analytical queries on MapReduce","authors":"P. Ravindra","doi":"10.1109/ICDEW.2014.6818351","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818351","url":null,"abstract":"The broadened use of Semantic Web technologies across domains has led to a shift in focus from simple pattern matching queries on RDF data to analytical queries with complex grouping and aggregations. An RDF analytical query involves graph pattern matching, which translates to several join operations due to the fine-grained nature of RDF data model. Complex analytical queries involve multiple grouping-aggregations on different graph patterns, making such tasks join-intensive. Scale-out processing of RDF analytical queries on existing relational-style MapReduce platforms such as Apache Hive and Pig, results in lengthy execution workflows with multiple cycles of I/O and network transfer. Additionally, certain graph patterns result in avoidable redundancy in intermediate results, which negatively impacts processing costs. The PhD thesis summarized in this paper proposes a two-pronged approach to minimize the costs while processing RDF queries on MapReduce: an algebraic approach based on a Nested TripleGroup Data Model and Algebra that reinterprets graph pattern queries in a way that reduces the required number of map-reduce cycles, and special strategies to minimize the redundancy in intermediate data while processing certain graph patterns. The proposed techniques are integrated into Apache Pig. Empirical evaluation of this work for processing graph pattern queries show 45-60% performance gains over systems such as Pig and Hive.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115167492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A provenance-based approach to manage long term preservation of scientific data 一种管理科学数据长期保存的基于来源的方法

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818316

Renato Beserra Sousa, D. C. Cugler, Joana G. Malaverri, C. B. Medeiros

引用次数: 12

Data and Software Preservation for Open Science (DASPOS) 开放科学的数据和软件保存(DASPOS)

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818318

M. Hildreth

引用次数: 5

Overlap versus partition: Marketing classification and customer profiling in complex networks of products 重叠与分割:复杂产品网络中的营销分类和客户分析

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818312

Diego Pennacchioli, M. Coscia, D. Pedreschi

{"title":"Overlap versus partition: Marketing classification and customer profiling in complex networks of products","authors":"Diego Pennacchioli, M. Coscia, D. Pedreschi","doi":"10.1109/ICDEW.2014.6818312","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818312","url":null,"abstract":"In recent years we witnessed the explosion in the availability of data regarding human and customer behavior in the market. This data richness era has fostered the development of useful applications in understanding how markets and the minds of the customers work. In this paper we focus on the analysis of complex networks based on customer behavior. Complex network analysis has provided a new and wide toolbox for the classic data mining task of clustering. With community discovery, i.e. the detection of functional modules in complex networks, we are now able to group together customers and products using a variety of different criteria. The aim of this paper is to explore this new analytic degree of freedom. We are interested in providing a case study uncovering the meaning of different community discovery algorithms on a network of products connected together because co-purchased by the same customers. We focus our interest in the different interpretation of a partition approach, where each product belongs to a single community, against an overlapping approach, where each product can belong to multiple communities. We found that the former is useful to improve the marketing classification of products, while the latter is able to create a collection of different customer profiles.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134416106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Characterizing comparison shopping behavior: A case study 比较购物行为的特征:一个案例研究

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818314

Mona Gupta, Happy Mittal, Parag Singla, A. Bagchi

{"title":"Characterizing comparison shopping behavior: A case study","authors":"Mona Gupta, Happy Mittal, Parag Singla, A. Bagchi","doi":"10.1109/ICDEW.2014.6818314","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818314","url":null,"abstract":"In this work we study the behavior of users on online comparison shopping using session traces collected over one year from an Indian mobile phone comparison website: http://smartprix.com. There are two aspects to our study: data analysis and behavior prediction. The first aspect of our study, data analysis, is geared towards providing insights into user behavior that could enable vendors to offer the right kinds of products and prices, and that could help the comparison shopping engine to customize the search based on user preferences. We discover the correlation between the search queries which users write before coming on the site and their future behavior on the same. We have also studied the distribution of users based on geographic location, time of the day, day of the week, number of sessions which have a click to buy (convert), repeat users, phones/brands visited and compared. We analyze the impact of price change on the popularity of a product and how special events such as launch of a new model affect the popularity of a brand. Our analysis corroborates intuitions such as increasing price leads to decrease in popularity and vice-versa. Further, we characterize the time lag in the effect of such phenomena on popularity. We characterize the user behavior on the website in terms of sequence of transitions between multiple states (defined in terms of the kind of page being visited e.g. home, visit, compare etc.). We use KL divergence to show that a time-homogeneous Markov chain is the right model for session traces when the number of clicks varies from 5 to 30. Finally, we build a model using Markov logic that uses the history of the user's activity in a session to predict whether a user is going to click to convert in that session. Our methodology of combining data analysis with machine learning is, in our opinion, a new approach to the empirical study of such data sets.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130977683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

SortingHat: A framework for deep matching between classes of entities SortingHat:用于实体类之间深度匹配的框架

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818309

Sumant Kulkarni, S. Srinivasa, Jyotiska Nath Khasnabish, K. Nagal, Sandeep G. Kurdagi

引用次数: 3

SLA-driven workload management for cloud databases 用于云数据库的sla驱动的工作负载管理

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818324

D. Stamatakis, Olga Papaemmanouil

引用次数: 9

LODHub — A platform for sharing and integrated processing of linked open data LODHub——链接开放数据的共享和综合处理平台

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818336

Stefan Hagedorn, K. Sattler

引用次数: 2

Cinderella — Adaptive online partitioning of irregularly structured data Cinderella -不规则结构化数据的自适应在线分区

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818342

K. Herrmann, H. Voigt, Wolfgang Lehner

{"title":"Cinderella — Adaptive online partitioning of irregularly structured data","authors":"K. Herrmann, H. Voigt, Wolfgang Lehner","doi":"10.1109/ICDEW.2014.6818342","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818342","url":null,"abstract":"In an increasing number of use cases, databases face the challenge of managing irregularly structured data. Irregularly structured data is characterized by a quickly evolving variety of entities without a common set of attributes. These entities do not show enough regularity to be captured in a traditional database schema. A common solution is to centralize the diverse entities in a universal table. Usually, this leads to a very sparse table. Although today's techniques allow efficient storage of sparse universal tables, query efficiency is still a problem. Queries that reference only a subset of attributes have to read the whole universal table including many irrelevant entities. One possible solution is to use a partitioning of the table, which allows pruning partitions of irrelevant entities before they are touched. Creating and maintaining such a partitioning manually is very laborious or even infeasible, due to the enormous complexity. Thus an autonomous solution is desirable. In this paper, we define the Online Partitioning Problem for irregularly structured data and present Cinderella. Cinderella is an autonomous online algorithm for horizontal partitioning of irregularly structured entities in universal tables. It is designed to keep its overhead low by incrementally assigning entities to partitions while they are touched anyway during modifications. The achieved partitioning allows queries that retrieve only entities with a subset of attributes easily pruning partitions of irrelevant entities. Cinderella increases the locality of queries and reduces query execution cost.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"287 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116565053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8