2014 IEEE 30th International Conference on Data Engineering Workshops最新文献

筛选
英文 中文
PolarDBMS: Towards a cost-effective and policy-based data management in the cloud PolarDBMS:在云中实现经济高效和基于策略的数据管理
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818323
Ilir Fetai, Filip-Martin Brinkmann, H. Schuldt
{"title":"PolarDBMS: Towards a cost-effective and policy-based data management in the cloud","authors":"Ilir Fetai, Filip-Martin Brinkmann, H. Schuldt","doi":"10.1109/ICDEW.2014.6818323","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818323","url":null,"abstract":"The proliferation of Cloud computing has attracted a large variety of applications which are completely deployed on resources of Cloud providers. As data management is an essential part of these applications, Cloud providers have to deal with many different requirements for data management, depending on the characteristics and guarantees these applications are supposed to have. The objective of a Cloud provider is to support these diverse requirements with a basic set of customizable modules and protocols that can be (dynamically) combined. With the pay-as-you-go cost model of the Cloud, literally each user action and resource usage has a price tag attached to it. Thus, for the application providers, it is essential that the needs of their applications are provided in a cost-optimized manner. In this paper, we present the work in progress PolarDBMS, a flexible and dynamically adaptable system for managing data in the Cloud. PolarDBMS derives policies from application and service objectives. Based on these policies, it will automatically deploy the most efficient and cost-optimized set of modules and protocols and monitor their compliance. If necessary, the modules and/or their customization is changed dynamically at run-time. Several modules and protocols that have already been developed are presented. Additionally, we discuss the challenges that have to be met to fully implement PolarDBMS.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123741733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards optimization of RDF analytical queries on MapReduce 面向MapReduce的RDF分析查询优化
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818351
P. Ravindra
{"title":"Towards optimization of RDF analytical queries on MapReduce","authors":"P. Ravindra","doi":"10.1109/ICDEW.2014.6818351","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818351","url":null,"abstract":"The broadened use of Semantic Web technologies across domains has led to a shift in focus from simple pattern matching queries on RDF data to analytical queries with complex grouping and aggregations. An RDF analytical query involves graph pattern matching, which translates to several join operations due to the fine-grained nature of RDF data model. Complex analytical queries involve multiple grouping-aggregations on different graph patterns, making such tasks join-intensive. Scale-out processing of RDF analytical queries on existing relational-style MapReduce platforms such as Apache Hive and Pig, results in lengthy execution workflows with multiple cycles of I/O and network transfer. Additionally, certain graph patterns result in avoidable redundancy in intermediate results, which negatively impacts processing costs. The PhD thesis summarized in this paper proposes a two-pronged approach to minimize the costs while processing RDF queries on MapReduce: an algebraic approach based on a Nested TripleGroup Data Model and Algebra that reinterprets graph pattern queries in a way that reduces the required number of map-reduce cycles, and special strategies to minimize the redundancy in intermediate data while processing certain graph patterns. The proposed techniques are integrated into Apache Pig. Empirical evaluation of this work for processing graph pattern queries show 45-60% performance gains over systems such as Pig and Hive.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115167492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A provenance-based approach to manage long term preservation of scientific data 一种管理科学数据长期保存的基于来源的方法
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818316
Renato Beserra Sousa, D. C. Cugler, Joana G. Malaverri, C. B. Medeiros
{"title":"A provenance-based approach to manage long term preservation of scientific data","authors":"Renato Beserra Sousa, D. C. Cugler, Joana G. Malaverri, C. B. Medeiros","doi":"10.1109/ICDEW.2014.6818316","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818316","url":null,"abstract":"Long term preservation of scientific data goes beyond the data, and extends to metadata preservation and curation. While several researchers emphasize curation processes, our work is geared towards assessing the quality of scientific (meta)data. The rationale behind this strategy is that scientific data are often accessible via metadata - and thus ensuring metadata quality is a means to provide long term accessibility. This paper discusses our quality assessment architecture, presenting a case study on animal sound recording metadata. Our case study is an example of the importance of periodically assessing (meta)data quality, since knowledge about the world may evolve, and quality decrease with time, hampering long term preservation.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128192066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Data and Software Preservation for Open Science (DASPOS) 开放科学的数据和软件保存(DASPOS)
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818318
M. Hildreth
{"title":"Data and Software Preservation for Open Science (DASPOS)","authors":"M. Hildreth","doi":"10.1109/ICDEW.2014.6818318","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818318","url":null,"abstract":"Data and Software Preservation for Open Science (DASPOS), represents a first attempt to establish a formal collaboration tying together physicists from the CMS and ATLAS experiments at the LHC and the Tevatron experiments with experts in digital curation, heterogeneous high-throughput storage systems, large-scale computing systems, and grid access and infrastructure. Recently funded by the National Science Foundation, the project is organizing multiple workshops aimed at understanding use cases for data, software, and knowledge preservation in High Energy Physics and other scientific disciplines, including BioInformatics and Astrophysics. The goal of this project is the technical development and specification of an architecture for curating HEP data and software to the point where the repetition of a physics analysis using only the archived data, software, and analysis description is possible. The novelty of this effort is this holistic approach, where not only data but also software and frameworks necessary to use the data are part of the preservation effort, making it true “physics preservation” rather than merely data preservation.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134102234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Overlap versus partition: Marketing classification and customer profiling in complex networks of products 重叠与分割:复杂产品网络中的营销分类和客户分析
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818312
Diego Pennacchioli, M. Coscia, D. Pedreschi
{"title":"Overlap versus partition: Marketing classification and customer profiling in complex networks of products","authors":"Diego Pennacchioli, M. Coscia, D. Pedreschi","doi":"10.1109/ICDEW.2014.6818312","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818312","url":null,"abstract":"In recent years we witnessed the explosion in the availability of data regarding human and customer behavior in the market. This data richness era has fostered the development of useful applications in understanding how markets and the minds of the customers work. In this paper we focus on the analysis of complex networks based on customer behavior. Complex network analysis has provided a new and wide toolbox for the classic data mining task of clustering. With community discovery, i.e. the detection of functional modules in complex networks, we are now able to group together customers and products using a variety of different criteria. The aim of this paper is to explore this new analytic degree of freedom. We are interested in providing a case study uncovering the meaning of different community discovery algorithms on a network of products connected together because co-purchased by the same customers. We focus our interest in the different interpretation of a partition approach, where each product belongs to a single community, against an overlapping approach, where each product can belong to multiple communities. We found that the former is useful to improve the marketing classification of products, while the latter is able to create a collection of different customer profiles.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134416106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Characterizing comparison shopping behavior: A case study 比较购物行为的特征:一个案例研究
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818314
Mona Gupta, Happy Mittal, Parag Singla, A. Bagchi
{"title":"Characterizing comparison shopping behavior: A case study","authors":"Mona Gupta, Happy Mittal, Parag Singla, A. Bagchi","doi":"10.1109/ICDEW.2014.6818314","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818314","url":null,"abstract":"In this work we study the behavior of users on online comparison shopping using session traces collected over one year from an Indian mobile phone comparison website: http://smartprix.com. There are two aspects to our study: data analysis and behavior prediction. The first aspect of our study, data analysis, is geared towards providing insights into user behavior that could enable vendors to offer the right kinds of products and prices, and that could help the comparison shopping engine to customize the search based on user preferences. We discover the correlation between the search queries which users write before coming on the site and their future behavior on the same. We have also studied the distribution of users based on geographic location, time of the day, day of the week, number of sessions which have a click to buy (convert), repeat users, phones/brands visited and compared. We analyze the impact of price change on the popularity of a product and how special events such as launch of a new model affect the popularity of a brand. Our analysis corroborates intuitions such as increasing price leads to decrease in popularity and vice-versa. Further, we characterize the time lag in the effect of such phenomena on popularity. We characterize the user behavior on the website in terms of sequence of transitions between multiple states (defined in terms of the kind of page being visited e.g. home, visit, compare etc.). We use KL divergence to show that a time-homogeneous Markov chain is the right model for session traces when the number of clicks varies from 5 to 30. Finally, we build a model using Markov logic that uses the history of the user's activity in a session to predict whether a user is going to click to convert in that session. Our methodology of combining data analysis with machine learning is, in our opinion, a new approach to the empirical study of such data sets.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130977683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
SortingHat: A framework for deep matching between classes of entities SortingHat:用于实体类之间深度匹配的框架
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818309
Sumant Kulkarni, S. Srinivasa, Jyotiska Nath Khasnabish, K. Nagal, Sandeep G. Kurdagi
{"title":"SortingHat: A framework for deep matching between classes of entities","authors":"Sumant Kulkarni, S. Srinivasa, Jyotiska Nath Khasnabish, K. Nagal, Sandeep G. Kurdagi","doi":"10.1109/ICDEW.2014.6818309","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818309","url":null,"abstract":"This paper addresses the problem of “deep matching” - or matching different classes of entities based on latent underlying semantics, rather than just their visible attributes. An example of this is the “automatic task assignment” problem where several tasks have to be assigned to people with varied skill-sets and experiences. Datasets showing types of entities (tasks and people) along with their involvement of other concepts, are used as the basis for deep matching. This paper describes a work in progress, of a deep matching application called SortingHat. We analyze issue tracking data of a large corporation containing task descriptions and assignments to people that were computed manually. We identify several entities and concepts from the dataset and build a co-occurrence graph as the basic data structure for computing deep matches. We then propose a set of query primitives that can establish several forms of semantic matching across different classes of entities.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125190403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SLA-driven workload management for cloud databases 用于云数据库的sla驱动的工作负载管理
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818324
D. Stamatakis, Olga Papaemmanouil
{"title":"SLA-driven workload management for cloud databases","authors":"D. Stamatakis, Olga Papaemmanouil","doi":"10.1109/ICDEW.2014.6818324","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818324","url":null,"abstract":"Despite the fast growth and increased adoption of cloud databases, challenges related to Service-Level-Agreements (SLAs) specification and management still exist. Supporting application-specific performance goals and SLAs, assigning incoming query processing workloads to the reserved resources to avoid SLA violations and monitoring performance factors to ensure acceptable QoS levels, are some of the critical tasks that have not yet been addressed by the database community. In this position paper, we argue that SLA management for cloud databases should itself be offered to developers as a cloud-based automated service. Towards this goal, we discuss the design of a framework that a) enables the specification of custom applicaton-level performance SLAs and b) offers workload management mechanisms that can automatically customize their functionality towards meeting these application-specific SLAs.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115133807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
LODHub — A platform for sharing and integrated processing of linked open data LODHub——链接开放数据的共享和综合处理平台
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818336
Stefan Hagedorn, K. Sattler
{"title":"LODHub — A platform for sharing and integrated processing of linked open data","authors":"Stefan Hagedorn, K. Sattler","doi":"10.1109/ICDEW.2014.6818336","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818336","url":null,"abstract":"In this paper we discuss the need for a new platform that combines existing solutions for publishing and sharing linked open data with the infrastructure of services for exploring, processing, and analyzing data across multiple data sets. We identify various requirements for such a platform, describe the architecture, and sketch initial results of our prototype.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123098514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cinderella — Adaptive online partitioning of irregularly structured data Cinderella -不规则结构化数据的自适应在线分区
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818342
K. Herrmann, H. Voigt, Wolfgang Lehner
{"title":"Cinderella — Adaptive online partitioning of irregularly structured data","authors":"K. Herrmann, H. Voigt, Wolfgang Lehner","doi":"10.1109/ICDEW.2014.6818342","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818342","url":null,"abstract":"In an increasing number of use cases, databases face the challenge of managing irregularly structured data. Irregularly structured data is characterized by a quickly evolving variety of entities without a common set of attributes. These entities do not show enough regularity to be captured in a traditional database schema. A common solution is to centralize the diverse entities in a universal table. Usually, this leads to a very sparse table. Although today's techniques allow efficient storage of sparse universal tables, query efficiency is still a problem. Queries that reference only a subset of attributes have to read the whole universal table including many irrelevant entities. One possible solution is to use a partitioning of the table, which allows pruning partitions of irrelevant entities before they are touched. Creating and maintaining such a partitioning manually is very laborious or even infeasible, due to the enormous complexity. Thus an autonomous solution is desirable. In this paper, we define the Online Partitioning Problem for irregularly structured data and present Cinderella. Cinderella is an autonomous online algorithm for horizontal partitioning of irregularly structured entities in universal tables. It is designed to keep its overhead low by incrementally assigning entities to partitions while they are touched anyway during modifications. The achieved partitioning allows queries that retrieve only entities with a subset of attributes easily pruning partitions of irrelevant entities. Cinderella increases the locality of queries and reduces query execution cost.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"287 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116565053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信