Haggai Roitman, Gilad Barkai, D. Konopnicki, Michal Shmueli-Scheuer
{"title":"Harnessing the crowds for multi-channel marketing monitoring","authors":"Haggai Roitman, Gilad Barkai, D. Konopnicki, Michal Shmueli-Scheuer","doi":"10.1109/ICDEW.2014.6818313","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818313","url":null,"abstract":"Measuring the effectiveness of marketing efforts across various channels is a challenging task. Such measurement usually relies on a combination of key performance indicators (KPIs) used to assess marketing outcomes. In this work we present the Multi-channel Marketing Monitoring Platform (M3P). M3P harnesses the crowds (people) as sources for effective collection of marketing KPIs across all possible channels. We describe M3P's main challenges and characterize marketing KPIs. We then describe the M3P solution, focusing on its KPI extraction process.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122240679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jack Lange, Alexandros Labrinidis, Panos K. Chrysanthis
{"title":"Towards automated personalized data storage","authors":"Jack Lange, Alexandros Labrinidis, Panos K. Chrysanthis","doi":"10.1109/ICDEW.2014.6818341","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818341","url":null,"abstract":"User data is growing at an ever greater pace that threatens to overwhelm our ability to effectively manage it. As the types of data increase, and the storage environments become ever more heterogeneous, even reasoning about basic data management decisions becomes increasingly difficult. This expansion in complexity requires new methodologies for managing data that alleviate as much of the burden as possible from the individual user. Instead of requiring users to understand their full collection of data and the underlying storage architectures, future storage systems need to be able to decide on their own how to manage individual files both in terms of the appropriate storage medium as well as the necessary file operation semantics. In this paper we present a vision for future storage systems that address the dramatic increase in complexity and volume by providing autonomic storage management decisions based on dynamically collected metrics that measure the relationship between individual users and each of their personal files.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130425029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to generate query parameters in RDF benchmarks?","authors":"Andrey Gubichev, Renzo Angles, P. Boncz","doi":"10.1109/ICDEW.2014.6818339","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818339","url":null,"abstract":"In this paper we consider the problem of generating parameters for queries in RDF benchmarks. We show that uniform random sampling of the substitution parameters is not well suited for RDF benchmarks, since it results in unpredictable runtime behavior of queries. We formulate a formal problem of parameter generation to ensure stable and statistically significant benchmark results.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122715593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benchmarking cloud-based tagging services","authors":"T. Malik, K. Chard, Ian T Foster","doi":"10.1109/ICDEW.2014.6818331","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818331","url":null,"abstract":"Tagging services have emerged as a useful and popular way to organize data resources. Despite popular interest, an efficient implementation of tagging services is a challenge since highly dynamic schemas and sparse, heterogeneous attributes must be supported within a shared, openly writable database. NoSQL databases support dynamic schemas and sparse data but lack efficient native support for joins that are inherent to query and search functionality in tagging services. Relational databases provide sufficient support for joins, but offer a multitude of options to manifest dynamic schemas and tune sparse data models, making evaluation of a tagging service time consuming and painful. In this case-study paper, we describe a benchmark for tagging services, and propose benchmarking modules that can be used to evaluate the suitability of a database for workloads generated from tagging services. We have incorporated our modules as part of OLTP-Bench, a cloud-based benchmarking infrastructure, to understand performance characteristics of tagging systems on several relational DBMSs and cloud-based database-as-a-service (DBaaS) offerings.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128601697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Langer, P. Schulze, Stefan George, Matthias Kohnen, Tobias Metzke, Ziawasch Abedjan, G. Kasneci
{"title":"Assigning global relevance scores to DBpedia facts","authors":"Philipp Langer, P. Schulze, Stefan George, Matthias Kohnen, Tobias Metzke, Ziawasch Abedjan, G. Kasneci","doi":"10.1109/ICDEW.2014.6818334","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818334","url":null,"abstract":"Knowledge bases have become ubiquitous assets in today's Web. They provide access to billions of statements about real-world entities derived from governmental, institutional, product-oriented, bibliographic, bio-chemical, and many other domain-oriented and general-purpose datasets. The sheer amount of statements that can be retrieved for a given entity calls for ranking techniques that return the most salient, i.e., globally relevant, statements as top results. In this paper we analyze and compare various strategies for assigning global relevance scores to DBpedia facts with the goal to derive the best one among these strategies. Some of these strategies build on complementary aspects such as frequency and inverse document frequency, yet others combine structural information about the underlying knowledge graph with Web-based co-occurrence statistics for entity pairs. A user evaluation of the discussed approaches has been conducted on the popular DBpedia knowledge base with statistics derived from an indexed version of the ClueWeb09 corpus. The created dataset can be seen as a strong baseline for comparing entity ranking strategies (especially, in terms of global relevance) and can be used as a building block for developing new ranking and mining techniques on linked data.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134098888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aggregates caching for enterprise applications","authors":"Stephan Müller","doi":"10.1109/ICDEW.2014.6818353","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818353","url":null,"abstract":"Modern enterprise applications generate a mixed workload comprised of short-running transactional queries and long-running analytical queries containing expensive aggregations. Based on the fact that columnar in-memory databases are capable of handling these mixed workloads, we evaluate how existing materialized view maintenance strategies can accelerate the execution of aggregate queries. We contribute by introducing a novel materialized view maintenance approach that leverages the main-delta architecture of columnar storage, outperforming existing strategies for a wide range of workloads. As an optimization, we further propose an approach that adapts the aggregate maintenance strategy based upon the currently monitored workload characteristics.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131805061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated operator placement in distributed Data Stream Management Systems subject to user constraints","authors":"C. Thoma, Alexandros Labrinidis, Adam J. Lee","doi":"10.1109/ICDEW.2014.6818346","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818346","url":null,"abstract":"Traditional distributed Data Stream Management Systems assign query operators to sites by optimizing for some criterion such as query throughput, or network delay. The work presented in this paper begins to augment this traditional operator placement technique by allowing the user issuing a continuous query to specify a variety of constraints - including collocation, upstream/downstream, and tag- or attribute-based constraints - controlling operator placement within the query network. Given a set of constraints, operators, and sites; four strategies are presented for optimizing the operator placement. An optimal brute force algorithm is presented first for smaller cases, followed by linear programming, constraint satisfaction, and local search strategies. The four methods are compared for speed, accuracy, and efficiency, with constraint satisfaction performing the best, and allowing assignments to be adapted on the fly by the DDSMS.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127476988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy-preserving reachability query services for sparse graphs","authors":"Peipei Yi, Zhe Fan, Shuxiang Yin","doi":"10.1109/ICDEW.2014.6818298","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818298","url":null,"abstract":"This paper studies privacy-preserving query services for reachability queries under the paradigm of data outsourcing. Specifically, graph data have been outsourced to a third-party service provider (SP), query clients submit their queries to the SP, and the SP returns the query answers. However, SP may not always be trustworthy. Therefore, this paper considers protecting the structural information of the graph data and the query answers from the SP. This paper proposes simple yet optimized privacy-preserving 2-hop labeling. In particular, this paper proposes that the encrypted intermediate results of encrypted query evaluation are indistinguishable. The proposed technique is secure under chosen plaintext attack. We perform an experimental study on the effectiveness of the proposed techniques on both real-world and synthetic datasets.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122434192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On isomorphic matching of large disk-resident graphs using an XQuery engine","authors":"Carlos R. Rivero, H. Jamil","doi":"10.1109/ICDEW.2014.6818296","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818296","url":null,"abstract":"There exists an increasing interest in using graphs to model data, and managing them is a challenging research field. One of the major hurdles in large graph management and processing is our ability to store graphs on disk, and develop techniques that can process the data in their native representation on the disk. Currently, many powerful processing techniques only ensure efficient processing while the graphs reside fully in volatile memory, which limits their applications. In this paper, we present a disk representation of unit graphs, called graphlets, that is amenable to leveraging both XML and relational storage structures, and associated query engines such as XQuery and SQL3. Specifically, we focus on XML and XQuery to implement a graph decomposition-based isomorphic subgraph matching technique, called NetQL, that exploits the graphlet representation. Furthermore, we present a new covering concept, called the minimum hub cover, that allows node-at-a-time processing of arbitrarily large graphs and opens up new opportunities for cost-based graph query optimization. Finally, we discuss some early results to show that such optimizations are feasible and promising by comparing our strategy with GraphQL.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127423475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Orestes: A scalable Database-as-a-Service architecture for low latency","authors":"Felix Gessert, Florian Bücklers, N. Ritter","doi":"10.1109/ICDEW.2014.6818329","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818329","url":null,"abstract":"Today, the applicability of database systems in cloud environments is considerably restricted because of three major problems: I) high network latencies for remote/mobile clients, II) lack of elastic horizontal scalability mechanisms, and III) missing abstraction of storage and data models. In this paper, we propose an architecture, a REST/HTTP protocol and a set of algorithms to solve these problems through a Database-as-a-Service middleware called Orestes (Objects RESTfully Encapsulated in Standard Formats). Orestes exposes cloud-hosted NoSQL database systems through a scalable tier of REST servers. These provide database-independent, object-oriented schema design, a client-independent REST-API for database operations, globally distributed caching, cache consistency mechanisms and optimistic ACID transactions. By comparative evaluations we offer empirical evidence that the proposed Database-as-a-Service architecture indeed solves common latency, scalability and abstraction problems encountered in modern cloud-based applications.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"299 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131511105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}