V. Theodorou, A. Abelló, Maik Thiele, Wolfgang Lehner
{"title":"A Framework for User-Centered Declarative ETL","authors":"V. Theodorou, A. Abelló, Maik Thiele, Wolfgang Lehner","doi":"10.1145/2666158.2666178","DOIUrl":"https://doi.org/10.1145/2666158.2666178","url":null,"abstract":"As business requirements evolve with increasing information density and velocity, there is a growing need for efficiency and automation of Extract-Transform-Load (ETL) processes. Current approaches for the modeling and optimization of ETL processes provide platform-independent optimization solutions for the (semi-)automated transition among different abstraction levels, focusing on cost and performance. However, the suggested representations are not abstract enough to communicate business requirements and the role of the process quality in a user-centered perspective has not yet been adequately examined. In this paper, we introduce a novel methodology for the end-to-end design of ETL processes that takes under consideration both functional and non-functional requirements. Based on existing work, we raise the level of abstraction for the conceptual representation of ETL operations and we show how process quality characteristics can generate specific patterns on the process design.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123809203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Golfarelli, M. Mantovani, Federico Ravaldi, S. Rizzi
{"title":"From Business Intelligence to Location Intelligence with the Lily Library","authors":"M. Golfarelli, M. Mantovani, Federico Ravaldi, S. Rizzi","doi":"10.1145/2666158.2666176","DOIUrl":"https://doi.org/10.1145/2666158.2666176","url":null,"abstract":"Location intelligence is a set of tools and techniques to integrate spatial features into BI platforms, aimed at better monitoring and interpreting business events related to the territory. In this demonstration we present Lily, a geo-enhanced library that relies on a spatial data warehouse to add real location intelligence capabilities to existing BI platforms. Lily provides end-users with a highly-interactive interface that seamlessly achieves a bidirectional integration between the BI and the geospatial worlds, so as to enable advanced analytical, prediction, and simulation features taking into account the spatial dimension. In particular we focus on the impact of Lily on the user experience with reference to three case studies in the domain of healthcare, telco, and school services respectively.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126400749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization of Data-intensive Flows: Is it Needed? Is it Solved?","authors":"Georgia Kougka, A. Gounaris","doi":"10.1145/2666158.2666174","DOIUrl":"https://doi.org/10.1145/2666158.2666174","url":null,"abstract":"Modern data analysis is increasingly employing data-intensive flows for processing very large volumes of data. As the data flows become more and more complex and operate in a highly dynamic environment, we argue that we need to resort to automated cost-based optimization solutions rather than relying on efficient designs by human experts. We further demonstrate that the current state-of-the-art in flow optimizations needs to be extended and we propose a promising direction for optimizing flows at the logical level, and more specifically, for deciding the sequence of flow tasks.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122853797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"fVSS: A New Secure and Cost-Efficient Scheme for Cloud Data Warehouses","authors":"Varunya Attasena, Nouria Harbi, J. Darmont","doi":"10.1145/2666158.2666173","DOIUrl":"https://doi.org/10.1145/2666158.2666173","url":null,"abstract":"Cloud business intelligence is an increasingly popular choice to deliver decision support capabilities via elastic, pay-per-use resources. However, data security issues are one of the top concerns when dealing with sensitive data. In this paper, we propose a novel approach for securing cloud data warehouses by flexible verifiable secret sharing, fVSS. Secret sharing encrypts and distributes data over several cloud service providers, thus enforcing data privacy and availability. fVSS addresses four shortcomings in existing secret sharing-based approaches. First, it allows refreshing the data warehouse when some service providers fail. Second, it allows on-line analysis processing. Third, it enforces data integrity with the help of both inner and outer signatures. Fourth, it helps users control the cost of cloud warehousing by balancing the load among service providers with respect to their pricing policies. To illustrate fVSS' efficiency, we thoroughly compare it with existing secret sharing-based approaches with respect to security features, querying power and data storage and computing costs.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"64 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132574953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julien Aligon, Kamal Boulil, Patrick Marcel, Verónika Peralta
{"title":"A Holistic Approach to OLAP Sessions Composition: The Falseto Experience","authors":"Julien Aligon, Kamal Boulil, Patrick Marcel, Verónika Peralta","doi":"10.1145/2666158.2666179","DOIUrl":"https://doi.org/10.1145/2666158.2666179","url":null,"abstract":"OLAP is the main paradigm for flexible and effective exploration of multidimensional cubes in data warehouses. During an OLAP session the user analyzes the results of a query and determines a new query that will give her a better understanding of information. Given the huge size of the data space, this exploration process is often tedious and may leave the user disoriented and frustrated. This paper presents an OLAP tool named Falseto (Former AnalyticaL Sessions for lEss Tedious Olap), that is meant to assist query and session composition, by letting the user summarize, browse, query, and reuse former analytical sessions. Falseto's implementation on top of a formal framework is detailed. We also report the experiments we run to obtain and analyze real OLAP sessions and assess Falseto with them. Finally, we discuss how Falseto can be seen as a starting point for bridging OLAP with exploratory search, a search paradigm centered on the user and the evolution of her knowledge.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133740131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Querying Big, Dynamic, Distributed Data","authors":"M. Garofalakis","doi":"10.1145/2666158.2666184","DOIUrl":"https://doi.org/10.1145/2666158.2666184","url":null,"abstract":"Effective Big Data analytics pose several difficult challenges for modern data management architectures. One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous sites needs to be continuously collected and analyzed for interesting trends. In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying network infrastructure. In this talk, we introduce the distributed data streaming model, and discuss recent work on tracking complex queries over massive distributed streams, as well as new research directions in this space.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129341566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renato Fileto, Alessandra Raffaetà, A. Roncato, Juarez A. P. Sacenti, C. May, Douglas Klein
{"title":"A Semantic Model for Movement Data Warehouses","authors":"Renato Fileto, Alessandra Raffaetà, A. Roncato, Juarez A. P. Sacenti, C. May, Douglas Klein","doi":"10.1145/2666158.2666180","DOIUrl":"https://doi.org/10.1145/2666158.2666180","url":null,"abstract":"Despite recent progresses in methods for processing data about the movement of objects in the geographic space, some fundamental issues remain unresolved. One of them is how to describe movement segments (e.g., semantic trajectories, episodes like stops and moves) and diverse movement patterns (e.g., moving clusters, hotel-restaurant-shop-hotel), with formal semantic descriptions. Another issue is how to arrange descriptive data and measures in a Movement Data Warehouse (MDW) for powerful information analyses and reasonable performance. This paper introduces general definitions for movement segments, movement patterns, their categories and hierarchies. The proposed constructs are semantically enriched with references to concepts (categories) and/or instances of these concepts (objects) arranged in distinct hierarchies. Based on these constructs, we propose a semantic multidimensional model for MDW. A case study illustrates the expressiveness of the proposal for analyzing movement data collected via social media and semantically enriched with Linked Open Data (LOD).","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"74 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124882443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jovan Varga, Oscar Romero, T. Pedersen, Christian Thomsen
{"title":"SM4AM: A Semantic Metamodel for Analytical Metadata","authors":"Jovan Varga, Oscar Romero, T. Pedersen, Christian Thomsen","doi":"10.1145/2666158.2666182","DOIUrl":"https://doi.org/10.1145/2666158.2666182","url":null,"abstract":"Next generation BI systems emerge as platforms where traditional BI tools meet semi-structured and unstructured data coming from the Web. In these settings, the user-centric orientation represents a key characteristic for the acceptance and wide usage by numerous and diverse end users in their data analysis tasks. System and user related metadata are the base for enabling user assistance features. However, current approaches typically store these metadata in ad-hoc manners. In this paper, we propose a generic and extensible approach for the definition and modeling of the relevant metadata artifacts. We present SM4AM, a Semantic Metamodel for Analytical Metadata created as an RDF formalization of the Analytical Metadata artifacts needed for user assistance exploitation purposes in next generation BI systems. We consider the Linked Data initiative and its relevance for user assistance functionalities. We discuss the metamodel benefits and present directions for future work.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130168639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What can Emerging Hardware do for your DBMS Buffer?","authors":"Salmi Cheikh, Abdelhakim Nacef, Ladjel Bellatreche, Jalil Boukhobza","doi":"10.1145/2666158.2666181","DOIUrl":"https://doi.org/10.1145/2666158.2666181","url":null,"abstract":"The spectacular development of business intelligence applications (BIA), built around the data warehousing technology, increases the demand on query performance of DBMS hosting with its extremely high amount of data. In such a context a high interaction among queries exists since they share a large number of intermediate results. This is due to the fact that BIA use relational schemes such as a star schema in which each join passes through the fact table. The decision to cache these intermediate results in the traditional buffer becomes a critical issue since it depends on the size of the buffer and the number of intermediate results candidate for caching. As flash memory is more and more adopted in mass storage systems, we rely on it to buffer some intermediate results. In this paper, we first propose to couple the RAM and Solid State Drive, to respond to the problem combining buffer management and query scheduling sub problems. Secondly, a cost model for evaluating the quality of buffering data and scheduling queries is given. Based on this cost model, an algorithm is given to solve our joint problem. Simulations show that our proposal enhances the performance of SQL queries up to 86%.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124507056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recursive Query Evaluation in a Column DBMS to Analyze Large Graphs","authors":"C. Ordonez, Achyuth Gurram, N. Rai","doi":"10.1145/2666158.2666177","DOIUrl":"https://doi.org/10.1145/2666158.2666177","url":null,"abstract":"Graphs represent a major challenge on big data analytics, for which there are many systems and prototypes, most of them not based on relational database management systems (DBMSs). Graph problems require substantially different algorithms compared to other analytical techniques (i.e., cubes, statistical models, machine learning) and they are especially important in the analysis of social networks and the Internet. On the other hand, recursive queries are a fundamental query mechanism to analyze graphs in a DBMS, but they can be slow with large graphs. Column DBMSs are a novel kind of faster database systems, but with significantly different storage and retrieval mechanisms compared to traditional row DBMSs. Thus we study the pros and cons of optimizing recursive queries on a column DBMS. Specifically, we study two inter-related graph problems: transitive closure and adjacency matrix multiplication, together with their respective optimization of queries combining recursive joins and recursive aggregations. An experimental evaluation with large graphs compares query optimization in a column DBMS and a row DBMS. We analyze performance tradeoffs with graphs having significantly different size, shape and connectivity. Our benchmark results prove column DBMSs are much faster than row DBMSs to analyze graphs, especially as graphs get larger and denser.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129525782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}