{"title":"Secure Range Query Processing over Untrustworthy Cloud Services","authors":"T. Tzouramanis","doi":"10.1145/3105831.3105872","DOIUrl":"https://doi.org/10.1145/3105831.3105872","url":null,"abstract":"Database-as-a-service is a relatively new cloud computing service offered on a pay-per-use basis and providing on-demand access to a database. The way data has dramatically increased in volume explains its success, while security and privacy issues arise, leaving enterprises, in particular, exposed to the risk of leakage of the data which they entrust to specialized cloud service providers or to other parties in order to reduce storage and query processing costs. Since traditional encryption does not support the execution of queries on encrypted data, this paper focuses on the problem of secure computation on encrypted data and puts forward a cloud database model that supports secure range query processing and retrieval of multi-dimensional (i.e. multi-attribute) data. It proposes two schemes to resist practical attacks operating on the basis of powerful background knowledge. A performance and efficiency evaluation of these schemes is also carried out to confirm their efficiency and practicability.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121649616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Y. Santos, Carlos A. Costa, João Galvão, Carina Andrade, Bruno Martinho, F. V. Lima, Eduarda Costa
{"title":"Evaluating SQL-on-Hadoop for Big Data Warehousing on Not-So-Good Hardware","authors":"M. Y. Santos, Carlos A. Costa, João Galvão, Carina Andrade, Bruno Martinho, F. V. Lima, Eduarda Costa","doi":"10.1145/3105831.3105842","DOIUrl":"https://doi.org/10.1145/3105831.3105842","url":null,"abstract":"Big Data is currently conceptualized as data whose volume, variety or velocity impose significant difficulties in traditional techniques and technologies. Big Data Warehousing is emerging as a new concept for Big Data analytics. In this context, SQL-on-Hadoop systems increased notoriety, providing Structured Query Language (SQL) interfaces and interactive queries on Hadoop. A benchmark based on a denormalized version of the TPC-H is used to compare the performance of Hive on Tez, Spark, Presto and Drill. Some key contributions of this work include: the direct comparison of a vast set of technologies; unlike previous scientific works, SQL-on-Hadoop systems were connected to Hive tables instead of raw files; allow to understand the behaviour of these systems in scenarios with ever-increasing requirements, but not-so-good hardware. Besides these benchmark results, this paper also makes available interesting findings regarding an architecture and infrastructure in SQL-on-Hadoop for Big Data Warehousing, helping practitioners and fostering future research.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125866341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Dynamic Itemset Mining Algorithms for Multiple Support Thresholds","authors":"Nourhan Abuzayed, B. Ergenç","doi":"10.1145/3105831.3105846","DOIUrl":"https://doi.org/10.1145/3105831.3105846","url":null,"abstract":"Mining1 frequent itemsets is an important part of association rule mining process. Handling dynamic aspect of databases and multiple support threshold requirements of items are two important challenges of frequent itemset mining algorithms. Most of the existing dynamic itemset mining algorithms are devised for single support threshold whereas multiple support threshold algorithms are static. This work focuses on dynamic update problem of frequent itemsets under multiple support thresholds and proposes tree-based Dynamic CFP-Growth++ algorithm. Proposed algorithm is compared to our previous dynamic algorithm Dynamic MIS [50] and a recent static algorithm CFP-Growth++ [2] and, findings are; in dynamic database, 1) both of the dynamic algorithms are better than the static algorithm CFP-Growth++, 2) as memory usage performance; Dynamic CFP-Growth++ performs better than Dynamic MIS, 3) as execution time performance; Dynamic MIS is better than Dynamic CFP-Growth++. In short, Dynamic CFP-Growth++ and Dynamic MIS have a trade-off relationship in terms of memory usage and execution time.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"16 9-12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123622016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective Big Data Visualization","authors":"Murali Mani, Si Fei","doi":"10.1145/3105831.3105857","DOIUrl":"https://doi.org/10.1145/3105831.3105857","url":null,"abstract":"In the last several years, big data analytics has found an increasing role in our everyday lives. Data visualization has long been accepted as an integral part of data analytics. However, data visualization systems are not equipped to handle the complexities typically found in big data. Our work examines effective ways of visualizing big data, while also realizing that most visualization processes are interactive. During an interactive visualization session, an analyst issues several visualization requests, each of which builds on prior visualizations. In our approach, we integrate a distributed data processing system that can effectively process big data with a visualization system that can provide effective interactive visualization but for smaller amounts of data. The analyst's current request is used to infer contextual information about the analyst such as their expertise and tolerance for delay. This information is used to carefully determine additional data that can be sent to the visualization system for decreasing the response time for future requests, thus providing a better experience for the analyst and increasing their productivity.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132198854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Project-management-tools are used in Agile Practice: Benefits, Drawbacks and Potentials","authors":"Florian Raith, Ingo Richter, Robert Lindermeier","doi":"10.1145/3105831.3105865","DOIUrl":"https://doi.org/10.1145/3105831.3105865","url":null,"abstract":"The use of agile methodologies is quite common in distributed software development. To facilitate sharing of project relevant information across distributed agile teams, project-management-tools (e.g. Jira, Youtrack) are commonly used. Literature and practice show drawbacks in teamwork and communication when those mainly browser-based tools are used instead of traditional paper-based media. Improvements in this area are possible if we acquire knowledge of how these tools are used in practice and what problems or challenges need to be addressed in the future. Therefore we conducted an exploratory semi-structured interview study investigating in what manner common project-management-tools are used in the particular phases and meetings of an agile development process. We have interviewed five experienced agile coaches that guided projects with different constellations regarding the number of involved agile teams and their locations. As a first result we summarized and structured our findings according to all steps of an agile process in the form of textual descriptions and diagrams. Thereby we focused on the combined usage of digitial project-management-tools and traditional paper-based media. As a second outcome we listed benefits, drawbacks and potential improvements of digital project-management-tools in agile software development from the interviewees' points of view.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122083846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
André L. C. Mendonça, Felipe T. Brito, L. S. Linhares, Javam C. Machado
{"title":"DiPCoDing: A Differentially Private Approach for Correlated Data with Clustering","authors":"André L. C. Mendonça, Felipe T. Brito, L. S. Linhares, Javam C. Machado","doi":"10.1145/3105831.3105861","DOIUrl":"https://doi.org/10.1145/3105831.3105861","url":null,"abstract":"Differential privacy is a model which gives strong privacy guarantees. It was designed to make difficult to distinguish individuals' records on statistical databases while maximizing data utility. Differential privacy approaches usually assume that database records are sampled independently, i.e., each record of this database is independent of the rest. However, this assumption is not always true in the context of real-world applications. In this paper we propose DiPCoDing, a novel approach to calculate the correlation between records in statistical databases using clusterization. For this matter, we have considered Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Gaussian Mixture Model (GMM). Our method aims to group similar records, which are more likely to be correlated, to reduce the sensitivity of differential privacy and consequently the amount of noise added to the query answer, increasing data utility while providing privacy for correlated data. The experimental results of our approach showed that relative errors and noisy answers are significantly lower than those from existing works.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121005108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing a Deterministic Semantics for P2P Deductive Databases","authors":"Luciano Caroprese, E. Zumpano","doi":"10.1145/3105831.3105837","DOIUrl":"https://doi.org/10.1145/3105831.3105837","url":null,"abstract":"This paper proposes a logic based framework for data integration and query answering for deductive databases in a P2P environment. It is based on a special interpretation of mapping rules that leads to a declarative semantics for P2P systems defined in terms of preferred weak models. Under this semantics, only facts not making the local databases inconsistent can be imported, and the preferred weak models are the consistent scenarios in which peers import, by means of mapping rules, maximal sets of facts not violating (directly or indirectly) integrity constraints. The preferred weak models can be computed by means of a rewriting technique allowing to model a P2P system as a unique logic program whose stable models correspond to its preferred weak models. In the general case a P2P system may admit many preferred weak models and it has been shown that the complexity of their computation is prohibitive. Therefore, the paper looks for a more pragmatic solution assigning to a P2P system a new and more suitable semantics: the Well Founded Model Semantics. It allows to obtain a deterministic model whose computation is polynomial time. This model is a (partial) stable model obtained by evaluating with a three-value semantics the normal version of the rewriting of the P2P system. Finally, a distributed algorithm for the computation of the well founded model is proposed.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124045446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constrained Hierarchical Clustering for News Events","authors":"R. Florence, Bruno M. Nogueira, R. Marcacini","doi":"10.1145/3105831.3105859","DOIUrl":"https://doi.org/10.1145/3105831.3105859","url":null,"abstract":"Knowledge discovery from web news events has received great attention in recent years. In practice, this knowledge is a digital representation (virtual world) of various phenomena that occur in our physical world. Hierarchical clustering algorithms are used to organize related events into groups and subgroups according to some similarity measure. The main motivation for this organization is based on the hypothesis that if the user is interested in a specific event of a certain cluster, then the user may also be interested in other related events of this same cluster. However, existing event clustering methods do not effectively use the different types of information about events, such as temporal information, geographical data, name of people and organizations. In this paper, we propose the COH-KMeans algorithm (Constrained Hierarchical K-Means) that obtains a hierarchical clustering structure considering certain conditions imposed by the users, for example, events of similar content that occurred in nearby geographic locations or that occurred within a predefined time window. A statistical analysis of the experimental results reveals that the incorporation of constraints performed by COH-KMeans allows to obtain higher quality clusters when compared to a state-of-the-art unsupervised hierarchical clustering method. Moreover, we present our tool for exploratory analysis of events and we discuss how event clustering can be used to support the decision-making process from the perspective of a Data Analytics System.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"514 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132622036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomás Faltín, Michal Hanzeli, Vojtech Sípek, Jan Skvaril, Dusan Varis, Irena Holubová Mlýnková
{"title":"BDgen: A Universal Big Data Generator","authors":"Tomás Faltín, Michal Hanzeli, Vojtech Sípek, Jan Skvaril, Dusan Varis, Irena Holubová Mlýnková","doi":"10.1145/3105831.3105847","DOIUrl":"https://doi.org/10.1145/3105831.3105847","url":null,"abstract":"This paper introduces BDgen, a generator of Big Data targeting various types of users, implemented as a general and easily extensible framework. It is divided into a scalable backend designed to generate Big Data on clusters and a frontend for user-friendly definition of the structure of the required data, or its automatic inference from a sample data set. In the first release we have implemented generators of two commonly used formats (JSON and CSV) and the support for general grammars. We have also performed preliminary experimental comparisons confirming the advantages and competitiveness of the solution.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"119 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130227462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using a Model-driven Approach in Building a Provenance Framework for Tracking Policy-making Processes in Smart Cities","authors":"Barkha Javed, Z. Khan, R. McClatchey","doi":"10.1145/3105831.3105849","DOIUrl":"https://doi.org/10.1145/3105831.3105849","url":null,"abstract":"The significance of provenance in various settings has emphasised its potential in the policy-making process for analytics in Smart Cities. At present, there exists no framework that can capture the provenance in a policy-making setting. This research therefore aims at defining a novel framework, namely, the Policy Cycle Provenance (PCP) Framework, to capture the provenance of the policymaking process. However, it is not straightforward to design the provenance framework due to a number of associated policy design challenges. The design challenges revealed the need for an adaptive system for tracking policies therefore a model-driven approach has been considered in designing the PCP framework. Also, suitability of a networking approach is proposed for designing workflows for tracking the policy-making process.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121329776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}