{"title":"A Network-based Approach to Capture Provenance of a Policy-making Process","authors":"Barkha Javed, Z. Khan, R. McClatchey","doi":"10.1145/3105831.3105850","DOIUrl":"https://doi.org/10.1145/3105831.3105850","url":null,"abstract":"A policy-making process entails large sources of data for its creation, the tracking of which can provide a significant insight into a process and data that was being employed for its creation. However, a process employed for creating policies varies with each policy case due to different nature and requirements of policies. Therefore, a flexible approach is required to capture process details of different policies. Thus a novel autonomous provenance capturing technique has been introduced that is inspired from IP packet switching networking concepts. This novel technique is used with a model-driven approach to enable flexibility in a system. It is expected that this new technique will be able to systematically capture provenance in a dynamic sociotechnical environments.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"293 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123181658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Density-Based Blocking for Record Matching","authors":"Chenxiao Dou, Ruoyu Wang, Daniel W. Sun, M. Atif","doi":"10.1145/3105831.3105844","DOIUrl":"https://doi.org/10.1145/3105831.3105844","url":null,"abstract":"Record Matching in data engineering refers to searching for data records originating from the same entities across different data sources. In practice, the main challenge of record matching is that the amount of non-matches typically far exceeds the amount of matches. This is called imbalance problem, which notoriously affects efficiency and effectiveness of matching algorithms. To solve the imbalance problem, recently, density-based blocking algorithms have been studied and demonstrated an effective blocking performance. However, the efficiency of density-based blocking approaches is not good as their effectiveness. In this paper, we improve the efficiency of density-based blocking by exploiting the idea of pre-computing and pruning. Our approach optimizes the method of computing density to speed up the blocking process. Throughout experiments on real-world datasets, the proposed approach demonstrated a high performance on both blocking efficiency and blocking effectiveness.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134142683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cloud Service Composition Modeling Using Bigraphical Reactive Systems","authors":"Oussama Kamel, A. Chaoui, Mohamed Gharzouli","doi":"10.1145/3105831.3105851","DOIUrl":"https://doi.org/10.1145/3105831.3105851","url":null,"abstract":"In the last decade, cloud computing has emerged as one of the most popular computing models. This model delivers a pool of computing resources as on-demand services to different categories of users. As the number of cloud services available on the Internet is increasing and owing to the users' complicated requirement, the composition of cloud services has become more and more challenging. In this work, we are interested in the vertical composition that represents the collaboration of services from different cloud layers to offer a complete solution to end-users. This paper proposes the Bigraphical Reactive Systems (BRS) to model the service composition. We use bigraphs to describe the different actors and services evolving in the composition. In addition, we define a set of bigraphical reactive rules to express the dynamic behaviors of these services and actors and to show the direct and indirect dependencies between them.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132753432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DWAHP: Workload Aware Hybrid Partitioning and Distribution of RDF Data","authors":"Trupti Padiya, Minal Bhise","doi":"10.1145/3105831.3105864","DOIUrl":"https://doi.org/10.1145/3105831.3105864","url":null,"abstract":"Proliferation of RDF data has reached to a peak where data is partitioned across multiple nodes. Significant contribution for developing solutions to manage RDF data in distributed environment is witnessed in recent years. We propose a workload aware hybrid partitioning approach for a distributed environment. The objective of our approach is reducing query joins and inter-node communication leading it to faster query execution for frequent queries. Our approach considers a query workload and partitions data based on workload information. It distributes data by exploiting underlying structural relationship between properties using a property reachability matrix to optimize query performance. DWAHP gets rid of inter-node communication cost for frequent queries like linear and star queries and answers 83% of frequent query workload without inter-node communication. DWAHP is compared with state-of-the-art solutions in terms of query execution time, query cost, storage space, and inter-node communication. It has demonstrated significant improvement over state-of-the-art solution.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"151 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131060669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Separation of Duties for Multiple Relations in Cloud Databases as an Optimization Problem","authors":"Ferdinand Bollwein, L. Wiese","doi":"10.1145/3105831.3105873","DOIUrl":"https://doi.org/10.1145/3105831.3105873","url":null,"abstract":"Confidentiality concerns are important in the context of cloud databases. In this paper, the technique of vertical fragmentation is explored to break sensitive associations between columns of several database tables according to confidentiality constraints. By storing insensitive portions of the database at different non-communicating servers it is possible to overcome confidentiality concerns. In addition, visibility constraints and data dependencies are supported. Moreover, to provide some control over the distribution of columns among different servers, novel closeness constraints are introduced. Finding confidentiality-preserving fragmentations is studied in the context of mathematical optimization and a corresponding integer linear program formulation is presented. Benchmarks were performed to evaluate the suitability of our approach.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134235316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serguei A. Mokhov, Miao Song, Amandeep Kaur, Mehak Talwar, Keerthana Gudavalli, S. Mudur
{"title":"Managing Data and Artifacts between Software Engineers and Artists: an ISSv2 Case Study","authors":"Serguei A. Mokhov, Miao Song, Amandeep Kaur, Mehak Talwar, Keerthana Gudavalli, S. Mudur","doi":"10.1145/3105831.3105862","DOIUrl":"https://doi.org/10.1145/3105831.3105862","url":null,"abstract":"We describe our experience of managing different types of artifacts between multidisciplinary teams of computer scientists and software engineers with computation and design artists while designing, developing, and deploying Illimitable Space System v2 (ISSv2) in real production environments. The artifacts include design documentation, source code, hardware and production equipment inventory, stage data, SCM and issue tracking data, git repository, and multimedia assets. These types of projects are challenging due to the nature of interdisciplinary teams and their working habits. We show how we manage the data and the teams to have enable successful public productions.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134537405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Deletion Strategies for the BoND-Tree in Multidimensional Non-ordered Discrete Data Spaces","authors":"R. Cherniak, Qiang Zhu, Yarong Gu, S. Pramanik","doi":"10.1145/3105831.3105840","DOIUrl":"https://doi.org/10.1145/3105831.3105840","url":null,"abstract":"Box queries on a dataset in a multidimensional data space are a type of query which specifies a set of allowed values for each dimension. Indexing a dataset in a multidimensional Non-ordered Discrete Data Space (NDDS) for supporting efficient box queries is becoming increasingly important in many application domains such as genome sequence analysis. The BoND-tree was recently introduced as an index structure specifically designed for box queries in an NDDS. Earlier work focused on developing strategies for building an effective BoND-tree to achieve high query performance. Developing efficient and effective techniques for deleting indexed vectors from the BoND-tree remains an open issue. In this paper, we present three deletion algorithms based on different underflow handling strategies in an NDDS. Our study shows that incorporating a new BoND-tree inspired heuristic can provide improved performance compared to the traditional underflow handling heuristics in NDDSs.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134575296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DyadChurn: Customer Churn Prediction using Strong Social Ties","authors":"Marwa N. Abd-Allah, S. El-Beltagy, A. Salah","doi":"10.1145/3105831.3105832","DOIUrl":"https://doi.org/10.1145/3105831.3105832","url":null,"abstract":"The increase in mobile phone subscriptions in recent years, has led to near market saturation in the telecom industry. As a result, it has become harder for telecom providers to acquire new customers, and the need for retaining existing ones has become of paramount importance. Because of fierce competition between different telecom providers and because the ease of which customers can move from one provider to another, all telecom service providers suffer from customer churn. In this paper, we propose a dyadic based churn prediction model, DyadChurn, where customer churn is modeled through social influence that propagates in the telecom network over strong social ties. We propose a novel method for evaluating social tie strength between telecom customers. We then, incorporate strong social ties in an influence propagation model to predict the set of future potential churners. The evaluation of the proposed dyadic based churn prediction model has been done using a real dataset, from one of the largest telecom companies in Egypt. The experimental results showed that the \"length of calls\" between customers is the most effective attribute in predicting social influence that result in churning. The results also showed that strong social ties (as opposed to weak ties) were the most effective ties in determining churn. Using strong social ties only enhanced the prediction accuracy (in terms of the lift curve) by more than 20%, when compared to a diffusion model.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130785883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The SusCity Big Data Warehousing Approach for Smart Cities","authors":"Carlos A. Costa, M. Y. Santos","doi":"10.1145/3105831.3105841","DOIUrl":"https://doi.org/10.1145/3105831.3105841","url":null,"abstract":"Nowadays, the concept of Smart City provides a rich analytical context, highlighting the need to store and process vast amounts of heterogeneous data flowing at different velocities. This data is defined as Big Data, which imposes significant difficulties in traditional data techniques and technologies. Data Warehouses (DWs) have long been recognized as a fundamental enterprise asset, providing fact-based decision support for several organizations. The concept of DW is evolving. Traditionally, Relational Database Management Systems (RDBMSs) are used to store historical data, providing different analytical perspectives regarding several business processes. With the current advancements in Big Data techniques and technologies, the concept of Big Data Warehouse (BDW) emerges to surpass several limitations of traditional DWs. This paper presents a novel approach for designing and implementing BDWs, which has been supporting the SusCity data visualization platform. The BDW is a crucial component of the SusCity research project in the context of Smart Cities, supporting analytical tasks based on data collected in the city of Lisbon.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126090995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 21st International Database Engineering & Applications Symposium","authors":"","doi":"10.1145/3105831","DOIUrl":"https://doi.org/10.1145/3105831","url":null,"abstract":"","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127115679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}