{"title":"A Zone-Based Data Lake Architecture for IoT, Small and Big Data","authors":"Yan Zhao, I. Megdiche, F. Ravat, Vincent-nam Dang","doi":"10.1145/3472163.3472185","DOIUrl":"https://doi.org/10.1145/3472163.3472185","url":null,"abstract":"Data lakes are supposed to enable analysts to perform more efficient and efficacious data analysis by crossing multiple existing data sources, processes and analyses. However, it is impossible to achieve that when a data lake does not have a metadata governance system that progressively capitalizes on all the performed analysis experiments. The objective of this paper is to have an easily accessible, reusable data lake that capitalizes on all user experiences. To meet this need, we propose an analysis-oriented metadata model for data lakes. This model includes the descriptive information of datasets and their attributes, as well as all metadata related to the machine learning analyzes performed on these datasets. To illustrate our metadata solution, we implemented a web application of data lake metadata management. This application allows users to find and use existing data, processes and analyses by searching relevant metadata stored in a NoSQL data store within the data lake. To demonstrate how to easily discover metadata with the application, we present two use cases, with real data, including datasets similarity detection and machine learning guidance.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130288822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploratory analysis of methods for automated classification of clinical diagnoses in Veterinary Medicine","authors":"Oscar Tamburis, E. Masciari, G. Fatone","doi":"10.1145/3472163.3472165","DOIUrl":"https://doi.org/10.1145/3472163.3472165","url":null,"abstract":"The present work describes the analysis conducted on the diagnoses made during the general physical examinations in the decade 2010–2020, starting from the DB of the EMR previously implemented in the University Veterinary Teaching Hospital at Federico II University of Naples. A decision tree algorithm was implemented to work out a predictive model for an effective recognition of neoplastic diseases and zoonoses for cats and dogs from Campania Region. The results achievable by data mining techniques for what concerns computer aided disease diagnosis and exploration of risk factors and their relations to diseases, show the increasing importance of Veterinary Informatics within the wider field of Biomedical and Health Informatics, and in particular its capacity to point out the existing connections between humans, animals, and surrounding environment, according to the One (Digital) Health perspective specifics.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127209622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bringing Common Subexpression Problem from the Dark to Light: Towards Large-Scale Workload Optimizations","authors":"Mohamed Kechar, Ladjel Bellatreche, S. N. Bahloul","doi":"10.1145/3472163.3472180","DOIUrl":"https://doi.org/10.1145/3472163.3472180","url":null,"abstract":"Nowadays large-scale data-centric systems have become an essential element for companies to store, manipulate and derive value from large volumes of data. Capturing this value depends on the ability of these systems in managing large-scale workloads including complex analytical queries. One of the main characteristics of these queries is that they share computations in terms of selections and joins. Materialized views (MV) have shown their force in speeding up queries by exploiting these redundant computations. MV selection problem (VSP) is one of the most studied problems in the database field. A large majority of the existing solutions follow workload-driven approaches since they facilitate the identification of shared computations. Interesting algorithms have been proposed and implemented in commercial DBMSs. But they fail in managing large-scale workloads. In this paper, we presented a comprehensive framework to select the most beneficial materialized views based on the detection of the common subexpressions shared between queries. This framework gives the right place of the problem of selection of common subexpressions representing the causes of the redundancy. The utility of final MV depends strongly on the selected subexpressions. Once selected, a heuristic is given to select the most beneficial materialized views by considering different query ordering. Finally, experiments have been conducted to evaluate the effectiveness and efficiency of our proposal by considering large workloads.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125140542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing Transaction Schedules on Universal Quantum Computers via Code Generation for Grover’s Search Algorithm","authors":"Sven Groppe, Jinghua Groppe","doi":"10.1145/3472163.3472164","DOIUrl":"https://doi.org/10.1145/3472163.3472164","url":null,"abstract":"Quantum computers are known to be efficient for solving combinatorial problems like finding optimal schedules for processing transactions in parallel without blocking. We show how Grover’s search algorithm for quantum computers can be applied for finding an optimal transaction schedule via generating code from the problem instance. We compare our approach with existing approaches for traditional computers and quantum annealers in terms of preprocessing, runtime, space and code length complexity. Furthermore, we show by experiments the expected number of optimal solutions of this problem as well as suboptimal ones. With the help of an estimator of the number of solutions, we further speed up our optimizer for optimal and suboptimal transaction schedules.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126263193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for Enhancing Deep Learning Based Recommender Systems with Knowledge Graphs","authors":"S. Mudur, Serguei A. Mokhov, Yuhao Mao","doi":"10.1145/3472163.3472183","DOIUrl":"https://doi.org/10.1145/3472163.3472183","url":null,"abstract":"Recommendation methods fall into three major categories, content based filtering, collaborative filtering and deep learning based. Information about products and the preferences of earlier users are used in an unsupervised manner to create models which help make personalized recommendations to a specific new user. The more information we provide to these methods, the more likely it is that they yield better recommendations. Deep learning based methods are relatively recent, and are generally more robust to noise and missing information. This is because deep learning models can be trained even when some of the information records have partial information. Knowledge graphs represent the current trend in recording information in the form of relations between entities, and can provide any available information about products and users. This information is used to train the recommendation model. In this work, we present a new generic recommender systems framework, that integrates knowledge graphs into the recommendation pipeline. We describe its design and implementation, and then show through experiments, how such a framework can be specialized, taking the domain of movies as an example, and the resulting improvements in recommendations made possible by using all the information obtained using knowledge graphs. Our framework, to be made publicly available, supports different knowledge graph representation formats, and facilitates format conversion, merging and information extraction needed for training recommendation models.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124152186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Load Balanced Semantic Aware Distributed RDF Graph","authors":"Ami Pandat, Nidhi Gupta, Minal Bhise","doi":"10.1145/3472163.347216","DOIUrl":"https://doi.org/10.1145/3472163.347216","url":null,"abstract":"Modern day application development requires efficient management of huge RDF data. The major approaches for RDF data management are Relational and Graph based techniques. As the relational approach suffers from query joins, we propose a semantic aware graph based partitioning method. The partitioned fragments are further allocated in a load balanced way. For efficient query processing, partial replication is implemented. It reduces Inter node Communication thereby accelerating queries on distributed RDF Graph. This approach has been demonstrated in two phases partitioning and Distribution of Linked Observation Data (LOD). The time complexity for partitioning and distribution of Load Balanced Semantic Aware RDF Graph (LBSD) is O(n) where n is the number of triples which is demonstrated by linear increment in algorithm execution time (AET) for LOD data scaled from 1x to 5x. LBSD has been found to behave well till 4x. LBSD is compared with the state of the art relational and graph-based partitioning techniques. LBSD records 71% QET gain when averaged over all the four query types. For most frequent query types, Linear and Star, on an average 65% QET gain is recorded over original configuration for scaling experiments. The optimal replication level has been found to be 12% of original data.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114330406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Automatic Schema-Instance Approach for Merging Multidimensional Data Warehouses","authors":"Yuzhao Yang, J. Darmont, F. Ravat, O. Teste","doi":"10.1145/3472163.3472268","DOIUrl":"https://doi.org/10.1145/3472163.3472268","url":null,"abstract":"Using data warehouses to analyse multidimensional data is a significant task in company decision-making. The need for analyzing data stored in different data warehouses generates the requirement of merging them into one integrated data warehouse. The data warehouse merging process is composed of two steps: matching multidimensional components and then merging them. Current approaches do not take all the particularities of multidimensional data warehouses into account, e.g., only merging schemata, but not instances; or not exploiting hierarchies nor fact tables. Thus, in this paper, we propose an automatic merging approach for star schema-modeled data warehouses that works at both the schema and instance levels. We also provide algorithms for merging hierarchies, dimensions and facts. Eventually, we implement our merging algorithms and validate them with the use of both synthetic and benchmark datasets.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130373858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing a Business View of Enterprise Data: An approach based on a Decentralised Enterprise Knowledge Graph","authors":"Bastien Vidé, J. Marty, F. Ravat, Max Chevalier","doi":"10.1145/3472163.3472276","DOIUrl":"https://doi.org/10.1145/3472163.3472276","url":null,"abstract":"Nowadays, companies manage a large volume of data usually organised in ”silos”. Each ”data silo” contains data related to a specific Business Unit, or a project. This scattering of data does not facilitate decision-making requiring the use and cross-checking of data coming from different silos. So, a challenge remains: the construction of a Business View of all data in a company. In this paper, we introduce the concepts of Enterprise Knowledge Graph (EKG) and Decentralised EKG (DEKG). Our DEKG aims at generating a Business View corresponding to a synthetic view of data sources. We first define and model a DEKG with an original process to generate a Business View before presenting the possible implementation of a DEKG.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127075432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IDEAS: the first quarter century","authors":"B. Desai","doi":"10.1145/3472163.3472168","DOIUrl":"https://doi.org/10.1145/3472163.3472168","url":null,"abstract":"This year marks the silver anniversary of IDEAS. It has been an exciting quarter century to shepherd this meeting through good times and not so good ones. We have survived Ebola, MERS and SARS. Whereas the others were local, the COVID pandemic, which still rages, has forced us to move to an on-line version, but thanks to the participants and the dedicated program committee we have continued. This paper is a photographic journey through the years of IDEAS. Unfortunately we have not been able to have the images of all participants over the quarter century oi IDEAS. This is just a sampling of some of the fond moments during the social gatherings of the IDEAS family.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126079143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An In-Browser Collaborative System for Functional Annotations","authors":"Yui Saeki, Motomichi Toyama","doi":"10.1145/3472163.3472275","DOIUrl":"https://doi.org/10.1145/3472163.3472275","url":null,"abstract":"In this work, we used the Web IndeX system, which converts words into hyperlinks in arbitrary Web pages, to implement a system for sharing annotations registered on keywords within a limited group. This system allows group members to view all the written annotations by simply mousing over the keyword when it appears on a web page, facilitating sharing information and awareness in collaborative research and work. In this study, we define our system as a sticky note type annotation sharing system. In contrast to the sticky note type, our system is positioned as a functional type, but it may display annotations that are not necessary since it allows browsing on any page. To improve this point, we propose a usability judgment method that shows only the most useful posts.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125144334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}