Firas Zouari, N. Kabachi, Khouloud Boukadi, C. Ghedira
{"title":"Data Management in the Data Lake: A Systematic Mapping","authors":"Firas Zouari, N. Kabachi, Khouloud Boukadi, C. Ghedira","doi":"10.1145/3472163.3472173","DOIUrl":"https://doi.org/10.1145/3472163.3472173","url":null,"abstract":"The computer science community is paying more and more attention to data due to its crucial role in performing analysis and prediction. Researchers have proposed many data containers such as files, databases, data warehouses, cloud systems, and recently data lakes in the last decade. The latter enables holding data in its native format, making it suitable for performing massive data prediction, particularly for real-time application development. Although data lake is well adopted in the computer science industry, its acceptance by the research community is still in its infancy stage. This paper sheds light on existing works for performing analysis and predictions on data placed in data lakes. Our study reveals the necessary data management steps, which need to be followed in a decision process, and the requirements to be respected, namely curation, quality evaluation, privacy-preservation, and prediction. This study aims to categorize and analyze proposals related to each step mentioned above.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131196719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Mufida, Abdessamad Ait El Cadi, T. Delot, M. Trépanier
{"title":"Towards a continuous forecasting mechanism of parking occupancy in urban environments","authors":"M. Mufida, Abdessamad Ait El Cadi, T. Delot, M. Trépanier","doi":"10.1145/3472163.3472265","DOIUrl":"https://doi.org/10.1145/3472163.3472265","url":null,"abstract":"Searching for an available parking space is a stressful and time-consuming task, which leads to increasing traffic and environmental pollution due to the emission of gases. To solve these issues, various solutions relying on information technologies (e.g., wireless networks, sensors, etc.) have been deployed over the last years to help drivers identify available parking spaces. Several recent works have also considered the use of historical data about parking availability and applied learning techniques (e.g., machine learning, deep learning) to estimate the occupancy rates in the near future. In this paper, we not only focus on training forecasting models for different types of parking lots to provide the best accuracy, but also consider the deployment of such a service in real conditions, to solve actual parking occupancy problems. It is therefore needed to continuously provide accurate information to the drivers but also to handle the frequent updates of parking occupancy data. The underlying challenges addressed in the present work so concern (1) the self-tuning of the forecasting model hyper-parameters according to the characteristics of the considered parking lots and (2) the need to maintain the performance of the forecasting model over time. To demonstrate the effectiveness of our approach, we present in the paper several evaluations using real data provided for different parking lots by the city of Lille in France. The results of these evaluations highlight the accuracy of the forecasts and the ability of our solution to maintain model performance over time.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"32 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114059730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SSstory: 3D data storytelling based on SuperSQL and Unity","authors":"Jingrui Li, Kento Goto, Motomichi Toyama","doi":"10.1145/3472163.3472277","DOIUrl":"https://doi.org/10.1145/3472163.3472277","url":null,"abstract":"SuperSQL is an extended SQL language, which brings out a rich layout presentation of a relational database with a particular query. This paper proposes SSstory, a storytelling system in a 3D data space created by a relational database. SSstory uses SuperSQL and Unity to generate a data video and add cinematic directions to the data video. Without learning special authoring tooling, users can easily create data videos with a small quantity of code.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132314194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Model Data Modeling and Representation: State of the Art and Research Challenges","authors":"I. Holubová, Pavel Contos, M. Svoboda","doi":"10.1145/3472163.3472267","DOIUrl":"https://doi.org/10.1145/3472163.3472267","url":null,"abstract":"Following the current trend, most of the well-known database systems, being relational, NoSQL, or NewSQL, denote themselves as multi-model. This industry-driven approach, however, lacks plenty of important features of the traditional DBMSs. The primary problem is a design of an optimal multi-model schema and its sufficiently general and efficient representation. In this paper, we provide an overview and discussion of the promising approaches that could potentially be capable of solving these issues, along with a summary of the remaining open problems.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131038561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software Quality Assessment of a Web Application for Biomedical Data Analysis","authors":"L. Wiese, Ingmar Wiese, Kristina Lietz","doi":"10.1145/3472163.3472172","DOIUrl":"https://doi.org/10.1145/3472163.3472172","url":null,"abstract":"Data Science as a multidisciplinary discipline has seen a massive transformation in the direction of operationalisation of analysis workflows. Yet it can be observed that such a workflow consists of potentially many diverse components: like modules in different programming languages, database backends, or web frontends. In order to achieve high efficiency and reproducibility of the analysis, a sufficiently high level of software engineering for the different components as well as an overall software architecture that integrates and automates the different components is needed. For the use case of gene expression analysis, from a software quality point of view we analyze a newly developed web application that allows user-friendly access to the underlying workflow.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124429889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Mining Autosomal Archaeogenetic Data to Determine Minoan Origins","authors":"P. Revesz","doi":"10.1145/3472163.3472178","DOIUrl":"https://doi.org/10.1145/3472163.3472178","url":null,"abstract":"This paper presents a method for data mining archaeogenetic autosomal data. The method is applied to the widely debated topic of the origin of the Bronze Age Minoan culture that existed on the island of Crete from 5000 to 3500 years ago. The data is compared with some Neolithic and early Bronze Age samples from the nearby Cycladic islands, mainland Greece and other Neolithic sites. The method shows that a large component of the Minoan autosomal genomes has sources from the Neolithic areas of northern Greece and the rest of the Balkans and a minor component comes directly from Neolithic Anatolia and the Caucasus.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121449895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning","authors":"V. Sharma, C. Dyreson, N. Flann","doi":"10.1145/3472163.3472176","DOIUrl":"https://doi.org/10.1145/3472163.3472176","url":null,"abstract":"DBMS performance is dependent on many parameters, such as index selection, cache size, physical layout, and data partitioning. Some combinations of these parameters can lead to optimal performance for a given workload but selecting an optimal or near-optimal combination is challenging, especially for large databases with complex workloads. Among the hundreds of parameters, index selection is arguably the most critical parameter for performance. We propose a self-administered framework, called the Multiple Type and Attribute Index Selector (MANTIS), that automatically selects near-optimal indexes. The framework advances the state-of-the-art index selection by considering both multi-attribute and multiple types of indexes within a bounded storage size constraint, a combination not previously addressed. MANTIS combines supervised and reinforcement learning, a Deep Neural Network recommends the type of index for a given workload while a Deep Q-Learning network recommends the multi-attribute aspect. MANTIS is sensitive to storage cost constraints and incorporates noisy rewards in its reward function for better performance. Our experimental evaluation shows that MANTIS outperforms the current state-of-art methods by an average of 9.53% QphH@size.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117076129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SQL-like query language and referential constraints on tree-structured data","authors":"F. Afrati, M. Damigos, Nikos Stasinopoulos","doi":"10.1145/3472163.3472184","DOIUrl":"https://doi.org/10.1145/3472163.3472184","url":null,"abstract":"In this paper we investigate within-record referential constraints on tree-structured data. We consider an SQL-like query language such that the one used in Dremel and we call it tree-SQL. We show how to define and process a query in tree-SQL in the presence of referential constraints. We give the semantics of tree-SQL via flattening and show how to produce equivalent semantics using the notion of tree-expansion of a query in the presence of referential constraints.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124912116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sentimental Analysis Applications and Approaches during COVID-19: A Survey","authors":"Areeba Umair, E. Masciari, Muhammad Habib Ullah","doi":"10.1145/3472163.3472274","DOIUrl":"https://doi.org/10.1145/3472163.3472274","url":null,"abstract":"The social media and electronic media has a vast amount of user-generated data such as people’ comment and reviews about different product, diseases, government policies etc. Sentimental analysis is the emerging field in text mining where people’s feeling and emotions are extracted using different techniques. COVID-19 has declared as pandemic and effected people’s lives all over the globe. It caused the feelings of fear, anxiety, anger, depression and many other psychological issues. In this survey paper, the sentimental analysis applications and methods which are used for COVID-19 research are briefly presented. The comparison of thirty primary studies shows that Naive Bayes and SVM are the widely used algorithms of sentimental analysis for COVID-19 research. The applications of sentimental analysis during COVID includes the analysis of people’s sentiments specially students, reopening sentiments, analysis of restaurants reviews and analysis of vaccine sentiments.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129610238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ChuangMing Liu, Denis Pak, Ari Ernesto Ortiz Castellanos
{"title":"Priority-Based Skyline Query Processing for Incomplete Data","authors":"ChuangMing Liu, Denis Pak, Ari Ernesto Ortiz Castellanos","doi":"10.1145/3472163.3472272","DOIUrl":"https://doi.org/10.1145/3472163.3472272","url":null,"abstract":"Over the years, several skyline query techniques have been introduced to handle incompleteness of data, the most recent of which has proposed to sort the points of a dataset into several distinct lists based on each dimension. The points would be accessed based on these lists in round robin fashion, and the points that haven’t been dominated by the end would compose the final skyline. The work is based on the assumption that relatively dominant points, if sorted, would be processed first, and even if the point wouldn’t be a skyline point, it would prune huge amount of data. However, that approach doesn’t take into consideration that the dominance of a point depends not only on the highest value of a given dimension, but also on the number of complete dimensions a point has. Hence, we propose a Priority-First Sort-Based Incomplete Data Skyline (PFSIDS) that utilizes a different indexing technique that allows optimization of access based on both number of complete dimensions a point has as well as sorting of the data.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125283172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}