{"title":"ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics","authors":"Pengfeï Liu, Sabine Loudcher, J. Darmont, C. Noûs","doi":"10.1145/3472163.3472266","DOIUrl":"https://doi.org/10.1145/3472163.3472266","url":null,"abstract":"With new emerging technologies, such as satellites and drones, archaeologists collect data over large areas. However, it becomes difficult to process such data in time. Archaeological data also have many different formats (images, texts, sensor data) and can be structured, semi-structured and unstructured. Such variety makes data difficult to collect, store, manage, search and analyze effectively. A few approaches have been proposed, but none of them covers the full data lifecycle nor provides an efficient data management system. Hence, we propose the use of a data lake to provide centralized data stores to host heterogeneous data, as well as tools for data quality checking, cleaning, transformation and analysis. In this paper, we propose a generic, flexible and complete data lake architecture. Our metadata management system exploits goldMEDAL, which is the most generic metadata model currently available. Finally, we detail the concrete implementation of this architecture dedicated to an archaeological project.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131414348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sink Group Betweenness Centrality","authors":"E. Fragkou, Dimitrios Katsaros, Y. Manolopoulos","doi":"10.1145/3472163.3472182","DOIUrl":"https://doi.org/10.1145/3472163.3472182","url":null,"abstract":"This article introduces the concept of Sink Group Node Betweenness centrality to identify those nodes in a network that can “monitor” the geodesic paths leading towards a set of subsets of nodes; it generalizes both the traditional node betweenness centrality and the sink betweenness centrality. We also provide extensions of the basic concept for node-weighted networks, and also describe the dual notion of Sink Group Edge Betweenness centrality. We exemplify the merits of these concepts and describe some areas where they can be applied.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"17 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131673024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Efficiently Equi-Joining Graphs","authors":"Giacomo Bergami","doi":"10.1145/3472163.3472269","DOIUrl":"https://doi.org/10.1145/3472163.3472269","url":null,"abstract":"Despite the growing popularity of techniques related to graph summarization, a general operator for joining graphs on both the vertices and the edges is still missing. Current languages such as Cypher and SPARQL express binary joins through the non-scalable and inefficient composition of multiple traversal and graph creation operations. In this paper, we propose an efficient equi-join algorithm that is able to perform vertex and path joins over a secondary memory indexed graph, also the resulting graph is serialised in secondary memory. The results show that the implementation of the proposed model outperforms solutions based on graphs, such as Neo4J and Virtuoso, and the relational model, such as PostgreSQL. Moreover, we propose two ways how edges can be combined, namely the conjunctive and disjunctive semantics, Preliminary experiments on the graph conjunctive join are also carried out with incremental updates, thus suggesting that our solution outperforms materialized views over PostgreSQL.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121362117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Angel L. Garrido, Álvaro Peiró, Carlos Bobed, E. Mena, Cristian Morte
{"title":"ICIX: A Semantic Information Extraction Architecture","authors":"Angel L. Garrido, Álvaro Peiró, Carlos Bobed, E. Mena, Cristian Morte","doi":"10.1145/3472163.3472174","DOIUrl":"https://doi.org/10.1145/3472163.3472174","url":null,"abstract":"Public and private organizations produce and store huge amounts of documents which contain information about their domains in non-structured formats. Although from the final user’s point of view we can rely on different retrieval tools to access such data, the progressive structuring of such documents has important benefits for daily operations. While there exist many approaches to extract information in open domains, we lack tools flexible enough to adapt themselves to the particularities of different domains. In this paper, we present the design and implementation of ICIX, an architecture to extract structured information from text documents. ICIX aims at obtaining specific information within a given domain, defined by means of an ontology which guides the extraction process. Besides, to optimize such an extraction, ICIX relies on document classification and data curation adapted to the particular domain. Our proposal has been implemented and evaluated in the specific context of managing legal documents, with promising results.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121891146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaohua Wang, G. Canahuate, L. V. Dijk, A. Mohamed, C. Fuller, Xinhua Zhang, G. Marai
{"title":"Predicting late symptoms of head and neck cancer treatment using LSTM and patient reported outcomes","authors":"Yaohua Wang, G. Canahuate, L. V. Dijk, A. Mohamed, C. Fuller, Xinhua Zhang, G. Marai","doi":"10.1145/3472163.3472177","DOIUrl":"https://doi.org/10.1145/3472163.3472177","url":null,"abstract":"Patient-Reported Outcome (PRO) surveys are used to monitor patients’ symptoms during and after cancer treatment. Late symptoms refer to those experienced after treatment. While most patients experience severe symptoms during treatment, these usually subside in the late stage. However, for some patients, late toxicities persist negatively affecting the patient’s quality of life (QoL). In the case of head and neck cancer patients, PRO surveys are recorded every week during the patient’s visit to the clinic and at different follow-up times after the treatment has concluded. In this paper, we model the PRO data as a time-series and apply Long-Short Term Memory (LSTM) neural networks for predicting symptom severity in the late stage. The PRO data used in this project corresponds to MD Anderson Symptom Inventory (MDASI) questionnaires collected from head and neck cancer patients treated at the MD Anderson Cancer Center. We show that the LSTM model is effective in predicting symptom ratings under the RMSE and NRMSE metrics. Our experiments show that the LSTM model also outperforms other machine learning models and time-series prediction models for these data.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125538563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}