{"title":"Predicting SPARQL Query Dynamics","authors":"Alberto Moya Loustaunau, A. Hogan","doi":"10.1145/3460210.3493565","DOIUrl":"https://doi.org/10.1145/3460210.3493565","url":null,"abstract":"Given historical versions of an RDF graph, we propose and compare several methods to predict whether or not the results of a SPARQL query will change for the next version. Unsurprisingly, we find that the best results for this task are achievable by considering the full history of results for the query over previous versions of the graph. However, given a previously unseen query, producing historical results requires costly offline maintenance of previous versions of the data, and costly online computation of the query results over these previous versions. This prompts us to explore more lightweight alternatives that rely on features computed from the query and statistical summaries of historical versions of the graph. We evaluate the quality of the predictions produced over weekly snapshots of Wikidata and daily snapshots of DBpedia. Our results provide insights into the trade-offs for predicting SPARQL query dynamics, where we find that a detailed history of changes for a query's results enables much more accurate predictions, but has higher overhead versus more lightweight alternatives.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115873856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Zhu, Ling Cai, Gengchen Mai, C. Shimizu, C. Fisher, K. Janowicz, Anna Lopez-Carr, A. Schroeder, M. Schildhauer, Yuanyuan Tian, Shirly Stephen, Zilong Liu
{"title":"Providing Humanitarian Relief Support through Knowledge Graphs","authors":"Rui Zhu, Ling Cai, Gengchen Mai, C. Shimizu, C. Fisher, K. Janowicz, Anna Lopez-Carr, A. Schroeder, M. Schildhauer, Yuanyuan Tian, Shirly Stephen, Zilong Liu","doi":"10.1145/3460210.3493581","DOIUrl":"https://doi.org/10.1145/3460210.3493581","url":null,"abstract":"Disasters are often unpredictable and complex events, requiring humanitarian organizations to understand and respond to many different issues simultaneously and immediately. Often the biggest challenge to improving the effectiveness of the response is quickly finding the right expert, with the right expertise concerning a specific disaster type/disaster and geographic region. To assist in achieving such a goal, this paper demonstrates a knowledge graph-based search engine developed on top of an expert knowledge graph. It accommodates three modes of information retrieval, including a follow-your-nose search, an expert similarity search, and a SPARQL query interface. We will demonstrate utilizing the system to rapidly navigate from a hazard event to a specific expert who may be helpful, for example. More importantly, as the data is fully integrated including links between hazards and their abstract topics, we can find experts who have relevant expertise while navigating the graph.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116145995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Challenges of Cross-Document Coreference Resolution for Email","authors":"Xue Li, Sara Magliacane, Paul Groth","doi":"10.1145/3460210.3493573","DOIUrl":"https://doi.org/10.1145/3460210.3493573","url":null,"abstract":"Long-form conversations such as email are an important source of information for knowledge capture. For tasks such as knowledge graph construction, conversational search, and entity linking, being able to resolve entities from across documents is important. Building on recent work on within document coreference resolution for email, we study for the first time a cross-document formulation of the problem. Our results show that the current state-of-the-art deep learning models for general cross-document coreference resolution are insufficient for email conversations. Our experiments show that the general task is challenging and, importantly for knowledge intensive tasks, coreference resolution models that only treat entity mentions perform worse. Based on these results, we outline the work needed to address this challenging task.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"67 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126770366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Akhter, Muhammad Saleem, Alexander Bigerl, A. N. Ngomo
{"title":"Efficient RDF Knowledge Graph Partitioning Using Querying Workload","authors":"A. Akhter, Muhammad Saleem, Alexander Bigerl, A. N. Ngomo","doi":"10.1145/3460210.3493577","DOIUrl":"https://doi.org/10.1145/3460210.3493577","url":null,"abstract":"Data partitioning is an effective way to manage large datasets. While a broad range of RDF graph partitioning techniques has been proposed in previous works, little attention has been given to workload-aware RDF graph partitioning. In this paper, we propose two techniques that make use of the querying workload to detect the portions of RDF graphs that are often queried concurrently. Our techniques leverage predicate co-occurrences in SPARQL queries. By detecting highly co-occurring predicates, our techniques can keep data pertaining to these predicates in the same data partition. We evaluate the proposed partitioning techniques using various real-data and query benchmarks generated by the FEASIBLE SPARQL benchmark generation framework. Our evaluation results show the superiority of the proposed techniques in comparison to previous techniques in terms of better query runtime performances.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122171539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASSET: A Semi-supervised Approach for Entity Typing in Knowledge Graphs","authors":"Hamada M. Zahera, Stefan Heindorf, A. N. Ngomo","doi":"10.1145/3460210.3493563","DOIUrl":"https://doi.org/10.1145/3460210.3493563","url":null,"abstract":"Entity typing in knowledge graphs (KGs) aims to infer missing types of entities and might be considered one of the most significant tasks of knowledge graph construction since type information is highly relevant for querying, quality assurance, and KG applications. While supervised learning approaches for entity typing have been proposed, they require large amounts of (manually) labeled data, which can be expensive to obtain. In this paper, we propose a novel approach for KG entity typing that leverages semi-supervised learning from massive unlabeled data. Our approach follows a teacher-student paradigm that allows combining a small amount of labeled data with a large amount of unlabeled data to boost performance. We conduct several experiments on two benchmarking datasets (FB15k-ET and YAGO43k-ET). Our results demonstrate the effectiveness of our approach in improving entity typing in KGs. Given type information for only 1% of entities, our approach ASSET predicts missing types with a F1-score of 0.47 and 0.64 on the datasets FB15k-ET and YAGO43k-ET, respectively, outperforming supervised baselines.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124839202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge Extraction from Auto-Encoders on Anomaly Detection Tasks Using Co-activation Graphs","authors":"Daniyal Selani, Ilaria Tiddi","doi":"10.1145/3460210.3493571","DOIUrl":"https://doi.org/10.1145/3460210.3493571","url":null,"abstract":"Deep neural networks have exploded in popularity and different types of networks are used to solve a multitude of complex tasks. One such task is anomaly detection, that a type of deep neural network called auto-encoder has become extremely proficient at solving. The low level neural activity, produced by such a network, generates extremely rich representations of the data, which can be used to extract task specific knowledge. In this paper, we built upon previous work and used co-activation graph analysis to extract knowledge from auto-encoders, that were trained for the specific task of anomaly detection. First, we outlined a method for extracting co-activation graphs from auto-encoders. Then, we performed graph analysis to discover that task specific knowledge from the auto-encoder was being encoded into the co-activation graph, and that the extracted knowledge could be used to reveal the role of individual neurons in the network.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"482 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116523958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Delva, B. De Smedt, Sitt Min Oo, Dylan Van Assche, S. Lieber, Anastasia Dimou
{"title":"RML2SHACL: RDF Generation Taking Shape","authors":"Thomas Delva, B. De Smedt, Sitt Min Oo, Dylan Van Assche, S. Lieber, Anastasia Dimou","doi":"10.1145/3460210.3493562","DOIUrl":"https://doi.org/10.1145/3460210.3493562","url":null,"abstract":"RDF graphs are often generated by mapping data in other (semi-)structured data formats to RDF. Such mapped graphs have a repetitive structure defined by (i) the mapping rules and (ii) the schema of the input sources. However, this information is not exploited beyond its original scope. SHACL was recently introduced to model constraints that RDF graphs should validate. SHACL shapes and their constraints are either manually defined or derived from ontologies or RDF graphs. We investigate a method to derive the shapes and their constraints from mapping rules, allowing the generation of the RDF graph and the corresponding shapes in one step. In this paper, we present RML2SHACL: an approach to generate SHACL shapes that validate RDF graphs defined by RML mapping rules. RML2SHACL relies on our proposed set of correspondences between RML and SHACL constructs. RML2SHACL covers a large variety of RML constructs, as proven by generating shapes for the RML test cases. A comparative analysis shows that shapes generated by RML2SHACL are similar to shapes generated by ontology-based tools, with a larger focus on data value-based constraints instead of schema-based constraints. We also found that RML2SHACL has a faster execution time than data-graph based approaches for data sizes of 90MB and higher.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"7 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122744666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Measuring the Resemblance of Embedding Models for Evolving Ontologies","authors":"Romana Pernisch, Daniele Dell'Aglio, A. Bernstein","doi":"10.1145/3460210.3493540","DOIUrl":"https://doi.org/10.1145/3460210.3493540","url":null,"abstract":"Updates on ontologies affect the operations built on top of them. But not all changes are equal: some updates drastically change the result of operations; others lead to minor variations, if any. Hence, estimating the impact of a change ex-ante is highly important, as it might make ontology engineers aware of the consequences of their action during editing. However, in order to estimate the impact of changes, we need to understand how to measure them. To address this gap for embeddings, we propose a new measure called Embedding Resemblance Indicator (ERI), which takes into account both the stochasticity of learning embeddings as well as the shortcomings of established comparison methods. We base ERI on (i) a similarity score, (ii) a robustness factor $hatμ $ (based on the embedding method, similarity measure, and dataset), and (iii) the number of added or deleted entities to the embedding computed with the Jaccard index. To evaluate ERI, we investigate its usage in the context of two biomedical ontologies and three embedding methods---GraRep, LINE, and DeepWalk---as well as the two standard benchmark datasets---FB15k-237 and Wordnet-18-RR---with TransE and RESCAL embeddings. To study different aspects of ERI, we introduce synthetic changes in the knowledge graphs, generating two test-cases with five versions each and compare their impact with the expected behaviour. Our studies suggests that ERI behaves as expected and captures the similarity of embeddings based on the severity of changes. ERI is crucial for enabling further studies into impact of changes on embeddings.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115026555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blockchain for Trustworthy Publication and Integration of Linked Open Data","authors":"Fabian Kirstein, M. Hauswirth","doi":"10.1145/3460210.3493572","DOIUrl":"https://doi.org/10.1145/3460210.3493572","url":null,"abstract":"The timely, traceable and provenance-aware publication of Linked Open Data (LOD) is crucial for its success and to fulfill the vision of a global, decentralized, and machine-readable database of knowledge. Yet, the access to LOD is still fragmented and mainly centralized aggregations are being used, relying on complex harvesting mechanisms. As a remedy, we propose a blockchain-based approach enabling an integrated, traceable, and timely view on LOD. We use a blockchain to meet the organizational requirements of publishing LOD in a decentralized fashion while still supporting the sovereignty of the data providers and supporting provenance and proper integration into a harmonized knowledge graph. We present an approach and an implemented system that fulfills the requirements regarding volume and throughput and can be used as the foundation for practical deployments. We use Linked Open Government Data (LOGD) as our case study to demonstrate the feasibility of our approach. We developed a prototype to address the specific requirements of LOGD publication and apply the Practical Byzantine Fault Tolerance algorithm at its core to enable a robust state replication.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122773182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dörthe Arndt, S. Lieber, Raf Buyle, S. Goossens, David De Block, B. Meester, E. Mannens
{"title":"Dynamic Workflow Composition with OSLO-steps: Data Re-use and Simplification of Automated Administration","authors":"Dörthe Arndt, S. Lieber, Raf Buyle, S. Goossens, David De Block, B. Meester, E. Mannens","doi":"10.1145/3460210.3493559","DOIUrl":"https://doi.org/10.1145/3460210.3493559","url":null,"abstract":"e-Government applications have hard-coded and non-personalized user journeys with high maintenance costs to keep up with, e.g., changing legislation. Automatic administrative workflows are needed. We present the OSLO-steps vocabulary and the workflow composer: combined, they are a means to create cross-organizational interoperable user journeys, adapted to the user's needs. We identify the requirements for automating administrative workflows and present an architecture and its implemented components. By using Linked Data principles to decentrally describe independent steps using states as pre- and postconditions, and composing workflows on-the-fly whilst matching a user's state to those preconditions, we automatically generate next steps to reach the user's goal. The validated solution shows its feasibility, and the upcoming interest around interoperable personal data pods (e.g., via Solid) can further increase its potential.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116274246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}