{"title":"Estimation-based optimizations for the semantic compression of RDF knowledge bases","authors":"Ruoyu Wang , Raymond Wong , Daniel Sun","doi":"10.1016/j.ipm.2024.103799","DOIUrl":null,"url":null,"abstract":"<div><p>Structured knowledge bases are critical for the interpretability of AI techniques. RDF KBs, which are the dominant representation of structured knowledge, are expanding extremely fast to increase their knowledge coverage, enhancing the capability of knowledge reasoning while bringing heavy burdens to downstream applications. Recent studies employ semantic compression to detect and remove knowledge redundancies via semantic models and use the induced model for further applications, such as knowledge completion and error detection. However, semantic models that are sufficiently expressive for semantic compression cannot be efficiently induced, especially for large-scale KBs, due to the hardness of logic induction. In this article, we present estimation-based optimizations for the semantic compression of RDF KBs from the perspectives of input and intermediate data involved in the induction of first-order logic rules. The negative sampling technique selects a representative subset of all negative tuples with respect to the closed-world assumption, reducing the cost of evaluating the quality of a logic rule used for knowledge inference. The number of logic inference operations used during a compression procedure is reduced by a statistical estimation technique that prunes logic rules of low quality. The evaluation results show that the two techniques are feasible for the purpose of semantic compression and accelerate the compression algorithm by up to 47x compared to the state-of-the-art system.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306457324001584/pdfft?md5=1434ced08cb844b2e1fe9c678d211fae&pid=1-s2.0-S0306457324001584-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001584","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Structured knowledge bases are critical for the interpretability of AI techniques. RDF KBs, which are the dominant representation of structured knowledge, are expanding extremely fast to increase their knowledge coverage, enhancing the capability of knowledge reasoning while bringing heavy burdens to downstream applications. Recent studies employ semantic compression to detect and remove knowledge redundancies via semantic models and use the induced model for further applications, such as knowledge completion and error detection. However, semantic models that are sufficiently expressive for semantic compression cannot be efficiently induced, especially for large-scale KBs, due to the hardness of logic induction. In this article, we present estimation-based optimizations for the semantic compression of RDF KBs from the perspectives of input and intermediate data involved in the induction of first-order logic rules. The negative sampling technique selects a representative subset of all negative tuples with respect to the closed-world assumption, reducing the cost of evaluating the quality of a logic rule used for knowledge inference. The number of logic inference operations used during a compression procedure is reduced by a statistical estimation technique that prunes logic rules of low quality. The evaluation results show that the two techniques are feasible for the purpose of semantic compression and accelerate the compression algorithm by up to 47x compared to the state-of-the-art system.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.