Estimation-based optimizations for the semantic compression of RDF knowledge bases

IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Ruoyu Wang , Raymond Wong , Daniel Sun
{"title":"Estimation-based optimizations for the semantic compression of RDF knowledge bases","authors":"Ruoyu Wang ,&nbsp;Raymond Wong ,&nbsp;Daniel Sun","doi":"10.1016/j.ipm.2024.103799","DOIUrl":null,"url":null,"abstract":"<div><p>Structured knowledge bases are critical for the interpretability of AI techniques. RDF KBs, which are the dominant representation of structured knowledge, are expanding extremely fast to increase their knowledge coverage, enhancing the capability of knowledge reasoning while bringing heavy burdens to downstream applications. Recent studies employ semantic compression to detect and remove knowledge redundancies via semantic models and use the induced model for further applications, such as knowledge completion and error detection. However, semantic models that are sufficiently expressive for semantic compression cannot be efficiently induced, especially for large-scale KBs, due to the hardness of logic induction. In this article, we present estimation-based optimizations for the semantic compression of RDF KBs from the perspectives of input and intermediate data involved in the induction of first-order logic rules. The negative sampling technique selects a representative subset of all negative tuples with respect to the closed-world assumption, reducing the cost of evaluating the quality of a logic rule used for knowledge inference. The number of logic inference operations used during a compression procedure is reduced by a statistical estimation technique that prunes logic rules of low quality. The evaluation results show that the two techniques are feasible for the purpose of semantic compression and accelerate the compression algorithm by up to 47x compared to the state-of-the-art system.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306457324001584/pdfft?md5=1434ced08cb844b2e1fe9c678d211fae&pid=1-s2.0-S0306457324001584-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001584","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Structured knowledge bases are critical for the interpretability of AI techniques. RDF KBs, which are the dominant representation of structured knowledge, are expanding extremely fast to increase their knowledge coverage, enhancing the capability of knowledge reasoning while bringing heavy burdens to downstream applications. Recent studies employ semantic compression to detect and remove knowledge redundancies via semantic models and use the induced model for further applications, such as knowledge completion and error detection. However, semantic models that are sufficiently expressive for semantic compression cannot be efficiently induced, especially for large-scale KBs, due to the hardness of logic induction. In this article, we present estimation-based optimizations for the semantic compression of RDF KBs from the perspectives of input and intermediate data involved in the induction of first-order logic rules. The negative sampling technique selects a representative subset of all negative tuples with respect to the closed-world assumption, reducing the cost of evaluating the quality of a logic rule used for knowledge inference. The number of logic inference operations used during a compression procedure is reduced by a statistical estimation technique that prunes logic rules of low quality. The evaluation results show that the two techniques are feasible for the purpose of semantic compression and accelerate the compression algorithm by up to 47x compared to the state-of-the-art system.

基于估算的 RDF 知识库语义压缩优化技术
结构化知识库对于人工智能技术的可解释性至关重要。RDF 知识库是结构化知识的主流表示形式,它正以极快的速度扩展以增加知识覆盖面,在增强知识推理能力的同时,也给下游应用带来了沉重负担。最近的研究采用了语义压缩的方法,通过语义模型检测和去除知识冗余,并将诱导出的模型用于进一步的应用,如知识补全和错误检测。然而,由于逻辑归纳的困难性,无法有效地诱导出具有足够表达力的语义模型来进行语义压缩,尤其是对于大规模知识库而言。在本文中,我们从一阶逻辑规则归纳所涉及的输入数据和中间数据的角度,提出了基于估计的 RDF 知识库语义压缩优化方案。负抽样技术根据封闭世界假设从所有负元组中选择一个有代表性的子集,从而降低了用于知识推理的逻辑规则的质量评估成本。在压缩过程中,使用统计估算技术对低质量的逻辑规则进行修剪,从而减少了逻辑推理操作的次数。评估结果表明,这两种技术在语义压缩方面是可行的,与最先进的系统相比,压缩算法的速度提高了 47 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信