Mining Expressive Rules in Knowledge Graphs

N. Ahmadi, Viet-Phi Huynh, Venkata Vamsikrishna Meduri, Stefano Ortona, Paolo Papotti
{"title":"Mining Expressive Rules in Knowledge Graphs","authors":"N. Ahmadi, Viet-Phi Huynh, Venkata Vamsikrishna Meduri, Stefano Ortona, Paolo Papotti","doi":"10.1145/3371315","DOIUrl":null,"url":null,"abstract":"We describe RuDiK, an algorithm and a system for mining declarative rules over RDF knowledge graphs (KGs). RuDiK can discover rules expressing both positive relationships between KG elements, e.g., “if two persons share at least one parent, they are likely to be siblings,” and negative patterns identifying data contradictions, e.g., “if two persons are married, one cannot be the child of the other” or “the birth year for a person cannot be bigger than her graduation year.” While the first kind of rules identify new facts in the KG, the second kind enables the detection of incorrect triples and the generation of (training) negative examples for learning algorithms. High-quality rules are also critical for any reasoning task involving the KGs. Our approach increases the expressive power of the supported rule language w.r.t. the existing systems. RuDiK discovers rules containing (i) comparisons among literal values and (ii) selection conditions with constants. Richer rules increase the accuracy and the coverage over the facts in the KG for the task at hand. This is achieved with aggressive pruning of the search space and with disk-based algorithms, which enable the execution of the system in commodity machines. Also, RuDiK is robust to errors and missing data in the input graph. It discovers approximate rules with a measure of support that is aware of the quality issues. Our experimental evaluation with real-world KGs shows that RuDiK does better than existing solutions in terms of scalability and that it can identify effective rules for different target applications.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"48 1","pages":"1 - 27"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3371315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

We describe RuDiK, an algorithm and a system for mining declarative rules over RDF knowledge graphs (KGs). RuDiK can discover rules expressing both positive relationships between KG elements, e.g., “if two persons share at least one parent, they are likely to be siblings,” and negative patterns identifying data contradictions, e.g., “if two persons are married, one cannot be the child of the other” or “the birth year for a person cannot be bigger than her graduation year.” While the first kind of rules identify new facts in the KG, the second kind enables the detection of incorrect triples and the generation of (training) negative examples for learning algorithms. High-quality rules are also critical for any reasoning task involving the KGs. Our approach increases the expressive power of the supported rule language w.r.t. the existing systems. RuDiK discovers rules containing (i) comparisons among literal values and (ii) selection conditions with constants. Richer rules increase the accuracy and the coverage over the facts in the KG for the task at hand. This is achieved with aggressive pruning of the search space and with disk-based algorithms, which enable the execution of the system in commodity machines. Also, RuDiK is robust to errors and missing data in the input graph. It discovers approximate rules with a measure of support that is aware of the quality issues. Our experimental evaluation with real-world KGs shows that RuDiK does better than existing solutions in terms of scalability and that it can identify effective rules for different target applications.
挖掘知识图中的表达规则
本文描述了一种基于RDF知识图(KGs)的声明性规则挖掘算法和系统RuDiK。RuDiK可以发现表达KG元素之间积极关系的规则,例如,“如果两个人共享至少一个父母,他们很可能是兄弟姐妹”,以及识别数据矛盾的消极模式,例如,“如果两个人结婚,一个人不能是另一个人的孩子”或“一个人的出生年份不能大于她的毕业年份”。当第一类规则识别KG中的新事实时,第二类规则能够检测不正确的三元组并为学习算法生成(训练)负例。高质量的规则对于任何涉及到KGs的推理任务都是至关重要的,我们的方法在现有系统的基础上提高了所支持规则语言的表达能力。RuDiK发现了包含(i)文字值之间比较和(ii)带有常量的选择条件的规则。更丰富的规则增加了当前任务KG中事实的准确性和覆盖范围。这是通过搜索空间的积极修剪和基于磁盘的算法实现的,这使得系统可以在普通机器上执行。此外,RuDiK对输入图中的错误和缺失数据具有鲁棒性。它发现近似的规则与支持的措施,是意识到质量问题。我们对真实世界的KGs进行的实验评估表明,就可扩展性而言,RuDiK比现有的解决方案做得更好,并且它可以为不同的目标应用程序识别有效的规则。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信