NEARM:自然语言增强关联规则挖掘

Shiya Ren, Zhixing Li, Huaming Wang, Yuan Li, Ke Shen, Sijie Cheng
{"title":"NEARM:自然语言增强关联规则挖掘","authors":"Shiya Ren, Zhixing Li, Huaming Wang, Yuan Li, Ke Shen, Sijie Cheng","doi":"10.1109/ICDMW.2018.00071","DOIUrl":null,"url":null,"abstract":"Knowledge bases(KBs), which are typical heterogeneous graphs that contain numerous triple facts of various types and relations, have shown remarkable advantages in many natural language processing(NLP) tasks. KBs usually integrate information from different sources such as human-edited online encyclopedias, news articles and even social networks. Due to the heterogeneous nature of these sources, both the KBs themselves and their applications on NLP tasks are far from perfect. On the one hand, KBs need further completion and refining to cover more knowledge with higher qualities. On the other hand, the joint modeling of structured knowledge in KBs and unstructured texts have not been well investigated. This paper proposes a novel natural language enhanced association rules mining (NEARM) framework to improve KBs. NEARM finds knowledge fragments from free texts in a data-driven manner. It first groups raw data (sentences) which contains related entity pairs into clusters of different granularities, and then integrates them with facts from KBs to mine rules in each clusters. To capture the relations between plain text and triple facts, NEARM produces rules that contain natural language patterns and/or triple facts in antecedent, and triple facts in consequent. In this way, NEARM can infer triple facts directly from plain text. At last, experiment results demonstrate the effectiveness of the NEARM on relation classification and triple facts reasoning.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"NEARM: Natural Language Enhanced Association Rules Mining\",\"authors\":\"Shiya Ren, Zhixing Li, Huaming Wang, Yuan Li, Ke Shen, Sijie Cheng\",\"doi\":\"10.1109/ICDMW.2018.00071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge bases(KBs), which are typical heterogeneous graphs that contain numerous triple facts of various types and relations, have shown remarkable advantages in many natural language processing(NLP) tasks. KBs usually integrate information from different sources such as human-edited online encyclopedias, news articles and even social networks. Due to the heterogeneous nature of these sources, both the KBs themselves and their applications on NLP tasks are far from perfect. On the one hand, KBs need further completion and refining to cover more knowledge with higher qualities. On the other hand, the joint modeling of structured knowledge in KBs and unstructured texts have not been well investigated. This paper proposes a novel natural language enhanced association rules mining (NEARM) framework to improve KBs. NEARM finds knowledge fragments from free texts in a data-driven manner. It first groups raw data (sentences) which contains related entity pairs into clusters of different granularities, and then integrates them with facts from KBs to mine rules in each clusters. To capture the relations between plain text and triple facts, NEARM produces rules that contain natural language patterns and/or triple facts in antecedent, and triple facts in consequent. In this way, NEARM can infer triple facts directly from plain text. At last, experiment results demonstrate the effectiveness of the NEARM on relation classification and triple facts reasoning.\",\"PeriodicalId\":259600,\"journal\":{\"name\":\"2018 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2018.00071\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2018.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

知识库(KBs)是典型的异构图,包含许多不同类型和关系的三重事实,在许多自然语言处理(NLP)任务中显示出显着的优势。KBs通常整合不同来源的信息,如人工编辑的在线百科全书、新闻文章,甚至社交网络。由于这些来源的异构性,KBs本身及其在NLP任务上的应用都远非完美。一方面,KBs需要进一步完善和完善,以涵盖更多的知识和更高的质量。另一方面,KBs和非结构化文本中结构化知识的联合建模还没有得到很好的研究。本文提出了一种新的自然语言增强关联规则挖掘(NEARM)框架来改进知识库。NEARM以数据驱动的方式从自由文本中查找知识片段。它首先将包含相关实体对的原始数据(句子)分组成不同粒度的簇,然后与KBs中的事实相结合,在每个簇中挖掘规则。为了捕获纯文本和三重事实之间的关系,NEARM生成了包含自然语言模式和/或在先行项中包含三重事实,在后项中包含三重事实的规则。通过这种方式,NEARM可以直接从纯文本中推断三重事实。最后,通过实验验证了该方法在关系分类和三事实推理方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NEARM: Natural Language Enhanced Association Rules Mining
Knowledge bases(KBs), which are typical heterogeneous graphs that contain numerous triple facts of various types and relations, have shown remarkable advantages in many natural language processing(NLP) tasks. KBs usually integrate information from different sources such as human-edited online encyclopedias, news articles and even social networks. Due to the heterogeneous nature of these sources, both the KBs themselves and their applications on NLP tasks are far from perfect. On the one hand, KBs need further completion and refining to cover more knowledge with higher qualities. On the other hand, the joint modeling of structured knowledge in KBs and unstructured texts have not been well investigated. This paper proposes a novel natural language enhanced association rules mining (NEARM) framework to improve KBs. NEARM finds knowledge fragments from free texts in a data-driven manner. It first groups raw data (sentences) which contains related entity pairs into clusters of different granularities, and then integrates them with facts from KBs to mine rules in each clusters. To capture the relations between plain text and triple facts, NEARM produces rules that contain natural language patterns and/or triple facts in antecedent, and triple facts in consequent. In this way, NEARM can infer triple facts directly from plain text. At last, experiment results demonstrate the effectiveness of the NEARM on relation classification and triple facts reasoning.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信