CMVC+: A Multi-View Clustering Framework for Open Knowledge Base Canonicalization Via Contrastive Learning

IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yang Yang;Wei Shen;Junfeng Shu;Yinan Liu;Edward Curry;Guoliang Li
{"title":"CMVC+: A Multi-View Clustering Framework for Open Knowledge Base Canonicalization Via Contrastive Learning","authors":"Yang Yang;Wei Shen;Junfeng Shu;Yinan Liu;Edward Curry;Guoliang Li","doi":"10.1109/TKDE.2025.3543423","DOIUrl":null,"url":null,"abstract":"Open information extraction (OIE) methods extract plenty of OIE triples <italic><inline-formula><tex-math>$&lt; $</tex-math><alternatives><mml:math><mml:mo>&lt;</mml:mo></mml:math><inline-graphic></alternatives></inline-formula>noun phrase, relation phrase, noun phrase<inline-formula><tex-math>$&gt; $</tex-math><alternatives><mml:math><mml:mo>&gt;</mml:mo></mml:math><inline-graphic></alternatives></inline-formula></i> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. In order to leverage these two views of knowledge jointly, we propose CMVC+, a novel unsupervised framework for canonicalizing OKBs without the need for manually annotated labels. Specifically, we propose a multi-view CHF K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering the clustering quality in a fine-grained manner. Furthermore, we propose a novel contrastive learning module to refine the learned view-specific embeddings and further enhance the canonicalization performance. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2296-2310"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891880/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Open information extraction (OIE) methods extract plenty of OIE triples $< $<noun phrase, relation phrase, noun phrase$> $> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. In order to leverage these two views of knowledge jointly, we propose CMVC+, a novel unsupervised framework for canonicalizing OKBs without the need for manually annotated labels. Specifically, we propose a multi-view CHF K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering the clustering quality in a fine-grained manner. Furthermore, we propose a novel contrastive learning module to refine the learned view-specific embeddings and further enhance the canonicalization performance. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
CMVC+:基于对比学习的开放知识库规范化多视图聚类框架
开放信息提取(OIE)方法提取了大量的OIE三元组$<;名词短语,关系短语,名词短语$>;在美元;从非结构化的文本,组成大型开放知识库(okb)。这些okb中的名词短语和关系短语没有规范化,导致事实的分散和冗余。研究发现,两种知识视图(即基于事实三元组的事实视图和基于事实三元组源上下文的上下文视图)提供了对OKB规范化任务至关重要的互补信息,该任务将同义名词短语和关系短语聚类到同一组中,并为它们分配唯一标识符。为了共同利用这两种知识视图,我们提出了CMVC+,这是一种新的无监督框架,用于规范化okb,而不需要手动注释标签。具体来说,我们提出了一种多视图CHF K-Means聚类算法,通过细粒度方式考虑聚类质量,相互加强从每个视图学习的特定视图嵌入的聚类。此外,我们提出了一种新的对比学习模块来改进学习到的特定于视图的嵌入,进一步提高规范化性能。我们通过在多个真实世界的OKB数据集上对最先进的方法进行了广泛的实验,证明了我们框架的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering 工程技术-工程:电子与电气
CiteScore
11.70
自引率
3.40%
发文量
515
审稿时长
6 months
期刊介绍: The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信