知识就是力量：基于知识的视觉推理的开放世界知识表征学习

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2024-05-13 DOI:10.1016/j.artint.2024.104147

Wenbo Zheng , Lan Yan , Fei-Yue Wang

{"title":"知识就是力量：基于知识的视觉推理的开放世界知识表征学习","authors":"Wenbo Zheng , Lan Yan , Fei-Yue Wang","doi":"10.1016/j.artint.2024.104147","DOIUrl":null,"url":null,"abstract":"<div>Knowledge-based visual reasoning requires the ability to associate outside knowledge that is not present in a given image for cross-modal visual understanding. Two deficiencies of the existing approaches are that (1) they only employ or construct elementary and explicit but superficial knowledge graphs while lacking complex and implicit but indispensable cross-modal knowledge for visual reasoning, and (2) they also cannot reason new/unseen images or questions in open environments and are often violated in real-world applications. How to represent and leverage tacit multimodal knowledge for open-world visual reasoning scenarios has been less studied. In this paper, we propose a novel open-world knowledge representation learning method to not only construct implicit knowledge representations from the given images and their questions but also enable knowledge transfer from a known given scene to an unknown scene for answer prediction. Extensive experiments conducted on six benchmarks demonstrate the superiority of our approach over other state-of-the-art methods. We apply our approach to other visual reasoning tasks, and the experimental results show that our approach, with its good performance, can support related reasoning applications.</div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"333 ","pages":"Article 104147"},"PeriodicalIF":5.1000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Knowledge is power: Open-world knowledge representation learning for knowledge-based visual reasoning\",\"authors\":\"Wenbo Zheng , Lan Yan , Fei-Yue Wang\",\"doi\":\"10.1016/j.artint.2024.104147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>Knowledge-based visual reasoning requires the ability to associate outside knowledge that is not present in a given image for cross-modal visual understanding. Two deficiencies of the existing approaches are that (1) they only employ or construct elementary and explicit but superficial knowledge graphs while lacking complex and implicit but indispensable cross-modal knowledge for visual reasoning, and (2) they also cannot reason new/unseen images or questions in open environments and are often violated in real-world applications. How to represent and leverage tacit multimodal knowledge for open-world visual reasoning scenarios has been less studied. In this paper, we propose a novel open-world knowledge representation learning method to not only construct implicit knowledge representations from the given images and their questions but also enable knowledge transfer from a known given scene to an unknown scene for answer prediction. Extensive experiments conducted on six benchmarks demonstrate the superiority of our approach over other state-of-the-art methods. We apply our approach to other visual reasoning tasks, and the experimental results show that our approach, with its good performance, can support related reasoning applications.</div>\",\"PeriodicalId\":8434,\"journal\":{\"name\":\"Artificial Intelligence\",\"volume\":\"333 \",\"pages\":\"Article 104147\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0004370224000833\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224000833","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于知识的视觉推理要求能够联想到特定图像中不存在的外部知识，以实现跨模态视觉理解。现有方法有两个不足之处：(1) 它们只使用或构建了基本的、显性的但肤浅的知识图谱，而缺乏复杂的、隐性的但对于视觉推理不可或缺的跨模态知识；(2) 它们也无法在开放环境中推理新的/未见过的图像或问题，在现实世界的应用中经常被违反。如何在开放世界的视觉推理场景中表示和利用隐性多模态知识的研究较少。在本文中，我们提出了一种新颖的开放世界知识表征学习方法，不仅能从给定图像及其问题中构建隐性知识表征，还能实现从已知给定场景到未知场景的知识转移，从而进行答案预测。在六个基准测试中进行的大量实验证明，我们的方法优于其他最先进的方法。我们将我们的方法应用于其他视觉推理任务，实验结果表明我们的方法性能良好，可以支持相关的推理应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Knowledge is power: Open-world knowledge representation learning for knowledge-based visual reasoning

Knowledge-based visual reasoning requires the ability to associate outside knowledge that is not present in a given image for cross-modal visual understanding. Two deficiencies of the existing approaches are that (1) they only employ or construct elementary and explicit but superficial knowledge graphs while lacking complex and implicit but indispensable cross-modal knowledge for visual reasoning, and (2) they also cannot reason new/unseen images or questions in open environments and are often violated in real-world applications. How to represent and leverage tacit multimodal knowledge for open-world visual reasoning scenarios has been less studied. In this paper, we propose a novel open-world knowledge representation learning method to not only construct implicit knowledge representations from the given images and their questions but also enable knowledge transfer from a known given scene to an unknown scene for answer prediction. Extensive experiments conducted on six benchmarks demonstrate the superiority of our approach over other state-of-the-art methods. We apply our approach to other visual reasoning tasks, and the experimental results show that our approach, with its good performance, can support related reasoning applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.