知识就是力量:基于知识的视觉推理的开放世界知识表征学习

IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wenbo Zheng , Lan Yan , Fei-Yue Wang
{"title":"知识就是力量:基于知识的视觉推理的开放世界知识表征学习","authors":"Wenbo Zheng ,&nbsp;Lan Yan ,&nbsp;Fei-Yue Wang","doi":"10.1016/j.artint.2024.104147","DOIUrl":null,"url":null,"abstract":"<div><p>Knowledge-based visual reasoning requires the ability to associate outside knowledge that is not present in a given image for cross-modal visual understanding. Two deficiencies of the existing approaches are that (1) they only employ or construct elementary and <em>explicit</em> but superficial knowledge graphs while lacking complex and <em>implicit</em> but indispensable cross-modal knowledge for visual reasoning, and (2) they also cannot reason new/<em>unseen</em> images or questions in open environments and are often violated in real-world applications. How to represent and leverage tacit multimodal knowledge for open-world visual reasoning scenarios has been less studied. In this paper, we propose a novel open-world knowledge representation learning method to not only construct implicit knowledge representations from the given images and their questions but also enable knowledge transfer from a <em>known</em> given scene to an <em>unknown</em> scene for answer prediction. Extensive experiments conducted on six benchmarks demonstrate the superiority of our approach over other state-of-the-art methods. We apply our approach to other visual reasoning tasks, and the experimental results show that our approach, with its good performance, can support related reasoning applications.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"333 ","pages":"Article 104147"},"PeriodicalIF":5.1000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Knowledge is power: Open-world knowledge representation learning for knowledge-based visual reasoning\",\"authors\":\"Wenbo Zheng ,&nbsp;Lan Yan ,&nbsp;Fei-Yue Wang\",\"doi\":\"10.1016/j.artint.2024.104147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Knowledge-based visual reasoning requires the ability to associate outside knowledge that is not present in a given image for cross-modal visual understanding. Two deficiencies of the existing approaches are that (1) they only employ or construct elementary and <em>explicit</em> but superficial knowledge graphs while lacking complex and <em>implicit</em> but indispensable cross-modal knowledge for visual reasoning, and (2) they also cannot reason new/<em>unseen</em> images or questions in open environments and are often violated in real-world applications. How to represent and leverage tacit multimodal knowledge for open-world visual reasoning scenarios has been less studied. In this paper, we propose a novel open-world knowledge representation learning method to not only construct implicit knowledge representations from the given images and their questions but also enable knowledge transfer from a <em>known</em> given scene to an <em>unknown</em> scene for answer prediction. Extensive experiments conducted on six benchmarks demonstrate the superiority of our approach over other state-of-the-art methods. We apply our approach to other visual reasoning tasks, and the experimental results show that our approach, with its good performance, can support related reasoning applications.</p></div>\",\"PeriodicalId\":8434,\"journal\":{\"name\":\"Artificial Intelligence\",\"volume\":\"333 \",\"pages\":\"Article 104147\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0004370224000833\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224000833","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

基于知识的视觉推理要求能够联想到特定图像中不存在的外部知识,以实现跨模态视觉理解。现有方法有两个不足之处:(1) 它们只使用或构建了基本的、显性的但肤浅的知识图谱,而缺乏复杂的、隐性的但对于视觉推理不可或缺的跨模态知识;(2) 它们也无法在开放环境中推理新的/未见过的图像或问题,在现实世界的应用中经常被违反。如何在开放世界的视觉推理场景中表示和利用隐性多模态知识的研究较少。在本文中,我们提出了一种新颖的开放世界知识表征学习方法,不仅能从给定图像及其问题中构建隐性知识表征,还能实现从已知给定场景到未知场景的知识转移,从而进行答案预测。在六个基准测试中进行的大量实验证明,我们的方法优于其他最先进的方法。我们将我们的方法应用于其他视觉推理任务,实验结果表明我们的方法性能良好,可以支持相关的推理应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Knowledge is power: Open-world knowledge representation learning for knowledge-based visual reasoning

Knowledge-based visual reasoning requires the ability to associate outside knowledge that is not present in a given image for cross-modal visual understanding. Two deficiencies of the existing approaches are that (1) they only employ or construct elementary and explicit but superficial knowledge graphs while lacking complex and implicit but indispensable cross-modal knowledge for visual reasoning, and (2) they also cannot reason new/unseen images or questions in open environments and are often violated in real-world applications. How to represent and leverage tacit multimodal knowledge for open-world visual reasoning scenarios has been less studied. In this paper, we propose a novel open-world knowledge representation learning method to not only construct implicit knowledge representations from the given images and their questions but also enable knowledge transfer from a known given scene to an unknown scene for answer prediction. Extensive experiments conducted on six benchmarks demonstrate the superiority of our approach over other state-of-the-art methods. We apply our approach to other visual reasoning tasks, and the experimental results show that our approach, with its good performance, can support related reasoning applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Artificial Intelligence
Artificial Intelligence 工程技术-计算机:人工智能
CiteScore
11.20
自引率
1.40%
发文量
118
审稿时长
8 months
期刊介绍: The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信