Huan Rong, Minfeng Qian, Tinghuai Ma, Di Jin, Victor S. Sheng
{"title":"CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge Graph","authors":"Huan Rong, Minfeng Qian, Tinghuai Ma, Di Jin, Victor S. Sheng","doi":"10.1145/3643565","DOIUrl":null,"url":null,"abstract":"<p><i>Object detection</i> is a widely studied problem in existing works. However, in this paper, we turn to a more challenging problem of “<i>Covered Object Reasoning</i>”, aimed at reasoning the category label of target object in the given image particularly when it has been totally <i>covered</i> (or <i>invisible</i>). To resolve this problem, we propose <i>CoBjeason</i> to seize the opportunity when visual reasoning meets the knowledge graph, where “<i>empirical cognition</i>” on common visual contexts have been incorporated as knowledge graph to conduct reinforced multi-hop reasoning via two collaborative agents. Such two agents, for one thing, stand at the covered object (or <i>unknown entity</i>) to observe the surrounding visual cues in the given image and gradually select <i>entities</i> and <i>relations</i> from the global <i>gallery-level</i> knowledge graph which contains entity-pairs frequently occurring across the entire image-collection, so as to <i>infer</i> the main structure of image-level knowledge graph <i>forward</i> expanded from the <i>unknown entity</i>. In turn, for another, based on the <i>reasoned</i> image-level knowledge graph, the semantic context among <i>entities</i> will be aggregated backward into <i>unknown entity</i> to select an appropriate entity from the global <i>gallery-level</i> knowledge graph as the reasoning result. Moreover, such two agents will collaborate with each other, securing that the above <i>Forward</i> & <i>Backward Reasoning</i> will step towards the same destination of the higher performance on covered object reasoning. To our best knowledge, this is the first work on <i>Covered Object Reasoning</i> with Knowledge Graphs and reinforced Multi-Agent collaboration. Particularly, our study on <i>Covered Object Reasoning</i> and the proposed model <i>CoBjeason</i> could offer novel insights into more basic Computer Vision (CV) tasks, such as <i>Semantic Segmentation</i> with better understanding on the current scene when some objects are blurred or covered, <i>Visual Question Answering</i> with enhancement on the inference in more complicated visual context when some objects are covered or invisible, and <i>Image Caption Generation</i> with the augmentation on the richness of visual context for images containing partially visible objects. The improvement on the above basic CV tasks can further refine more complicated ones involved with nuanced visual interpretation like Autonomous Driving, where the recognition and reasoning on partially visible or covered object are critical. According to the experimental results, our proposed <i>CoBjeason</i> can achieve the best overall ranking performance on covered object reasoning compared with other models, meanwhile enjoying the advantage of lower “<i>exploration cost</i>”, with the insensitivity against the long-tail covered objects and the acceptable time complexity.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"75 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643565","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Object detection is a widely studied problem in existing works. However, in this paper, we turn to a more challenging problem of “Covered Object Reasoning”, aimed at reasoning the category label of target object in the given image particularly when it has been totally covered (or invisible). To resolve this problem, we propose CoBjeason to seize the opportunity when visual reasoning meets the knowledge graph, where “empirical cognition” on common visual contexts have been incorporated as knowledge graph to conduct reinforced multi-hop reasoning via two collaborative agents. Such two agents, for one thing, stand at the covered object (or unknown entity) to observe the surrounding visual cues in the given image and gradually select entities and relations from the global gallery-level knowledge graph which contains entity-pairs frequently occurring across the entire image-collection, so as to infer the main structure of image-level knowledge graph forward expanded from the unknown entity. In turn, for another, based on the reasoned image-level knowledge graph, the semantic context among entities will be aggregated backward into unknown entity to select an appropriate entity from the global gallery-level knowledge graph as the reasoning result. Moreover, such two agents will collaborate with each other, securing that the above Forward & Backward Reasoning will step towards the same destination of the higher performance on covered object reasoning. To our best knowledge, this is the first work on Covered Object Reasoning with Knowledge Graphs and reinforced Multi-Agent collaboration. Particularly, our study on Covered Object Reasoning and the proposed model CoBjeason could offer novel insights into more basic Computer Vision (CV) tasks, such as Semantic Segmentation with better understanding on the current scene when some objects are blurred or covered, Visual Question Answering with enhancement on the inference in more complicated visual context when some objects are covered or invisible, and Image Caption Generation with the augmentation on the richness of visual context for images containing partially visible objects. The improvement on the above basic CV tasks can further refine more complicated ones involved with nuanced visual interpretation like Autonomous Driving, where the recognition and reasoning on partially visible or covered object are critical. According to the experimental results, our proposed CoBjeason can achieve the best overall ranking performance on covered object reasoning compared with other models, meanwhile enjoying the advantage of lower “exploration cost”, with the insensitivity against the long-tail covered objects and the acceptable time complexity.
期刊介绍:
TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.