{"title":"Compositional scene modeling with global object-centric representations","authors":"","doi":"10.1007/s10994-023-06419-5","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>The appearance of the same object may vary in different scene images due to occlusions between objects. Humans can quickly identify the same object, even if occlusions exist, by completing the occluded parts based on its complete canonical image in the memory. Achieving this ability is still challenging for existing models, especially in the unsupervised learning setting. Inspired by such an ability of humans, we propose a novel object-centric representation learning method to identify the same object in different scenes that may be occluded by learning global object-centric representations of complete canonical objects without supervision. The representation of each object is divided into an extrinsic part, which characterizes scene-dependent information (i.e., position and size), and an intrinsic part, which characterizes globally invariant information (i.e., appearance and shape). The former can be inferred with an improved IC-SBP module. The latter is extracted by combining rectangular and arbitrary-shaped attention and is used to infer the identity representation via a proposed patch-matching strategy with a set of learnable global object-centric representations of complete canonical objects. In the experiment, three 2D scene datasets are used to verify the proposed method’s ability to recognize the identity of the same object in different scenes. A complex 3D scene dataset and a real-world dataset are used to evaluate the performance of scene decomposition. Our experimental results demonstrate that the proposed method outperforms the comparison methods in terms of same object recognition and scene decomposition.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"14 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-023-06419-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The appearance of the same object may vary in different scene images due to occlusions between objects. Humans can quickly identify the same object, even if occlusions exist, by completing the occluded parts based on its complete canonical image in the memory. Achieving this ability is still challenging for existing models, especially in the unsupervised learning setting. Inspired by such an ability of humans, we propose a novel object-centric representation learning method to identify the same object in different scenes that may be occluded by learning global object-centric representations of complete canonical objects without supervision. The representation of each object is divided into an extrinsic part, which characterizes scene-dependent information (i.e., position and size), and an intrinsic part, which characterizes globally invariant information (i.e., appearance and shape). The former can be inferred with an improved IC-SBP module. The latter is extracted by combining rectangular and arbitrary-shaped attention and is used to infer the identity representation via a proposed patch-matching strategy with a set of learnable global object-centric representations of complete canonical objects. In the experiment, three 2D scene datasets are used to verify the proposed method’s ability to recognize the identity of the same object in different scenes. A complex 3D scene dataset and a real-world dataset are used to evaluate the performance of scene decomposition. Our experimental results demonstrate that the proposed method outperforms the comparison methods in terms of same object recognition and scene decomposition.
期刊介绍:
Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.