Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-27 DOI:10.1109/TCSVT.2025.3534204

Kunshan Yang;Lin Zuo;Mengmeng Jing;Xianlong Tian;Kunbin He;Yongqi Ding

{"title":"Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition","authors":"Kunshan Yang;Lin Zuo;Mengmeng Jing;Xianlong Tian;Kunbin He;Yongqi Ding","doi":"10.1109/TCSVT.2025.3534204","DOIUrl":null,"url":null,"abstract":"Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring graph nodes, which is employed to identify flexible objects with minimal inter-class differences. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid graph nodes. This introduces local context information and enables extracting of consistent representation, effectively adapting to the shape and size variations in flexible objects. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our FDA, FireNet, CIFAR-100 and ImageNet-Hard datasets demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"6424-6436"},"PeriodicalIF":11.1000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10854580/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring graph nodes, which is employed to identify flexible objects with minimal inter-class differences. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid graph nodes. This introduces local context information and enables extracting of consistent representation, effectively adapting to the shape and size variations in flexible objects. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our FDA, FireNet, CIFAR-100 and ImageNet-Hard datasets demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects.

查看原文本刊更多论文

柔性视觉：学习柔性物体识别的自显著性

现有的计算机视觉方法主要集中在对刚性物体的识别上，而对柔性物体的识别还没有深入研究。由于柔性对象本身具有不同的形状和大小、半透明的属性、模糊的边界和微妙的类间差异，因此识别柔性对象提出了重大挑战。在本文中，我们认为这些问题主要是由缺乏客体显著性引起的。为此，我们提出了柔性视觉图神经网络（FViG）来优化自显著性，从而提高对柔性物体表征的辨别能力。具体而言，一方面，我们提出通过提取相邻图节点的权重来最大化通道感知显著性，从而以最小的类间差异识别柔性对象；另一方面，在聚类的基础上最大化空间感知显著性，聚合质心图节点的邻域信息。该方法引入了局部上下文信息，能够提取一致的表示，有效地适应柔性对象的形状和大小变化。为了彻底验证柔性物体识别的性能，我们首次提出了柔性数据集（FDA），该数据集由从真实场景或在线收集的各种柔性物体图像组成。在FDA、FireNet、CIFAR-100和ImageNet-Hard数据集上进行的大量实验表明，我们的方法可以有效地增强对柔性物体的识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.