Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition

IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Kunshan Yang;Lin Zuo;Mengmeng Jing;Xianlong Tian;Kunbin He;Yongqi Ding
{"title":"Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition","authors":"Kunshan Yang;Lin Zuo;Mengmeng Jing;Xianlong Tian;Kunbin He;Yongqi Ding","doi":"10.1109/TCSVT.2025.3534204","DOIUrl":null,"url":null,"abstract":"Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring graph nodes, which is employed to identify flexible objects with minimal inter-class differences. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid graph nodes. This introduces local context information and enables extracting of consistent representation, effectively adapting to the shape and size variations in flexible objects. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our FDA, FireNet, CIFAR-100 and ImageNet-Hard datasets demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"6424-6436"},"PeriodicalIF":11.1000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10854580/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring graph nodes, which is employed to identify flexible objects with minimal inter-class differences. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid graph nodes. This introduces local context information and enables extracting of consistent representation, effectively adapting to the shape and size variations in flexible objects. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our FDA, FireNet, CIFAR-100 and ImageNet-Hard datasets demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects.
柔性视觉:学习柔性物体识别的自显著性
现有的计算机视觉方法主要集中在对刚性物体的识别上,而对柔性物体的识别还没有深入研究。由于柔性对象本身具有不同的形状和大小、半透明的属性、模糊的边界和微妙的类间差异,因此识别柔性对象提出了重大挑战。在本文中,我们认为这些问题主要是由缺乏客体显著性引起的。为此,我们提出了柔性视觉图神经网络(FViG)来优化自显著性,从而提高对柔性物体表征的辨别能力。具体而言,一方面,我们提出通过提取相邻图节点的权重来最大化通道感知显著性,从而以最小的类间差异识别柔性对象;另一方面,在聚类的基础上最大化空间感知显著性,聚合质心图节点的邻域信息。该方法引入了局部上下文信息,能够提取一致的表示,有效地适应柔性对象的形状和大小变化。为了彻底验证柔性物体识别的性能,我们首次提出了柔性数据集(FDA),该数据集由从真实场景或在线收集的各种柔性物体图像组成。在FDA、FireNet、CIFAR-100和ImageNet-Hard数据集上进行的大量实验表明,我们的方法可以有效地增强对柔性物体的识别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信