{"title":"柔性视觉:学习柔性物体识别的自显著性","authors":"Kunshan Yang;Lin Zuo;Mengmeng Jing;Xianlong Tian;Kunbin He;Yongqi Ding","doi":"10.1109/TCSVT.2025.3534204","DOIUrl":null,"url":null,"abstract":"Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring graph nodes, which is employed to identify flexible objects with minimal inter-class differences. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid graph nodes. This introduces local context information and enables extracting of consistent representation, effectively adapting to the shape and size variations in flexible objects. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our FDA, FireNet, CIFAR-100 and ImageNet-Hard datasets demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"6424-6436"},"PeriodicalIF":11.1000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition\",\"authors\":\"Kunshan Yang;Lin Zuo;Mengmeng Jing;Xianlong Tian;Kunbin He;Yongqi Ding\",\"doi\":\"10.1109/TCSVT.2025.3534204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring graph nodes, which is employed to identify flexible objects with minimal inter-class differences. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid graph nodes. This introduces local context information and enables extracting of consistent representation, effectively adapting to the shape and size variations in flexible objects. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our FDA, FireNet, CIFAR-100 and ImageNet-Hard datasets demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 7\",\"pages\":\"6424-6436\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10854580/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10854580/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition
Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise from the lack of object saliency. To this end, we propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects. Specifically, on one hand, we propose to maximize the channel-aware saliency by extracting the weight of neighboring graph nodes, which is employed to identify flexible objects with minimal inter-class differences. On the other hand, we maximize the spatial-aware saliency based on clustering to aggregate neighborhood information for the centroid graph nodes. This introduces local context information and enables extracting of consistent representation, effectively adapting to the shape and size variations in flexible objects. To verify the performance of flexible objects recognition thoroughly, for the first time we propose the Flexible Dataset (FDA), which consists of various images of flexible objects collected from real-world scenarios or online. Extensive experiments evaluated on our FDA, FireNet, CIFAR-100 and ImageNet-Hard datasets demonstrate the effectiveness of our method on enhancing the discrimination of flexible objects.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.