Dongdong Zhang, Chunping Wang, Huiying Wang, Qiang Fu, Zhaorui Li
{"title":"一种用于伪装目标检测的有效CNN和Transformer融合网络","authors":"Dongdong Zhang, Chunping Wang, Huiying Wang, Qiang Fu, Zhaorui Li","doi":"10.1016/j.cviu.2025.104431","DOIUrl":null,"url":null,"abstract":"<div><div>Camouflage object detection aims to identify concealed objects in images. Global context and local spatial details are crucial for this task. Convolutional neural network (CNN) excels at capturing fine-grained local features, while Transformer is adept at modeling global contextual information. To leverage their respective strengths, we propose a novel CNN-Transformer fusion network (CTF-Net) for COD to achieve more accurate detection. Our approach employs parallel CNN and Transformer branches as an encoder to extract complementary features. We then propose a cross-domain fusion module (CDFM) to fuse these features with cross-modulation. Additionally, we develop a boundary-aware module (BAM) that combines low-level edge details with high-level global context to extract camouflaged object edge features. Furthermore, we design a feature enhancement module (FEM) to mitigate background and noise interference during cross-layer feature fusion, thereby highlighting camouflaged object regions for precise predictions. Extensive experiments show that CTF-Net outperforms the existing 16 state-of-the-art methods on four widely-used COD datasets. Especially, compared with all the comparison models, CTF-Net significantly improves the performance by <span><math><mo>∼</mo></math></span>5.1% (F-measure) on the NC4K dataset, showing that CTF-Net could accurately detect camouflaged objects. Our code is publicly available at <span><span>https://github.com/zcc0616/CTF-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104431"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An effective CNN and Transformer fusion network for camouflaged object detection\",\"authors\":\"Dongdong Zhang, Chunping Wang, Huiying Wang, Qiang Fu, Zhaorui Li\",\"doi\":\"10.1016/j.cviu.2025.104431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Camouflage object detection aims to identify concealed objects in images. Global context and local spatial details are crucial for this task. Convolutional neural network (CNN) excels at capturing fine-grained local features, while Transformer is adept at modeling global contextual information. To leverage their respective strengths, we propose a novel CNN-Transformer fusion network (CTF-Net) for COD to achieve more accurate detection. Our approach employs parallel CNN and Transformer branches as an encoder to extract complementary features. We then propose a cross-domain fusion module (CDFM) to fuse these features with cross-modulation. Additionally, we develop a boundary-aware module (BAM) that combines low-level edge details with high-level global context to extract camouflaged object edge features. Furthermore, we design a feature enhancement module (FEM) to mitigate background and noise interference during cross-layer feature fusion, thereby highlighting camouflaged object regions for precise predictions. Extensive experiments show that CTF-Net outperforms the existing 16 state-of-the-art methods on four widely-used COD datasets. Especially, compared with all the comparison models, CTF-Net significantly improves the performance by <span><math><mo>∼</mo></math></span>5.1% (F-measure) on the NC4K dataset, showing that CTF-Net could accurately detect camouflaged objects. Our code is publicly available at <span><span>https://github.com/zcc0616/CTF-Net</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"259 \",\"pages\":\"Article 104431\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225001547\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001547","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
An effective CNN and Transformer fusion network for camouflaged object detection
Camouflage object detection aims to identify concealed objects in images. Global context and local spatial details are crucial for this task. Convolutional neural network (CNN) excels at capturing fine-grained local features, while Transformer is adept at modeling global contextual information. To leverage their respective strengths, we propose a novel CNN-Transformer fusion network (CTF-Net) for COD to achieve more accurate detection. Our approach employs parallel CNN and Transformer branches as an encoder to extract complementary features. We then propose a cross-domain fusion module (CDFM) to fuse these features with cross-modulation. Additionally, we develop a boundary-aware module (BAM) that combines low-level edge details with high-level global context to extract camouflaged object edge features. Furthermore, we design a feature enhancement module (FEM) to mitigate background and noise interference during cross-layer feature fusion, thereby highlighting camouflaged object regions for precise predictions. Extensive experiments show that CTF-Net outperforms the existing 16 state-of-the-art methods on four widely-used COD datasets. Especially, compared with all the comparison models, CTF-Net significantly improves the performance by 5.1% (F-measure) on the NC4K dataset, showing that CTF-Net could accurately detect camouflaged objects. Our code is publicly available at https://github.com/zcc0616/CTF-Net.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems