A three-stage model for camouflaged object detection

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-10-28 DOI:10.1016/j.neucom.2024.128784

Tianyou Chen , Hui Ruan , Shaojie Wang , Jin Xiao , Xiaoguang Hu

{"title":"A three-stage model for camouflaged object detection","authors":"Tianyou Chen , Hui Ruan , Shaojie Wang , Jin Xiao , Xiaoguang Hu","doi":"10.1016/j.neucom.2024.128784","DOIUrl":null,"url":null,"abstract":"<div><div>Camouflaged objects are typically assimilated into their backgrounds and exhibit fuzzy boundaries. The complex environmental conditions and the high intrinsic similarity between camouflaged targets and their surroundings pose significant challenges in accurately locating and segmenting these objects in their entirety. While existing methods have demonstrated remarkable performance in various real-world scenarios, they still face limitations when confronted with difficult cases, such as small targets, thin structures, and indistinct boundaries. Drawing inspiration from human visual perception when observing images containing camouflaged objects, we propose a three-stage model that enables coarse-to-fine segmentation in a single iteration. Specifically, our model employs three decoders to sequentially process subsampled features, cropped features, and high-resolution original features. This proposed approach not only reduces computational overhead but also mitigates interference caused by background noise. Furthermore, considering the significance of multi-scale information, we have designed a multi-scale feature enhancement module that enlarges the receptive field while preserving detailed structural cues. Additionally, a boundary enhancement module has been developed to enhance performance by leveraging boundary information. Subsequently, a mask-guided fusion module is proposed to generate fine-grained results by integrating coarse prediction maps with high-resolution feature maps. Our network shows superior performance without introducing unnecessary complexities. Upon acceptance of the paper, the source code will be made publicly available at <span><span>https://github.com/clelouch/TSNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128784"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015558","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Camouflaged objects are typically assimilated into their backgrounds and exhibit fuzzy boundaries. The complex environmental conditions and the high intrinsic similarity between camouflaged targets and their surroundings pose significant challenges in accurately locating and segmenting these objects in their entirety. While existing methods have demonstrated remarkable performance in various real-world scenarios, they still face limitations when confronted with difficult cases, such as small targets, thin structures, and indistinct boundaries. Drawing inspiration from human visual perception when observing images containing camouflaged objects, we propose a three-stage model that enables coarse-to-fine segmentation in a single iteration. Specifically, our model employs three decoders to sequentially process subsampled features, cropped features, and high-resolution original features. This proposed approach not only reduces computational overhead but also mitigates interference caused by background noise. Furthermore, considering the significance of multi-scale information, we have designed a multi-scale feature enhancement module that enlarges the receptive field while preserving detailed structural cues. Additionally, a boundary enhancement module has been developed to enhance performance by leveraging boundary information. Subsequently, a mask-guided fusion module is proposed to generate fine-grained results by integrating coarse prediction maps with high-resolution feature maps. Our network shows superior performance without introducing unnecessary complexities. Upon acceptance of the paper, the source code will be made publicly available at https://github.com/clelouch/TSNet.

查看原文本刊更多论文

伪装物体检测的三阶段模型

伪装目标通常会被其背景所同化，并表现出模糊的边界。复杂的环境条件以及伪装目标与其周围环境之间的高度内在相似性，对准确定位和分割这些物体的整体构成了巨大挑战。虽然现有方法在各种真实世界场景中表现出了不俗的性能，但在面对小目标、薄结构和边界不清晰等困难情况时，这些方法仍然面临着局限性。我们从人类观察包含伪装物体的图像时的视觉感知中汲取灵感，提出了一种三阶段模型，可在一次迭代中实现从粗到细的分割。具体来说，我们的模型采用三个解码器依次处理子采样特征、裁剪特征和高分辨率原始特征。这种方法不仅能减少计算开销，还能减轻背景噪声造成的干扰。此外，考虑到多尺度信息的重要性，我们还设计了一个多尺度特征增强模块，在扩大感受野的同时保留详细的结构线索。此外，我们还开发了一个边界增强模块，通过利用边界信息来提高性能。随后，我们提出了一个掩膜引导融合模块，通过将粗略预测图与高分辨率特征图进行整合来生成精细结果。我们的网络在不引入不必要复杂性的情况下显示出卓越的性能。论文一经接受，源代码将在 https://github.com/clelouch/TSNet 上公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.