用于阴影检测的结构感知变压器

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Image Processing Pub Date : 2025-03-04 DOI:10.1049/ipr2.70031

Wanlu Sun, Liyun Xiang, Wei Zhao

{"title":"用于阴影检测的结构感知变压器","authors":"Wanlu Sun, Liyun Xiang, Wei Zhao","doi":"10.1049/ipr2.70031","DOIUrl":null,"url":null,"abstract":"<p>Shadow detection helps reduce ambiguity in object detection and tracking. However, existing shadow detection methods tend to misidentify complex shadows and their similar patterns, such as soft shadow regions and shadow-like regions, since they treat all cases equally, leading to an incomplete structure of the detected shadow regions. To alleviate this issue, we propose a structure-aware transformer network (STNet) for robust shadow detection. Specifically, we first develop a transformer-based shadow detection network to learn significant contextual information interactions. To this end, a context-aware enhancement (CaE) block is also introduced into the backbone to expand the receptive field, thus enhancing semantic interaction. Then, we design an edge-guided multi-task learning framework to produce intermediate and main predictions with a rich structure. By fusing these two complementary predictions, we can obtain an edge-preserving refined shadow map. Finally, we introduce an auxiliary semantic-aware learning to overcome the interference from complex scenes, which facilitates the model to perceive shadow and non-shadow regions using a semantic affinity loss. By doing these, we can predict high-quality shadow maps in different scenarios. Experimental results demonstrate that our method reduces the balance error rate (BER) by 4.53%, 2.54%, and 3.49% compared to state-of-the-art (SOTA) methods on the benchmark datasets SBU, ISTD, and UCF, respectively.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70031","citationCount":"0","resultStr":"{\"title\":\"Structure-Aware Transformer for Shadow Detection\",\"authors\":\"Wanlu Sun, Liyun Xiang, Wei Zhao\",\"doi\":\"10.1049/ipr2.70031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Shadow detection helps reduce ambiguity in object detection and tracking. However, existing shadow detection methods tend to misidentify complex shadows and their similar patterns, such as soft shadow regions and shadow-like regions, since they treat all cases equally, leading to an incomplete structure of the detected shadow regions. To alleviate this issue, we propose a structure-aware transformer network (STNet) for robust shadow detection. Specifically, we first develop a transformer-based shadow detection network to learn significant contextual information interactions. To this end, a context-aware enhancement (CaE) block is also introduced into the backbone to expand the receptive field, thus enhancing semantic interaction. Then, we design an edge-guided multi-task learning framework to produce intermediate and main predictions with a rich structure. By fusing these two complementary predictions, we can obtain an edge-preserving refined shadow map. Finally, we introduce an auxiliary semantic-aware learning to overcome the interference from complex scenes, which facilitates the model to perceive shadow and non-shadow regions using a semantic affinity loss. By doing these, we can predict high-quality shadow maps in different scenarios. Experimental results demonstrate that our method reduces the balance error rate (BER) by 4.53%, 2.54%, and 3.49% compared to state-of-the-art (SOTA) methods on the benchmark datasets SBU, ISTD, and UCF, respectively.</p>\",\"PeriodicalId\":56303,\"journal\":{\"name\":\"IET Image Processing\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70031\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Image Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70031\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70031","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

阴影检测有助于减少目标检测和跟踪中的模糊性。然而，现有的阴影检测方法由于对所有情况一视同仁，容易对复杂阴影及其相似模式（如软阴影区域和类阴影区域）进行错误识别，导致检测到的阴影区域结构不完整。为了缓解这个问题，我们提出了一个结构感知的变压器网络（STNet），用于鲁棒阴影检测。具体来说，我们首先开发了一个基于变压器的阴影检测网络来学习重要的上下文信息交互。为此，还在主干中引入了上下文感知增强（CaE）块来扩展接受野，从而增强语义交互。然后，我们设计了一个边缘引导的多任务学习框架，以产生具有丰富结构的中间和主预测。通过融合这两个互补的预测，我们可以得到一个保持边缘的精细阴影图。最后，我们引入了一种辅助的语义感知学习来克服复杂场景的干扰，这有助于模型利用语义亲和损失来感知阴影和非阴影区域。通过这样做，我们可以在不同的场景中预测高质量的阴影地图。实验结果表明，在基准数据集SBU、ISTD和UCF上，我们的方法与最先进的（SOTA）方法相比，平衡错误率（BER）分别降低了4.53%、2.54%和3.49%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Structure-Aware Transformer for Shadow Detection

查看原文本刊更多论文

Structure-Aware Transformer for Shadow Detection

Shadow detection helps reduce ambiguity in object detection and tracking. However, existing shadow detection methods tend to misidentify complex shadows and their similar patterns, such as soft shadow regions and shadow-like regions, since they treat all cases equally, leading to an incomplete structure of the detected shadow regions. To alleviate this issue, we propose a structure-aware transformer network (STNet) for robust shadow detection. Specifically, we first develop a transformer-based shadow detection network to learn significant contextual information interactions. To this end, a context-aware enhancement (CaE) block is also introduced into the backbone to expand the receptive field, thus enhancing semantic interaction. Then, we design an edge-guided multi-task learning framework to produce intermediate and main predictions with a rich structure. By fusing these two complementary predictions, we can obtain an edge-preserving refined shadow map. Finally, we introduce an auxiliary semantic-aware learning to overcome the interference from complex scenes, which facilitates the model to perceive shadow and non-shadow regions using a semantic affinity loss. By doing these, we can predict high-quality shadow maps in different scenarios. Experimental results demonstrate that our method reduces the balance error rate (BER) by 4.53%, 2.54%, and 3.49% compared to state-of-the-art (SOTA) methods on the benchmark datasets SBU, ISTD, and UCF, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Image Processing 工程技术-工程：电子与电气

CiteScore

5.40

自引率

8.70%

发文量

282

审稿时长

6 months

期刊介绍： The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications. Principal topics include: Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality. Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing. Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing. Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video. Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography. Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security. Current Special Issue Call for Papers: Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf