PreCM: The Padding-Based Rotation Equivariant Convolution Mode for Semantic Segmentation

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-18 DOI:10.1109/TIP.2025.3558425

Xinyu Xu;Huazhen Liu;Tao Zhang;Huilin Xiong;Wenxian Yu

{"title":"PreCM: The Padding-Based Rotation Equivariant Convolution Mode for Semantic Segmentation","authors":"Xinyu Xu;Huazhen Liu;Tao Zhang;Huilin Xiong;Wenxian Yu","doi":"10.1109/TIP.2025.3558425","DOIUrl":null,"url":null,"abstract":"Semantic segmentation is an important branch of image processing and computer vision. With the popularity of deep learning, various convolutional neural networks have been proposed for pixel-level classification and segmentation tasks. In practical scenarios, however, imaging angles are often arbitrary, encompassing instances such as water body images from remote sensing and capillary and polyp images in the medical domain, where prior orientation information is typically unavailable to guide these networks to extract more effective features. In this case, learning features from objects with diverse orientation information poses a significant challenge, as the majority of CNN-based semantic segmentation networks lack rotation equivariance to resist the disturbance from orientation information. To address this challenge, this paper first constructs a universal convolution-group framework aimed at more fully utilizing orientation information and equipping the network with rotation equivariance. Subsequently, we mathematically design a padding-based rotation equivariant convolution mode (PreCM), which is not only applicable to multi-scale images and convolutional kernels but can also serve as a replacement component for various types of convolutions, such as dilated convolutions, transposed convolutions, and asymmetric convolution. To quantitatively assess the impact of image rotation in semantic segmentation tasks, we also propose a new evaluation metric, Rotation Difference (RD). The replacement experiments related to six existing semantic segmentation networks on three datasets (i.e., Satellite Images of Water Bodies, DRIVE, and Floodnet) show that, the average Intersection Over Union (IOU) of their PreCM-based versions respectively improve 6.91%, 10.63%, 4.53%, 5.93%, 7.48%, 8.33% compared to their original versions in terms of random angle rotation. And the average RD values are decreased by 3.58%, 4.56%, 3.47%, 3.66%, 3.47%, 3.43% respectively. The code can be download from <uri>https://github.com/XinyuXu414</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2781-2795"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10970426/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic segmentation is an important branch of image processing and computer vision. With the popularity of deep learning, various convolutional neural networks have been proposed for pixel-level classification and segmentation tasks. In practical scenarios, however, imaging angles are often arbitrary, encompassing instances such as water body images from remote sensing and capillary and polyp images in the medical domain, where prior orientation information is typically unavailable to guide these networks to extract more effective features. In this case, learning features from objects with diverse orientation information poses a significant challenge, as the majority of CNN-based semantic segmentation networks lack rotation equivariance to resist the disturbance from orientation information. To address this challenge, this paper first constructs a universal convolution-group framework aimed at more fully utilizing orientation information and equipping the network with rotation equivariance. Subsequently, we mathematically design a padding-based rotation equivariant convolution mode (PreCM), which is not only applicable to multi-scale images and convolutional kernels but can also serve as a replacement component for various types of convolutions, such as dilated convolutions, transposed convolutions, and asymmetric convolution. To quantitatively assess the impact of image rotation in semantic segmentation tasks, we also propose a new evaluation metric, Rotation Difference (RD). The replacement experiments related to six existing semantic segmentation networks on three datasets (i.e., Satellite Images of Water Bodies, DRIVE, and Floodnet) show that, the average Intersection Over Union (IOU) of their PreCM-based versions respectively improve 6.91%, 10.63%, 4.53%, 5.93%, 7.48%, 8.33% compared to their original versions in terms of random angle rotation. And the average RD values are decreased by 3.58%, 4.56%, 3.47%, 3.66%, 3.47%, 3.43% respectively. The code can be download from https://github.com/XinyuXu414

查看原文本刊更多论文

基于填充的旋转等变卷积语义分割模型

语义分割是图像处理和计算机视觉的一个重要分支。随着深度学习的普及，各种卷积神经网络被提出用于像素级的分类和分割任务。然而，在实际场景中，成像角度往往是任意的，包括遥感水体图像和医学领域的毛细血管和息肉图像等实例，在这些情况下，通常无法获得先验方向信息来指导这些网络提取更有效的特征。在这种情况下，从具有不同方向信息的对象中学习特征是一个很大的挑战，因为大多数基于cnn的语义分割网络缺乏旋转等方差来抵抗方向信息的干扰。为了解决这一挑战，本文首先构建了一个通用的卷积群框架，旨在更充分地利用方向信息并赋予网络旋转等方差。随后，我们在数学上设计了一种基于填充的旋转等变卷积模式（PreCM），该模式不仅适用于多尺度图像和卷积核，而且可以作为各种类型卷积的替代成分，如扩展卷积、转置卷积和不对称卷积。为了定量评估图像旋转对语义分割任务的影响，我们还提出了一个新的评估指标，旋转差（RD）。对现有6个语义分割网络在水体、DRIVE和Floodnet三个数据集上的替换实验表明，基于precm的语义分割网络在随机角度旋转方面的平均IOU分别比原始版本提高了6.91%、10.63%、4.53%、5.93%、7.48%和8.33%。平均RD值分别下降3.58%、4.56%、3.47%、3.66%、3.47%、3.43%。代码可以从https://github.com/XinyuXu414下载

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量