Background-Tolerant Object Classification With Embedded Segmentation Mask For Infrared And Color Imagery

2022 IEEE International Conference on Image Processing (ICIP) Pub Date : 2022-10-16 DOI:10.1109/ICIP46576.2022.9897418

Maliha Arif, Calvin Yong, Abhijit Mahalanobis, N. Rahnavard

{"title":"Background-Tolerant Object Classification With Embedded Segmentation Mask For Infrared And Color Imagery","authors":"Maliha Arif, Calvin Yong, Abhijit Mahalanobis, N. Rahnavard","doi":"10.1109/ICIP46576.2022.9897418","DOIUrl":null,"url":null,"abstract":"Even though convolutional neural networks (CNNs) can classify objects in images very accurately, it is well known that the attention of the network may not always be on the semantically important regions of the scene. It has been observed that networks often learn background textures, which are not relevant to the object of interest. In turn this makes the networks susceptible to variations and changes in the background which may negatively affect their performance.We propose a new three-step training procedure called split training to reduce this bias in CNNs for object recognition using Infrared imagery and Color (RGB) data. Our split training procedure has three steps. First, a baseline model is trained to recognize objects in images without background, and the activations produced by the higher layers are observed. Next, a second network is trained using Mean Square Error (MSE) loss to produce the same activations, but in response to the objects embedded in background. This forces the second network to ignore the background while focusing on the object of interest. Finally, with layers producing the activations frozen, the rest of the second network is trained using cross-entropy loss to classify the objects in images with background. Our training method outperforms the traditional training procedure in both a simple CNN architecture, as well as for deep CNNs like VGG and DenseNet, and learns to mimic human vision which focuses more on shape and structure than background with higher accuracy.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":" 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP46576.2022.9897418","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Even though convolutional neural networks (CNNs) can classify objects in images very accurately, it is well known that the attention of the network may not always be on the semantically important regions of the scene. It has been observed that networks often learn background textures, which are not relevant to the object of interest. In turn this makes the networks susceptible to variations and changes in the background which may negatively affect their performance.We propose a new three-step training procedure called split training to reduce this bias in CNNs for object recognition using Infrared imagery and Color (RGB) data. Our split training procedure has three steps. First, a baseline model is trained to recognize objects in images without background, and the activations produced by the higher layers are observed. Next, a second network is trained using Mean Square Error (MSE) loss to produce the same activations, but in response to the objects embedded in background. This forces the second network to ignore the background while focusing on the object of interest. Finally, with layers producing the activations frozen, the rest of the second network is trained using cross-entropy loss to classify the objects in images with background. Our training method outperforms the traditional training procedure in both a simple CNN architecture, as well as for deep CNNs like VGG and DenseNet, and learns to mimic human vision which focuses more on shape and structure than background with higher accuracy.

查看原文本刊更多论文

基于嵌入式分割掩模的红外和彩色图像背景容忍目标分类

尽管卷积神经网络(cnn)可以非常准确地对图像中的物体进行分类，但众所周知，网络的注意力可能并不总是集中在场景的语义重要区域上。已经观察到，网络经常学习背景纹理，这与感兴趣的对象无关。反过来，这使得网络容易受到背景变化和变化的影响，这可能会对它们的性能产生负面影响。我们提出了一种新的三步训练方法，称为分割训练，以减少cnn在使用红外图像和RGB数据进行物体识别时的这种偏差。我们的分割训练程序有三个步骤。首先，训练基线模型来识别无背景图像中的物体，并观察由更高层产生的激活。接下来，使用均方误差(MSE)损失对第二个网络进行训练，以产生相同的激活，但对嵌入在背景中的对象做出响应。这迫使第二个网络在专注于感兴趣的对象时忽略背景。最后，随着产生激活的层被冻结，第二个网络的其余部分使用交叉熵损失进行训练，以对具有背景的图像中的物体进行分类。我们的训练方法在简单的CNN架构以及VGG和DenseNet等深度CNN的训练过程中都优于传统的训练过程，并且学习模仿人类视觉，更关注形状和结构而不是背景，准确率更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Image Processing (ICIP)

自引率

0.00%

发文量