Maliha Arif, Calvin Yong, Abhijit Mahalanobis, N. Rahnavard
{"title":"基于嵌入式分割掩模的红外和彩色图像背景容忍目标分类","authors":"Maliha Arif, Calvin Yong, Abhijit Mahalanobis, N. Rahnavard","doi":"10.1109/ICIP46576.2022.9897418","DOIUrl":null,"url":null,"abstract":"Even though convolutional neural networks (CNNs) can classify objects in images very accurately, it is well known that the attention of the network may not always be on the semantically important regions of the scene. It has been observed that networks often learn background textures, which are not relevant to the object of interest. In turn this makes the networks susceptible to variations and changes in the background which may negatively affect their performance.We propose a new three-step training procedure called split training to reduce this bias in CNNs for object recognition using Infrared imagery and Color (RGB) data. Our split training procedure has three steps. First, a baseline model is trained to recognize objects in images without background, and the activations produced by the higher layers are observed. Next, a second network is trained using Mean Square Error (MSE) loss to produce the same activations, but in response to the objects embedded in background. This forces the second network to ignore the background while focusing on the object of interest. Finally, with layers producing the activations frozen, the rest of the second network is trained using cross-entropy loss to classify the objects in images with background. Our training method outperforms the traditional training procedure in both a simple CNN architecture, as well as for deep CNNs like VGG and DenseNet, and learns to mimic human vision which focuses more on shape and structure than background with higher accuracy.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":" 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Background-Tolerant Object Classification With Embedded Segmentation Mask For Infrared And Color Imagery\",\"authors\":\"Maliha Arif, Calvin Yong, Abhijit Mahalanobis, N. Rahnavard\",\"doi\":\"10.1109/ICIP46576.2022.9897418\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Even though convolutional neural networks (CNNs) can classify objects in images very accurately, it is well known that the attention of the network may not always be on the semantically important regions of the scene. It has been observed that networks often learn background textures, which are not relevant to the object of interest. In turn this makes the networks susceptible to variations and changes in the background which may negatively affect their performance.We propose a new three-step training procedure called split training to reduce this bias in CNNs for object recognition using Infrared imagery and Color (RGB) data. Our split training procedure has three steps. First, a baseline model is trained to recognize objects in images without background, and the activations produced by the higher layers are observed. Next, a second network is trained using Mean Square Error (MSE) loss to produce the same activations, but in response to the objects embedded in background. This forces the second network to ignore the background while focusing on the object of interest. Finally, with layers producing the activations frozen, the rest of the second network is trained using cross-entropy loss to classify the objects in images with background. Our training method outperforms the traditional training procedure in both a simple CNN architecture, as well as for deep CNNs like VGG and DenseNet, and learns to mimic human vision which focuses more on shape and structure than background with higher accuracy.\",\"PeriodicalId\":387035,\"journal\":{\"name\":\"2022 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\" 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP46576.2022.9897418\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP46576.2022.9897418","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Background-Tolerant Object Classification With Embedded Segmentation Mask For Infrared And Color Imagery
Even though convolutional neural networks (CNNs) can classify objects in images very accurately, it is well known that the attention of the network may not always be on the semantically important regions of the scene. It has been observed that networks often learn background textures, which are not relevant to the object of interest. In turn this makes the networks susceptible to variations and changes in the background which may negatively affect their performance.We propose a new three-step training procedure called split training to reduce this bias in CNNs for object recognition using Infrared imagery and Color (RGB) data. Our split training procedure has three steps. First, a baseline model is trained to recognize objects in images without background, and the activations produced by the higher layers are observed. Next, a second network is trained using Mean Square Error (MSE) loss to produce the same activations, but in response to the objects embedded in background. This forces the second network to ignore the background while focusing on the object of interest. Finally, with layers producing the activations frozen, the rest of the second network is trained using cross-entropy loss to classify the objects in images with background. Our training method outperforms the traditional training procedure in both a simple CNN architecture, as well as for deep CNNs like VGG and DenseNet, and learns to mimic human vision which focuses more on shape and structure than background with higher accuracy.