Score-based Aggregation for Attention Modules in Image Classification Tasks

2019 IEEE 4th International Conference on Technology, Informatics, Management, Engineering & Environment (TIME-E) Pub Date : 2019-11-13 DOI:10.1109/TIME-E47986.2019.9353302

Changwoo Lee, Ki-Seok Chung

{"title":"Score-based Aggregation for Attention Modules in Image Classification Tasks","authors":"Changwoo Lee, Ki-Seok Chung","doi":"10.1109/TIME-E47986.2019.9353302","DOIUrl":null,"url":null,"abstract":"Deep Convolutional Neural Networks (CNNs) have been widely used for various computer vision tasks because they hierarchically extract bountiful features from a highdimensional image. Also, some CNNs incorporate channel attention mechanisms that re-scale each channel of intermediate feature maps based on their importance. The channel attention modules squeeze the spatial information of a feature into a representative value to transform it as a re-scaling value. In order to reduce the amount of information, attention modules have utilized hand-designed pooling functions such as max pooling or average pooling which have been widely adopted in CNNs, because they add negligible computational complexity. However, a significant amount of spatial information is lost due to these pooling functions. In this paper, we propose a generalized pooling function that scales down spatial information with respect to the importance of each pixel. Unlike max pooling or average pooling, our score-based aggregation is capable of flexibly adjusting to input. Also, the score-based aggregation function learns how to squeeze the spatial information into the must appropriate representative value, which will convert the pooling into a spatial attention mechanism. Finally, we propose a novel method called Score-based Aggregated Attention Module (SAAM) that utilizes the proposed score-based aggregation. Our experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that SAAM achieves the highest classification accuracy improvement among existing channel attention modules since the score-based aggregation in SAAM is a more dynamic and effective method than the hand-designed aggregations.","PeriodicalId":345220,"journal":{"name":"2019 IEEE 4th International Conference on Technology, Informatics, Management, Engineering & Environment (TIME-E)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 4th International Conference on Technology, Informatics, Management, Engineering & Environment (TIME-E)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TIME-E47986.2019.9353302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Convolutional Neural Networks (CNNs) have been widely used for various computer vision tasks because they hierarchically extract bountiful features from a highdimensional image. Also, some CNNs incorporate channel attention mechanisms that re-scale each channel of intermediate feature maps based on their importance. The channel attention modules squeeze the spatial information of a feature into a representative value to transform it as a re-scaling value. In order to reduce the amount of information, attention modules have utilized hand-designed pooling functions such as max pooling or average pooling which have been widely adopted in CNNs, because they add negligible computational complexity. However, a significant amount of spatial information is lost due to these pooling functions. In this paper, we propose a generalized pooling function that scales down spatial information with respect to the importance of each pixel. Unlike max pooling or average pooling, our score-based aggregation is capable of flexibly adjusting to input. Also, the score-based aggregation function learns how to squeeze the spatial information into the must appropriate representative value, which will convert the pooling into a spatial attention mechanism. Finally, we propose a novel method called Score-based Aggregated Attention Module (SAAM) that utilizes the proposed score-based aggregation. Our experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that SAAM achieves the highest classification accuracy improvement among existing channel attention modules since the score-based aggregation in SAAM is a more dynamic and effective method than the hand-designed aggregations.

查看原文本刊更多论文

图像分类任务中关注模块的基于分数的聚合

深度卷积神经网络(cnn)因其从高维图像中分层提取丰富的特征而被广泛应用于各种计算机视觉任务。此外，一些cnn结合了通道关注机制，根据它们的重要性重新缩放中间特征映射的每个通道。通道注意模块将特征的空间信息压缩为一个代表性值，并将其转换为一个重尺度值。为了减少信息量，注意力模块使用了手工设计的池化函数，如max pooling或average pooling，这些函数在cnn中被广泛采用，因为它们的计算复杂度可以忽略不计。然而，由于这些池化功能，大量的空间信息丢失了。在本文中，我们提出了一个广义的池化函数，该函数根据每个像素的重要性来缩小空间信息。与最大池化或平均池化不同，我们基于分数的聚合能够灵活地调整输入。同时，基于分数的聚合函数学习如何将空间信息压缩成最合适的代表值，从而将池化转化为空间注意机制。最后，我们提出了一种新的方法，称为基于分数的聚合注意力模块(SAAM)，它利用了所提出的基于分数的聚合。我们在CIFAR-10和CIFAR-100数据集上的实验结果表明，SAAM在现有频道关注模块中实现了最高的分类精度提高，因为SAAM中基于分数的聚合方法比手工设计的聚合方法更加动态和有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 4th International Conference on Technology, Informatics, Management, Engineering & Environment (TIME-E)

自引率

0.00%

发文量