{"title":"图像分类任务中关注模块的基于分数的聚合","authors":"Changwoo Lee, Ki-Seok Chung","doi":"10.1109/TIME-E47986.2019.9353302","DOIUrl":null,"url":null,"abstract":"Deep Convolutional Neural Networks (CNNs) have been widely used for various computer vision tasks because they hierarchically extract bountiful features from a highdimensional image. Also, some CNNs incorporate channel attention mechanisms that re-scale each channel of intermediate feature maps based on their importance. The channel attention modules squeeze the spatial information of a feature into a representative value to transform it as a re-scaling value. In order to reduce the amount of information, attention modules have utilized hand-designed pooling functions such as max pooling or average pooling which have been widely adopted in CNNs, because they add negligible computational complexity. However, a significant amount of spatial information is lost due to these pooling functions. In this paper, we propose a generalized pooling function that scales down spatial information with respect to the importance of each pixel. Unlike max pooling or average pooling, our score-based aggregation is capable of flexibly adjusting to input. Also, the score-based aggregation function learns how to squeeze the spatial information into the must appropriate representative value, which will convert the pooling into a spatial attention mechanism. Finally, we propose a novel method called Score-based Aggregated Attention Module (SAAM) that utilizes the proposed score-based aggregation. Our experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that SAAM achieves the highest classification accuracy improvement among existing channel attention modules since the score-based aggregation in SAAM is a more dynamic and effective method than the hand-designed aggregations.","PeriodicalId":345220,"journal":{"name":"2019 IEEE 4th International Conference on Technology, Informatics, Management, Engineering & Environment (TIME-E)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Score-based Aggregation for Attention Modules in Image Classification Tasks\",\"authors\":\"Changwoo Lee, Ki-Seok Chung\",\"doi\":\"10.1109/TIME-E47986.2019.9353302\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Convolutional Neural Networks (CNNs) have been widely used for various computer vision tasks because they hierarchically extract bountiful features from a highdimensional image. Also, some CNNs incorporate channel attention mechanisms that re-scale each channel of intermediate feature maps based on their importance. The channel attention modules squeeze the spatial information of a feature into a representative value to transform it as a re-scaling value. In order to reduce the amount of information, attention modules have utilized hand-designed pooling functions such as max pooling or average pooling which have been widely adopted in CNNs, because they add negligible computational complexity. However, a significant amount of spatial information is lost due to these pooling functions. In this paper, we propose a generalized pooling function that scales down spatial information with respect to the importance of each pixel. Unlike max pooling or average pooling, our score-based aggregation is capable of flexibly adjusting to input. Also, the score-based aggregation function learns how to squeeze the spatial information into the must appropriate representative value, which will convert the pooling into a spatial attention mechanism. Finally, we propose a novel method called Score-based Aggregated Attention Module (SAAM) that utilizes the proposed score-based aggregation. Our experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that SAAM achieves the highest classification accuracy improvement among existing channel attention modules since the score-based aggregation in SAAM is a more dynamic and effective method than the hand-designed aggregations.\",\"PeriodicalId\":345220,\"journal\":{\"name\":\"2019 IEEE 4th International Conference on Technology, Informatics, Management, Engineering & Environment (TIME-E)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 4th International Conference on Technology, Informatics, Management, Engineering & Environment (TIME-E)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TIME-E47986.2019.9353302\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 4th International Conference on Technology, Informatics, Management, Engineering & Environment (TIME-E)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TIME-E47986.2019.9353302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Score-based Aggregation for Attention Modules in Image Classification Tasks
Deep Convolutional Neural Networks (CNNs) have been widely used for various computer vision tasks because they hierarchically extract bountiful features from a highdimensional image. Also, some CNNs incorporate channel attention mechanisms that re-scale each channel of intermediate feature maps based on their importance. The channel attention modules squeeze the spatial information of a feature into a representative value to transform it as a re-scaling value. In order to reduce the amount of information, attention modules have utilized hand-designed pooling functions such as max pooling or average pooling which have been widely adopted in CNNs, because they add negligible computational complexity. However, a significant amount of spatial information is lost due to these pooling functions. In this paper, we propose a generalized pooling function that scales down spatial information with respect to the importance of each pixel. Unlike max pooling or average pooling, our score-based aggregation is capable of flexibly adjusting to input. Also, the score-based aggregation function learns how to squeeze the spatial information into the must appropriate representative value, which will convert the pooling into a spatial attention mechanism. Finally, we propose a novel method called Score-based Aggregated Attention Module (SAAM) that utilizes the proposed score-based aggregation. Our experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that SAAM achieves the highest classification accuracy improvement among existing channel attention modules since the score-based aggregation in SAAM is a more dynamic and effective method than the hand-designed aggregations.