Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan
{"title":"Image Multi-Label Classification Based on Pyramid Convolution and Split-Attention Mechanism","authors":"Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan","doi":"10.1109/ICCWAMTIP53232.2021.9674123","DOIUrl":null,"url":null,"abstract":"Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.","PeriodicalId":358772,"journal":{"name":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWAMTIP53232.2021.9674123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.