{"title":"Research on Data Augmentation Strategy Methods for Image Caption","authors":"Nan Lin, Shuang Li, Yingkang Han, Mengdi Liu","doi":"10.1145/3573428.3573673","DOIUrl":null,"url":null,"abstract":"Data augmentation can effectively expand the number of samples in a dataset and increase the diversity of samples. Image caption refers to the generation of a description statement corresponding to an image, and its accuracy directly affects the accuracy of the description statement. In this paper, we study and analyze data augmentation and VizWiz dataset, then we find that data augmentation can effectively simulate the image quality problems existing in VizWiz dataset. In order to improve the accuracy of the image caption model on the VizWiz dataset, this paper presents a method based on a data augmentation strategy, which mainly uses four data augmentation operators to simulate camera shake, out-of-focus, flash and low light conditions. The strategy space also contains basic translate, shear and contrast operations for the image. The method achieves a score: BLEU_1 of 62.5, BLEU_4 of 23.1, ROUGE_L of 46.6 and CIDEr of 49.6 on the VizWiz dataset.","PeriodicalId":314698,"journal":{"name":"Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3573428.3573673","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data augmentation can effectively expand the number of samples in a dataset and increase the diversity of samples. Image caption refers to the generation of a description statement corresponding to an image, and its accuracy directly affects the accuracy of the description statement. In this paper, we study and analyze data augmentation and VizWiz dataset, then we find that data augmentation can effectively simulate the image quality problems existing in VizWiz dataset. In order to improve the accuracy of the image caption model on the VizWiz dataset, this paper presents a method based on a data augmentation strategy, which mainly uses four data augmentation operators to simulate camera shake, out-of-focus, flash and low light conditions. The strategy space also contains basic translate, shear and contrast operations for the image. The method achieves a score: BLEU_1 of 62.5, BLEU_4 of 23.1, ROUGE_L of 46.6 and CIDEr of 49.6 on the VizWiz dataset.