{"title":"Research on Image Caption of Children's Image Based on Attention Mechanism","authors":"Haibing Li, Xiang Li, Wenyon Wang","doi":"10.1109/ICAICA52286.2021.9498013","DOIUrl":null,"url":null,"abstract":"Image Caption refers to a technique in which the computer uses neural networks to identify the image content and output text statements that conform to people's reading habits by studying the object categories, attributes, relationships among objects, etc. This paper builds an Image Caption network model with Attention Mechanism. The model first uses convolution neural network ResNet50 to extract image features, encoding image information, and then weighted image features through Attention Mechanism. Finally, the three-layer stacked LSTM network is used to decode the image features and output the description statements. Also, in this paper, Smooth L1 is used as a loss function of the Attention Mechanism to solve the problem of gradient explosion caused by excessive gradient and strengthen the training effect. Because the whole process of Image Caption is like making the machine \"talking about pictures \", this paper applies this technology to early childhood education in order to help children\" talking about pictures \"purpose.","PeriodicalId":121979,"journal":{"name":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA52286.2021.9498013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Image Caption refers to a technique in which the computer uses neural networks to identify the image content and output text statements that conform to people's reading habits by studying the object categories, attributes, relationships among objects, etc. This paper builds an Image Caption network model with Attention Mechanism. The model first uses convolution neural network ResNet50 to extract image features, encoding image information, and then weighted image features through Attention Mechanism. Finally, the three-layer stacked LSTM network is used to decode the image features and output the description statements. Also, in this paper, Smooth L1 is used as a loss function of the Attention Mechanism to solve the problem of gradient explosion caused by excessive gradient and strengthen the training effect. Because the whole process of Image Caption is like making the machine "talking about pictures ", this paper applies this technology to early childhood education in order to help children" talking about pictures "purpose.