{"title":"Multiple Perspective Caption Generation with Attention Mechanism","authors":"H. Yanagimoto, Maaki Shozu","doi":"10.1109/IIAI-AAI50415.2020.00031","DOIUrl":null,"url":null,"abstract":"In caption generation, a caption generation system generates a caption, which describes the content of the image with natural language and needs to understand both an image and a text. So caption generation is an essential task in natural language processing and image processing. Many researchers recently pay attention to deep learning as a key technique to construct the caption generation system because deep learning can construct an intermediate representation, which is shared in both image processing and natural language processing. First, the system generates a feature from a given image with convolutional neural networks. Eventually, the system generates a word sequence, a caption, from the feature. It means that the system consists of two modules, the image processing module and the language model module and both of the modules are simultaneously trained with a training dataset. The deep learning based caption generation system is a blackbox system and it is difficult to collaborate with a human. So, we introduce attention mechanism in the caption generation system and we control caption generation by the attention weights. The usual caption generation systems can generate only a single caption from one image because the system can generate a caption from an image directly. The proposed system can generate some different captions form the same image because we can control the initial attention weights. Our results demonstrate how we collaborate with deep learning and the fact that the collaboration improves caption generation.","PeriodicalId":188870,"journal":{"name":"2020 9th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 9th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI50415.2020.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In caption generation, a caption generation system generates a caption, which describes the content of the image with natural language and needs to understand both an image and a text. So caption generation is an essential task in natural language processing and image processing. Many researchers recently pay attention to deep learning as a key technique to construct the caption generation system because deep learning can construct an intermediate representation, which is shared in both image processing and natural language processing. First, the system generates a feature from a given image with convolutional neural networks. Eventually, the system generates a word sequence, a caption, from the feature. It means that the system consists of two modules, the image processing module and the language model module and both of the modules are simultaneously trained with a training dataset. The deep learning based caption generation system is a blackbox system and it is difficult to collaborate with a human. So, we introduce attention mechanism in the caption generation system and we control caption generation by the attention weights. The usual caption generation systems can generate only a single caption from one image because the system can generate a caption from an image directly. The proposed system can generate some different captions form the same image because we can control the initial attention weights. Our results demonstrate how we collaborate with deep learning and the fact that the collaboration improves caption generation.