{"title":"Attention Analysis in Caption Generation","authors":"Maaki Shozu, H. Yanagimoto","doi":"10.1109/IIAI-AAI.2019.00029","DOIUrl":null,"url":null,"abstract":"Caption Generation is one of the fundamental tasks combining computer vision and natural language processing. To achieve this goal, neural networks are employed to implement a caption generation system. In this paper, we proposed a caption generation system combining a CNN-based object detection system and a language model with a recurrent neural network. Especially, a vector which is sent from the object detection system to the language model is generated using an attention mechanism. Attention visualization can help us to understand the system focuses on a part of the input image in generating a caption. In the experiments, we evaluate the performance of the proposed system and discuss the effects of the attention mechanism in the image caption. Especially, the attention contributes to the improvement of caption generation but the attention is uncorrelated to system interpretation.","PeriodicalId":136474,"journal":{"name":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2019.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Caption Generation is one of the fundamental tasks combining computer vision and natural language processing. To achieve this goal, neural networks are employed to implement a caption generation system. In this paper, we proposed a caption generation system combining a CNN-based object detection system and a language model with a recurrent neural network. Especially, a vector which is sent from the object detection system to the language model is generated using an attention mechanism. Attention visualization can help us to understand the system focuses on a part of the input image in generating a caption. In the experiments, we evaluate the performance of the proposed system and discuss the effects of the attention mechanism in the image caption. Especially, the attention contributes to the improvement of caption generation but the attention is uncorrelated to system interpretation.