{"title":"Image Captioning: Methods and Evaluation Metrics","authors":"Himanshu Sharma, Swati Srivastava","doi":"10.1109/CONIT55038.2022.9847918","DOIUrl":null,"url":null,"abstract":"In Computer Vision, image captioning is one of the most fascinating topics. Image captioning simply says generating sentences from a given picture. The process of generating captions from images includes two processes. One of them is image processing and the other one is natural language processing. In this, we use CNN that is a Convolution neural network, and RNN is a recurrent neural network. We use CNN for drawing out properties from the given picture and RNN for creating language. Though, the process is a bit complex and difficult, though we have put in a good amount of time in planning and research for the contentment of every attribute of it. The paper contains a comprehensive study of different models that are used for image captioning. Models that we will be discussing will be the Retrieval-Based Model, Template-Based Model, CNN-RNN based model, and CNN-CNN Model.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9847918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In Computer Vision, image captioning is one of the most fascinating topics. Image captioning simply says generating sentences from a given picture. The process of generating captions from images includes two processes. One of them is image processing and the other one is natural language processing. In this, we use CNN that is a Convolution neural network, and RNN is a recurrent neural network. We use CNN for drawing out properties from the given picture and RNN for creating language. Though, the process is a bit complex and difficult, though we have put in a good amount of time in planning and research for the contentment of every attribute of it. The paper contains a comprehensive study of different models that are used for image captioning. Models that we will be discussing will be the Retrieval-Based Model, Template-Based Model, CNN-RNN based model, and CNN-CNN Model.