{"title":"Image Captioning based on Deep Convolutional Neural Networks and LSTM","authors":"Swati Srivastava, Himanshu Sharma, Pragati Dixit","doi":"10.1109/PARC52418.2022.9726635","DOIUrl":null,"url":null,"abstract":"Image captioning is a challenging task that needs the knowledge from both computer vision algorithms and language processing techniques. The model must be able to understand an image and then apply language generation techniques to describe an image in a natural language such as English. In this paper, we have presented an image captioning model which uses VGG16 for visual feature extraction and LSTM model to generate sentences corresponding to extracted visual features. We have performed experiments on Flickr8k and Flickr30k datasets. Bilingual Evaluation Understudy (BLEU) metric is used to measure the accuracy of the proposed model. The proposed model can be further extended to wide range of applications related to IOT based applications and smart control systems.","PeriodicalId":158896,"journal":{"name":"2022 2nd International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PARC52418.2022.9726635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Image captioning is a challenging task that needs the knowledge from both computer vision algorithms and language processing techniques. The model must be able to understand an image and then apply language generation techniques to describe an image in a natural language such as English. In this paper, we have presented an image captioning model which uses VGG16 for visual feature extraction and LSTM model to generate sentences corresponding to extracted visual features. We have performed experiments on Flickr8k and Flickr30k datasets. Bilingual Evaluation Understudy (BLEU) metric is used to measure the accuracy of the proposed model. The proposed model can be further extended to wide range of applications related to IOT based applications and smart control systems.