{"title":"Implementation of Simple and Efficient Picture Caption Generator","authors":"V. Mane, Riddhi Selkar","doi":"10.48001/joaii.2023.1111-18","DOIUrl":null,"url":null,"abstract":"Image captioning or picture captioning has become one of the most widely used technologies in applications that generate and provide captions for specific photographs. All these things are done with the help of deep neural networks. It identifies the specific objects in an image and their attributes and relationships. The purpose of this research is to find different things in a photograph, figure out their relationships, and write captions. The proposed system is implemented on dataset Flickr8k along with python. The input images are pre-processed and then features from images are extracted using CNN. To translate the features and objects extracted by CNN to a natural sentence in English LSTM is utilized in the implementation. Different types of images are tested with the proposed system. The results are presented with the generated image captions. The results presented shows the accuracy of the system. The presented method has potentials for such applications where image captioning is essential.","PeriodicalId":201326,"journal":{"name":"Journal of Artificial Intelligence and Imaging","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48001/joaii.2023.1111-18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Image captioning or picture captioning has become one of the most widely used technologies in applications that generate and provide captions for specific photographs. All these things are done with the help of deep neural networks. It identifies the specific objects in an image and their attributes and relationships. The purpose of this research is to find different things in a photograph, figure out their relationships, and write captions. The proposed system is implemented on dataset Flickr8k along with python. The input images are pre-processed and then features from images are extracted using CNN. To translate the features and objects extracted by CNN to a natural sentence in English LSTM is utilized in the implementation. Different types of images are tested with the proposed system. The results are presented with the generated image captions. The results presented shows the accuracy of the system. The presented method has potentials for such applications where image captioning is essential.