{"title":"不同图像字幕模型的比较研究","authors":"Sahil Takkar, Anshul Jain, Piyush Adlakha","doi":"10.1109/ICCMC51019.2021.9418451","DOIUrl":null,"url":null,"abstract":"This paper has compared various deep learning models for generating caption of images gathered from Flickr 8k Dataset. Also, this research work attempts to combine a CNN type encoder for extracting features from images and a Recurrent Neural Network for generating caption for the extracted features. The CNN encoders used are VGG16 and InceptionV3. The extracted features are then passed to a unidirectional or a bidirectional LSTM for generating captions. The proposed model has used beam search as well as greedy algorithms to generate captions from vocabulary. The generated captions are then compared with actual captions with the help of BLEU scores. The Bilingual Evaluation Understudy score (BLEU) is used to compare how close a given sentence is to another sentence. The BLEU score of captions generated using beam search as well as greedy algorithms are analyzed and compared to see which is better.","PeriodicalId":131747,"journal":{"name":"2021 5th International Conference on Computing Methodologies and Communication (ICCMC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Comparative Study of Different Image Captioning Models\",\"authors\":\"Sahil Takkar, Anshul Jain, Piyush Adlakha\",\"doi\":\"10.1109/ICCMC51019.2021.9418451\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper has compared various deep learning models for generating caption of images gathered from Flickr 8k Dataset. Also, this research work attempts to combine a CNN type encoder for extracting features from images and a Recurrent Neural Network for generating caption for the extracted features. The CNN encoders used are VGG16 and InceptionV3. The extracted features are then passed to a unidirectional or a bidirectional LSTM for generating captions. The proposed model has used beam search as well as greedy algorithms to generate captions from vocabulary. The generated captions are then compared with actual captions with the help of BLEU scores. The Bilingual Evaluation Understudy score (BLEU) is used to compare how close a given sentence is to another sentence. The BLEU score of captions generated using beam search as well as greedy algorithms are analyzed and compared to see which is better.\",\"PeriodicalId\":131747,\"journal\":{\"name\":\"2021 5th International Conference on Computing Methodologies and Communication (ICCMC)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 5th International Conference on Computing Methodologies and Communication (ICCMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCMC51019.2021.9418451\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 5th International Conference on Computing Methodologies and Communication (ICCMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMC51019.2021.9418451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative Study of Different Image Captioning Models
This paper has compared various deep learning models for generating caption of images gathered from Flickr 8k Dataset. Also, this research work attempts to combine a CNN type encoder for extracting features from images and a Recurrent Neural Network for generating caption for the extracted features. The CNN encoders used are VGG16 and InceptionV3. The extracted features are then passed to a unidirectional or a bidirectional LSTM for generating captions. The proposed model has used beam search as well as greedy algorithms to generate captions from vocabulary. The generated captions are then compared with actual captions with the help of BLEU scores. The Bilingual Evaluation Understudy score (BLEU) is used to compare how close a given sentence is to another sentence. The BLEU score of captions generated using beam search as well as greedy algorithms are analyzed and compared to see which is better.