基于骑手优化算法的图像标题精确生成改进框架

Int. J. Image Graph. Pub Date : 2021-06-21 DOI:10.1142/s0219467822500218

Chaitrali Prasanna Chaudhari, S. Devane

{"title":"基于骑手优化算法的图像标题精确生成改进框架","authors":"Chaitrali Prasanna Chaudhari, S. Devane","doi":"10.1142/s0219467822500218","DOIUrl":null,"url":null,"abstract":"“Image Captioning is the process of generating a textual description of an image”. It deploys both computer vision and natural language processing for caption generation. However, the majority of the image captioning systems offer unclear depictions regarding the objects like “man”, “woman”, “group of people”, “building”, etc. Hence, this paper intends to develop an intelligent-based image captioning model. The adopted model comprises of few steps like word generation, sentence formation, and caption generation. Initially, the input image is subjected to the Deep learning classifier called Convolutional Neural Network (CNN). Since the classifier is already trained in the relevant words that are related to all images, it can easily classify the associated words of the given image. Further, a set of sentences is formed with the generated words using Long-Short Term Memory (LSTM) model. The likelihood of the formed sentences is computed using the Maximum Likelihood (ML) function, and the sentences with higher probability are taken, which is further used for generating the visual representation of the scene in terms of image caption. As a major novelty, this paper aims to enhance the performance of CNN by optimally tuning its weight and activation function. This paper introduces a new enhanced optimization algorithm Rider with Randomized Bypass and Over-taker update (RR-BOU) for this optimal selection. In the proposed RR-BOU is the enhanced version of the Rider Optimization Algorithm (ROA). Finally, the performance of the proposed captioning model is compared over other conventional models with respect to statistical analysis.","PeriodicalId":177479,"journal":{"name":"Int. J. Image Graph.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Improved Framework using Rider Optimization Algorithm for Precise Image Caption Generation\",\"authors\":\"Chaitrali Prasanna Chaudhari, S. Devane\",\"doi\":\"10.1142/s0219467822500218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"“Image Captioning is the process of generating a textual description of an image”. It deploys both computer vision and natural language processing for caption generation. However, the majority of the image captioning systems offer unclear depictions regarding the objects like “man”, “woman”, “group of people”, “building”, etc. Hence, this paper intends to develop an intelligent-based image captioning model. The adopted model comprises of few steps like word generation, sentence formation, and caption generation. Initially, the input image is subjected to the Deep learning classifier called Convolutional Neural Network (CNN). Since the classifier is already trained in the relevant words that are related to all images, it can easily classify the associated words of the given image. Further, a set of sentences is formed with the generated words using Long-Short Term Memory (LSTM) model. The likelihood of the formed sentences is computed using the Maximum Likelihood (ML) function, and the sentences with higher probability are taken, which is further used for generating the visual representation of the scene in terms of image caption. As a major novelty, this paper aims to enhance the performance of CNN by optimally tuning its weight and activation function. This paper introduces a new enhanced optimization algorithm Rider with Randomized Bypass and Over-taker update (RR-BOU) for this optimal selection. In the proposed RR-BOU is the enhanced version of the Rider Optimization Algorithm (ROA). Finally, the performance of the proposed captioning model is compared over other conventional models with respect to statistical analysis.\",\"PeriodicalId\":177479,\"journal\":{\"name\":\"Int. J. Image Graph.\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Image Graph.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0219467822500218\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Image Graph.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219467822500218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

“图像字幕是生成图像文本描述的过程”。它部署了计算机视觉和自然语言处理来生成标题。然而，大多数图像字幕系统对“男人”、“女人”、“一群人”、“建筑物”等物体提供了不明确的描述。因此，本文打算开发一种基于智能的图像字幕模型。所采用的模型包括单词生成、句子形成和标题生成等几个步骤。最初，输入图像要经过称为卷积神经网络(CNN)的深度学习分类器。由于分类器已经在与所有图像相关的相关词中进行了训练，因此它可以很容易地对给定图像的相关词进行分类。然后，使用长短期记忆(LSTM)模型将生成的单词组成一组句子。使用最大似然(Maximum likelihood, ML)函数计算所形成句子的似然，选取概率较高的句子，进一步用于生成场景在图像标题方面的视觉表示。作为一个主要的新颖点，本文旨在通过优化调整其权重和激活函数来提高CNN的性能。针对这一优化选择，本文提出了一种新的增强型随机旁路和超车更新优化算法(RR-BOU)。提出的RR-BOU是骑手优化算法(ROA)的增强版本。最后，在统计分析方面，将本文提出的字幕模型与其他传统模型的性能进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improved Framework using Rider Optimization Algorithm for Precise Image Caption Generation

“Image Captioning is the process of generating a textual description of an image”. It deploys both computer vision and natural language processing for caption generation. However, the majority of the image captioning systems offer unclear depictions regarding the objects like “man”, “woman”, “group of people”, “building”, etc. Hence, this paper intends to develop an intelligent-based image captioning model. The adopted model comprises of few steps like word generation, sentence formation, and caption generation. Initially, the input image is subjected to the Deep learning classifier called Convolutional Neural Network (CNN). Since the classifier is already trained in the relevant words that are related to all images, it can easily classify the associated words of the given image. Further, a set of sentences is formed with the generated words using Long-Short Term Memory (LSTM) model. The likelihood of the formed sentences is computed using the Maximum Likelihood (ML) function, and the sentences with higher probability are taken, which is further used for generating the visual representation of the scene in terms of image caption. As a major novelty, this paper aims to enhance the performance of CNN by optimally tuning its weight and activation function. This paper introduces a new enhanced optimization algorithm Rider with Randomized Bypass and Over-taker update (RR-BOU) for this optimal selection. In the proposed RR-BOU is the enhanced version of the Rider Optimization Algorithm (ROA). Finally, the performance of the proposed captioning model is compared over other conventional models with respect to statistical analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Image Graph.

自引率

0.00%

发文量