An Efficient Image Captioning Method Based on Beam Search

IF 0.6 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC
Tarun Jaiswal, Manju Pandey, Priyanka Tripathi
{"title":"An Efficient Image Captioning Method Based on Beam Search","authors":"Tarun Jaiswal, Manju Pandey, Priyanka Tripathi","doi":"10.2174/0123520965254606231009091711","DOIUrl":null,"url":null,"abstract":"Introduction: An image captioning system is a crucial component in the domains of computer vision and natural language processing. Deep neural networks have been an increasingly popular tool for the generation of descriptive captions for photos in recent years. Method: However, these models frequently have the issue of providing captions that are unoriginal and repetitious. Beam search is a well-known search technique that is utilized for the purpose of producing descriptions for images in an effective and productive manner. The algorithm keeps track of a set of partial captions and expands them iteratively by choosing the probable next word throughout each step until a complete caption is generated. The set of partial captions, also known as the beam, is updated at each step based on the predicted probabilities of the next words. This research paper presents an image caption generation system based on beam search. In order to encode the image data and generate captions, the system is trained on a deep neural network architecture. Results: This architecture brings together the benefits of CNN with RNN. After that, the beam search method is executed in order to provide the completed captions, resulting in a more diverse and descriptive set of captions compared to traditional greedy decoding approaches. The experimental outcomes indicate that the suggested system is superior to the existing image caption generation techniques in terms of the precision and variety of the generated captions. Conclusion: This demonstrates the effectiveness of beam search in enhancing the efficiency of image caption generation systems.","PeriodicalId":43275,"journal":{"name":"Recent Advances in Electrical & Electronic Engineering","volume":"55 1","pages":"0"},"PeriodicalIF":0.6000,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Advances in Electrical & Electronic Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/0123520965254606231009091711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: An image captioning system is a crucial component in the domains of computer vision and natural language processing. Deep neural networks have been an increasingly popular tool for the generation of descriptive captions for photos in recent years. Method: However, these models frequently have the issue of providing captions that are unoriginal and repetitious. Beam search is a well-known search technique that is utilized for the purpose of producing descriptions for images in an effective and productive manner. The algorithm keeps track of a set of partial captions and expands them iteratively by choosing the probable next word throughout each step until a complete caption is generated. The set of partial captions, also known as the beam, is updated at each step based on the predicted probabilities of the next words. This research paper presents an image caption generation system based on beam search. In order to encode the image data and generate captions, the system is trained on a deep neural network architecture. Results: This architecture brings together the benefits of CNN with RNN. After that, the beam search method is executed in order to provide the completed captions, resulting in a more diverse and descriptive set of captions compared to traditional greedy decoding approaches. The experimental outcomes indicate that the suggested system is superior to the existing image caption generation techniques in terms of the precision and variety of the generated captions. Conclusion: This demonstrates the effectiveness of beam search in enhancing the efficiency of image caption generation systems.
一种基于波束搜索的高效图像字幕方法
摘要:图像字幕系统是计算机视觉和自然语言处理领域的重要组成部分。近年来,深度神经网络已经成为为照片生成描述性说明的一种越来越流行的工具。方法:然而,这些模型经常存在提供非原创和重复的标题的问题。光束搜索是一种众所周知的搜索技术,用于以有效和富有成效的方式生成图像描述。该算法跟踪一组部分标题,并通过在每个步骤中选择可能的下一个单词来迭代地扩展它们,直到生成完整的标题。这组部分标题,也被称为光束,在每一步都会根据下一个单词的预测概率进行更新。本文提出了一种基于波束搜索的图像标题生成系统。为了对图像数据进行编码和生成字幕,系统在深度神经网络架构上进行训练。结果:该架构将CNN和RNN的优点结合在一起。之后,执行波束搜索方法以提供完整的字幕,与传统的贪婪解码方法相比,产生更多样化和描述性的字幕集。实验结果表明,该系统在生成图像标题的精度和多样性方面都优于现有的图像标题生成技术。结论:这证明了束搜索在提高图像标题生成系统效率方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Recent Advances in Electrical & Electronic Engineering
Recent Advances in Electrical & Electronic Engineering ENGINEERING, ELECTRICAL & ELECTRONIC-
CiteScore
1.70
自引率
16.70%
发文量
101
期刊介绍: Recent Advances in Electrical & Electronic Engineering publishes full-length/mini reviews and research articles, guest edited thematic issues on electrical and electronic engineering and applications. The journal also covers research in fast emerging applications of electrical power supply, electrical systems, power transmission, electromagnetism, motor control process and technologies involved and related to electrical and electronic engineering. The journal is essential reading for all researchers in electrical and electronic engineering science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信