A novel key point based ROI segmentation and image captioning using guidance information

IF 2.4 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy
{"title":"A novel key point based ROI segmentation and image captioning using guidance information","authors":"Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy","doi":"10.1007/s00138-024-01597-1","DOIUrl":null,"url":null,"abstract":"<p>Recently, image captioning has become an intriguing task that has attracted many researchers. This paper proposes a novel keypoint-based segmentation algorithm for extracting regions of interest (ROI) and an image captioning model guided by this information to generate more accurate image captions. The Difference of Gaussian (DoG) is used to identify keypoints. A novel ROI segmentation algorithm then utilizes these keypoints to extract the ROI. Features of the ROI are extracted, and the text features of related images are merged into a common semantic space using canonical correlation analysis (CCA) to produce the guiding information. The text features are constructed using a Bag of Words (BoW) model. Based on the guiding information and the entire image features, an LSTM generates a caption for the image. The guiding information helps the LSTM focus on important semantic regions in the image to generate the most significant keywords in the image caption. Experiments on the Flickr8k dataset show that the proposed ROI segmentation algorithm accurately identifies the ROI, and the image captioning model with the guidance information outperforms state-of-the-art methods.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01597-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, image captioning has become an intriguing task that has attracted many researchers. This paper proposes a novel keypoint-based segmentation algorithm for extracting regions of interest (ROI) and an image captioning model guided by this information to generate more accurate image captions. The Difference of Gaussian (DoG) is used to identify keypoints. A novel ROI segmentation algorithm then utilizes these keypoints to extract the ROI. Features of the ROI are extracted, and the text features of related images are merged into a common semantic space using canonical correlation analysis (CCA) to produce the guiding information. The text features are constructed using a Bag of Words (BoW) model. Based on the guiding information and the entire image features, an LSTM generates a caption for the image. The guiding information helps the LSTM focus on important semantic regions in the image to generate the most significant keywords in the image caption. Experiments on the Flickr8k dataset show that the proposed ROI segmentation algorithm accurately identifies the ROI, and the image captioning model with the guidance information outperforms state-of-the-art methods.

Abstract Image

基于关键点的新颖 ROI 分割和使用引导信息的图像字幕制作
近来,图像标题已成为一项吸引众多研究人员的有趣任务。本文提出了一种新颖的基于关键点的分割算法,用于提取感兴趣区域(ROI),并在此信息指导下建立图像字幕模型,以生成更准确的图像字幕。高斯差(DoG)用于识别关键点。然后,一种新颖的 ROI 分割算法利用这些关键点提取 ROI。提取 ROI 的特征,并使用规范相关分析 (CCA) 将相关图像的文本特征合并到一个共同的语义空间,从而生成引导信息。文本特征使用词袋(BoW)模型构建。基于引导信息和整个图像特征,LSTM 为图像生成标题。引导信息可帮助 LSTM 专注于图像中的重要语义区域,从而生成图像标题中最重要的关键词。在 Flickr8k 数据集上进行的实验表明,所提出的 ROI 分割算法能准确识别 ROI,带有引导信息的图像标题模型优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine Vision and Applications
Machine Vision and Applications 工程技术-工程:电子与电气
CiteScore
6.30
自引率
3.00%
发文量
84
审稿时长
8.7 months
期刊介绍: Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal. Particular emphasis is placed on engineering and technology aspects of image processing and computer vision. The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信