{"title":"基于关键点的新颖 ROI 分割和使用引导信息的图像字幕制作","authors":"Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy","doi":"10.1007/s00138-024-01597-1","DOIUrl":null,"url":null,"abstract":"<p>Recently, image captioning has become an intriguing task that has attracted many researchers. This paper proposes a novel keypoint-based segmentation algorithm for extracting regions of interest (ROI) and an image captioning model guided by this information to generate more accurate image captions. The Difference of Gaussian (DoG) is used to identify keypoints. A novel ROI segmentation algorithm then utilizes these keypoints to extract the ROI. Features of the ROI are extracted, and the text features of related images are merged into a common semantic space using canonical correlation analysis (CCA) to produce the guiding information. The text features are constructed using a Bag of Words (BoW) model. Based on the guiding information and the entire image features, an LSTM generates a caption for the image. The guiding information helps the LSTM focus on important semantic regions in the image to generate the most significant keywords in the image caption. Experiments on the Flickr8k dataset show that the proposed ROI segmentation algorithm accurately identifies the ROI, and the image captioning model with the guidance information outperforms state-of-the-art methods.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"2011 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel key point based ROI segmentation and image captioning using guidance information\",\"authors\":\"Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy\",\"doi\":\"10.1007/s00138-024-01597-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Recently, image captioning has become an intriguing task that has attracted many researchers. This paper proposes a novel keypoint-based segmentation algorithm for extracting regions of interest (ROI) and an image captioning model guided by this information to generate more accurate image captions. The Difference of Gaussian (DoG) is used to identify keypoints. A novel ROI segmentation algorithm then utilizes these keypoints to extract the ROI. Features of the ROI are extracted, and the text features of related images are merged into a common semantic space using canonical correlation analysis (CCA) to produce the guiding information. The text features are constructed using a Bag of Words (BoW) model. Based on the guiding information and the entire image features, an LSTM generates a caption for the image. The guiding information helps the LSTM focus on important semantic regions in the image to generate the most significant keywords in the image caption. Experiments on the Flickr8k dataset show that the proposed ROI segmentation algorithm accurately identifies the ROI, and the image captioning model with the guidance information outperforms state-of-the-art methods.</p>\",\"PeriodicalId\":51116,\"journal\":{\"name\":\"Machine Vision and Applications\",\"volume\":\"2011 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Vision and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00138-024-01597-1\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01597-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
近来,图像标题已成为一项吸引众多研究人员的有趣任务。本文提出了一种新颖的基于关键点的分割算法,用于提取感兴趣区域(ROI),并在此信息指导下建立图像字幕模型,以生成更准确的图像字幕。高斯差(DoG)用于识别关键点。然后,一种新颖的 ROI 分割算法利用这些关键点提取 ROI。提取 ROI 的特征,并使用规范相关分析 (CCA) 将相关图像的文本特征合并到一个共同的语义空间,从而生成引导信息。文本特征使用词袋(BoW)模型构建。基于引导信息和整个图像特征,LSTM 为图像生成标题。引导信息可帮助 LSTM 专注于图像中的重要语义区域,从而生成图像标题中最重要的关键词。在 Flickr8k 数据集上进行的实验表明,所提出的 ROI 分割算法能准确识别 ROI,带有引导信息的图像标题模型优于最先进的方法。
A novel key point based ROI segmentation and image captioning using guidance information
Recently, image captioning has become an intriguing task that has attracted many researchers. This paper proposes a novel keypoint-based segmentation algorithm for extracting regions of interest (ROI) and an image captioning model guided by this information to generate more accurate image captions. The Difference of Gaussian (DoG) is used to identify keypoints. A novel ROI segmentation algorithm then utilizes these keypoints to extract the ROI. Features of the ROI are extracted, and the text features of related images are merged into a common semantic space using canonical correlation analysis (CCA) to produce the guiding information. The text features are constructed using a Bag of Words (BoW) model. Based on the guiding information and the entire image features, an LSTM generates a caption for the image. The guiding information helps the LSTM focus on important semantic regions in the image to generate the most significant keywords in the image caption. Experiments on the Flickr8k dataset show that the proposed ROI segmentation algorithm accurately identifies the ROI, and the image captioning model with the guidance information outperforms state-of-the-art methods.
期刊介绍:
Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal.
Particular emphasis is placed on engineering and technology aspects of image processing and computer vision.
The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.