Image Dense Captioning of Irregular Regions Based on Visual Saliency

2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA) Pub Date : 2023-03-01 DOI:10.1109/PRMVIA58252.2023.00008

Xiaosheng Wen, Ping Jian

{"title":"Image Dense Captioning of Irregular Regions Based on Visual Saliency","authors":"Xiaosheng Wen, Ping Jian","doi":"10.1109/PRMVIA58252.2023.00008","DOIUrl":null,"url":null,"abstract":"Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRMVIA58252.2023.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.

查看原文本刊更多论文

基于视觉显著性的不规则区域图像密集字幕

传统的密集字幕是用自然语言描述图像的局部细节。通常先对目标进行检测，然后对检测到的边界框内的内容进行描述，使描述内容更加丰富。但基于目标检测的字幕往往缺乏对目标与环境之间或目标之间关联的关注。而目前，还没有密集字幕的方法能够处理不规则区域。为了解决这些问题，我们提出了一种基于视觉显著性的区域划分方法。它更多地关注区域而不仅仅是对象。在此基础上，对不规则区域进行局部描述。对于每个区域，我们将图像与目标区域结合生成特征，并将这些特征放入标题模型中。我们使用Visual Genome数据集进行训练和测试。通过实验，我们的模型与传统边界框下的基线具有可比性。对不规则区域的描述也很好。该模型在图像检索实验中表现良好，信息冗余少。在应用程序中，我们支持手动选择图像上感兴趣的区域进行描述，以帮助扩展数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)

自引率

0.00%

发文量