基于视觉显著性的不规则区域图像密集字幕

Xiaosheng Wen, Ping Jian
{"title":"基于视觉显著性的不规则区域图像密集字幕","authors":"Xiaosheng Wen, Ping Jian","doi":"10.1109/PRMVIA58252.2023.00008","DOIUrl":null,"url":null,"abstract":"Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image Dense Captioning of Irregular Regions Based on Visual Saliency\",\"authors\":\"Xiaosheng Wen, Ping Jian\",\"doi\":\"10.1109/PRMVIA58252.2023.00008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.\",\"PeriodicalId\":221346,\"journal\":{\"name\":\"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PRMVIA58252.2023.00008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRMVIA58252.2023.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

传统的密集字幕是用自然语言描述图像的局部细节。通常先对目标进行检测,然后对检测到的边界框内的内容进行描述,使描述内容更加丰富。但基于目标检测的字幕往往缺乏对目标与环境之间或目标之间关联的关注。而目前,还没有密集字幕的方法能够处理不规则区域。为了解决这些问题,我们提出了一种基于视觉显著性的区域划分方法。它更多地关注区域而不仅仅是对象。在此基础上,对不规则区域进行局部描述。对于每个区域,我们将图像与目标区域结合生成特征,并将这些特征放入标题模型中。我们使用Visual Genome数据集进行训练和测试。通过实验,我们的模型与传统边界框下的基线具有可比性。对不规则区域的描述也很好。该模型在图像检索实验中表现良好,信息冗余少。在应用程序中,我们支持手动选择图像上感兴趣的区域进行描述,以帮助扩展数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Image Dense Captioning of Irregular Regions Based on Visual Saliency
Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信