用于弱监督引用表达接地的通用重定位器

IF 5.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Panpan Zhang, Meng Liu, Xuemeng Song, Da Cao, Zan Gao, Liqiang Nie
{"title":"用于弱监督引用表达接地的通用重定位器","authors":"Panpan Zhang, Meng Liu, Xuemeng Song, Da Cao, Zan Gao, Liqiang Nie","doi":"10.1145/3656045","DOIUrl":null,"url":null,"abstract":"<p>This paper introduces the Universal Relocalizer, a novel approach designed for weakly supervised referring expression grounding. Our method strives to pinpoint a target proposal that corresponds to a specific query, eliminating the need for region-level annotations during training. To bolster the localization precision and enrich the semantic understanding of the target proposal, we devise three key modules: the category module, the color module, and the spatial relationship module. The category and color modules assign respective category and color labels to region proposals, enabling the computation of category and color scores. Simultaneously, the spatial relationship module integrates spatial cues, yielding a spatial score for each proposal to enhance localization accuracy further. By adeptly amalgamating the category, color, and spatial scores, we derive a refined grounding score for every proposal. Comprehensive evaluations on the RefCOCO, RefCOCO+, and RefCOCOg datasets manifest the prowess of the Universal Relocalizer, showcasing its formidable performance across the board.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"17 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Universal Relocalizer for Weakly Supervised Referring Expression Grounding\",\"authors\":\"Panpan Zhang, Meng Liu, Xuemeng Song, Da Cao, Zan Gao, Liqiang Nie\",\"doi\":\"10.1145/3656045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper introduces the Universal Relocalizer, a novel approach designed for weakly supervised referring expression grounding. Our method strives to pinpoint a target proposal that corresponds to a specific query, eliminating the need for region-level annotations during training. To bolster the localization precision and enrich the semantic understanding of the target proposal, we devise three key modules: the category module, the color module, and the spatial relationship module. The category and color modules assign respective category and color labels to region proposals, enabling the computation of category and color scores. Simultaneously, the spatial relationship module integrates spatial cues, yielding a spatial score for each proposal to enhance localization accuracy further. By adeptly amalgamating the category, color, and spatial scores, we derive a refined grounding score for every proposal. Comprehensive evaluations on the RefCOCO, RefCOCO+, and RefCOCOg datasets manifest the prowess of the Universal Relocalizer, showcasing its formidable performance across the board.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3656045\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656045","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

本文介绍了通用重定位器(Universal Relocalizer),这是一种专为弱监督引用表达接地而设计的新方法。我们的方法致力于精确定位与特定查询相对应的目标建议,在训练过程中无需区域级注释。为了提高定位精度并丰富对目标建议的语义理解,我们设计了三个关键模块:类别模块、颜色模块和空间关系模块。类别和颜色模块分别为区域提案分配类别和颜色标签,从而计算出类别和颜色分数。与此同时,空间关系模块整合空间线索,为每个建议得出空间分数,从而进一步提高定位精度。通过将类别、颜色和空间得分巧妙地结合在一起,我们为每个建议得出了一个精细的定位得分。在 RefCOCO、RefCOCO+ 和 RefCOCOg 数据集上进行的综合评估体现了通用重定位器的实力,展示了其强大的全面性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Universal Relocalizer for Weakly Supervised Referring Expression Grounding

This paper introduces the Universal Relocalizer, a novel approach designed for weakly supervised referring expression grounding. Our method strives to pinpoint a target proposal that corresponds to a specific query, eliminating the need for region-level annotations during training. To bolster the localization precision and enrich the semantic understanding of the target proposal, we devise three key modules: the category module, the color module, and the spatial relationship module. The category and color modules assign respective category and color labels to region proposals, enabling the computation of category and color scores. Simultaneously, the spatial relationship module integrates spatial cues, yielding a spatial score for each proposal to enhance localization accuracy further. By adeptly amalgamating the category, color, and spatial scores, we derive a refined grounding score for every proposal. Comprehensive evaluations on the RefCOCO, RefCOCO+, and RefCOCOg datasets manifest the prowess of the Universal Relocalizer, showcasing its formidable performance across the board.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.50
自引率
5.90%
发文量
285
审稿时长
7.5 months
期刊介绍: The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信