MIP-GAF:最重要人物定位和群体上下文理解的 MLLM 注释基准

Surbhi Madan, Shreya Ghosh, Lownish Rai Sookha, M. A. Ganaie, Ramanathan Subramanian, Abhinav Dhall, Tom Gedeon
{"title":"MIP-GAF:最重要人物定位和群体上下文理解的 MLLM 注释基准","authors":"Surbhi Madan, Shreya Ghosh, Lownish Rai Sookha, M. A. Ganaie, Ramanathan Subramanian, Abhinav Dhall, Tom Gedeon","doi":"arxiv-2409.06224","DOIUrl":null,"url":null,"abstract":"Estimating the Most Important Person (MIP) in any social event setup is a\nchallenging problem mainly due to contextual complexity and scarcity of labeled\ndata. Moreover, the causality aspects of MIP estimation are quite subjective\nand diverse. To this end, we aim to address the problem by annotating a\nlarge-scale `in-the-wild' dataset for identifying human perceptions about the\n`Most Important Person (MIP)' in an image. The paper provides a thorough\ndescription of our proposed Multimodal Large Language Model (MLLM) based data\nannotation strategy, and a thorough data quality analysis. Further, we perform\na comprehensive benchmarking of the proposed dataset utilizing state-of-the-art\nMIP localization methods, indicating a significant drop in performance compared\nto existing datasets. The performance drop shows that the existing MIP\nlocalization algorithms must be more robust with respect to `in-the-wild'\nsituations. We believe the proposed dataset will play a vital role in building\nthe next-generation social situation understanding methods. The code and data\nis available at https://github.com/surbhimadan92/MIP-GAF.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding\",\"authors\":\"Surbhi Madan, Shreya Ghosh, Lownish Rai Sookha, M. A. Ganaie, Ramanathan Subramanian, Abhinav Dhall, Tom Gedeon\",\"doi\":\"arxiv-2409.06224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating the Most Important Person (MIP) in any social event setup is a\\nchallenging problem mainly due to contextual complexity and scarcity of labeled\\ndata. Moreover, the causality aspects of MIP estimation are quite subjective\\nand diverse. To this end, we aim to address the problem by annotating a\\nlarge-scale `in-the-wild' dataset for identifying human perceptions about the\\n`Most Important Person (MIP)' in an image. The paper provides a thorough\\ndescription of our proposed Multimodal Large Language Model (MLLM) based data\\nannotation strategy, and a thorough data quality analysis. Further, we perform\\na comprehensive benchmarking of the proposed dataset utilizing state-of-the-art\\nMIP localization methods, indicating a significant drop in performance compared\\nto existing datasets. The performance drop shows that the existing MIP\\nlocalization algorithms must be more robust with respect to `in-the-wild'\\nsituations. We believe the proposed dataset will play a vital role in building\\nthe next-generation social situation understanding methods. The code and data\\nis available at https://github.com/surbhimadan92/MIP-GAF.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在任何社会事件中估计最重要人物(MIP)都是一个具有挑战性的问题,这主要是由于上下文的复杂性和标记数据的稀缺性。此外,MIP 估算的因果关系也相当主观和多样。为此,我们旨在通过标注大规模 "野生 "数据集来识别人类对图像中 "最重要人物(MIP)"的看法,从而解决这一问题。本文全面介绍了我们提出的基于多模态大语言模型(MLLM)的数据注释策略,并进行了深入的数据质量分析。此外,我们还利用最先进的 MIP 本地化方法对所提出的数据集进行了全面的基准测试,结果表明,与现有数据集相比,该数据集的性能大幅下降。性能下降表明,现有的 MIP 定位算法在 "野外 "情况下必须更加稳健。我们相信,所提出的数据集将在构建下一代社会情境理解方法中发挥重要作用。代码和数据可在 https://github.com/surbhimadan92/MIP-GAF 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding
Estimating the Most Important Person (MIP) in any social event setup is a challenging problem mainly due to contextual complexity and scarcity of labeled data. Moreover, the causality aspects of MIP estimation are quite subjective and diverse. To this end, we aim to address the problem by annotating a large-scale `in-the-wild' dataset for identifying human perceptions about the `Most Important Person (MIP)' in an image. The paper provides a thorough description of our proposed Multimodal Large Language Model (MLLM) based data annotation strategy, and a thorough data quality analysis. Further, we perform a comprehensive benchmarking of the proposed dataset utilizing state-of-the-art MIP localization methods, indicating a significant drop in performance compared to existing datasets. The performance drop shows that the existing MIP localization algorithms must be more robust with respect to `in-the-wild' situations. We believe the proposed dataset will play a vital role in building the next-generation social situation understanding methods. The code and data is available at https://github.com/surbhimadan92/MIP-GAF.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信