诊断人-物交互检测器

IF 11.6 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Fangrui Zhu, Yiming Xie, Weidi Xie, Huaizu Jiang
{"title":"诊断人-物交互检测器","authors":"Fangrui Zhu, Yiming Xie, Weidi Xie, Huaizu Jiang","doi":"10.1007/s11263-025-02369-8","DOIUrl":null,"url":null,"abstract":"<p>We have witnessed significant progress in human-object interaction (HOI) detection. However, relying solely on <i>mAP</i> (mean Average Precision) scores as a summary metric does not provide sufficient insight into the nuances of model performance (<i>e.g.</i>, why one model outperforms another), which can hinder further innovation in this field. To address this issue, we introduce a diagnosis toolbox in this paper to offer a detailed quantitative breakdown of HOI detection models, inspired by the success of object detection diagnosis tools. We first conduct a holistic investigation into the HOI detection pipeline. By defining a set of errors and using oracles to fix each one, we quantitatively analyze the significance of different errors based on the <i>mAP</i> improvement gained from fixing them. Next, we explore the two key sub-tasks of HOI detection: human-object pair localization and interaction classification. For the pair localization task, we compute the coverage of ground-truth human-object pairs and assess the noisiness of the localization results. For the classification task, we measure a model’s ability to distinguish between positive and negative detection results and to classify actual interactions when human-object pairs are correctly localized. We analyze eight state-of-the-art HOI detection models, providing valuable diagnostic insights to guide future research. For instance, our diagnosis reveals that the state-of-the-art model RLIPv2 outperforms others primarily due to its significant improvement in multi-label interaction classification accuracy. Our toolbox is applicable across various methods and datasets and is available at https://neu-vi.github.io/Diag-HOI/.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"2 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diagnosing Human-Object Interaction Detectors\",\"authors\":\"Fangrui Zhu, Yiming Xie, Weidi Xie, Huaizu Jiang\",\"doi\":\"10.1007/s11263-025-02369-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>We have witnessed significant progress in human-object interaction (HOI) detection. However, relying solely on <i>mAP</i> (mean Average Precision) scores as a summary metric does not provide sufficient insight into the nuances of model performance (<i>e.g.</i>, why one model outperforms another), which can hinder further innovation in this field. To address this issue, we introduce a diagnosis toolbox in this paper to offer a detailed quantitative breakdown of HOI detection models, inspired by the success of object detection diagnosis tools. We first conduct a holistic investigation into the HOI detection pipeline. By defining a set of errors and using oracles to fix each one, we quantitatively analyze the significance of different errors based on the <i>mAP</i> improvement gained from fixing them. Next, we explore the two key sub-tasks of HOI detection: human-object pair localization and interaction classification. For the pair localization task, we compute the coverage of ground-truth human-object pairs and assess the noisiness of the localization results. For the classification task, we measure a model’s ability to distinguish between positive and negative detection results and to classify actual interactions when human-object pairs are correctly localized. We analyze eight state-of-the-art HOI detection models, providing valuable diagnostic insights to guide future research. For instance, our diagnosis reveals that the state-of-the-art model RLIPv2 outperforms others primarily due to its significant improvement in multi-label interaction classification accuracy. Our toolbox is applicable across various methods and datasets and is available at https://neu-vi.github.io/Diag-HOI/.</p>\",\"PeriodicalId\":13752,\"journal\":{\"name\":\"International Journal of Computer Vision\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":11.6000,\"publicationDate\":\"2025-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11263-025-02369-8\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-025-02369-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

我们见证了人-物交互(HOI)检测方面的重大进展。然而,仅仅依靠mAP(平均精度)分数作为总结度量并不能提供足够的洞察模型性能的细微差别(例如,为什么一个模型优于另一个模型),这可能会阻碍该领域的进一步创新。为了解决这个问题,我们在本文中引入了一个诊断工具箱,以提供详细的HOI检测模型的定量分解,灵感来自于目标检测诊断工具的成功。我们首先对HOI检测管道进行全面调查。通过定义一组错误并使用oracle来修复每个错误,我们根据修复这些错误所获得的mAP改进来定量分析不同错误的重要性。接下来,我们探讨了HOI检测的两个关键子任务:人-目标对定位和交互分类。对于配对定位任务,我们计算了地真人物对的覆盖范围,并评估了定位结果的噪声。对于分类任务,我们衡量模型区分阳性和阴性检测结果的能力,以及当人-物体对被正确定位时对实际交互进行分类的能力。我们分析了八种最先进的HOI检测模型,为指导未来的研究提供了有价值的诊断见解。例如,我们的诊断显示,最先进的模型RLIPv2优于其他模型,主要是由于其在多标签交互分类精度方面的显着提高。我们的工具箱适用于各种方法和数据集,可以在https://neu-vi.github.io/Diag-HOI/上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Diagnosing Human-Object Interaction Detectors

We have witnessed significant progress in human-object interaction (HOI) detection. However, relying solely on mAP (mean Average Precision) scores as a summary metric does not provide sufficient insight into the nuances of model performance (e.g., why one model outperforms another), which can hinder further innovation in this field. To address this issue, we introduce a diagnosis toolbox in this paper to offer a detailed quantitative breakdown of HOI detection models, inspired by the success of object detection diagnosis tools. We first conduct a holistic investigation into the HOI detection pipeline. By defining a set of errors and using oracles to fix each one, we quantitatively analyze the significance of different errors based on the mAP improvement gained from fixing them. Next, we explore the two key sub-tasks of HOI detection: human-object pair localization and interaction classification. For the pair localization task, we compute the coverage of ground-truth human-object pairs and assess the noisiness of the localization results. For the classification task, we measure a model’s ability to distinguish between positive and negative detection results and to classify actual interactions when human-object pairs are correctly localized. We analyze eight state-of-the-art HOI detection models, providing valuable diagnostic insights to guide future research. For instance, our diagnosis reveals that the state-of-the-art model RLIPv2 outperforms others primarily due to its significant improvement in multi-label interaction classification accuracy. Our toolbox is applicable across various methods and datasets and is available at https://neu-vi.github.io/Diag-HOI/.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Computer Vision
International Journal of Computer Vision 工程技术-计算机:人工智能
CiteScore
29.80
自引率
2.10%
发文量
163
审稿时长
6 months
期刊介绍: The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信