Exploring interaction concepts for human–object-interaction detection via global- and local-scale enhancing

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Tianlun Luo , Qiao Yuan , Boxuan Zhu , Steven Guan , Rui Yang , Jeremy S. Smith , Eng Gee Lim
{"title":"Exploring interaction concepts for human–object-interaction detection via global- and local-scale enhancing","authors":"Tianlun Luo ,&nbsp;Qiao Yuan ,&nbsp;Boxuan Zhu ,&nbsp;Steven Guan ,&nbsp;Rui Yang ,&nbsp;Jeremy S. Smith ,&nbsp;Eng Gee Lim","doi":"10.1016/j.neucom.2025.130882","DOIUrl":null,"url":null,"abstract":"<div><div>Understanding the interactions between human–object (HO) pairs is the key to the human–object interaction (HOI) detection task. Visual understanding research has been significantly impacted by recent advances in linguistic-visual contrastive learning. For HOI detection studies, the alignment of linguistic and visual features is usually required to be performed when linguistic knowledge is used for enhancement. This usually results in the demands of extra training data or extended training time. In this study, an effective approach for utilizing multimodal knowledge to enhance HOI learning from global and instance scales is proposed. Model performance on Rare HOI categories can be prominently improved by using projection guided by linguistic knowledge at a global scale and merging multimodal features at an instance scale. State-of-the-art performance on the HICO-Det benchmark is achieved by the proposed model, and the effectiveness of the proposed global- and local-scale multimodal learning approach is validated.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130882"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225015541","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Understanding the interactions between human–object (HO) pairs is the key to the human–object interaction (HOI) detection task. Visual understanding research has been significantly impacted by recent advances in linguistic-visual contrastive learning. For HOI detection studies, the alignment of linguistic and visual features is usually required to be performed when linguistic knowledge is used for enhancement. This usually results in the demands of extra training data or extended training time. In this study, an effective approach for utilizing multimodal knowledge to enhance HOI learning from global and instance scales is proposed. Model performance on Rare HOI categories can be prominently improved by using projection guided by linguistic knowledge at a global scale and merging multimodal features at an instance scale. State-of-the-art performance on the HICO-Det benchmark is achieved by the proposed model, and the effectiveness of the proposed global- and local-scale multimodal learning approach is validated.
通过全局和局部尺度增强探索人-物交互检测的交互概念
理解人-物(HO)对之间的交互是人-物交互(HOI)检测任务的关键。视觉理解研究受到近年来语言视觉对比学习研究的显著影响。对于HOI检测研究,当使用语言知识进行增强时,通常需要进行语言和视觉特征的对齐。这通常会导致需要额外的训练数据或延长训练时间。本文提出了一种利用多模态知识在全局和实例尺度上增强HOI学习的有效方法。通过在全局尺度上使用语言知识引导的投影和在实例尺度上合并多模态特征,可以显著提高模型在稀有HOI类别上的性能。所提出的模型在HICO-Det基准上达到了最先进的性能,并验证了所提出的全局和局部尺度多模态学习方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信