{"title":"通过注意力聚合学习人与物体的交互","authors":"Dongzhou Gu, Shuang Cai, Shiwei Ma","doi":"10.1117/12.2604708","DOIUrl":null,"url":null,"abstract":"Recent years, deep neural networks have achieved impressive progress in object detection. However, detecting the interactions between objects is still challenging. Many researchers pay attention to human-object interaction (HOI) detection as a basic task in detailed scene understanding. Most conventional HOI detectors are in a two-stage manner and usually slow in inference. One-stage methods for direct parallel detection of HOI triples breaks through the limitation of object detection, but the extracted features are still insufficient. To overcome these drawbacks above, we propose an improved one-stage HOI detection approach, in which attention aggregation module and dynamic point matching strategy play key roles. The attention aggregation enhances the semantic expression ability of interaction points explicitly by aggregating contextually important information, while the matching strategy can filter the negative HOI pairs effectively in the inference stage. Extensive experiments on two challenging HOI detection benchmarks: VCOCO and HICO-DET show that our method achieves considerable performance compared to state-of-the-art performance without any additional human pose and language features.","PeriodicalId":90079,"journal":{"name":"... International Workshop on Pattern Recognition in NeuroImaging. International Workshop on Pattern Recognition in NeuroImaging","volume":"68 1","pages":"119130H - 119130H-5"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning human-object interactions by attention aggregation\",\"authors\":\"Dongzhou Gu, Shuang Cai, Shiwei Ma\",\"doi\":\"10.1117/12.2604708\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent years, deep neural networks have achieved impressive progress in object detection. However, detecting the interactions between objects is still challenging. Many researchers pay attention to human-object interaction (HOI) detection as a basic task in detailed scene understanding. Most conventional HOI detectors are in a two-stage manner and usually slow in inference. One-stage methods for direct parallel detection of HOI triples breaks through the limitation of object detection, but the extracted features are still insufficient. To overcome these drawbacks above, we propose an improved one-stage HOI detection approach, in which attention aggregation module and dynamic point matching strategy play key roles. The attention aggregation enhances the semantic expression ability of interaction points explicitly by aggregating contextually important information, while the matching strategy can filter the negative HOI pairs effectively in the inference stage. Extensive experiments on two challenging HOI detection benchmarks: VCOCO and HICO-DET show that our method achieves considerable performance compared to state-of-the-art performance without any additional human pose and language features.\",\"PeriodicalId\":90079,\"journal\":{\"name\":\"... International Workshop on Pattern Recognition in NeuroImaging. International Workshop on Pattern Recognition in NeuroImaging\",\"volume\":\"68 1\",\"pages\":\"119130H - 119130H-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"... International Workshop on Pattern Recognition in NeuroImaging. International Workshop on Pattern Recognition in NeuroImaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2604708\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"... International Workshop on Pattern Recognition in NeuroImaging. International Workshop on Pattern Recognition in NeuroImaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2604708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning human-object interactions by attention aggregation
Recent years, deep neural networks have achieved impressive progress in object detection. However, detecting the interactions between objects is still challenging. Many researchers pay attention to human-object interaction (HOI) detection as a basic task in detailed scene understanding. Most conventional HOI detectors are in a two-stage manner and usually slow in inference. One-stage methods for direct parallel detection of HOI triples breaks through the limitation of object detection, but the extracted features are still insufficient. To overcome these drawbacks above, we propose an improved one-stage HOI detection approach, in which attention aggregation module and dynamic point matching strategy play key roles. The attention aggregation enhances the semantic expression ability of interaction points explicitly by aggregating contextually important information, while the matching strategy can filter the negative HOI pairs effectively in the inference stage. Extensive experiments on two challenging HOI detection benchmarks: VCOCO and HICO-DET show that our method achieves considerable performance compared to state-of-the-art performance without any additional human pose and language features.