Yang Liu, Fang Liu, Licheng Jiao, Qianyue Bao, Shuo Li, Lingling Li, Xu Liu
{"title":"Knowledge-Driven Compositional Action Recognition","authors":"Yang Liu, Fang Liu, Licheng Jiao, Qianyue Bao, Shuo Li, Lingling Li, Xu Liu","doi":"10.1016/j.patcog.2025.111452","DOIUrl":null,"url":null,"abstract":"<div><div>Human action often involves interaction with objects, so in action recognition, action labels can be defined by compositions of verbs and nouns. It is almost infeasible to collect and annotate enough training data for every possible composition in the real world. Therefore, the main challenge in compositional action recognition is to enable the model to understand “action-objects” compositions that have not been seen during training. We propose a Knowledge-Driven Composition Modulation Model (KCMM), which constructs unseen “action-objects” compositions to improve action recognition generalization. We first design a Grammar Knowledge-Driven Composition (GKC) module, which extracts the labels of verbs and nouns and their corresponding feature representations from compositional actions, and then modulates them under the guidance of grammatical rules to construct new “action-objects” actions. Subsequently, to verify the rationality of the new “action-objects” actions, we design a Common Knowledge-Driven Verification (CKV) module. This module extracts motion commonsense from ConceptNet and infuses it into the compositional labels to improve the comprehensiveness of the verification. It should be noted that GKC does not construct new videos, but directly composes verbs and nouns at the label and feature space to obtain new compositional action label-feature pairs. We conduct extensive experiments on Something-Else and NEU-I datasets, and our method significantly outperforms current state-of-the-art methods in both compositional settings and few-shot settings. The source code is available at <span><span>https://github.com/XDLiuyyy/KCMM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111452"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325001128","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Human action often involves interaction with objects, so in action recognition, action labels can be defined by compositions of verbs and nouns. It is almost infeasible to collect and annotate enough training data for every possible composition in the real world. Therefore, the main challenge in compositional action recognition is to enable the model to understand “action-objects” compositions that have not been seen during training. We propose a Knowledge-Driven Composition Modulation Model (KCMM), which constructs unseen “action-objects” compositions to improve action recognition generalization. We first design a Grammar Knowledge-Driven Composition (GKC) module, which extracts the labels of verbs and nouns and their corresponding feature representations from compositional actions, and then modulates them under the guidance of grammatical rules to construct new “action-objects” actions. Subsequently, to verify the rationality of the new “action-objects” actions, we design a Common Knowledge-Driven Verification (CKV) module. This module extracts motion commonsense from ConceptNet and infuses it into the compositional labels to improve the comprehensiveness of the verification. It should be noted that GKC does not construct new videos, but directly composes verbs and nouns at the label and feature space to obtain new compositional action label-feature pairs. We conduct extensive experiments on Something-Else and NEU-I datasets, and our method significantly outperforms current state-of-the-art methods in both compositional settings and few-shot settings. The source code is available at https://github.com/XDLiuyyy/KCMM.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.