Xiaotian Wang , Xiang Jiang , Zhifu Zhao , Kexin Wang , Yifan Yang
{"title":"Exploring interaction: Inner-outer spatial–temporal transformer for skeleton-based mutual action recognition","authors":"Xiaotian Wang , Xiang Jiang , Zhifu Zhao , Kexin Wang , Yifan Yang","doi":"10.1016/j.neucom.2025.130007","DOIUrl":null,"url":null,"abstract":"<div><div>Transformer-based methods have achieved significant results in the field of skeleton-based action recognition. However, when dealing with two-person interaction, existing approaches normally embed the skeleton of each person separately and then introduce an additional module to learn their interactions. This risks losing the spatial and semantic connection information between the two entities, which is crucial for interaction identification. To address this issue, a unified interactive spatial–temporal transformer is proposed in this paper. First, a Two-Person Embedding (TPE) is performed to provide a holistic interactive relationship representation, which can effectively avoid the information gap caused by the division of interacting entities. Second, an innovative Inner-Outer Transformer (IOformer) combining with a new spatio-temporal partition strategy is proposed to simultaneously learn the interactions between intra-partition joints and inter-partition skeletal parts. By comprehensively capturing the key spatio-temporal interactive feature, the accuracy and robustness of interaction recognition can be significantly improved. Extensive experiments on three challenging benchmark datasets validate that our method achieves better performance in comprehensive evaluation methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"636 ","pages":"Article 130007"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225006794","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Transformer-based methods have achieved significant results in the field of skeleton-based action recognition. However, when dealing with two-person interaction, existing approaches normally embed the skeleton of each person separately and then introduce an additional module to learn their interactions. This risks losing the spatial and semantic connection information between the two entities, which is crucial for interaction identification. To address this issue, a unified interactive spatial–temporal transformer is proposed in this paper. First, a Two-Person Embedding (TPE) is performed to provide a holistic interactive relationship representation, which can effectively avoid the information gap caused by the division of interacting entities. Second, an innovative Inner-Outer Transformer (IOformer) combining with a new spatio-temporal partition strategy is proposed to simultaneously learn the interactions between intra-partition joints and inter-partition skeletal parts. By comprehensively capturing the key spatio-temporal interactive feature, the accuracy and robustness of interaction recognition can be significantly improved. Extensive experiments on three challenging benchmark datasets validate that our method achieves better performance in comprehensive evaluation methods.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.