Bowen Chen;Wei Nie;Haoyu Ji;Weihong Ren;Qiyi Tong;Zhiyong Wang;Honghai Liu
{"title":"基于分层时间建模和预测集成的多尺度骨架时间动作分割","authors":"Bowen Chen;Wei Nie;Haoyu Ji;Weihong Ren;Qiyi Tong;Zhiyong Wang;Honghai Liu","doi":"10.1109/TCYB.2025.3559660","DOIUrl":null,"url":null,"abstract":"Skeleton-based temporal action segmentation (TAS) decomposes untrimmed skeleton sequence into meaningful segments. The variance in temporal scale challenges the skeleton modeling network to seek a balance between over-segmentation and under-segmentation. Current methods often rely on parallel multiscale feature extractors and additional refinement modules to mitigate the multiscale issue, which brings significant computations and complexity. To address these issues, this article proposes multiscale skeleton-based TAS (MSTAS), consisting of temporal probability pyramid (TPP) and smoothed multiscale ensemble (SME). TPP represents each action as a collection of multiscale probability distributions using a U-shape hierarchical temporal pyramid. Subsequently, SME takes the average of distributions instead of deploying additional refinement stages to achieve action segmentation. Considering the over-confident issue that exists in each scale, SME incorporates a novel label smoothing phase to improve the probability distributions by dynamically calibrating the confidence of each scale. Experimental results on four public datasets show that the MSTAS achieves state-of-the-art performance with less computation overheads, such as +1.1% accuracy and +2.8% F1@0.5 on the challenging LARa dataset with 70% fewer parameters and 80% fewer GFLOPS. Benefiting from confidence calibration, the MSTAS efficiently utilizes more temporal scales while keeping better calibration for ambiguous action instances. Additionally, the U-shape pyramid demonstrates a strong compatibility with classical refinement module, enabling the efficient extraction of multiscale motion representations.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 6","pages":"2779-2791"},"PeriodicalIF":9.4000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiscale Skeleton-Based Temporal Action Segmentation Using Hierarchical Temporal Modeling and Prediction Ensemble\",\"authors\":\"Bowen Chen;Wei Nie;Haoyu Ji;Weihong Ren;Qiyi Tong;Zhiyong Wang;Honghai Liu\",\"doi\":\"10.1109/TCYB.2025.3559660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Skeleton-based temporal action segmentation (TAS) decomposes untrimmed skeleton sequence into meaningful segments. The variance in temporal scale challenges the skeleton modeling network to seek a balance between over-segmentation and under-segmentation. Current methods often rely on parallel multiscale feature extractors and additional refinement modules to mitigate the multiscale issue, which brings significant computations and complexity. To address these issues, this article proposes multiscale skeleton-based TAS (MSTAS), consisting of temporal probability pyramid (TPP) and smoothed multiscale ensemble (SME). TPP represents each action as a collection of multiscale probability distributions using a U-shape hierarchical temporal pyramid. Subsequently, SME takes the average of distributions instead of deploying additional refinement stages to achieve action segmentation. Considering the over-confident issue that exists in each scale, SME incorporates a novel label smoothing phase to improve the probability distributions by dynamically calibrating the confidence of each scale. Experimental results on four public datasets show that the MSTAS achieves state-of-the-art performance with less computation overheads, such as +1.1% accuracy and +2.8% F1@0.5 on the challenging LARa dataset with 70% fewer parameters and 80% fewer GFLOPS. Benefiting from confidence calibration, the MSTAS efficiently utilizes more temporal scales while keeping better calibration for ambiguous action instances. Additionally, the U-shape pyramid demonstrates a strong compatibility with classical refinement module, enabling the efficient extraction of multiscale motion representations.\",\"PeriodicalId\":13112,\"journal\":{\"name\":\"IEEE Transactions on Cybernetics\",\"volume\":\"55 6\",\"pages\":\"2779-2791\"},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cybernetics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10974698/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10974698/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Multiscale Skeleton-Based Temporal Action Segmentation Using Hierarchical Temporal Modeling and Prediction Ensemble
Skeleton-based temporal action segmentation (TAS) decomposes untrimmed skeleton sequence into meaningful segments. The variance in temporal scale challenges the skeleton modeling network to seek a balance between over-segmentation and under-segmentation. Current methods often rely on parallel multiscale feature extractors and additional refinement modules to mitigate the multiscale issue, which brings significant computations and complexity. To address these issues, this article proposes multiscale skeleton-based TAS (MSTAS), consisting of temporal probability pyramid (TPP) and smoothed multiscale ensemble (SME). TPP represents each action as a collection of multiscale probability distributions using a U-shape hierarchical temporal pyramid. Subsequently, SME takes the average of distributions instead of deploying additional refinement stages to achieve action segmentation. Considering the over-confident issue that exists in each scale, SME incorporates a novel label smoothing phase to improve the probability distributions by dynamically calibrating the confidence of each scale. Experimental results on four public datasets show that the MSTAS achieves state-of-the-art performance with less computation overheads, such as +1.1% accuracy and +2.8% F1@0.5 on the challenging LARa dataset with 70% fewer parameters and 80% fewer GFLOPS. Benefiting from confidence calibration, the MSTAS efficiently utilizes more temporal scales while keeping better calibration for ambiguous action instances. Additionally, the U-shape pyramid demonstrates a strong compatibility with classical refinement module, enabling the efficient extraction of multiscale motion representations.
期刊介绍:
The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.