Multiscale Skeleton-Based Temporal Action Segmentation Using Hierarchical Temporal Modeling and Prediction Ensemble

IF 9.4 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Bowen Chen;Wei Nie;Haoyu Ji;Weihong Ren;Qiyi Tong;Zhiyong Wang;Honghai Liu
{"title":"Multiscale Skeleton-Based Temporal Action Segmentation Using Hierarchical Temporal Modeling and Prediction Ensemble","authors":"Bowen Chen;Wei Nie;Haoyu Ji;Weihong Ren;Qiyi Tong;Zhiyong Wang;Honghai Liu","doi":"10.1109/TCYB.2025.3559660","DOIUrl":null,"url":null,"abstract":"Skeleton-based temporal action segmentation (TAS) decomposes untrimmed skeleton sequence into meaningful segments. The variance in temporal scale challenges the skeleton modeling network to seek a balance between over-segmentation and under-segmentation. Current methods often rely on parallel multiscale feature extractors and additional refinement modules to mitigate the multiscale issue, which brings significant computations and complexity. To address these issues, this article proposes multiscale skeleton-based TAS (MSTAS), consisting of temporal probability pyramid (TPP) and smoothed multiscale ensemble (SME). TPP represents each action as a collection of multiscale probability distributions using a U-shape hierarchical temporal pyramid. Subsequently, SME takes the average of distributions instead of deploying additional refinement stages to achieve action segmentation. Considering the over-confident issue that exists in each scale, SME incorporates a novel label smoothing phase to improve the probability distributions by dynamically calibrating the confidence of each scale. Experimental results on four public datasets show that the MSTAS achieves state-of-the-art performance with less computation overheads, such as +1.1% accuracy and +2.8% F1@0.5 on the challenging LARa dataset with 70% fewer parameters and 80% fewer GFLOPS. Benefiting from confidence calibration, the MSTAS efficiently utilizes more temporal scales while keeping better calibration for ambiguous action instances. Additionally, the U-shape pyramid demonstrates a strong compatibility with classical refinement module, enabling the efficient extraction of multiscale motion representations.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 6","pages":"2779-2791"},"PeriodicalIF":9.4000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10974698/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Skeleton-based temporal action segmentation (TAS) decomposes untrimmed skeleton sequence into meaningful segments. The variance in temporal scale challenges the skeleton modeling network to seek a balance between over-segmentation and under-segmentation. Current methods often rely on parallel multiscale feature extractors and additional refinement modules to mitigate the multiscale issue, which brings significant computations and complexity. To address these issues, this article proposes multiscale skeleton-based TAS (MSTAS), consisting of temporal probability pyramid (TPP) and smoothed multiscale ensemble (SME). TPP represents each action as a collection of multiscale probability distributions using a U-shape hierarchical temporal pyramid. Subsequently, SME takes the average of distributions instead of deploying additional refinement stages to achieve action segmentation. Considering the over-confident issue that exists in each scale, SME incorporates a novel label smoothing phase to improve the probability distributions by dynamically calibrating the confidence of each scale. Experimental results on four public datasets show that the MSTAS achieves state-of-the-art performance with less computation overheads, such as +1.1% accuracy and +2.8% F1@0.5 on the challenging LARa dataset with 70% fewer parameters and 80% fewer GFLOPS. Benefiting from confidence calibration, the MSTAS efficiently utilizes more temporal scales while keeping better calibration for ambiguous action instances. Additionally, the U-shape pyramid demonstrates a strong compatibility with classical refinement module, enabling the efficient extraction of multiscale motion representations.
基于分层时间建模和预测集成的多尺度骨架时间动作分割
基于骨架的时间动作分割(TAS)将未修剪的骨架序列分解为有意义的片段。时间尺度的变化给骨架建模网络在过度分割和欠分割之间寻求平衡提出了挑战。目前的方法通常依赖于并行多尺度特征提取器和附加的细化模块来缓解多尺度问题,这带来了巨大的计算量和复杂性。为了解决这些问题,本文提出了由时间概率金字塔(TPP)和光滑多尺度集合(SME)组成的基于骨架的多尺度TAS (MSTAS)。TPP将每个动作表示为使用u形分层时间金字塔的多尺度概率分布的集合。随后,SME取分布的平均值,而不是部署额外的细化阶段来实现动作细分。考虑到每个尺度存在的过度自信问题,SME引入了一种新颖的标签平滑阶段,通过动态校准每个尺度的置信度来改善概率分布。在四个公共数据集上的实验结果表明,MSTAS在具有挑战性的LARa数据集上以更少的计算开销获得了最先进的性能,例如+1.1%的准确率和+2.8% F1@0.5,参数减少了70%,GFLOPS减少了80%。受益于置信度校准,MSTAS有效地利用了更多的时间尺度,同时对模糊的动作实例保持了更好的校准。此外,u形金字塔与经典的细化模块具有很强的兼容性,可以有效地提取多尺度运动表示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Cybernetics
IEEE Transactions on Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
25.40
自引率
11.00%
发文量
1869
期刊介绍: The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信