基于全局时空信息编码器-解码器的无剪辑视频中的动作分割

IF 6.6 1区 计算机科学 Q1 Multidisciplinary
Yichao Liu;Yiyang Sun;Zhide Chen;Chen Feng;Kexin Zhu
{"title":"基于全局时空信息编码器-解码器的无剪辑视频中的动作分割","authors":"Yichao Liu;Yiyang Sun;Zhide Chen;Chen Feng;Kexin Zhu","doi":"10.26599/TST.2024.9010041","DOIUrl":null,"url":null,"abstract":"Action segmentation has made significant progress, but segmenting and recognizing actions from untrimmed long videos remains a challenging problem. Most state-of-the-art methods focus on designing models based on temporal convolution. However, the limitations of modeling long-term temporal dependencies and the inflexibility of temporal convolutions restrict the potential of these models. To address the issue of over-segmentation in existing action segmentation methods, which leads to classification errors and reduced segmentation quality, this paper proposes a global spatial-temporal information encoder-decoder based action segmentation method. The method proposed in this paper uses the global temporal information captured by refinement layer to assist the Encoder-Decoder (ED) structure in judging the action segmentation point more accurately and, at the same time, suppress the excessive segmentation phenomenon caused by the ED structure. The method proposed in this paper achieves 93% frame accuracy on the constructed real Tai Chi action dataset. The experimental results prove that this method can accurately and efficiently complete the long video action segmentation task.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 1","pages":"290-302"},"PeriodicalIF":6.6000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10676351","citationCount":"0","resultStr":"{\"title\":\"Global Spatial-Temporal Information Encoder-Decoder Based Action Segmentation in Untrimmed Video\",\"authors\":\"Yichao Liu;Yiyang Sun;Zhide Chen;Chen Feng;Kexin Zhu\",\"doi\":\"10.26599/TST.2024.9010041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Action segmentation has made significant progress, but segmenting and recognizing actions from untrimmed long videos remains a challenging problem. Most state-of-the-art methods focus on designing models based on temporal convolution. However, the limitations of modeling long-term temporal dependencies and the inflexibility of temporal convolutions restrict the potential of these models. To address the issue of over-segmentation in existing action segmentation methods, which leads to classification errors and reduced segmentation quality, this paper proposes a global spatial-temporal information encoder-decoder based action segmentation method. The method proposed in this paper uses the global temporal information captured by refinement layer to assist the Encoder-Decoder (ED) structure in judging the action segmentation point more accurately and, at the same time, suppress the excessive segmentation phenomenon caused by the ED structure. The method proposed in this paper achieves 93% frame accuracy on the constructed real Tai Chi action dataset. The experimental results prove that this method can accurately and efficiently complete the long video action segmentation task.\",\"PeriodicalId\":48690,\"journal\":{\"name\":\"Tsinghua Science and Technology\",\"volume\":\"30 1\",\"pages\":\"290-302\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10676351\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Tsinghua Science and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10676351/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10676351/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

摘要

动作分割技术已经取得了重大进展,但从未经剪辑的长视频中分割和识别动作仍然是一个具有挑战性的问题。大多数最先进的方法都侧重于设计基于时态卷积的模型。然而,长期时间依赖性建模的局限性和时间卷积的不灵活性限制了这些模型的潜力。为了解决现有动作分割方法中存在的过度分割问题,即导致分类错误和分割质量下降的问题,本文提出了一种基于全局时空信息编码器-解码器的动作分割方法。本文提出的方法利用细化层捕获的全局时空信息,辅助编码器-解码器(ED)结构更准确地判断动作分割点,同时抑制 ED 结构造成的过度分割现象。本文提出的方法在构建的真实太极拳动作数据集上实现了 93% 的帧准确率。实验结果证明,该方法可以准确高效地完成长视频动作分割任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Global Spatial-Temporal Information Encoder-Decoder Based Action Segmentation in Untrimmed Video
Action segmentation has made significant progress, but segmenting and recognizing actions from untrimmed long videos remains a challenging problem. Most state-of-the-art methods focus on designing models based on temporal convolution. However, the limitations of modeling long-term temporal dependencies and the inflexibility of temporal convolutions restrict the potential of these models. To address the issue of over-segmentation in existing action segmentation methods, which leads to classification errors and reduced segmentation quality, this paper proposes a global spatial-temporal information encoder-decoder based action segmentation method. The method proposed in this paper uses the global temporal information captured by refinement layer to assist the Encoder-Decoder (ED) structure in judging the action segmentation point more accurately and, at the same time, suppress the excessive segmentation phenomenon caused by the ED structure. The method proposed in this paper achieves 93% frame accuracy on the constructed real Tai Chi action dataset. The experimental results prove that this method can accurately and efficiently complete the long video action segmentation task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Tsinghua Science and Technology
Tsinghua Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMSCOMPU-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
10.20
自引率
10.60%
发文量
2340
期刊介绍: Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信