MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2024-08-12 DOI:10.1016/j.cviu.2024.104109

Lin Chen , Jing Zhang , Yian Zhang , Junpeng Kang , Li Zhuo

{"title":"MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming","authors":"Lin Chen , Jing Zhang , Yian Zhang , Junpeng Kang , Li Zhuo","doi":"10.1016/j.cviu.2024.104109","DOIUrl":null,"url":null,"abstract":"<div><p>Standardized regulation of livestreaming is an important element of cyberspace governance. Temporal action localization (TAL) can localize the occurrence of specific actions to better understand human activities. Due to the short duration and inconspicuous boundaries of human-specific actions, it is very cumbersome to obtain sufficient labeled data for training in untrimmed livestreaming. The point-supervised approach requires only a single-frame annotation for each action instance and can effectively balance cost and performance. Therefore, we propose a memory knowledge propagation network (MKP-Net) for point-supervised temporal action localization in livestreaming, including (1) a plug-and-play memory module is introduced to model prototype features of foreground actions and background knowledge using point-level annotations, (2) the memory knowledge propagation mechanism is used to generate discriminative feature representation in a multi-instance learning pipeline, and (3) localization completeness learning is performed by designing a dual optimization loss for refining and localizing temporal actions. Experimental results show that our method achieves 61.4% and 49.1% SOTAs on THUMOS14 and self-built BJUT-PTAL datasets, respectively, with an inference speed of 711 FPS.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"248 ","pages":"Article 104109"},"PeriodicalIF":3.5000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001905","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Standardized regulation of livestreaming is an important element of cyberspace governance. Temporal action localization (TAL) can localize the occurrence of specific actions to better understand human activities. Due to the short duration and inconspicuous boundaries of human-specific actions, it is very cumbersome to obtain sufficient labeled data for training in untrimmed livestreaming. The point-supervised approach requires only a single-frame annotation for each action instance and can effectively balance cost and performance. Therefore, we propose a memory knowledge propagation network (MKP-Net) for point-supervised temporal action localization in livestreaming, including (1) a plug-and-play memory module is introduced to model prototype features of foreground actions and background knowledge using point-level annotations, (2) the memory knowledge propagation mechanism is used to generate discriminative feature representation in a multi-instance learning pipeline, and (3) localization completeness learning is performed by designing a dual optimization loss for refining and localizing temporal actions. Experimental results show that our method achieves 61.4% and 49.1% SOTAs on THUMOS14 and self-built BJUT-PTAL datasets, respectively, with an inference speed of 711 FPS.

查看原文本刊更多论文

MKP-Net：用于直播中点监督时间动作定位的记忆知识传播网络

对直播进行标准化监管是网络空间治理的一项重要内容。时间动作定位（TAL）可以定位特定动作的发生，从而更好地了解人类活动。由于人类特定行为的持续时间短且边界不明显，因此在未经修剪的直播中获取足够的标记数据进行训练非常麻烦。点监督方法只需要对每个动作实例进行单帧标注，可以有效地平衡成本和性能。因此，我们提出了一种记忆知识传播网络（MKP-Net），用于直播中的点监督时间动作定位，包括：（1）引入即插即用的记忆模块，利用点级注释对前景动作和背景知识的原型特征进行建模；（2）利用记忆知识传播机制在多实例学习管道中生成判别特征表征；（3）通过设计用于细化和定位时间动作的双重优化损失来进行定位完整性学习。实验结果表明，我们的方法在 THUMOS14 和自建的 BJUT-PTAL 数据集上分别实现了 61.4% 和 49.1% 的 SOTAs，推理速度为 711 FPS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems