Adaptive Multi-tasking Framework for Video Action Proposal Localization

2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML) Pub Date : 2022-03-01 DOI:10.1109/CACML55074.2022.00023

H. Jia

{"title":"Adaptive Multi-tasking Framework for Video Action Proposal Localization","authors":"H. Jia","doi":"10.1109/CACML55074.2022.00023","DOIUrl":null,"url":null,"abstract":"This paper focuses on the improvement of the accuracy of activity detection for multi-camera/extended video stream. Most existing methods typically sample frames from video, which is applied by sliding window method. Action localization in video can be divided into different phases: temporal proposal generation and action classification. In the part of proposal generation stage, most of the works choose the static sampling method, that is, in the evaluation stage, the same sampling rules are followed for any input videos. We figure that there is also guidance information for generating a proposal in the training data. In this paper, We propose a Adaptive Multi-tasking Framework, to deliver proposals according to the input video automatically. For each video, we can first establishes a mapping from visual signals to proposal bounding, the starting and ending frames for the proposal, and then combine the generated proposal with state-of-art model SlowFast to finish the action classification task. The framework in this practice was defined as Adaptive Proposal Generation Network(APGN). We train and test our model on the VIRAT dataset, which consists of real outdoor video with non-actors actions. We hope that the accuracy of activity detection will be enhanced by combining our model with some existing activity detection network which based on the old fashion methods. By testing with SlowFast network, we achieve the improvement of Mean Average Precision(mAP) by more than 10 percent. We believe that by replacing the typical sliding window framework with our proposed framework, other models can enhance accuracy and performance, which we will explore more in the future work.","PeriodicalId":137505,"journal":{"name":"2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CACML55074.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper focuses on the improvement of the accuracy of activity detection for multi-camera/extended video stream. Most existing methods typically sample frames from video, which is applied by sliding window method. Action localization in video can be divided into different phases: temporal proposal generation and action classification. In the part of proposal generation stage, most of the works choose the static sampling method, that is, in the evaluation stage, the same sampling rules are followed for any input videos. We figure that there is also guidance information for generating a proposal in the training data. In this paper, We propose a Adaptive Multi-tasking Framework, to deliver proposals according to the input video automatically. For each video, we can first establishes a mapping from visual signals to proposal bounding, the starting and ending frames for the proposal, and then combine the generated proposal with state-of-art model SlowFast to finish the action classification task. The framework in this practice was defined as Adaptive Proposal Generation Network(APGN). We train and test our model on the VIRAT dataset, which consists of real outdoor video with non-actors actions. We hope that the accuracy of activity detection will be enhanced by combining our model with some existing activity detection network which based on the old fashion methods. By testing with SlowFast network, we achieve the improvement of Mean Average Precision(mAP) by more than 10 percent. We believe that by replacing the typical sliding window framework with our proposed framework, other models can enhance accuracy and performance, which we will explore more in the future work.

查看原文本刊更多论文

视频动作提案定位的自适应多任务框架

本文主要研究多摄像机/扩展视频流下活动检测精度的提高问题。现有的方法大多是从视频中抽取帧，采用滑动窗口的方法。视频中的动作定位可以分为时间提案生成和动作分类两个阶段。在提案生成阶段，大部分作品选择静态抽样方法，即在评估阶段，对任何输入视频都遵循相同的抽样规则。我们认为，在训练数据中也有用于生成提案的指导信息。在本文中，我们提出了一个自适应多任务框架，根据输入的视频自动发送提案。对于每个视频，我们可以首先建立一个从视觉信号到提案边界、提案的开始帧和结束帧的映射，然后将生成的提案与最先进的模型SlowFast结合起来完成动作分类任务。本实践中的框架被定义为自适应提案生成网络(APGN)。我们在VIRAT数据集上训练和测试我们的模型，该数据集由具有非演员动作的真实户外视频组成。我们希望将我们的模型与现有的一些基于旧方法的活动检测网络相结合，提高活动检测的准确性。通过在SlowFast网络上的测试，我们实现了平均精度(mAP)提高10%以上。我们相信，通过用我们提出的框架取代典型的滑动窗口框架，其他模型可以提高准确性和性能，我们将在未来的工作中进行更多的探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)

自引率

0.00%

发文量