{"title":"Adaptive Multi-tasking Framework for Video Action Proposal Localization","authors":"H. Jia","doi":"10.1109/CACML55074.2022.00023","DOIUrl":null,"url":null,"abstract":"This paper focuses on the improvement of the accuracy of activity detection for multi-camera/extended video stream. Most existing methods typically sample frames from video, which is applied by sliding window method. Action localization in video can be divided into different phases: temporal proposal generation and action classification. In the part of proposal generation stage, most of the works choose the static sampling method, that is, in the evaluation stage, the same sampling rules are followed for any input videos. We figure that there is also guidance information for generating a proposal in the training data. In this paper, We propose a Adaptive Multi-tasking Framework, to deliver proposals according to the input video automatically. For each video, we can first establishes a mapping from visual signals to proposal bounding, the starting and ending frames for the proposal, and then combine the generated proposal with state-of-art model SlowFast to finish the action classification task. The framework in this practice was defined as Adaptive Proposal Generation Network(APGN). We train and test our model on the VIRAT dataset, which consists of real outdoor video with non-actors actions. We hope that the accuracy of activity detection will be enhanced by combining our model with some existing activity detection network which based on the old fashion methods. By testing with SlowFast network, we achieve the improvement of Mean Average Precision(mAP) by more than 10 percent. We believe that by replacing the typical sliding window framework with our proposed framework, other models can enhance accuracy and performance, which we will explore more in the future work.","PeriodicalId":137505,"journal":{"name":"2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CACML55074.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper focuses on the improvement of the accuracy of activity detection for multi-camera/extended video stream. Most existing methods typically sample frames from video, which is applied by sliding window method. Action localization in video can be divided into different phases: temporal proposal generation and action classification. In the part of proposal generation stage, most of the works choose the static sampling method, that is, in the evaluation stage, the same sampling rules are followed for any input videos. We figure that there is also guidance information for generating a proposal in the training data. In this paper, We propose a Adaptive Multi-tasking Framework, to deliver proposals according to the input video automatically. For each video, we can first establishes a mapping from visual signals to proposal bounding, the starting and ending frames for the proposal, and then combine the generated proposal with state-of-art model SlowFast to finish the action classification task. The framework in this practice was defined as Adaptive Proposal Generation Network(APGN). We train and test our model on the VIRAT dataset, which consists of real outdoor video with non-actors actions. We hope that the accuracy of activity detection will be enhanced by combining our model with some existing activity detection network which based on the old fashion methods. By testing with SlowFast network, we achieve the improvement of Mean Average Precision(mAP) by more than 10 percent. We believe that by replacing the typical sliding window framework with our proposed framework, other models can enhance accuracy and performance, which we will explore more in the future work.