{"title":"Video Object Detection Using Motion Context and Feature Aggregation","authors":"Jaekyum Kim, Junho Koh, J. Choi","doi":"10.1109/ICTC49870.2020.9289386","DOIUrl":null,"url":null,"abstract":"The deep learning technique has recently led to significant improvement in object-detection accuracy. Numerous object detection schemes have been designed to process each frame independently. However, in many applications, object detection is performed using video data, which consists of a sequence of image frames. Thus, the object detection accuracy can be improved by exploiting the temporal context of the video sequence. In this paper, we propose a novel video object detection method that exploits both the motion context of the object and spatio-temporal aggregated features to enhance the video object detection performance. First, the motion context of the object is extracted by the correlation operator between the feature maps of two adjacent frames. In addition to generating the motion context, the spatial feature maps for N adjacent frames are aggregated to boost the quality of the feature map with gated attention network.","PeriodicalId":282243,"journal":{"name":"2020 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC49870.2020.9289386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The deep learning technique has recently led to significant improvement in object-detection accuracy. Numerous object detection schemes have been designed to process each frame independently. However, in many applications, object detection is performed using video data, which consists of a sequence of image frames. Thus, the object detection accuracy can be improved by exploiting the temporal context of the video sequence. In this paper, we propose a novel video object detection method that exploits both the motion context of the object and spatio-temporal aggregated features to enhance the video object detection performance. First, the motion context of the object is extracted by the correlation operator between the feature maps of two adjacent frames. In addition to generating the motion context, the spatial feature maps for N adjacent frames are aggregated to boost the quality of the feature map with gated attention network.