{"title":"RO-TextCNN Based MUL-MOVE-Net for Camera Motion Classification","authors":"Zeyu Chen, Yana Zhang, Lianyi Zhang, Cheng Yang","doi":"10.1109/icisfall51598.2021.9627386","DOIUrl":null,"url":null,"abstract":"Video auto-editing is a new application of artificial intelligence technology in the media industry. Camera motion is one of the critical characteristics of videos and is vital for shot arrangement. Rules based camera motion classification algorithms generalize poorly from one dataset to another. Machine learning or deep learning based algorithms have better cross-task performance. However, in order to remove the motion information in the foreground area, researchers have to use semantic segmentation neural networks that are computationally expensive. Existing foreground segmentation algorithms are only effective for samples with clear foreground areas. The salient areas generated by the saliency segmentation algorithms are not the same as the foreground areas in many cases. This paper proposes a novel deep learning based camera motion classification framework MUL-MOVE-Net, which is composed of multiple instantaneous motion networks MOVE-Net. In MOVE-Net, a lightweight RO- TextCNN module is proposed to learn multi-scale templates in the 1D angle histogram of optical flow information. Without using semantic segmentation network, the algorithm is capable of foreground fault tolerance while ensuring efficiency. For the experiments, a dataset MOVE-SET is constructed with more than 100, 000 pairs of instantaneous camera motion samples. On the testing set, our algorithm achieves an accuracy of 95.3% and a Macro-F1 value of 0.9385. In the shot-level motion classification task, the accuracy of MUL-MOVE-Net gets 4% higher than that of SGNet, and Macro-F1 0.3 higher. As a result, MUL-MOVE-Net could efficiently classify the camera motion in real time and is helpful for video auto-editing.","PeriodicalId":240142,"journal":{"name":"2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icisfall51598.2021.9627386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Video auto-editing is a new application of artificial intelligence technology in the media industry. Camera motion is one of the critical characteristics of videos and is vital for shot arrangement. Rules based camera motion classification algorithms generalize poorly from one dataset to another. Machine learning or deep learning based algorithms have better cross-task performance. However, in order to remove the motion information in the foreground area, researchers have to use semantic segmentation neural networks that are computationally expensive. Existing foreground segmentation algorithms are only effective for samples with clear foreground areas. The salient areas generated by the saliency segmentation algorithms are not the same as the foreground areas in many cases. This paper proposes a novel deep learning based camera motion classification framework MUL-MOVE-Net, which is composed of multiple instantaneous motion networks MOVE-Net. In MOVE-Net, a lightweight RO- TextCNN module is proposed to learn multi-scale templates in the 1D angle histogram of optical flow information. Without using semantic segmentation network, the algorithm is capable of foreground fault tolerance while ensuring efficiency. For the experiments, a dataset MOVE-SET is constructed with more than 100, 000 pairs of instantaneous camera motion samples. On the testing set, our algorithm achieves an accuracy of 95.3% and a Macro-F1 value of 0.9385. In the shot-level motion classification task, the accuracy of MUL-MOVE-Net gets 4% higher than that of SGNet, and Macro-F1 0.3 higher. As a result, MUL-MOVE-Net could efficiently classify the camera motion in real time and is helpful for video auto-editing.