RO-TextCNN Based MUL-MOVE-Net for Camera Motion Classification

2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall) Pub Date : 2021-10-13 DOI:10.1109/icisfall51598.2021.9627386

Zeyu Chen, Yana Zhang, Lianyi Zhang, Cheng Yang

{"title":"RO-TextCNN Based MUL-MOVE-Net for Camera Motion Classification","authors":"Zeyu Chen, Yana Zhang, Lianyi Zhang, Cheng Yang","doi":"10.1109/icisfall51598.2021.9627386","DOIUrl":null,"url":null,"abstract":"Video auto-editing is a new application of artificial intelligence technology in the media industry. Camera motion is one of the critical characteristics of videos and is vital for shot arrangement. Rules based camera motion classification algorithms generalize poorly from one dataset to another. Machine learning or deep learning based algorithms have better cross-task performance. However, in order to remove the motion information in the foreground area, researchers have to use semantic segmentation neural networks that are computationally expensive. Existing foreground segmentation algorithms are only effective for samples with clear foreground areas. The salient areas generated by the saliency segmentation algorithms are not the same as the foreground areas in many cases. This paper proposes a novel deep learning based camera motion classification framework MUL-MOVE-Net, which is composed of multiple instantaneous motion networks MOVE-Net. In MOVE-Net, a lightweight RO- TextCNN module is proposed to learn multi-scale templates in the 1D angle histogram of optical flow information. Without using semantic segmentation network, the algorithm is capable of foreground fault tolerance while ensuring efficiency. For the experiments, a dataset MOVE-SET is constructed with more than 100, 000 pairs of instantaneous camera motion samples. On the testing set, our algorithm achieves an accuracy of 95.3% and a Macro-F1 value of 0.9385. In the shot-level motion classification task, the accuracy of MUL-MOVE-Net gets 4% higher than that of SGNet, and Macro-F1 0.3 higher. As a result, MUL-MOVE-Net could efficiently classify the camera motion in real time and is helpful for video auto-editing.","PeriodicalId":240142,"journal":{"name":"2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icisfall51598.2021.9627386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Video auto-editing is a new application of artificial intelligence technology in the media industry. Camera motion is one of the critical characteristics of videos and is vital for shot arrangement. Rules based camera motion classification algorithms generalize poorly from one dataset to another. Machine learning or deep learning based algorithms have better cross-task performance. However, in order to remove the motion information in the foreground area, researchers have to use semantic segmentation neural networks that are computationally expensive. Existing foreground segmentation algorithms are only effective for samples with clear foreground areas. The salient areas generated by the saliency segmentation algorithms are not the same as the foreground areas in many cases. This paper proposes a novel deep learning based camera motion classification framework MUL-MOVE-Net, which is composed of multiple instantaneous motion networks MOVE-Net. In MOVE-Net, a lightweight RO- TextCNN module is proposed to learn multi-scale templates in the 1D angle histogram of optical flow information. Without using semantic segmentation network, the algorithm is capable of foreground fault tolerance while ensuring efficiency. For the experiments, a dataset MOVE-SET is constructed with more than 100, 000 pairs of instantaneous camera motion samples. On the testing set, our algorithm achieves an accuracy of 95.3% and a Macro-F1 value of 0.9385. In the shot-level motion classification task, the accuracy of MUL-MOVE-Net gets 4% higher than that of SGNet, and Macro-F1 0.3 higher. As a result, MUL-MOVE-Net could efficiently classify the camera motion in real time and is helpful for video auto-editing.

查看原文本刊更多论文

基于RO-TextCNN的multi - move - net摄像机运动分类

视频自动编辑是人工智能技术在媒体行业的新应用。摄像机的运动是视频的重要特征之一，对镜头的安排至关重要。基于规则的摄像机运动分类算法在不同数据集之间泛化能力较差。机器学习或基于深度学习的算法具有更好的跨任务性能。然而，为了去除前景区域的运动信息，研究人员不得不使用语义分割神经网络，这在计算上是昂贵的。现有的前景分割算法仅对前景区域清晰的样本有效。在许多情况下，显著性分割算法生成的显著区域与前景区域并不相同。本文提出了一种新的基于深度学习的摄像机运动分类框架mu -MOVE-Net，该框架由多个瞬时运动网络MOVE-Net组成。在MOVE-Net中，提出了一种轻量级的RO- TextCNN模块来学习光流信息一维角度直方图中的多尺度模板。该算法在不使用语义分割网络的情况下，在保证效率的同时具有前台容错能力。在实验中，使用超过10万对瞬时相机运动样本构建了MOVE-SET数据集。在测试集上，我们的算法准确率达到95.3%，Macro-F1值为0.9385。在镜头级运动分类任务中，mull - move - net的准确率比SGNet高4%，Macro-F1高0.3。结果表明，multi - move - net能够有效地对摄像机运动进行实时分类，有利于视频的自动编辑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall)

自引率

0.00%

发文量