RO-TextCNN Based MUL-MOVE-Net for Camera Motion Classification

Zeyu Chen, Yana Zhang, Lianyi Zhang, Cheng Yang
{"title":"RO-TextCNN Based MUL-MOVE-Net for Camera Motion Classification","authors":"Zeyu Chen, Yana Zhang, Lianyi Zhang, Cheng Yang","doi":"10.1109/icisfall51598.2021.9627386","DOIUrl":null,"url":null,"abstract":"Video auto-editing is a new application of artificial intelligence technology in the media industry. Camera motion is one of the critical characteristics of videos and is vital for shot arrangement. Rules based camera motion classification algorithms generalize poorly from one dataset to another. Machine learning or deep learning based algorithms have better cross-task performance. However, in order to remove the motion information in the foreground area, researchers have to use semantic segmentation neural networks that are computationally expensive. Existing foreground segmentation algorithms are only effective for samples with clear foreground areas. The salient areas generated by the saliency segmentation algorithms are not the same as the foreground areas in many cases. This paper proposes a novel deep learning based camera motion classification framework MUL-MOVE-Net, which is composed of multiple instantaneous motion networks MOVE-Net. In MOVE-Net, a lightweight RO- TextCNN module is proposed to learn multi-scale templates in the 1D angle histogram of optical flow information. Without using semantic segmentation network, the algorithm is capable of foreground fault tolerance while ensuring efficiency. For the experiments, a dataset MOVE-SET is constructed with more than 100, 000 pairs of instantaneous camera motion samples. On the testing set, our algorithm achieves an accuracy of 95.3% and a Macro-F1 value of 0.9385. In the shot-level motion classification task, the accuracy of MUL-MOVE-Net gets 4% higher than that of SGNet, and Macro-F1 0.3 higher. As a result, MUL-MOVE-Net could efficiently classify the camera motion in real time and is helpful for video auto-editing.","PeriodicalId":240142,"journal":{"name":"2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icisfall51598.2021.9627386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Video auto-editing is a new application of artificial intelligence technology in the media industry. Camera motion is one of the critical characteristics of videos and is vital for shot arrangement. Rules based camera motion classification algorithms generalize poorly from one dataset to another. Machine learning or deep learning based algorithms have better cross-task performance. However, in order to remove the motion information in the foreground area, researchers have to use semantic segmentation neural networks that are computationally expensive. Existing foreground segmentation algorithms are only effective for samples with clear foreground areas. The salient areas generated by the saliency segmentation algorithms are not the same as the foreground areas in many cases. This paper proposes a novel deep learning based camera motion classification framework MUL-MOVE-Net, which is composed of multiple instantaneous motion networks MOVE-Net. In MOVE-Net, a lightweight RO- TextCNN module is proposed to learn multi-scale templates in the 1D angle histogram of optical flow information. Without using semantic segmentation network, the algorithm is capable of foreground fault tolerance while ensuring efficiency. For the experiments, a dataset MOVE-SET is constructed with more than 100, 000 pairs of instantaneous camera motion samples. On the testing set, our algorithm achieves an accuracy of 95.3% and a Macro-F1 value of 0.9385. In the shot-level motion classification task, the accuracy of MUL-MOVE-Net gets 4% higher than that of SGNet, and Macro-F1 0.3 higher. As a result, MUL-MOVE-Net could efficiently classify the camera motion in real time and is helpful for video auto-editing.
基于RO-TextCNN的multi - move - net摄像机运动分类
视频自动编辑是人工智能技术在媒体行业的新应用。摄像机的运动是视频的重要特征之一,对镜头的安排至关重要。基于规则的摄像机运动分类算法在不同数据集之间泛化能力较差。机器学习或基于深度学习的算法具有更好的跨任务性能。然而,为了去除前景区域的运动信息,研究人员不得不使用语义分割神经网络,这在计算上是昂贵的。现有的前景分割算法仅对前景区域清晰的样本有效。在许多情况下,显著性分割算法生成的显著区域与前景区域并不相同。本文提出了一种新的基于深度学习的摄像机运动分类框架mu -MOVE-Net,该框架由多个瞬时运动网络MOVE-Net组成。在MOVE-Net中,提出了一种轻量级的RO- TextCNN模块来学习光流信息一维角度直方图中的多尺度模板。该算法在不使用语义分割网络的情况下,在保证效率的同时具有前台容错能力。在实验中,使用超过10万对瞬时相机运动样本构建了MOVE-SET数据集。在测试集上,我们的算法准确率达到95.3%,Macro-F1值为0.9385。在镜头级运动分类任务中,mull - move - net的准确率比SGNet高4%,Macro-F1高0.3。结果表明,multi - move - net能够有效地对摄像机运动进行实时分类,有利于视频的自动编辑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信