基于卷积神经网络的视频摘要方法研究

Neural Networks, Information and Communication Engineering Pub Date : 2022-06-30 DOI:10.1117/12.2639224

Ke-xin Zheng, Xiang Chen

{"title":"基于卷积神经网络的视频摘要方法研究","authors":"Ke-xin Zheng, Xiang Chen","doi":"10.1117/12.2639224","DOIUrl":null,"url":null,"abstract":"Short videos on the Internet are growing exponentially, and the number of videos uploaded every day is huge; people also involve a lot of video data in real life. People can retrieve and view all kinds of videos, but it also brings a lot of problems. On the one hand, the accumulation of a large number of videos makes people unable to find the videos they want quickly, and the repeated scenes in the videos will also waste people's time and energy; on the other hand, a large amount of video data also brings enormous pressure to storage. Aiming at the problems of inaccurate selection of key frames and how to select video frame features in existing video summarization models, this paper proposes a multi-feature-based video summarization generation model (DME-VSNet), which extracts multiple features of video frames. Including importance score, image memory strength and image entropy. Aiming at the problem of inaccurate video shot segmentation, this model proposes a video shot segmentation algorithm based on TransNet network, which divides the original video into several short shots through shot boundaries; the model inputs the above three features into the proposed The video frame score is obtained in the MLP architecture, and the key frame is selected by the score to generate a video summary. The effectiveness of the video shot segmentation method based on TransNet network and the overall model based on convolutional neural network is verified by comparative experiments. The experimental results show that the evaluation results of the video summaries generated by the three features are better.","PeriodicalId":336892,"journal":{"name":"Neural Networks, Information and Communication Engineering","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on video summarization method based on convolutional neural network\",\"authors\":\"Ke-xin Zheng, Xiang Chen\",\"doi\":\"10.1117/12.2639224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Short videos on the Internet are growing exponentially, and the number of videos uploaded every day is huge; people also involve a lot of video data in real life. People can retrieve and view all kinds of videos, but it also brings a lot of problems. On the one hand, the accumulation of a large number of videos makes people unable to find the videos they want quickly, and the repeated scenes in the videos will also waste people's time and energy; on the other hand, a large amount of video data also brings enormous pressure to storage. Aiming at the problems of inaccurate selection of key frames and how to select video frame features in existing video summarization models, this paper proposes a multi-feature-based video summarization generation model (DME-VSNet), which extracts multiple features of video frames. Including importance score, image memory strength and image entropy. Aiming at the problem of inaccurate video shot segmentation, this model proposes a video shot segmentation algorithm based on TransNet network, which divides the original video into several short shots through shot boundaries; the model inputs the above three features into the proposed The video frame score is obtained in the MLP architecture, and the key frame is selected by the score to generate a video summary. The effectiveness of the video shot segmentation method based on TransNet network and the overall model based on convolutional neural network is verified by comparative experiments. The experimental results show that the evaluation results of the video summaries generated by the three features are better.\",\"PeriodicalId\":336892,\"journal\":{\"name\":\"Neural Networks, Information and Communication Engineering\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks, Information and Communication Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2639224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks, Information and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2639224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

互联网上的短视频呈指数级增长，每天上传的视频数量巨大;人们在现实生活中也会涉及到大量的视频数据。人们可以检索和观看各种各样的视频，但这也带来了很多问题。一方面，大量视频的积累使得人们无法快速找到自己想要的视频，视频中反复出现的场景也会浪费人们的时间和精力;另一方面，海量的视频数据也给存储带来了巨大的压力。针对现有视频摘要模型中关键帧选择不准确以及如何选择视频帧特征的问题，本文提出了一种基于多特征的视频摘要生成模型(DME-VSNet)，该模型提取视频帧的多个特征。包括重要性评分、图像记忆强度和图像熵。针对视频镜头分割不准确的问题，该模型提出了一种基于TransNet网络的视频镜头分割算法，该算法通过镜头边界将原始视频分割为多个短镜头;该模型将上述三个特征输入到所提出的视频帧分数中，在MLP架构中得到视频帧分数，并根据分数选择关键帧生成视频摘要。通过对比实验验证了基于TransNet网络的视频镜头分割方法和基于卷积神经网络的整体模型的有效性。实验结果表明，三种特征生成的视频摘要评价结果较好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on video summarization method based on convolutional neural network

Short videos on the Internet are growing exponentially, and the number of videos uploaded every day is huge; people also involve a lot of video data in real life. People can retrieve and view all kinds of videos, but it also brings a lot of problems. On the one hand, the accumulation of a large number of videos makes people unable to find the videos they want quickly, and the repeated scenes in the videos will also waste people's time and energy; on the other hand, a large amount of video data also brings enormous pressure to storage. Aiming at the problems of inaccurate selection of key frames and how to select video frame features in existing video summarization models, this paper proposes a multi-feature-based video summarization generation model (DME-VSNet), which extracts multiple features of video frames. Including importance score, image memory strength and image entropy. Aiming at the problem of inaccurate video shot segmentation, this model proposes a video shot segmentation algorithm based on TransNet network, which divides the original video into several short shots through shot boundaries; the model inputs the above three features into the proposed The video frame score is obtained in the MLP architecture, and the key frame is selected by the score to generate a video summary. The effectiveness of the video shot segmentation method based on TransNet network and the overall model based on convolutional neural network is verified by comparative experiments. The experimental results show that the evaluation results of the video summaries generated by the three features are better.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks, Information and Communication Engineering

自引率

0.00%

发文量