Video to Text Study using an Encoder-Decoder Networks Approach

2018 37th International Conference of the Chilean Computer Science Society (SCCC) Pub Date : 2018-11-01 DOI:10.1109/SCCC.2018.8705254

Carlos Ismael Orozco, M. Buemi, J. Jacobo-Berlles

引用次数: 0

Abstract

The automatic generation of video description is currently a topic of interest in computer vision due to applications such as web indexation, video description for people with visual disabilities, among others. In this work we present a Neural Network architecture Encoder-Decoder. First, a Convolutional Neural Network 3D extracts the features of the input video. Then, an Long Short-Term Memory decodes the vector to automatically generate the description of the video. To perform the training and testing we use the Microsoft Video Description Corpus data set (MSVD). Evaluate the performance of our system using the challenge of COCO Image Captioning Challenge. We obtain as results 0.3984, 0.2941 and 0.5052 for the BLEU, METEOR and CIDEr metrics respectively. Competitive results compared with certificates in the bibliography.

查看原文本刊更多论文

使用编码器-解码器网络方法进行视频到文本的学习

由于web索引、视障人士视频描述等应用，视频描述的自动生成目前是计算机视觉领域的一个热门话题。在这项工作中，我们提出了一个神经网络结构的编码器-解码器。首先，卷积神经网络3D提取输入视频的特征。然后，长短期记忆对矢量进行解码，自动生成视频描述。为了进行训练和测试，我们使用微软视频描述语料库数据集(MSVD)。使用COCO图像字幕挑战来评估我们系统的性能。BLEU、METEOR和CIDEr指标的结果分别为0.3984、0.2941和0.5052。与书目中证书的竞争结果比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 37th International Conference of the Chilean Computer Science Society (SCCC)

自引率

0.00%

发文量