{"title":"使用编码器-解码器网络方法进行视频到文本的学习","authors":"Carlos Ismael Orozco, M. Buemi, J. Jacobo-Berlles","doi":"10.1109/SCCC.2018.8705254","DOIUrl":null,"url":null,"abstract":"The automatic generation of video description is currently a topic of interest in computer vision due to applications such as web indexation, video description for people with visual disabilities, among others. In this work we present a Neural Network architecture Encoder-Decoder. First, a Convolutional Neural Network 3D extracts the features of the input video. Then, an Long Short-Term Memory decodes the vector to automatically generate the description of the video. To perform the training and testing we use the Microsoft Video Description Corpus data set (MSVD). Evaluate the performance of our system using the challenge of COCO Image Captioning Challenge. We obtain as results 0.3984, 0.2941 and 0.5052 for the BLEU, METEOR and CIDEr metrics respectively. Competitive results compared with certificates in the bibliography.","PeriodicalId":235495,"journal":{"name":"2018 37th International Conference of the Chilean Computer Science Society (SCCC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video to Text Study using an Encoder-Decoder Networks Approach\",\"authors\":\"Carlos Ismael Orozco, M. Buemi, J. Jacobo-Berlles\",\"doi\":\"10.1109/SCCC.2018.8705254\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The automatic generation of video description is currently a topic of interest in computer vision due to applications such as web indexation, video description for people with visual disabilities, among others. In this work we present a Neural Network architecture Encoder-Decoder. First, a Convolutional Neural Network 3D extracts the features of the input video. Then, an Long Short-Term Memory decodes the vector to automatically generate the description of the video. To perform the training and testing we use the Microsoft Video Description Corpus data set (MSVD). Evaluate the performance of our system using the challenge of COCO Image Captioning Challenge. We obtain as results 0.3984, 0.2941 and 0.5052 for the BLEU, METEOR and CIDEr metrics respectively. Competitive results compared with certificates in the bibliography.\",\"PeriodicalId\":235495,\"journal\":{\"name\":\"2018 37th International Conference of the Chilean Computer Science Society (SCCC)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 37th International Conference of the Chilean Computer Science Society (SCCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCCC.2018.8705254\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 37th International Conference of the Chilean Computer Science Society (SCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCCC.2018.8705254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Video to Text Study using an Encoder-Decoder Networks Approach
The automatic generation of video description is currently a topic of interest in computer vision due to applications such as web indexation, video description for people with visual disabilities, among others. In this work we present a Neural Network architecture Encoder-Decoder. First, a Convolutional Neural Network 3D extracts the features of the input video. Then, an Long Short-Term Memory decodes the vector to automatically generate the description of the video. To perform the training and testing we use the Microsoft Video Description Corpus data set (MSVD). Evaluate the performance of our system using the challenge of COCO Image Captioning Challenge. We obtain as results 0.3984, 0.2941 and 0.5052 for the BLEU, METEOR and CIDEr metrics respectively. Competitive results compared with certificates in the bibliography.