{"title":"Video Captioning using Pre-Trained CNN and LSTM","authors":"A. Preethi, P. Dhanalakshmi","doi":"10.1109/IConSCEPT57958.2023.10170131","DOIUrl":null,"url":null,"abstract":"Digital video is more prevalent nowadays because of more usage of video data among users. The short and catchy videos among social media attract the attention of people. On the same time, the lengthy videos are found to be left without being fully watched. So, video captioning overcomes this issue by automatically generating captions for a video. The process of generating meaningful natural language sentences for the corresponding scenes in the video is called video captioning. Video captioning involves two steps, namely, feature extraction and caption generation. Here, the pre-trained CNN such as InceptionV3 and VGG16 were used for extracting the features from the video. The caption generation is done through LSTM with the help of extracted features. The relevant captions are achieved using LSTM with the help of word embeddings.","PeriodicalId":240167,"journal":{"name":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IConSCEPT57958.2023.10170131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Digital video is more prevalent nowadays because of more usage of video data among users. The short and catchy videos among social media attract the attention of people. On the same time, the lengthy videos are found to be left without being fully watched. So, video captioning overcomes this issue by automatically generating captions for a video. The process of generating meaningful natural language sentences for the corresponding scenes in the video is called video captioning. Video captioning involves two steps, namely, feature extraction and caption generation. Here, the pre-trained CNN such as InceptionV3 and VGG16 were used for extracting the features from the video. The caption generation is done through LSTM with the help of extracted features. The relevant captions are achieved using LSTM with the help of word embeddings.