Subtitle Generation and Video Scene Indexing using Recurrent Neural Networks

Sajjan Kiran, Umesh Patil, P. S. Shankar, P. Ghuli
{"title":"Subtitle Generation and Video Scene Indexing using Recurrent Neural Networks","authors":"Sajjan Kiran, Umesh Patil, P. S. Shankar, P. Ghuli","doi":"10.1109/ICIRCA51532.2021.9544837","DOIUrl":null,"url":null,"abstract":"Video Subtitles are not only an essential tool for the hearing impaired, but also enhance the user's viewing experience, as they allow users to better understand and interpret different accents, even if they are of the same familiar language. Automatic Speech Recognition Systems would also eradicate the strenuous mechanical process involved in creating subtitle files for movie videos. Searching and indexing of different scenes in a video is still far behind when compared to that available for other forms like text data. With the help of Video Captioning models, the accessibility and indexing requirements of video files can be significantly improved by allowing the users to search for a particular scene/event in a video. This paper discusses about the solution offered to these requirements with the help of sequence-to-sequence recurrent neural networks. The paper also includes the different techniques involved in preprocessing the audio data and extracting features from them, the network architectures, CTC algorithm for backpropagation of error through time, suitable evaluation metrics for Sequence-to-Sequence models and the challenges involved during the designing and training phase of such models.","PeriodicalId":245244,"journal":{"name":"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIRCA51532.2021.9544837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Video Subtitles are not only an essential tool for the hearing impaired, but also enhance the user's viewing experience, as they allow users to better understand and interpret different accents, even if they are of the same familiar language. Automatic Speech Recognition Systems would also eradicate the strenuous mechanical process involved in creating subtitle files for movie videos. Searching and indexing of different scenes in a video is still far behind when compared to that available for other forms like text data. With the help of Video Captioning models, the accessibility and indexing requirements of video files can be significantly improved by allowing the users to search for a particular scene/event in a video. This paper discusses about the solution offered to these requirements with the help of sequence-to-sequence recurrent neural networks. The paper also includes the different techniques involved in preprocessing the audio data and extracting features from them, the network architectures, CTC algorithm for backpropagation of error through time, suitable evaluation metrics for Sequence-to-Sequence models and the challenges involved during the designing and training phase of such models.
基于循环神经网络的字幕生成和视频场景索引
视频字幕不仅是听障人士必不可少的工具,而且可以增强用户的观看体验,因为它们可以让用户更好地理解和解释不同的口音,即使他们是同一种熟悉的语言。自动语音识别系统还将消除为电影视频制作字幕文件所涉及的费力的机械过程。视频中不同场景的搜索和索引与其他形式(如文本数据)相比仍然远远落后。在视频字幕模型的帮助下,通过允许用户搜索视频中的特定场景/事件,可以显著提高视频文件的可访问性和索引要求。本文讨论了利用序列间递归神经网络解决这些问题的方法。本文还包括音频数据预处理和特征提取所涉及的不同技术、网络架构、误差随时间反向传播的CTC算法、序列到序列模型的合适评估指标以及这些模型在设计和训练阶段所涉及的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信