基于循环神经网络的字幕生成和视频场景索引

2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA) Pub Date : 2021-09-02 DOI:10.1109/ICIRCA51532.2021.9544837

Sajjan Kiran, Umesh Patil, P. S. Shankar, P. Ghuli

{"title":"基于循环神经网络的字幕生成和视频场景索引","authors":"Sajjan Kiran, Umesh Patil, P. S. Shankar, P. Ghuli","doi":"10.1109/ICIRCA51532.2021.9544837","DOIUrl":null,"url":null,"abstract":"Video Subtitles are not only an essential tool for the hearing impaired, but also enhance the user's viewing experience, as they allow users to better understand and interpret different accents, even if they are of the same familiar language. Automatic Speech Recognition Systems would also eradicate the strenuous mechanical process involved in creating subtitle files for movie videos. Searching and indexing of different scenes in a video is still far behind when compared to that available for other forms like text data. With the help of Video Captioning models, the accessibility and indexing requirements of video files can be significantly improved by allowing the users to search for a particular scene/event in a video. This paper discusses about the solution offered to these requirements with the help of sequence-to-sequence recurrent neural networks. The paper also includes the different techniques involved in preprocessing the audio data and extracting features from them, the network architectures, CTC algorithm for backpropagation of error through time, suitable evaluation metrics for Sequence-to-Sequence models and the challenges involved during the designing and training phase of such models.","PeriodicalId":245244,"journal":{"name":"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Subtitle Generation and Video Scene Indexing using Recurrent Neural Networks\",\"authors\":\"Sajjan Kiran, Umesh Patil, P. S. Shankar, P. Ghuli\",\"doi\":\"10.1109/ICIRCA51532.2021.9544837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video Subtitles are not only an essential tool for the hearing impaired, but also enhance the user's viewing experience, as they allow users to better understand and interpret different accents, even if they are of the same familiar language. Automatic Speech Recognition Systems would also eradicate the strenuous mechanical process involved in creating subtitle files for movie videos. Searching and indexing of different scenes in a video is still far behind when compared to that available for other forms like text data. With the help of Video Captioning models, the accessibility and indexing requirements of video files can be significantly improved by allowing the users to search for a particular scene/event in a video. This paper discusses about the solution offered to these requirements with the help of sequence-to-sequence recurrent neural networks. The paper also includes the different techniques involved in preprocessing the audio data and extracting features from them, the network architectures, CTC algorithm for backpropagation of error through time, suitable evaluation metrics for Sequence-to-Sequence models and the challenges involved during the designing and training phase of such models.\",\"PeriodicalId\":245244,\"journal\":{\"name\":\"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIRCA51532.2021.9544837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIRCA51532.2021.9544837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

视频字幕不仅是听障人士必不可少的工具，而且可以增强用户的观看体验，因为它们可以让用户更好地理解和解释不同的口音，即使他们是同一种熟悉的语言。自动语音识别系统还将消除为电影视频制作字幕文件所涉及的费力的机械过程。视频中不同场景的搜索和索引与其他形式(如文本数据)相比仍然远远落后。在视频字幕模型的帮助下，通过允许用户搜索视频中的特定场景/事件，可以显著提高视频文件的可访问性和索引要求。本文讨论了利用序列间递归神经网络解决这些问题的方法。本文还包括音频数据预处理和特征提取所涉及的不同技术、网络架构、误差随时间反向传播的CTC算法、序列到序列模型的合适评估指标以及这些模型在设计和训练阶段所涉及的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Subtitle Generation and Video Scene Indexing using Recurrent Neural Networks

Video Subtitles are not only an essential tool for the hearing impaired, but also enhance the user's viewing experience, as they allow users to better understand and interpret different accents, even if they are of the same familiar language. Automatic Speech Recognition Systems would also eradicate the strenuous mechanical process involved in creating subtitle files for movie videos. Searching and indexing of different scenes in a video is still far behind when compared to that available for other forms like text data. With the help of Video Captioning models, the accessibility and indexing requirements of video files can be significantly improved by allowing the users to search for a particular scene/event in a video. This paper discusses about the solution offered to these requirements with the help of sequence-to-sequence recurrent neural networks. The paper also includes the different techniques involved in preprocessing the audio data and extracting features from them, the network architectures, CTC algorithm for backpropagation of error through time, suitable evaluation metrics for Sequence-to-Sequence models and the challenges involved during the designing and training phase of such models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)

自引率

0.00%

发文量