Sajjan Kiran, Umesh Patil, P. S. Shankar, P. Ghuli
{"title":"基于循环神经网络的字幕生成和视频场景索引","authors":"Sajjan Kiran, Umesh Patil, P. S. Shankar, P. Ghuli","doi":"10.1109/ICIRCA51532.2021.9544837","DOIUrl":null,"url":null,"abstract":"Video Subtitles are not only an essential tool for the hearing impaired, but also enhance the user's viewing experience, as they allow users to better understand and interpret different accents, even if they are of the same familiar language. Automatic Speech Recognition Systems would also eradicate the strenuous mechanical process involved in creating subtitle files for movie videos. Searching and indexing of different scenes in a video is still far behind when compared to that available for other forms like text data. With the help of Video Captioning models, the accessibility and indexing requirements of video files can be significantly improved by allowing the users to search for a particular scene/event in a video. This paper discusses about the solution offered to these requirements with the help of sequence-to-sequence recurrent neural networks. The paper also includes the different techniques involved in preprocessing the audio data and extracting features from them, the network architectures, CTC algorithm for backpropagation of error through time, suitable evaluation metrics for Sequence-to-Sequence models and the challenges involved during the designing and training phase of such models.","PeriodicalId":245244,"journal":{"name":"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Subtitle Generation and Video Scene Indexing using Recurrent Neural Networks\",\"authors\":\"Sajjan Kiran, Umesh Patil, P. S. Shankar, P. Ghuli\",\"doi\":\"10.1109/ICIRCA51532.2021.9544837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video Subtitles are not only an essential tool for the hearing impaired, but also enhance the user's viewing experience, as they allow users to better understand and interpret different accents, even if they are of the same familiar language. Automatic Speech Recognition Systems would also eradicate the strenuous mechanical process involved in creating subtitle files for movie videos. Searching and indexing of different scenes in a video is still far behind when compared to that available for other forms like text data. With the help of Video Captioning models, the accessibility and indexing requirements of video files can be significantly improved by allowing the users to search for a particular scene/event in a video. This paper discusses about the solution offered to these requirements with the help of sequence-to-sequence recurrent neural networks. The paper also includes the different techniques involved in preprocessing the audio data and extracting features from them, the network architectures, CTC algorithm for backpropagation of error through time, suitable evaluation metrics for Sequence-to-Sequence models and the challenges involved during the designing and training phase of such models.\",\"PeriodicalId\":245244,\"journal\":{\"name\":\"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIRCA51532.2021.9544837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIRCA51532.2021.9544837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Subtitle Generation and Video Scene Indexing using Recurrent Neural Networks
Video Subtitles are not only an essential tool for the hearing impaired, but also enhance the user's viewing experience, as they allow users to better understand and interpret different accents, even if they are of the same familiar language. Automatic Speech Recognition Systems would also eradicate the strenuous mechanical process involved in creating subtitle files for movie videos. Searching and indexing of different scenes in a video is still far behind when compared to that available for other forms like text data. With the help of Video Captioning models, the accessibility and indexing requirements of video files can be significantly improved by allowing the users to search for a particular scene/event in a video. This paper discusses about the solution offered to these requirements with the help of sequence-to-sequence recurrent neural networks. The paper also includes the different techniques involved in preprocessing the audio data and extracting features from them, the network architectures, CTC algorithm for backpropagation of error through time, suitable evaluation metrics for Sequence-to-Sequence models and the challenges involved during the designing and training phase of such models.