{"title":"Automatic Subtitle Generation for Videos","authors":"Aditya Ramani, A. Rao, V. Vidya, V. B. Prasad","doi":"10.1109/ICACCS48705.2020.9074180","DOIUrl":null,"url":null,"abstract":"Subtitles are content gotten from either a transcript or screenplay of the discourse or critique in movies, TV programs, computer games. In a lion's share of cases inside a video, the sound holds a critical spot. At present, to incorporate subtitles into the videos, we have to download them from a third-party source and manually add it to our media player of choice. This poses a few problems: The video content we are trying to transcribe may not always have subtitles readily available for it and an active internet connection would be required to even be able to search for such a resource before we can consume it. We aim to try to solve these problems by performing a comparative analysis of various speech recognition engines in a real-time use case, i.e. generating subtitles for a video being played on a media player in an offline environment. Among the speech recognition engines that were compared, DeepSpeech obtained the lower Word Error Rate (WER) of twenty-six percentage, but when we consider system resource usage as well, CMU Sphinx proves to be a better overall engine for the given use case.","PeriodicalId":439003,"journal":{"name":"2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACCS48705.2020.9074180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Subtitles are content gotten from either a transcript or screenplay of the discourse or critique in movies, TV programs, computer games. In a lion's share of cases inside a video, the sound holds a critical spot. At present, to incorporate subtitles into the videos, we have to download them from a third-party source and manually add it to our media player of choice. This poses a few problems: The video content we are trying to transcribe may not always have subtitles readily available for it and an active internet connection would be required to even be able to search for such a resource before we can consume it. We aim to try to solve these problems by performing a comparative analysis of various speech recognition engines in a real-time use case, i.e. generating subtitles for a video being played on a media player in an offline environment. Among the speech recognition engines that were compared, DeepSpeech obtained the lower Word Error Rate (WER) of twenty-six percentage, but when we consider system resource usage as well, CMU Sphinx proves to be a better overall engine for the given use case.