{"title":"视频字幕主题建模的探索性分析","authors":"Atmik Ajoy, Chethan U Mahindrakar, H. Mamatha","doi":"10.1145/3483845.3483878","DOIUrl":null,"url":null,"abstract":"In this paper, we explore different models available to perform topic modelling on subtitles files. Subtitle files are sourced from movies and represent the dialogue being spoken. Applying this to topic modelling would mean trying to obtain the topics regarding the video from only the subtitles. Our novel idea is to test whether it would be feasible to use topic modelling on subtitles to get topics of a movie. While topic modelling as an idea has been used previously in bio-informatics,patent indexing and much more, has not seen any application in this sphere. We extensively search for datasets, preprocess the subtitles files and try Latent Dirichlet Allocation, Hierarchical Dirichlet Processes and Latent Semantic Indexing methods of topic modelling on these documents. These are the top three prominent topic modelling models that are used today. Our results entail what model would work best for subtitle files","PeriodicalId":134636,"journal":{"name":"Proceedings of the 2021 2nd International Conference on Control, Robotics and Intelligent System","volume":"194 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploratory Analysis on Topic Modelling for Video Subtitles\",\"authors\":\"Atmik Ajoy, Chethan U Mahindrakar, H. Mamatha\",\"doi\":\"10.1145/3483845.3483878\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we explore different models available to perform topic modelling on subtitles files. Subtitle files are sourced from movies and represent the dialogue being spoken. Applying this to topic modelling would mean trying to obtain the topics regarding the video from only the subtitles. Our novel idea is to test whether it would be feasible to use topic modelling on subtitles to get topics of a movie. While topic modelling as an idea has been used previously in bio-informatics,patent indexing and much more, has not seen any application in this sphere. We extensively search for datasets, preprocess the subtitles files and try Latent Dirichlet Allocation, Hierarchical Dirichlet Processes and Latent Semantic Indexing methods of topic modelling on these documents. These are the top three prominent topic modelling models that are used today. Our results entail what model would work best for subtitle files\",\"PeriodicalId\":134636,\"journal\":{\"name\":\"Proceedings of the 2021 2nd International Conference on Control, Robotics and Intelligent System\",\"volume\":\"194 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 2nd International Conference on Control, Robotics and Intelligent System\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3483845.3483878\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 2nd International Conference on Control, Robotics and Intelligent System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3483845.3483878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploratory Analysis on Topic Modelling for Video Subtitles
In this paper, we explore different models available to perform topic modelling on subtitles files. Subtitle files are sourced from movies and represent the dialogue being spoken. Applying this to topic modelling would mean trying to obtain the topics regarding the video from only the subtitles. Our novel idea is to test whether it would be feasible to use topic modelling on subtitles to get topics of a movie. While topic modelling as an idea has been used previously in bio-informatics,patent indexing and much more, has not seen any application in this sphere. We extensively search for datasets, preprocess the subtitles files and try Latent Dirichlet Allocation, Hierarchical Dirichlet Processes and Latent Semantic Indexing methods of topic modelling on these documents. These are the top three prominent topic modelling models that are used today. Our results entail what model would work best for subtitle files