Houssem Chatbri, Marlon Oliveira, Kevin McGuinness, S. Little, K. Kameyama, P. Kwan, Alistair Sutherland, N. O’Connor
{"title":"教育视频分类利用文本图像变换和监督学习","authors":"Houssem Chatbri, Marlon Oliveira, Kevin McGuinness, S. Little, K. Kameyama, P. Kwan, Alistair Sutherland, N. O’Connor","doi":"10.1109/IPTA.2017.8853988","DOIUrl":null,"url":null,"abstract":"In this work, we present a method for automatic topic classification of educational videos using a speech transcript transform. Our method works as follows: First, speech recognition is used to generate video transcripts. Then, the transcripts are converted into images using a statistical cooccurrence transformation that we designed. Finally, a classifier is used to produce video category labels for a transcript image input. For our classifiers, we report results using a convolutional neural network (CNN) and a principal component analysis (PCA) model. In order to evaluate our method, we used the Khan Academy on a Stick dataset that contains 2,545 videos, where each video is labeled with one or two of 13 categories. Experiments show that our method is effective and strongly competitive against other supervised learning-based methods.","PeriodicalId":316356,"journal":{"name":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Educational video classification by using a transcript to image transform and supervised learning\",\"authors\":\"Houssem Chatbri, Marlon Oliveira, Kevin McGuinness, S. Little, K. Kameyama, P. Kwan, Alistair Sutherland, N. O’Connor\",\"doi\":\"10.1109/IPTA.2017.8853988\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we present a method for automatic topic classification of educational videos using a speech transcript transform. Our method works as follows: First, speech recognition is used to generate video transcripts. Then, the transcripts are converted into images using a statistical cooccurrence transformation that we designed. Finally, a classifier is used to produce video category labels for a transcript image input. For our classifiers, we report results using a convolutional neural network (CNN) and a principal component analysis (PCA) model. In order to evaluate our method, we used the Khan Academy on a Stick dataset that contains 2,545 videos, where each video is labeled with one or two of 13 categories. Experiments show that our method is effective and strongly competitive against other supervised learning-based methods.\",\"PeriodicalId\":316356,\"journal\":{\"name\":\"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPTA.2017.8853988\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPTA.2017.8853988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Educational video classification by using a transcript to image transform and supervised learning
In this work, we present a method for automatic topic classification of educational videos using a speech transcript transform. Our method works as follows: First, speech recognition is used to generate video transcripts. Then, the transcripts are converted into images using a statistical cooccurrence transformation that we designed. Finally, a classifier is used to produce video category labels for a transcript image input. For our classifiers, we report results using a convolutional neural network (CNN) and a principal component analysis (PCA) model. In order to evaluate our method, we used the Khan Academy on a Stick dataset that contains 2,545 videos, where each video is labeled with one or two of 13 categories. Experiments show that our method is effective and strongly competitive against other supervised learning-based methods.