Video Based Transcript Summarizer for Online Courses using Natural Language Processing

Krishna R. Kulkarni, Rushikesh Padaki
{"title":"Video Based Transcript Summarizer for Online Courses using Natural Language Processing","authors":"Krishna R. Kulkarni, Rushikesh Padaki","doi":"10.1109/CSITSS54238.2021.9683609","DOIUrl":null,"url":null,"abstract":"Online education has become an effective way to deliver quality education to students. They have become more popular because of their high graphical and pictorial content, delivered by experts in the subjects and convenient for learning at anytime and anywhere. But sometimes, students may not be able to go through the course content due to shortage of time. Video transcript summarizer has got a lot scope in this situation. It highlights the important topics from the video. The idea of summarizing the videos can be extended to online courses videos. This will help students save a lot of time as they can understand the gist of the class within less time without actually watching the video and by just going through the summary. Our system focuses on the development of a module using Natural Language Processing with python to summarize an online class video. The methodology adopted in this project uses Natural Language Processing (NLP) algorithms such as Term Frequency-Inverse Document Frequency (TF-IDF) and Gensim to obtain the summary of video of online course. The model takes URL of a video from user as input. We have implemented summarization process with the help of two algorithms. TF-IDF is an information retrieval algorithm which uses frequency of a term and its inverse document frequency. Gensim is a NLP package that deals with topic modeling. The model also gives the flexibility to the user to decide on as to what percentage of summary is needed compared to the original lecture. The summarization technique is a subjective process. We have incorporated two prominent methods. One is cosine similarity and the other one is ROUGE score. The former does not require human generated summary for reference, whereas latter requires it. The efficiency obtained using Cosine similarity is greater than 90% in both the cases: TF-IDF and Gensim. The efficiency obtained in case of ROUGE score is in between 40-50%.","PeriodicalId":252628,"journal":{"name":"2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)","volume":"1097 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSITSS54238.2021.9683609","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Online education has become an effective way to deliver quality education to students. They have become more popular because of their high graphical and pictorial content, delivered by experts in the subjects and convenient for learning at anytime and anywhere. But sometimes, students may not be able to go through the course content due to shortage of time. Video transcript summarizer has got a lot scope in this situation. It highlights the important topics from the video. The idea of summarizing the videos can be extended to online courses videos. This will help students save a lot of time as they can understand the gist of the class within less time without actually watching the video and by just going through the summary. Our system focuses on the development of a module using Natural Language Processing with python to summarize an online class video. The methodology adopted in this project uses Natural Language Processing (NLP) algorithms such as Term Frequency-Inverse Document Frequency (TF-IDF) and Gensim to obtain the summary of video of online course. The model takes URL of a video from user as input. We have implemented summarization process with the help of two algorithms. TF-IDF is an information retrieval algorithm which uses frequency of a term and its inverse document frequency. Gensim is a NLP package that deals with topic modeling. The model also gives the flexibility to the user to decide on as to what percentage of summary is needed compared to the original lecture. The summarization technique is a subjective process. We have incorporated two prominent methods. One is cosine similarity and the other one is ROUGE score. The former does not require human generated summary for reference, whereas latter requires it. The efficiency obtained using Cosine similarity is greater than 90% in both the cases: TF-IDF and Gensim. The efficiency obtained in case of ROUGE score is in between 40-50%.
使用自然语言处理的在线课程基于视频的成绩单摘要器
网络教育已成为向学生提供优质教育的有效途径。由于其高图形化和图像化的内容,由学科专家提供,方便随时随地学习,它们变得越来越受欢迎。但有时,由于时间的原因,学生可能无法浏览课程内容。视频文字摘要在这种情况下发挥了很大的作用。它突出了视频中的重要主题。总结视频的想法可以扩展到在线课程视频。这将帮助学生节省很多时间,因为他们可以在更短的时间内理解课程的要点,而无需实际观看视频,只需浏览摘要。本系统重点开发了一个利用python语言进行自然语言处理的模块,用于在线课堂视频的总结。本项目采用的方法是使用Term Frequency- inverse Document Frequency (TF-IDF)和Gensim等自然语言处理(NLP)算法来获取在线课程视频摘要。该模型以用户视频的URL作为输入。我们利用两种算法实现了摘要过程。TF-IDF是一种利用词的频率及其逆文档频率的信息检索算法。Gensim是一个处理主题建模的NLP包。该模型还为用户提供了灵活性,可以决定与原始讲座相比,需要多少百分比的摘要。摘要技术是一个主观的过程。我们采用了两种突出的方法。一个是余弦相似度,另一个是ROUGE分数。前者不需要人工生成摘要作为参考,而后者需要人工生成摘要。在TF-IDF和Gensim两种情况下,使用余弦相似度获得的效率都大于90%。在ROUGE评分情况下获得的效率在40-50%之间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信