Anastasios Ntourmas, Y. Dimitriadis, S. Daskalaki, N. Avouris
{"title":"Classification of Discussions in MOOC Forums: An Incremental Modeling Approach","authors":"Anastasios Ntourmas, Y. Dimitriadis, S. Daskalaki, N. Avouris","doi":"10.1145/3430895.3460137","DOIUrl":null,"url":null,"abstract":"Supervised classification models are commonly used for classifying discussions in a MOOC forum. In most cases these models require a tedious process for manual labeling the forum messages as training data. So, new methods are needed to reduce the human effort necessary for the preparation of such training datasets. In this study we follow an incremental approach in order to examine how soon after the beginning of a new course, we have collected enough data for training a supervised classification model. We show that by employing features that derive from a seeded topic modeling method, we achieve classifiers with reliable performance early enough in the course life, thus reducing significantly the human effort. The content of the MOOC platform is used to bias the topic extraction towards discussions related to (a) course content, (b) logistics, or (c) social interactions. Then, we develop a supervised model at the start of each week based on the topic features of all previous weeks and evaluate its performance in classifying the discussions for the rest of the course. Our approach was implemented in three different MOOCs of different subjects and different sizes. The findings reveal that supervised models are able to perform reliably quite early in a MOOC's life and retain a steady overall accuracy across the remaining weeks, without requiring to be trained with the entire forum dataset.","PeriodicalId":125581,"journal":{"name":"Proceedings of the Eighth ACM Conference on Learning @ Scale","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth ACM Conference on Learning @ Scale","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3430895.3460137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Supervised classification models are commonly used for classifying discussions in a MOOC forum. In most cases these models require a tedious process for manual labeling the forum messages as training data. So, new methods are needed to reduce the human effort necessary for the preparation of such training datasets. In this study we follow an incremental approach in order to examine how soon after the beginning of a new course, we have collected enough data for training a supervised classification model. We show that by employing features that derive from a seeded topic modeling method, we achieve classifiers with reliable performance early enough in the course life, thus reducing significantly the human effort. The content of the MOOC platform is used to bias the topic extraction towards discussions related to (a) course content, (b) logistics, or (c) social interactions. Then, we develop a supervised model at the start of each week based on the topic features of all previous weeks and evaluate its performance in classifying the discussions for the rest of the course. Our approach was implemented in three different MOOCs of different subjects and different sizes. The findings reveal that supervised models are able to perform reliably quite early in a MOOC's life and retain a steady overall accuracy across the remaining weeks, without requiring to be trained with the entire forum dataset.