Classification of Discussions in MOOC Forums: An Incremental Modeling Approach

Proceedings of the Eighth ACM Conference on Learning @ Scale Pub Date : 2021-06-08 DOI:10.1145/3430895.3460137

Anastasios Ntourmas, Y. Dimitriadis, S. Daskalaki, N. Avouris

{"title":"Classification of Discussions in MOOC Forums: An Incremental Modeling Approach","authors":"Anastasios Ntourmas, Y. Dimitriadis, S. Daskalaki, N. Avouris","doi":"10.1145/3430895.3460137","DOIUrl":null,"url":null,"abstract":"Supervised classification models are commonly used for classifying discussions in a MOOC forum. In most cases these models require a tedious process for manual labeling the forum messages as training data. So, new methods are needed to reduce the human effort necessary for the preparation of such training datasets. In this study we follow an incremental approach in order to examine how soon after the beginning of a new course, we have collected enough data for training a supervised classification model. We show that by employing features that derive from a seeded topic modeling method, we achieve classifiers with reliable performance early enough in the course life, thus reducing significantly the human effort. The content of the MOOC platform is used to bias the topic extraction towards discussions related to (a) course content, (b) logistics, or (c) social interactions. Then, we develop a supervised model at the start of each week based on the topic features of all previous weeks and evaluate its performance in classifying the discussions for the rest of the course. Our approach was implemented in three different MOOCs of different subjects and different sizes. The findings reveal that supervised models are able to perform reliably quite early in a MOOC's life and retain a steady overall accuracy across the remaining weeks, without requiring to be trained with the entire forum dataset.","PeriodicalId":125581,"journal":{"name":"Proceedings of the Eighth ACM Conference on Learning @ Scale","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth ACM Conference on Learning @ Scale","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3430895.3460137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Supervised classification models are commonly used for classifying discussions in a MOOC forum. In most cases these models require a tedious process for manual labeling the forum messages as training data. So, new methods are needed to reduce the human effort necessary for the preparation of such training datasets. In this study we follow an incremental approach in order to examine how soon after the beginning of a new course, we have collected enough data for training a supervised classification model. We show that by employing features that derive from a seeded topic modeling method, we achieve classifiers with reliable performance early enough in the course life, thus reducing significantly the human effort. The content of the MOOC platform is used to bias the topic extraction towards discussions related to (a) course content, (b) logistics, or (c) social interactions. Then, we develop a supervised model at the start of each week based on the topic features of all previous weeks and evaluate its performance in classifying the discussions for the rest of the course. Our approach was implemented in three different MOOCs of different subjects and different sizes. The findings reveal that supervised models are able to perform reliably quite early in a MOOC's life and retain a steady overall accuracy across the remaining weeks, without requiring to be trained with the entire forum dataset.

查看原文本刊更多论文

MOOC论坛讨论分类:一种增量建模方法

监督分类模型通常用于对MOOC论坛中的讨论进行分类。在大多数情况下，这些模型需要一个繁琐的过程，手动将论坛消息标记为训练数据。因此，需要新的方法来减少准备这些训练数据集所需的人力。在这项研究中，我们采用增量方法来检查在新课程开始后多久，我们收集了足够的数据来训练监督分类模型。我们表明，通过使用源自种子主题建模方法的特征，我们可以在课程生命周期的早期实现具有可靠性能的分类器，从而显着减少了人类的工作量。MOOC平台的内容用于使主题提取偏向于与(a)课程内容，(b)物流或(c)社会互动相关的讨论。然后，我们在每周开始时根据前几周的主题特征开发一个监督模型，并评估其在对课程其余部分的讨论进行分类方面的表现。我们的方法在三个不同科目和不同规模的mooc中实施。研究结果表明，监督模型能够在MOOC的早期阶段可靠地运行，并在剩余的几周内保持稳定的整体准确性，而无需使用整个论坛数据集进行训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Eighth ACM Conference on Learning @ Scale

自引率

0.00%

发文量