mooc俄语评论分析数据集(摘自Stepik)

IF 0.5 Q4 EDUCATION, SCIENTIFIC DISCIPLINES

Voprosy Obrazovaniya-Educational Studies Moscow Pub Date : 2022-01-01 DOI:10.17323/1814-9545-2022-4-298-321

Y. Dyulicheva

{"title":"mooc俄语评论分析数据集(摘自Stepik)","authors":"Y. Dyulicheva","doi":"10.17323/1814-9545-2022-4-298-321","DOIUrl":null,"url":null,"abstract":"The article provides an overview of datasets and research areas in the field of educational data analysis based on natural language processing methods. The overview demonstrates the lack of datasets for the analysis of Russian-language reviews on MOOCs. Based on the scraping of reviews from the Stepik platform, a dataset of 5721 Russian-language reviews for MOOCs in mathematics, programming, biology, chemistry and physics was formed. A study of Russian-language reviews from the dataset was carried out based on descriptive statistics, frequency analysis of unigrams and bigrams, sentiment analysis using the dostoevsky python library with weighted F1-score for estimation accuracy of classification by sentiment as 74%. The descriptive characteristics of courses with respect to sentiments were detected based on unigrams analysis, the description of different aspects of learning content and difficulties encountered by students in learning MOOCs were detected based on bigrams analysis. The results of the sentiment analysis demonstrate the predominance of positive and neutral reviews of MOOCs in the studied dataset. The dataset is placed in the public domain Mendeley Data and will be useful to specialists in the field of text data analysis and the development of learning analytics tools.","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"26 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dataset for Analysis of Russian-Language Reviews on MOOCs Extracted from Stepik\",\"authors\":\"Y. Dyulicheva\",\"doi\":\"10.17323/1814-9545-2022-4-298-321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article provides an overview of datasets and research areas in the field of educational data analysis based on natural language processing methods. The overview demonstrates the lack of datasets for the analysis of Russian-language reviews on MOOCs. Based on the scraping of reviews from the Stepik platform, a dataset of 5721 Russian-language reviews for MOOCs in mathematics, programming, biology, chemistry and physics was formed. A study of Russian-language reviews from the dataset was carried out based on descriptive statistics, frequency analysis of unigrams and bigrams, sentiment analysis using the dostoevsky python library with weighted F1-score for estimation accuracy of classification by sentiment as 74%. The descriptive characteristics of courses with respect to sentiments were detected based on unigrams analysis, the description of different aspects of learning content and difficulties encountered by students in learning MOOCs were detected based on bigrams analysis. The results of the sentiment analysis demonstrate the predominance of positive and neutral reviews of MOOCs in the studied dataset. The dataset is placed in the public domain Mendeley Data and will be useful to specialists in the field of text data analysis and the development of learning analytics tools.\",\"PeriodicalId\":54119,\"journal\":{\"name\":\"Voprosy Obrazovaniya-Educational Studies Moscow\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Voprosy Obrazovaniya-Educational Studies Moscow\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17323/1814-9545-2022-4-298-321\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Voprosy Obrazovaniya-Educational Studies Moscow","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17323/1814-9545-2022-4-298-321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 1

摘要

本文概述了基于自然语言处理方法的教育数据分析领域的数据集和研究领域。概述表明缺乏用于分析mooc上俄语评论的数据集。基于从Stepik平台上收集的评论，形成了一个5721篇俄语评论的数据集，这些评论适用于数学、编程、生物、化学和物理等mooc课程。基于描述性统计、单格和双格的频率分析、使用陀思妥耶夫斯基python库的情感分析，对数据集中的俄语评论进行了研究，加权f1得分估计情感分类的准确率为74%。基于双图分析检测课程在情感方面的描述性特征，基于双图分析检测学生在mooc学习中遇到的学习内容和困难的不同方面的描述。情感分析的结果表明，在所研究的数据集中，对mooc的正面和中性评论占主导地位。该数据集位于Mendeley Data的公共领域，对文本数据分析领域的专家和学习分析工具的开发非常有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dataset for Analysis of Russian-Language Reviews on MOOCs Extracted from Stepik

The article provides an overview of datasets and research areas in the field of educational data analysis based on natural language processing methods. The overview demonstrates the lack of datasets for the analysis of Russian-language reviews on MOOCs. Based on the scraping of reviews from the Stepik platform, a dataset of 5721 Russian-language reviews for MOOCs in mathematics, programming, biology, chemistry and physics was formed. A study of Russian-language reviews from the dataset was carried out based on descriptive statistics, frequency analysis of unigrams and bigrams, sentiment analysis using the dostoevsky python library with weighted F1-score for estimation accuracy of classification by sentiment as 74%. The descriptive characteristics of courses with respect to sentiments were detected based on unigrams analysis, the description of different aspects of learning content and difficulties encountered by students in learning MOOCs were detected based on bigrams analysis. The results of the sentiment analysis demonstrate the predominance of positive and neutral reviews of MOOCs in the studied dataset. The dataset is placed in the public domain Mendeley Data and will be useful to specialists in the field of text data analysis and the development of learning analytics tools.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Voprosy Obrazovaniya-Educational Studies Moscow EDUCATION, SCIENTIFIC DISCIPLINES-

CiteScore

2.20

自引率

42.90%

发文量