mooc俄语评论分析数据集(摘自Stepik)

IF 0.7 Q4 EDUCATION, SCIENTIFIC DISCIPLINES
Y. Dyulicheva
{"title":"mooc俄语评论分析数据集(摘自Stepik)","authors":"Y. Dyulicheva","doi":"10.17323/1814-9545-2022-4-298-321","DOIUrl":null,"url":null,"abstract":"The article provides an overview of datasets and research areas in the field of educational data analysis based on natural language processing methods. The overview demonstrates the lack of datasets for the analysis of Russian-language reviews on MOOCs. Based on the scraping of reviews from the Stepik platform, a dataset of 5721 Russian-language reviews for MOOCs in mathematics, programming, biology, chemistry and physics was formed. A study of Russian-language reviews from the dataset was carried out based on descriptive statistics, frequency analysis of unigrams and bigrams, sentiment analysis using the dostoevsky python library with weighted F1-score for estimation accuracy of classification by sentiment as 74%. The descriptive characteristics of courses with respect to sentiments were detected based on unigrams analysis, the description of different aspects of learning content and difficulties encountered by students in learning MOOCs were detected based on bigrams analysis. The results of the sentiment analysis demonstrate the predominance of positive and neutral reviews of MOOCs in the studied dataset. The dataset is placed in the public domain Mendeley Data and will be useful to specialists in the field of text data analysis and the development of learning analytics tools.","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"26 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dataset for Analysis of Russian-Language Reviews on MOOCs Extracted from Stepik\",\"authors\":\"Y. Dyulicheva\",\"doi\":\"10.17323/1814-9545-2022-4-298-321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article provides an overview of datasets and research areas in the field of educational data analysis based on natural language processing methods. The overview demonstrates the lack of datasets for the analysis of Russian-language reviews on MOOCs. Based on the scraping of reviews from the Stepik platform, a dataset of 5721 Russian-language reviews for MOOCs in mathematics, programming, biology, chemistry and physics was formed. A study of Russian-language reviews from the dataset was carried out based on descriptive statistics, frequency analysis of unigrams and bigrams, sentiment analysis using the dostoevsky python library with weighted F1-score for estimation accuracy of classification by sentiment as 74%. The descriptive characteristics of courses with respect to sentiments were detected based on unigrams analysis, the description of different aspects of learning content and difficulties encountered by students in learning MOOCs were detected based on bigrams analysis. The results of the sentiment analysis demonstrate the predominance of positive and neutral reviews of MOOCs in the studied dataset. The dataset is placed in the public domain Mendeley Data and will be useful to specialists in the field of text data analysis and the development of learning analytics tools.\",\"PeriodicalId\":54119,\"journal\":{\"name\":\"Voprosy Obrazovaniya-Educational Studies Moscow\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Voprosy Obrazovaniya-Educational Studies Moscow\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17323/1814-9545-2022-4-298-321\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Voprosy Obrazovaniya-Educational Studies Moscow","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17323/1814-9545-2022-4-298-321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 1

摘要

本文概述了基于自然语言处理方法的教育数据分析领域的数据集和研究领域。概述表明缺乏用于分析mooc上俄语评论的数据集。基于从Stepik平台上收集的评论,形成了一个5721篇俄语评论的数据集,这些评论适用于数学、编程、生物、化学和物理等mooc课程。基于描述性统计、单格和双格的频率分析、使用陀思妥耶夫斯基python库的情感分析,对数据集中的俄语评论进行了研究,加权f1得分估计情感分类的准确率为74%。基于双图分析检测课程在情感方面的描述性特征,基于双图分析检测学生在mooc学习中遇到的学习内容和困难的不同方面的描述。情感分析的结果表明,在所研究的数据集中,对mooc的正面和中性评论占主导地位。该数据集位于Mendeley Data的公共领域,对文本数据分析领域的专家和学习分析工具的开发非常有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Dataset for Analysis of Russian-Language Reviews on MOOCs Extracted from Stepik
The article provides an overview of datasets and research areas in the field of educational data analysis based on natural language processing methods. The overview demonstrates the lack of datasets for the analysis of Russian-language reviews on MOOCs. Based on the scraping of reviews from the Stepik platform, a dataset of 5721 Russian-language reviews for MOOCs in mathematics, programming, biology, chemistry and physics was formed. A study of Russian-language reviews from the dataset was carried out based on descriptive statistics, frequency analysis of unigrams and bigrams, sentiment analysis using the dostoevsky python library with weighted F1-score for estimation accuracy of classification by sentiment as 74%. The descriptive characteristics of courses with respect to sentiments were detected based on unigrams analysis, the description of different aspects of learning content and difficulties encountered by students in learning MOOCs were detected based on bigrams analysis. The results of the sentiment analysis demonstrate the predominance of positive and neutral reviews of MOOCs in the studied dataset. The dataset is placed in the public domain Mendeley Data and will be useful to specialists in the field of text data analysis and the development of learning analytics tools.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Voprosy Obrazovaniya-Educational Studies Moscow
Voprosy Obrazovaniya-Educational Studies Moscow EDUCATION, SCIENTIFIC DISCIPLINES-
CiteScore
2.20
自引率
42.90%
发文量
23
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信