为学习分析服务的学生数据合成数据生成器:比较研究

Journal of Learning Letters Pub Date : 1900-01-01 DOI:10.59453//khzw9006

Chen Zhan, Oscar Blessed Deho, Xuwei Zhang, Srécko Joksimovíc, M. de Laat

{"title":"为学习分析服务的学生数据合成数据生成器:比较研究","authors":"Chen Zhan, Oscar Blessed Deho, Xuwei Zhang, Srécko Joksimovíc, M. de Laat","doi":"10.59453//khzw9006","DOIUrl":null,"url":null,"abstract":"The ongoing digital transformation in the education sector has led to an increased focus on the adoption of Learning Analytics (LA) techniques. LA collects and uses students’ data to gain insights about students’ learning and guide interventions and feedback. Despite a great potential for improving teaching and learning, the use of LA has also raised important questions about the privacy and ethical implications of collecting and using student data. Despite recent efforts to tackle these challenges through the implementation of privacy-preserving approaches and the proposal of ethical guidelines and policies, there remains an insufficiency in ensuring the full protection of student privacy and well-being. Therefore, as a solution to privacy and ethical concerns in LA, there is a high demand for synthetic data generators that can learn from realistic data to generate synthetic data that closely resembles the original data. This paper aims to examine existing synthetic data generators from the broader community in terms of their performances with student data, as well as the capabilities of serving LA models. A comparative study is conducted by applying a set of different synthetic data generators in Synthetic Data Vault (SDV), an open-sourced synthetic data generation ecosystem of libraries, to real-world student data from a university. We report the efficiencies of different generators and the qualities of generated synthetic datasets regarding their statistical properties against realistic data. Furthermore, we test the compatibility between synthetic data generators and LA models by fitting generated synthetic datasets into common-used LA models. By aligning with the ground truth (realistic data), we evaluated the performances of LA models trained by synthetic datasets as indicators of their capability of serving LA models.","PeriodicalId":430337,"journal":{"name":"Journal of Learning Letters","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synthetic Data Generator for Student Data Serving Learning Analytics: A Comparative Study\",\"authors\":\"Chen Zhan, Oscar Blessed Deho, Xuwei Zhang, Srécko Joksimovíc, M. de Laat\",\"doi\":\"10.59453//khzw9006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ongoing digital transformation in the education sector has led to an increased focus on the adoption of Learning Analytics (LA) techniques. LA collects and uses students’ data to gain insights about students’ learning and guide interventions and feedback. Despite a great potential for improving teaching and learning, the use of LA has also raised important questions about the privacy and ethical implications of collecting and using student data. Despite recent efforts to tackle these challenges through the implementation of privacy-preserving approaches and the proposal of ethical guidelines and policies, there remains an insufficiency in ensuring the full protection of student privacy and well-being. Therefore, as a solution to privacy and ethical concerns in LA, there is a high demand for synthetic data generators that can learn from realistic data to generate synthetic data that closely resembles the original data. This paper aims to examine existing synthetic data generators from the broader community in terms of their performances with student data, as well as the capabilities of serving LA models. A comparative study is conducted by applying a set of different synthetic data generators in Synthetic Data Vault (SDV), an open-sourced synthetic data generation ecosystem of libraries, to real-world student data from a university. We report the efficiencies of different generators and the qualities of generated synthetic datasets regarding their statistical properties against realistic data. Furthermore, we test the compatibility between synthetic data generators and LA models by fitting generated synthetic datasets into common-used LA models. By aligning with the ground truth (realistic data), we evaluated the performances of LA models trained by synthetic datasets as indicators of their capability of serving LA models.\",\"PeriodicalId\":430337,\"journal\":{\"name\":\"Journal of Learning Letters\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Learning Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.59453//khzw9006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Learning Letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59453//khzw9006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

教育领域正在进行的数字化转型导致人们越来越关注学习分析(LA)技术的采用。LA收集和使用学生的数据来了解学生的学习情况，并指导干预和反馈。尽管在改善教学和学习方面有很大的潜力，但使用LA也引发了关于收集和使用学生数据的隐私和道德影响的重要问题。尽管最近通过实施保护隐私的方法和提出道德准则和政策来解决这些挑战，但在确保充分保护学生的隐私和福祉方面仍然存在不足。因此，作为洛杉矶隐私和伦理问题的解决方案，对合成数据生成器的需求很高，这些合成数据生成器可以从现实数据中学习，生成与原始数据非常相似的合成数据。本文旨在从更广泛的社区中考察现有的合成数据生成器在处理学生数据方面的表现，以及为LA模型提供服务的能力。通过将合成数据库(SDV)中一组不同的合成数据生成器(SDV是一个开源的图书馆合成数据生成生态系统)应用于来自一所大学的真实学生数据，进行了比较研究。我们报告了不同生成器的效率和生成的合成数据集的质量，关于它们对现实数据的统计特性。此外，我们通过将生成的合成数据集拟合到常用的LA模型中来测试合成数据生成器与LA模型之间的兼容性。通过与真实数据(真实数据)保持一致，我们评估了由合成数据集训练的LA模型的性能，作为其服务于LA模型的能力的指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Synthetic Data Generator for Student Data Serving Learning Analytics: A Comparative Study

The ongoing digital transformation in the education sector has led to an increased focus on the adoption of Learning Analytics (LA) techniques. LA collects and uses students’ data to gain insights about students’ learning and guide interventions and feedback. Despite a great potential for improving teaching and learning, the use of LA has also raised important questions about the privacy and ethical implications of collecting and using student data. Despite recent efforts to tackle these challenges through the implementation of privacy-preserving approaches and the proposal of ethical guidelines and policies, there remains an insufficiency in ensuring the full protection of student privacy and well-being. Therefore, as a solution to privacy and ethical concerns in LA, there is a high demand for synthetic data generators that can learn from realistic data to generate synthetic data that closely resembles the original data. This paper aims to examine existing synthetic data generators from the broader community in terms of their performances with student data, as well as the capabilities of serving LA models. A comparative study is conducted by applying a set of different synthetic data generators in Synthetic Data Vault (SDV), an open-sourced synthetic data generation ecosystem of libraries, to real-world student data from a university. We report the efficiencies of different generators and the qualities of generated synthetic datasets regarding their statistical properties against realistic data. Furthermore, we test the compatibility between synthetic data generators and LA models by fitting generated synthetic datasets into common-used LA models. By aligning with the ground truth (realistic data), we evaluated the performances of LA models trained by synthetic datasets as indicators of their capability of serving LA models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Learning Letters

自引率

0.00%

发文量