Chen Zhan, Oscar Blessed Deho, Xuwei Zhang, Srécko Joksimovíc, M. de Laat
{"title":"为学习分析服务的学生数据合成数据生成器:比较研究","authors":"Chen Zhan, Oscar Blessed Deho, Xuwei Zhang, Srécko Joksimovíc, M. de Laat","doi":"10.59453//khzw9006","DOIUrl":null,"url":null,"abstract":"The ongoing digital transformation in the education sector has led to an increased focus on the adoption of Learning Analytics (LA) techniques. LA collects and uses students’ data to gain insights about students’ learning and guide interventions and feedback. Despite a great potential for improving teaching and learning, the use of LA has also raised important questions about the privacy and ethical implications of collecting and using student data. Despite recent efforts to tackle these challenges through the implementation of privacy-preserving approaches and the proposal of ethical guidelines and policies, there remains an insufficiency in ensuring the full protection of student privacy and well-being. Therefore, as a solution to privacy and ethical concerns in LA, there is a high demand for synthetic data generators that can learn from realistic data to generate synthetic data that closely resembles the original data. This paper aims to examine existing synthetic data generators from the broader community in terms of their performances with student data, as well as the capabilities of serving LA models. A comparative study is conducted by applying a set of different synthetic data generators in Synthetic Data Vault (SDV), an open-sourced synthetic data generation ecosystem of libraries, to real-world student data from a university. We report the efficiencies of different generators and the qualities of generated synthetic datasets regarding their statistical properties against realistic data. Furthermore, we test the compatibility between synthetic data generators and LA models by fitting generated synthetic datasets into common-used LA models. By aligning with the ground truth (realistic data), we evaluated the performances of LA models trained by synthetic datasets as indicators of their capability of serving LA models.","PeriodicalId":430337,"journal":{"name":"Journal of Learning Letters","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synthetic Data Generator for Student Data Serving Learning Analytics: A Comparative Study\",\"authors\":\"Chen Zhan, Oscar Blessed Deho, Xuwei Zhang, Srécko Joksimovíc, M. de Laat\",\"doi\":\"10.59453//khzw9006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ongoing digital transformation in the education sector has led to an increased focus on the adoption of Learning Analytics (LA) techniques. LA collects and uses students’ data to gain insights about students’ learning and guide interventions and feedback. Despite a great potential for improving teaching and learning, the use of LA has also raised important questions about the privacy and ethical implications of collecting and using student data. Despite recent efforts to tackle these challenges through the implementation of privacy-preserving approaches and the proposal of ethical guidelines and policies, there remains an insufficiency in ensuring the full protection of student privacy and well-being. Therefore, as a solution to privacy and ethical concerns in LA, there is a high demand for synthetic data generators that can learn from realistic data to generate synthetic data that closely resembles the original data. This paper aims to examine existing synthetic data generators from the broader community in terms of their performances with student data, as well as the capabilities of serving LA models. A comparative study is conducted by applying a set of different synthetic data generators in Synthetic Data Vault (SDV), an open-sourced synthetic data generation ecosystem of libraries, to real-world student data from a university. We report the efficiencies of different generators and the qualities of generated synthetic datasets regarding their statistical properties against realistic data. Furthermore, we test the compatibility between synthetic data generators and LA models by fitting generated synthetic datasets into common-used LA models. By aligning with the ground truth (realistic data), we evaluated the performances of LA models trained by synthetic datasets as indicators of their capability of serving LA models.\",\"PeriodicalId\":430337,\"journal\":{\"name\":\"Journal of Learning Letters\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Learning Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.59453//khzw9006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Learning Letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59453//khzw9006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Synthetic Data Generator for Student Data Serving Learning Analytics: A Comparative Study
The ongoing digital transformation in the education sector has led to an increased focus on the adoption of Learning Analytics (LA) techniques. LA collects and uses students’ data to gain insights about students’ learning and guide interventions and feedback. Despite a great potential for improving teaching and learning, the use of LA has also raised important questions about the privacy and ethical implications of collecting and using student data. Despite recent efforts to tackle these challenges through the implementation of privacy-preserving approaches and the proposal of ethical guidelines and policies, there remains an insufficiency in ensuring the full protection of student privacy and well-being. Therefore, as a solution to privacy and ethical concerns in LA, there is a high demand for synthetic data generators that can learn from realistic data to generate synthetic data that closely resembles the original data. This paper aims to examine existing synthetic data generators from the broader community in terms of their performances with student data, as well as the capabilities of serving LA models. A comparative study is conducted by applying a set of different synthetic data generators in Synthetic Data Vault (SDV), an open-sourced synthetic data generation ecosystem of libraries, to real-world student data from a university. We report the efficiencies of different generators and the qualities of generated synthetic datasets regarding their statistical properties against realistic data. Furthermore, we test the compatibility between synthetic data generators and LA models by fitting generated synthetic datasets into common-used LA models. By aligning with the ground truth (realistic data), we evaluated the performances of LA models trained by synthetic datasets as indicators of their capability of serving LA models.