{"title":"压力测试:为数据科学教育实践寻找合适的数据大小","authors":"Yong Zheng, Arnold Liu, Shuai Zheng","doi":"10.1145/3537674.3554748","DOIUrl":null,"url":null,"abstract":"Data science, such as data analytics, data mining, machine learning, became one popular curriculum in information technology educations. The lectures on these topics cannot stand alone without coding practice on real-world data sets. Some instructors prefer to utilize small data sets for practice in classroom or assignments, which limits experimental experiences and may even bring misleading experiences to students. Others may try to assign large data sets to students, but students may not be able to bear with the running time due to the efficiency issue raised by several factors (e.g., data size, algorithm complexity, computing power, etc.). In this paper, we first learned students’ preferences on the scalability of data sets for practice in data science courses, and performed experimental analysis by running different data science algorithms over both student laptops and personal/office computers, in order to deliver a suggestion about the appropriate data size for practice in multiple scenarios (e.g., in-class practice, assignments, class projects, research projects, etc.). We believe that our findings are valuable to help instructors prepare and assign real-world data sets to students in data science curriculum.","PeriodicalId":201428,"journal":{"name":"Proceedings of the 23rd Annual Conference on Information Technology Education","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pressure Test: Finding Appropriate Data Size for Practice in Data Science Education\",\"authors\":\"Yong Zheng, Arnold Liu, Shuai Zheng\",\"doi\":\"10.1145/3537674.3554748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data science, such as data analytics, data mining, machine learning, became one popular curriculum in information technology educations. The lectures on these topics cannot stand alone without coding practice on real-world data sets. Some instructors prefer to utilize small data sets for practice in classroom or assignments, which limits experimental experiences and may even bring misleading experiences to students. Others may try to assign large data sets to students, but students may not be able to bear with the running time due to the efficiency issue raised by several factors (e.g., data size, algorithm complexity, computing power, etc.). In this paper, we first learned students’ preferences on the scalability of data sets for practice in data science courses, and performed experimental analysis by running different data science algorithms over both student laptops and personal/office computers, in order to deliver a suggestion about the appropriate data size for practice in multiple scenarios (e.g., in-class practice, assignments, class projects, research projects, etc.). We believe that our findings are valuable to help instructors prepare and assign real-world data sets to students in data science curriculum.\",\"PeriodicalId\":201428,\"journal\":{\"name\":\"Proceedings of the 23rd Annual Conference on Information Technology Education\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23rd Annual Conference on Information Technology Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3537674.3554748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd Annual Conference on Information Technology Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3537674.3554748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Pressure Test: Finding Appropriate Data Size for Practice in Data Science Education
Data science, such as data analytics, data mining, machine learning, became one popular curriculum in information technology educations. The lectures on these topics cannot stand alone without coding practice on real-world data sets. Some instructors prefer to utilize small data sets for practice in classroom or assignments, which limits experimental experiences and may even bring misleading experiences to students. Others may try to assign large data sets to students, but students may not be able to bear with the running time due to the efficiency issue raised by several factors (e.g., data size, algorithm complexity, computing power, etc.). In this paper, we first learned students’ preferences on the scalability of data sets for practice in data science courses, and performed experimental analysis by running different data science algorithms over both student laptops and personal/office computers, in order to deliver a suggestion about the appropriate data size for practice in multiple scenarios (e.g., in-class practice, assignments, class projects, research projects, etc.). We believe that our findings are valuable to help instructors prepare and assign real-world data sets to students in data science curriculum.