Pressure Test: Finding Appropriate Data Size for Practice in Data Science Education

Yong Zheng, Arnold Liu, Shuai Zheng
{"title":"Pressure Test: Finding Appropriate Data Size for Practice in Data Science Education","authors":"Yong Zheng, Arnold Liu, Shuai Zheng","doi":"10.1145/3537674.3554748","DOIUrl":null,"url":null,"abstract":"Data science, such as data analytics, data mining, machine learning, became one popular curriculum in information technology educations. The lectures on these topics cannot stand alone without coding practice on real-world data sets. Some instructors prefer to utilize small data sets for practice in classroom or assignments, which limits experimental experiences and may even bring misleading experiences to students. Others may try to assign large data sets to students, but students may not be able to bear with the running time due to the efficiency issue raised by several factors (e.g., data size, algorithm complexity, computing power, etc.). In this paper, we first learned students’ preferences on the scalability of data sets for practice in data science courses, and performed experimental analysis by running different data science algorithms over both student laptops and personal/office computers, in order to deliver a suggestion about the appropriate data size for practice in multiple scenarios (e.g., in-class practice, assignments, class projects, research projects, etc.). We believe that our findings are valuable to help instructors prepare and assign real-world data sets to students in data science curriculum.","PeriodicalId":201428,"journal":{"name":"Proceedings of the 23rd Annual Conference on Information Technology Education","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd Annual Conference on Information Technology Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3537674.3554748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data science, such as data analytics, data mining, machine learning, became one popular curriculum in information technology educations. The lectures on these topics cannot stand alone without coding practice on real-world data sets. Some instructors prefer to utilize small data sets for practice in classroom or assignments, which limits experimental experiences and may even bring misleading experiences to students. Others may try to assign large data sets to students, but students may not be able to bear with the running time due to the efficiency issue raised by several factors (e.g., data size, algorithm complexity, computing power, etc.). In this paper, we first learned students’ preferences on the scalability of data sets for practice in data science courses, and performed experimental analysis by running different data science algorithms over both student laptops and personal/office computers, in order to deliver a suggestion about the appropriate data size for practice in multiple scenarios (e.g., in-class practice, assignments, class projects, research projects, etc.). We believe that our findings are valuable to help instructors prepare and assign real-world data sets to students in data science curriculum.
压力测试:为数据科学教育实践寻找合适的数据大小
数据科学,如数据分析、数据挖掘、机器学习,成为信息技术教育的热门课程。关于这些主题的讲座离不开对真实世界数据集的编码实践。有些教师更喜欢在课堂或作业中使用小数据集进行实践,这限制了实验经验,甚至可能给学生带来误导性的经验。其他人可能会尝试将大型数据集分配给学生,但由于几个因素(例如,数据大小,算法复杂性,计算能力等)引起的效率问题,学生可能无法承受运行时间。在本文中,我们首先了解了学生对数据科学课程实践中数据集可扩展性的偏好,并通过在学生笔记本电脑和个人/办公电脑上运行不同的数据科学算法进行了实验分析,从而为多种场景(例如课堂实践、作业、课堂项目、研究项目等)的实践提供合适的数据大小建议。我们相信,我们的发现对于帮助教师在数据科学课程中准备和分配真实世界的数据集是有价值的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信