Developing Big Data Curriculum with Open Source Infrastructure (Abstract Only)

Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education Pub Date : 2017-03-08 DOI:10.1145/3017680.3022386

Anurag Nagar

{"title":"Developing Big Data Curriculum with Open Source Infrastructure (Abstract Only)","authors":"Anurag Nagar","doi":"10.1145/3017680.3022386","DOIUrl":null,"url":null,"abstract":"This lightning talk will focus on our experience of developing and managing large undergraduate and graduate Big Data courses. The demand for trained professionals in the field of Big Data technologies is huge, and there is urgent need to develop and update courses in this area. One of the biggest hurdles for many schools is establishment, maintenance, and constant update of high performance computing infrastructure. Further, the technology landscape for Big Data is constantly evolving, and newer technologies, such as Apache Spark, require significant expenditure to set up and upgrade at the cluster level. Traditional infrastructure at most higher educational institutions is insufficient for this, and is also not able to scale up to meet the expectations of large class sizes and multiple simultaneous sessions. In this lightening talk, we will share our experience of running large undergraduate and graduate Big Data courses using open source infrastructure. Some of this infrastructure is cloud based, while others require students to create virtualized environment on their personal computers. Both types of resources are freely available, easy to setup, and provide students with enough computational power to run most academic tasks and projects. We will provide specific examples of using such technologies for common tasks, such as setting up a distributed file system, running MapReduce algorithms on large datasets, performing large scale machine learning and graph mining using Apache Spark, and maintaining a high availability Cassandra instance.","PeriodicalId":344382,"journal":{"name":"Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3017680.3022386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

This lightning talk will focus on our experience of developing and managing large undergraduate and graduate Big Data courses. The demand for trained professionals in the field of Big Data technologies is huge, and there is urgent need to develop and update courses in this area. One of the biggest hurdles for many schools is establishment, maintenance, and constant update of high performance computing infrastructure. Further, the technology landscape for Big Data is constantly evolving, and newer technologies, such as Apache Spark, require significant expenditure to set up and upgrade at the cluster level. Traditional infrastructure at most higher educational institutions is insufficient for this, and is also not able to scale up to meet the expectations of large class sizes and multiple simultaneous sessions. In this lightening talk, we will share our experience of running large undergraduate and graduate Big Data courses using open source infrastructure. Some of this infrastructure is cloud based, while others require students to create virtualized environment on their personal computers. Both types of resources are freely available, easy to setup, and provide students with enough computational power to run most academic tasks and projects. We will provide specific examples of using such technologies for common tasks, such as setting up a distributed file system, running MapReduce algorithms on large datasets, performing large scale machine learning and graph mining using Apache Spark, and maintaining a high availability Cassandra instance.

查看原文本刊更多论文

利用开源基础设施开发大数据课程(仅摘要)

这次闪电演讲将重点介绍我们开发和管理大型本科生和研究生大数据课程的经验。大数据技术领域对训练有素的专业人员的需求巨大，迫切需要开发和更新该领域的课程。对于许多学校来说，最大的障碍之一是建立、维护和不断更新高性能计算基础设施。此外，大数据的技术环境也在不断发展，像Apache Spark这样的新技术需要大量的支出来设置和升级集群级别。大多数高等教育机构的传统基础设施不足以满足这一需求，也无法扩大规模以满足大班规模和多堂同时授课的期望。在这个简短的演讲中，我们将分享我们使用开源基础设施运行大型本科生和研究生大数据课程的经验。其中一些基础设施是基于云的，而另一些则要求学生在他们的个人计算机上创建虚拟化环境。这两种类型的资源都是免费的，易于设置，并为学生提供足够的计算能力来运行大多数学术任务和项目。我们将提供将这些技术用于常见任务的具体示例，例如设置分布式文件系统，在大型数据集上运行MapReduce算法，使用Apache Spark执行大规模机器学习和图挖掘，以及维护高可用性Cassandra实例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education

自引率

0.00%

发文量