{"title":"Developing Big Data Curriculum with Open Source Infrastructure (Abstract Only)","authors":"Anurag Nagar","doi":"10.1145/3017680.3022386","DOIUrl":null,"url":null,"abstract":"This lightning talk will focus on our experience of developing and managing large undergraduate and graduate Big Data courses. The demand for trained professionals in the field of Big Data technologies is huge, and there is urgent need to develop and update courses in this area. One of the biggest hurdles for many schools is establishment, maintenance, and constant update of high performance computing infrastructure. Further, the technology landscape for Big Data is constantly evolving, and newer technologies, such as Apache Spark, require significant expenditure to set up and upgrade at the cluster level. Traditional infrastructure at most higher educational institutions is insufficient for this, and is also not able to scale up to meet the expectations of large class sizes and multiple simultaneous sessions. In this lightening talk, we will share our experience of running large undergraduate and graduate Big Data courses using open source infrastructure. Some of this infrastructure is cloud based, while others require students to create virtualized environment on their personal computers. Both types of resources are freely available, easy to setup, and provide students with enough computational power to run most academic tasks and projects. We will provide specific examples of using such technologies for common tasks, such as setting up a distributed file system, running MapReduce algorithms on large datasets, performing large scale machine learning and graph mining using Apache Spark, and maintaining a high availability Cassandra instance.","PeriodicalId":344382,"journal":{"name":"Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3017680.3022386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This lightning talk will focus on our experience of developing and managing large undergraduate and graduate Big Data courses. The demand for trained professionals in the field of Big Data technologies is huge, and there is urgent need to develop and update courses in this area. One of the biggest hurdles for many schools is establishment, maintenance, and constant update of high performance computing infrastructure. Further, the technology landscape for Big Data is constantly evolving, and newer technologies, such as Apache Spark, require significant expenditure to set up and upgrade at the cluster level. Traditional infrastructure at most higher educational institutions is insufficient for this, and is also not able to scale up to meet the expectations of large class sizes and multiple simultaneous sessions. In this lightening talk, we will share our experience of running large undergraduate and graduate Big Data courses using open source infrastructure. Some of this infrastructure is cloud based, while others require students to create virtualized environment on their personal computers. Both types of resources are freely available, easy to setup, and provide students with enough computational power to run most academic tasks and projects. We will provide specific examples of using such technologies for common tasks, such as setting up a distributed file system, running MapReduce algorithms on large datasets, performing large scale machine learning and graph mining using Apache Spark, and maintaining a high availability Cassandra instance.