Zhou Lei, Hongguang Du, Shengbo Chen, C. Zhu, Xianyang Liu
{"title":"DCSPARK:使用Docker容器虚拟化spark","authors":"Zhou Lei, Hongguang Du, Shengbo Chen, C. Zhu, Xianyang Liu","doi":"10.1109/ICALIP.2016.7846626","DOIUrl":null,"url":null,"abstract":"As MapReduce has become a popular model for large-scale data procession in recent years, companies and researchers take advantage of this model to solve their problems. The applications may run on the same MapReduce cluster, with their own system-wide configure settings and library dependencies, respectively. Sometimes, their configure settings and library dependencies are conflicted with each other. How to ensure these applications to run together correctly without mutual interference and achieve high resources utilization gives a challenge to the researchers. In this paper, we propose DCSpark, a framework that leverages the power of Docker containers that allows users to run Spark applications which have conflicting configurations and library dependencies in one physical cluster. In addition, it's presented an implementation of our framework called DCM which is aimed at managing the physical cluster, processing scheduling problem and building the container-based Spark cluster images automatically according to the dependence environment of the applications. Our experimental evaluation shows that DCSpark introduces negligible overhead for CPU and memory performance compared with the native Spark cluster.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"DCSPARK: Virtualizing spark using Docker containers\",\"authors\":\"Zhou Lei, Hongguang Du, Shengbo Chen, C. Zhu, Xianyang Liu\",\"doi\":\"10.1109/ICALIP.2016.7846626\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As MapReduce has become a popular model for large-scale data procession in recent years, companies and researchers take advantage of this model to solve their problems. The applications may run on the same MapReduce cluster, with their own system-wide configure settings and library dependencies, respectively. Sometimes, their configure settings and library dependencies are conflicted with each other. How to ensure these applications to run together correctly without mutual interference and achieve high resources utilization gives a challenge to the researchers. In this paper, we propose DCSpark, a framework that leverages the power of Docker containers that allows users to run Spark applications which have conflicting configurations and library dependencies in one physical cluster. In addition, it's presented an implementation of our framework called DCM which is aimed at managing the physical cluster, processing scheduling problem and building the container-based Spark cluster images automatically according to the dependence environment of the applications. Our experimental evaluation shows that DCSpark introduces negligible overhead for CPU and memory performance compared with the native Spark cluster.\",\"PeriodicalId\":184170,\"journal\":{\"name\":\"2016 International Conference on Audio, Language and Image Processing (ICALIP)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Audio, Language and Image Processing (ICALIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICALIP.2016.7846626\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALIP.2016.7846626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DCSPARK: Virtualizing spark using Docker containers
As MapReduce has become a popular model for large-scale data procession in recent years, companies and researchers take advantage of this model to solve their problems. The applications may run on the same MapReduce cluster, with their own system-wide configure settings and library dependencies, respectively. Sometimes, their configure settings and library dependencies are conflicted with each other. How to ensure these applications to run together correctly without mutual interference and achieve high resources utilization gives a challenge to the researchers. In this paper, we propose DCSpark, a framework that leverages the power of Docker containers that allows users to run Spark applications which have conflicting configurations and library dependencies in one physical cluster. In addition, it's presented an implementation of our framework called DCM which is aimed at managing the physical cluster, processing scheduling problem and building the container-based Spark cluster images automatically according to the dependence environment of the applications. Our experimental evaluation shows that DCSpark introduces negligible overhead for CPU and memory performance compared with the native Spark cluster.