{"title":"DSMM: Apache Spark中内存管理的动态设置","authors":"Suk-Joo Chae, Tae-Sun Chung","doi":"10.1109/ISPASS.2019.00024","DOIUrl":null,"url":null,"abstract":"Apache Spark (Spark) is a unified analytics engine for large-scale data processing. Unlike traditional data processing engines like Hadoop, Spark is a framework that caches data in memory. Therefore, memory management in Spark is importance. However, there are several factors that interfere with memory management. First, if users want to cache data in memory, they need to choose their own storage level. In this case, if they do not select the optimal storage level, Spark will be put a heavy burden on memory. Next, users need to select the ratio for spark memory directly within Spark. If they do not choose optimal ratio for spark memory, garbage collection overheads will be incurred. In this poster, we propose DSMM that dynamically select the above factors on the system for memory management. Our experimental result shows 13% execution time improvement as compared to standard Spark.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"DSMM: A Dynamic Setting for Memory Management in Apache Spark\",\"authors\":\"Suk-Joo Chae, Tae-Sun Chung\",\"doi\":\"10.1109/ISPASS.2019.00024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Apache Spark (Spark) is a unified analytics engine for large-scale data processing. Unlike traditional data processing engines like Hadoop, Spark is a framework that caches data in memory. Therefore, memory management in Spark is importance. However, there are several factors that interfere with memory management. First, if users want to cache data in memory, they need to choose their own storage level. In this case, if they do not select the optimal storage level, Spark will be put a heavy burden on memory. Next, users need to select the ratio for spark memory directly within Spark. If they do not choose optimal ratio for spark memory, garbage collection overheads will be incurred. In this poster, we propose DSMM that dynamically select the above factors on the system for memory management. Our experimental result shows 13% execution time improvement as compared to standard Spark.\",\"PeriodicalId\":137786,\"journal\":{\"name\":\"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPASS.2019.00024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2019.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DSMM: A Dynamic Setting for Memory Management in Apache Spark
Apache Spark (Spark) is a unified analytics engine for large-scale data processing. Unlike traditional data processing engines like Hadoop, Spark is a framework that caches data in memory. Therefore, memory management in Spark is importance. However, there are several factors that interfere with memory management. First, if users want to cache data in memory, they need to choose their own storage level. In this case, if they do not select the optimal storage level, Spark will be put a heavy burden on memory. Next, users need to select the ratio for spark memory directly within Spark. If they do not choose optimal ratio for spark memory, garbage collection overheads will be incurred. In this poster, we propose DSMM that dynamically select the above factors on the system for memory management. Our experimental result shows 13% execution time improvement as compared to standard Spark.