DSMM: Apache Spark中内存管理的动态设置

2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2019-03-24 DOI:10.1109/ISPASS.2019.00024

Suk-Joo Chae, Tae-Sun Chung

{"title":"DSMM: Apache Spark中内存管理的动态设置","authors":"Suk-Joo Chae, Tae-Sun Chung","doi":"10.1109/ISPASS.2019.00024","DOIUrl":null,"url":null,"abstract":"Apache Spark (Spark) is a unified analytics engine for large-scale data processing. Unlike traditional data processing engines like Hadoop, Spark is a framework that caches data in memory. Therefore, memory management in Spark is importance. However, there are several factors that interfere with memory management. First, if users want to cache data in memory, they need to choose their own storage level. In this case, if they do not select the optimal storage level, Spark will be put a heavy burden on memory. Next, users need to select the ratio for spark memory directly within Spark. If they do not choose optimal ratio for spark memory, garbage collection overheads will be incurred. In this poster, we propose DSMM that dynamically select the above factors on the system for memory management. Our experimental result shows 13% execution time improvement as compared to standard Spark.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"DSMM: A Dynamic Setting for Memory Management in Apache Spark\",\"authors\":\"Suk-Joo Chae, Tae-Sun Chung\",\"doi\":\"10.1109/ISPASS.2019.00024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Apache Spark (Spark) is a unified analytics engine for large-scale data processing. Unlike traditional data processing engines like Hadoop, Spark is a framework that caches data in memory. Therefore, memory management in Spark is importance. However, there are several factors that interfere with memory management. First, if users want to cache data in memory, they need to choose their own storage level. In this case, if they do not select the optimal storage level, Spark will be put a heavy burden on memory. Next, users need to select the ratio for spark memory directly within Spark. If they do not choose optimal ratio for spark memory, garbage collection overheads will be incurred. In this poster, we propose DSMM that dynamically select the above factors on the system for memory management. Our experimental result shows 13% execution time improvement as compared to standard Spark.\",\"PeriodicalId\":137786,\"journal\":{\"name\":\"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPASS.2019.00024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2019.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

Apache Spark (Spark)是用于大规模数据处理的统一分析引擎。与传统的数据处理引擎(如Hadoop)不同，Spark是一个在内存中缓存数据的框架。因此，Spark中的内存管理非常重要。然而，有几个因素会干扰内存管理。首先，如果用户希望在内存中缓存数据，他们需要选择自己的存储级别。在这种情况下，如果他们没有选择最优的存储级别，Spark将会给内存带来沉重的负担。接下来，用户需要直接在spark中选择spark内存的比例。如果他们没有为spark内存选择最佳的比例，就会产生垃圾收集开销。在这张海报中，我们提出了动态选择系统上的上述因素进行内存管理的DSMM。我们的实验结果表明，与标准Spark相比，执行时间提高了13%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DSMM: A Dynamic Setting for Memory Management in Apache Spark

Apache Spark (Spark) is a unified analytics engine for large-scale data processing. Unlike traditional data processing engines like Hadoop, Spark is a framework that caches data in memory. Therefore, memory management in Spark is importance. However, there are several factors that interfere with memory management. First, if users want to cache data in memory, they need to choose their own storage level. In this case, if they do not select the optimal storage level, Spark will be put a heavy burden on memory. Next, users need to select the ratio for spark memory directly within Spark. If they do not choose optimal ratio for spark memory, garbage collection overheads will be incurred. In this poster, we propose DSMM that dynamically select the above factors on the system for memory management. Our experimental result shows 13% execution time improvement as compared to standard Spark.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量