DSMM: Apache Spark中内存管理的动态设置

Suk-Joo Chae, Tae-Sun Chung
{"title":"DSMM: Apache Spark中内存管理的动态设置","authors":"Suk-Joo Chae, Tae-Sun Chung","doi":"10.1109/ISPASS.2019.00024","DOIUrl":null,"url":null,"abstract":"Apache Spark (Spark) is a unified analytics engine for large-scale data processing. Unlike traditional data processing engines like Hadoop, Spark is a framework that caches data in memory. Therefore, memory management in Spark is importance. However, there are several factors that interfere with memory management. First, if users want to cache data in memory, they need to choose their own storage level. In this case, if they do not select the optimal storage level, Spark will be put a heavy burden on memory. Next, users need to select the ratio for spark memory directly within Spark. If they do not choose optimal ratio for spark memory, garbage collection overheads will be incurred. In this poster, we propose DSMM that dynamically select the above factors on the system for memory management. Our experimental result shows 13% execution time improvement as compared to standard Spark.","PeriodicalId":137786,"journal":{"name":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"DSMM: A Dynamic Setting for Memory Management in Apache Spark\",\"authors\":\"Suk-Joo Chae, Tae-Sun Chung\",\"doi\":\"10.1109/ISPASS.2019.00024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Apache Spark (Spark) is a unified analytics engine for large-scale data processing. Unlike traditional data processing engines like Hadoop, Spark is a framework that caches data in memory. Therefore, memory management in Spark is importance. However, there are several factors that interfere with memory management. First, if users want to cache data in memory, they need to choose their own storage level. In this case, if they do not select the optimal storage level, Spark will be put a heavy burden on memory. Next, users need to select the ratio for spark memory directly within Spark. If they do not choose optimal ratio for spark memory, garbage collection overheads will be incurred. In this poster, we propose DSMM that dynamically select the above factors on the system for memory management. Our experimental result shows 13% execution time improvement as compared to standard Spark.\",\"PeriodicalId\":137786,\"journal\":{\"name\":\"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPASS.2019.00024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2019.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

Apache Spark (Spark)是用于大规模数据处理的统一分析引擎。与传统的数据处理引擎(如Hadoop)不同,Spark是一个在内存中缓存数据的框架。因此,Spark中的内存管理非常重要。然而,有几个因素会干扰内存管理。首先,如果用户希望在内存中缓存数据,他们需要选择自己的存储级别。在这种情况下,如果他们没有选择最优的存储级别,Spark将会给内存带来沉重的负担。接下来,用户需要直接在spark中选择spark内存的比例。如果他们没有为spark内存选择最佳的比例,就会产生垃圾收集开销。在这张海报中,我们提出了动态选择系统上的上述因素进行内存管理的DSMM。我们的实验结果表明,与标准Spark相比,执行时间提高了13%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DSMM: A Dynamic Setting for Memory Management in Apache Spark
Apache Spark (Spark) is a unified analytics engine for large-scale data processing. Unlike traditional data processing engines like Hadoop, Spark is a framework that caches data in memory. Therefore, memory management in Spark is importance. However, there are several factors that interfere with memory management. First, if users want to cache data in memory, they need to choose their own storage level. In this case, if they do not select the optimal storage level, Spark will be put a heavy burden on memory. Next, users need to select the ratio for spark memory directly within Spark. If they do not choose optimal ratio for spark memory, garbage collection overheads will be incurred. In this poster, we propose DSMM that dynamically select the above factors on the system for memory management. Our experimental result shows 13% execution time improvement as compared to standard Spark.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信