Comparison of the HPC and Big Data Java Libraries Spark, PCJ and APGAS

Jonas Posner, Lukas Reitz, Claudia Fohry
{"title":"Comparison of the HPC and Big Data Java Libraries Spark, PCJ and APGAS","authors":"Jonas Posner, Lukas Reitz, Claudia Fohry","doi":"10.1109/PAW-ATM.2018.00007","DOIUrl":null,"url":null,"abstract":"Although Java is rarely used in HPC, there are a few notable libraries. Use of Java may help to bridge the gap between HPC and big data processing. This paper compares the big data library Spark, and the HPC libraries PCJ and APGAS, regarding productivity and performance. We refer to Java versions of all libraries. For APGAS, we include both the original version and an own extension by locality-flexible tasks. We consider three benchmarks: Calculation of π from HPC, Unbalanced Tree Search (UTS) from HPC, and WordCount from the big data domain. In performance measurements with up to 144 workers, the extended APGAS library was the clear winner. With 144 workers, APGAS programs were up to a factor of more than two faster than Spark programs, and up to about 30% faster than PCJ programs. Regarding productivity, the extended APGAS programs consistently needed the lowest number of different library constructs. Spark ranged second in productivity, and PCJ third.","PeriodicalId":368346,"journal":{"name":"2018 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI (PAW-ATM)","volume":"40 12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI (PAW-ATM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PAW-ATM.2018.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Although Java is rarely used in HPC, there are a few notable libraries. Use of Java may help to bridge the gap between HPC and big data processing. This paper compares the big data library Spark, and the HPC libraries PCJ and APGAS, regarding productivity and performance. We refer to Java versions of all libraries. For APGAS, we include both the original version and an own extension by locality-flexible tasks. We consider three benchmarks: Calculation of π from HPC, Unbalanced Tree Search (UTS) from HPC, and WordCount from the big data domain. In performance measurements with up to 144 workers, the extended APGAS library was the clear winner. With 144 workers, APGAS programs were up to a factor of more than two faster than Spark programs, and up to about 30% faster than PCJ programs. Regarding productivity, the extended APGAS programs consistently needed the lowest number of different library constructs. Spark ranged second in productivity, and PCJ third.
HPC和大数据Java库Spark、PCJ和APGAS的比较
尽管在HPC中很少使用Java,但还是有一些值得注意的库。使用Java可能有助于弥合HPC和大数据处理之间的差距。本文比较了大数据库Spark与高性能计算库PCJ和APGAS在生产率和性能方面的差异。我们参考所有库的Java版本。对于APGAS,我们包含了原始版本和自己的扩展,这些扩展是根据位置灵活的任务进行的。我们考虑了三个基准:来自HPC的π计算,来自HPC的不平衡树搜索(UTS)和来自大数据领域的WordCount。在多达144名工人的性能测试中,扩展的APGAS库无疑是赢家。在144名工人的情况下,APGAS程序比Spark程序快两倍以上,比PCJ程序快30%左右。关于生产力,扩展的APGAS程序始终需要最少数量的不同库构造。Spark的生产率排名第二,PCJ排名第三。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信