Java中使用Array包的高性能计算:使用数据挖掘的案例研究

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI:10.1145/331532.331542

J. Moreira, S. Midkiff, M. Gupta, Richard D. Lawrence

{"title":"Java中使用Array包的高性能计算:使用数据挖掘的案例研究","authors":"J. Moreira, S. Midkiff, M. Gupta, Richard D. Lawrence","doi":"10.1145/331532.331542","DOIUrl":null,"url":null,"abstract":"This paper discusses several techniques used in developing a parallel, production quality data mining application in Java. We started by developing three sequential versions of a product recommendation data mining application: (i) a Fortran 90 version used as a performance reference, (ii) a plain Java implementation that only uses the primitive array structures from the language, and (iii) a baseline Java implementation that uses our Array package for Java. This Array package provides parallelism at the level of individual Array and BLAS operations. Using this Array package, we also developed two parallel Java versions of the data mining application: one that relies entirely on the implicit parallelism provided by the Array package, and another that is explicitly parallel at the application level. We discuss the design of the Array package, as well as the design of the data mining application. We compare the trade-offs between performance and the abstraction level the different Java versions present to the application programmer. Our studies show that, although a plain Java implementation performs poorly, the Java implementation with the Array package is quite competitive in performance with Fortran. We achieve a single processor performance of 109 Mflops, or 91% of Fortran performance, on a 332 MHz PowerPC 604e processor. Both the implicitly and explicitly parallel forms of our Java implementations also parallelize well. On an SMP with four of those PowerPC processors, the implicitly parallel form achieves 290 Mflops with no effort from the application programmer, while the explicitly parallel form achieves 340 Mflops.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"High Performance Computing with the Array Package for Java: A Case Study using Data Mining\",\"authors\":\"J. Moreira, S. Midkiff, M. Gupta, Richard D. Lawrence\",\"doi\":\"10.1145/331532.331542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper discusses several techniques used in developing a parallel, production quality data mining application in Java. We started by developing three sequential versions of a product recommendation data mining application: (i) a Fortran 90 version used as a performance reference, (ii) a plain Java implementation that only uses the primitive array structures from the language, and (iii) a baseline Java implementation that uses our Array package for Java. This Array package provides parallelism at the level of individual Array and BLAS operations. Using this Array package, we also developed two parallel Java versions of the data mining application: one that relies entirely on the implicit parallelism provided by the Array package, and another that is explicitly parallel at the application level. We discuss the design of the Array package, as well as the design of the data mining application. We compare the trade-offs between performance and the abstraction level the different Java versions present to the application programmer. Our studies show that, although a plain Java implementation performs poorly, the Java implementation with the Array package is quite competitive in performance with Fortran. We achieve a single processor performance of 109 Mflops, or 91% of Fortran performance, on a 332 MHz PowerPC 604e processor. Both the implicitly and explicitly parallel forms of our Java implementations also parallelize well. On an SMP with four of those PowerPC processors, the implicitly parallel form achieves 290 Mflops with no effort from the application programmer, while the explicitly parallel form achieves 340 Mflops.\",\"PeriodicalId\":354898,\"journal\":{\"name\":\"ACM/IEEE SC 1999 Conference (SC'99)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM/IEEE SC 1999 Conference (SC'99)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/331532.331542\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 1999 Conference (SC'99)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/331532.331542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

本文讨论了在Java中开发一个并行的、生产质量的数据挖掘应用程序所使用的几种技术。我们首先开发了产品推荐数据挖掘应用程序的三个连续版本:(i)用作性能参考的Fortran 90版本，(ii)仅使用语言中的原始数组结构的普通Java实现，以及(iii)使用我们的Java array包的基线Java实现。这个Array包在单个Array和BLAS操作级别上提供并行性。使用这个Array包，我们还开发了数据挖掘应用程序的两个并行Java版本:一个完全依赖于Array包提供的隐式并行性，另一个在应用程序级别显式并行。讨论了Array包的设计，以及数据挖掘应用程序的设计。我们比较了不同Java版本提供给应用程序程序员的性能和抽象级别之间的权衡。我们的研究表明，尽管普通Java实现的性能很差，但使用Array包的Java实现在性能上与Fortran相当有竞争力。我们在332 MHz的PowerPC 604e处理器上实现了109 Mflops的单处理器性能，即Fortran性能的91%。我们的Java实现的隐式和显式并行形式也可以很好地并行化。在具有四个PowerPC处理器的SMP上，隐式并行形式无需应用程序程序员的努力即可实现290 Mflops，而显式并行形式可实现340 Mflops。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High Performance Computing with the Array Package for Java: A Case Study using Data Mining

This paper discusses several techniques used in developing a parallel, production quality data mining application in Java. We started by developing three sequential versions of a product recommendation data mining application: (i) a Fortran 90 version used as a performance reference, (ii) a plain Java implementation that only uses the primitive array structures from the language, and (iii) a baseline Java implementation that uses our Array package for Java. This Array package provides parallelism at the level of individual Array and BLAS operations. Using this Array package, we also developed two parallel Java versions of the data mining application: one that relies entirely on the implicit parallelism provided by the Array package, and another that is explicitly parallel at the application level. We discuss the design of the Array package, as well as the design of the data mining application. We compare the trade-offs between performance and the abstraction level the different Java versions present to the application programmer. Our studies show that, although a plain Java implementation performs poorly, the Java implementation with the Array package is quite competitive in performance with Fortran. We achieve a single processor performance of 109 Mflops, or 91% of Fortran performance, on a 332 MHz PowerPC 604e processor. Both the implicitly and explicitly parallel forms of our Java implementations also parallelize well. On an SMP with four of those PowerPC processors, the implicitly parallel form achieves 290 Mflops with no effort from the application programmer, while the explicitly parallel form achieves 340 Mflops.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM/IEEE SC 1999 Conference (SC'99)

自引率

0.00%

发文量