Akihiro Hayashi, K. Ishizaki, Gita Koblents, Vivek Sarkar
{"title":"基于机器学习的运行时CPU/GPU选择性能启发式方法","authors":"Akihiro Hayashi, K. Ishizaki, Gita Koblents, Vivek Sarkar","doi":"10.1145/2807426.2807429","DOIUrl":null,"url":null,"abstract":"High-level languages such as Java increase both productivity and portability with productive language features such as managed runtime, type safety, and precise exception semantics. Additionally, Java 8 provides parallel stream APIs with lambda expressions to facilitate parallel programming for mainstream users of multi-core CPUs and many-core GPUs. These high-level APIs avoid the complexity of writing natively running parallel programs with OpenMP and CUDA/OpenCL through Java Native Interface (JNI). The adoption of such high-level programming models offers opportunities for enabling compilers to perform parallel-aware optimizations and code generation. While many prior approaches have the ability to generate parallel code for both multi-core CPUs and many-core GPUs from Java and other high-level languages, selection of the preferred computing resource between CPUs and GPUs for individual kernels remains one of the most important challenges since a variety of factors affecting performance such as datasets and feature of programs need to be taken into account. This paper explores the possibility of using machine learning to address this challenge. The key idea is to enable a Java runtime to select a preferable hardware device with performance heuristics constructed by supervised machine-learning techniques. For this purpose, if our JIT compiler detects a parallel stream API, 1) our compiler records features of its computation such as the parallel loop range and the number of instructions and 2) our Java runtime generates these features for constructing training data. For the results reported in this paper, we constructed a prediction model with support vector machines (SVMs) after obtaining 291 samples by running 11 applications with different data sets and optimization levels. Our Java runtime then uses the SVMs to make predictions for unseen programs. Our experimental results on an IBM POWER8 platform with NVIDIA Tesla GPUs show that our prediction model predicts a faster configuration with up to 99.0% accuracy with 5-fold cross validation. Based on these results, we conclude that supervised machine-learning is a promising approach for building performance heuristics for mapping Java applications onto accelerators.","PeriodicalId":104024,"journal":{"name":"Proceedings of the Principles and Practices of Programming on The Java Platform","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection\",\"authors\":\"Akihiro Hayashi, K. Ishizaki, Gita Koblents, Vivek Sarkar\",\"doi\":\"10.1145/2807426.2807429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-level languages such as Java increase both productivity and portability with productive language features such as managed runtime, type safety, and precise exception semantics. Additionally, Java 8 provides parallel stream APIs with lambda expressions to facilitate parallel programming for mainstream users of multi-core CPUs and many-core GPUs. These high-level APIs avoid the complexity of writing natively running parallel programs with OpenMP and CUDA/OpenCL through Java Native Interface (JNI). The adoption of such high-level programming models offers opportunities for enabling compilers to perform parallel-aware optimizations and code generation. While many prior approaches have the ability to generate parallel code for both multi-core CPUs and many-core GPUs from Java and other high-level languages, selection of the preferred computing resource between CPUs and GPUs for individual kernels remains one of the most important challenges since a variety of factors affecting performance such as datasets and feature of programs need to be taken into account. This paper explores the possibility of using machine learning to address this challenge. The key idea is to enable a Java runtime to select a preferable hardware device with performance heuristics constructed by supervised machine-learning techniques. For this purpose, if our JIT compiler detects a parallel stream API, 1) our compiler records features of its computation such as the parallel loop range and the number of instructions and 2) our Java runtime generates these features for constructing training data. For the results reported in this paper, we constructed a prediction model with support vector machines (SVMs) after obtaining 291 samples by running 11 applications with different data sets and optimization levels. Our Java runtime then uses the SVMs to make predictions for unseen programs. Our experimental results on an IBM POWER8 platform with NVIDIA Tesla GPUs show that our prediction model predicts a faster configuration with up to 99.0% accuracy with 5-fold cross validation. Based on these results, we conclude that supervised machine-learning is a promising approach for building performance heuristics for mapping Java applications onto accelerators.\",\"PeriodicalId\":104024,\"journal\":{\"name\":\"Proceedings of the Principles and Practices of Programming on The Java Platform\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Principles and Practices of Programming on The Java Platform\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2807426.2807429\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Principles and Practices of Programming on The Java Platform","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2807426.2807429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
摘要
Java等高级语言通过托管运行时、类型安全和精确异常语义等生产性语言特性提高了生产率和可移植性。此外,Java 8还提供了带有lambda表达式的并行流api,以方便多核cpu和多核gpu的主流用户进行并行编程。这些高级api避免了通过Java本机接口(JNI)使用OpenMP和CUDA/OpenCL编写本机运行并行程序的复杂性。这种高级编程模型的采用为编译器执行并行感知的优化和代码生成提供了机会。虽然许多先前的方法都能够从Java和其他高级语言中为多核cpu和多核gpu生成并行代码,但在cpu和gpu之间为单个内核选择首选计算资源仍然是最重要的挑战之一,因为需要考虑各种影响性能的因素,如数据集和程序特性。本文探讨了使用机器学习来解决这一挑战的可能性。关键思想是使Java运行时能够使用由监督机器学习技术构建的性能启发式来选择优选的硬件设备。为此,如果我们的JIT编译器检测到一个并行流API, 1)我们的编译器记录其计算的特征,如并行循环范围和指令数量,2)我们的Java运行时生成这些特征以构建训练数据。针对本文报告的结果,我们运行了11个不同数据集和优化水平的应用程序,获得了291个样本,并使用支持向量机(svm)构建了预测模型。然后,我们的Java运行时使用svm对未见过的程序进行预测。我们在带有NVIDIA Tesla gpu的IBM POWER8平台上的实验结果表明,我们的预测模型通过5倍交叉验证预测了一个更快的配置,准确率高达99.0%。基于这些结果,我们得出结论,有监督的机器学习对于构建将Java应用程序映射到加速器的性能启发式方法是一种很有前途的方法。
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
High-level languages such as Java increase both productivity and portability with productive language features such as managed runtime, type safety, and precise exception semantics. Additionally, Java 8 provides parallel stream APIs with lambda expressions to facilitate parallel programming for mainstream users of multi-core CPUs and many-core GPUs. These high-level APIs avoid the complexity of writing natively running parallel programs with OpenMP and CUDA/OpenCL through Java Native Interface (JNI). The adoption of such high-level programming models offers opportunities for enabling compilers to perform parallel-aware optimizations and code generation. While many prior approaches have the ability to generate parallel code for both multi-core CPUs and many-core GPUs from Java and other high-level languages, selection of the preferred computing resource between CPUs and GPUs for individual kernels remains one of the most important challenges since a variety of factors affecting performance such as datasets and feature of programs need to be taken into account. This paper explores the possibility of using machine learning to address this challenge. The key idea is to enable a Java runtime to select a preferable hardware device with performance heuristics constructed by supervised machine-learning techniques. For this purpose, if our JIT compiler detects a parallel stream API, 1) our compiler records features of its computation such as the parallel loop range and the number of instructions and 2) our Java runtime generates these features for constructing training data. For the results reported in this paper, we constructed a prediction model with support vector machines (SVMs) after obtaining 291 samples by running 11 applications with different data sets and optimization levels. Our Java runtime then uses the SVMs to make predictions for unseen programs. Our experimental results on an IBM POWER8 platform with NVIDIA Tesla GPUs show that our prediction model predicts a faster configuration with up to 99.0% accuracy with 5-fold cross validation. Based on these results, we conclude that supervised machine-learning is a promising approach for building performance heuristics for mapping Java applications onto accelerators.