Matt Martineau, Simon McIntosh-Smith, M. Boulton, W. Gaudin
{"title":"新兴多核并行编程模型的评价","authors":"Matt Martineau, Simon McIntosh-Smith, M. Boulton, W. Gaudin","doi":"10.1145/2883404.2883420","DOIUrl":null,"url":null,"abstract":"In this work we directly evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs. Each model has been used to port TeaLeaf, a miniature proxy application, or mini-app, that solves the heat conduction equation, and belongs to the Mantevo suite of applications. We find that the best performance is achieved with device-tuned implementations but that, in many cases, the performance portable models are able to solve the same problems to within a 5-20% performance penalty. The models expose varying levels of complexity to the developer, and they all present reasonable performance. We believe that complexity will become the major influencer in the long-term adoption of such models.","PeriodicalId":185841,"journal":{"name":"Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"An Evaluation of Emerging Many-Core Parallel Programming Models\",\"authors\":\"Matt Martineau, Simon McIntosh-Smith, M. Boulton, W. Gaudin\",\"doi\":\"10.1145/2883404.2883420\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we directly evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs. Each model has been used to port TeaLeaf, a miniature proxy application, or mini-app, that solves the heat conduction equation, and belongs to the Mantevo suite of applications. We find that the best performance is achieved with device-tuned implementations but that, in many cases, the performance portable models are able to solve the same problems to within a 5-20% performance penalty. The models expose varying levels of complexity to the developer, and they all present reasonable performance. We believe that complexity will become the major influencer in the long-term adoption of such models.\",\"PeriodicalId\":185841,\"journal\":{\"name\":\"Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2883404.2883420\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2883404.2883420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Evaluation of Emerging Many-Core Parallel Programming Models
In this work we directly evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs. Each model has been used to port TeaLeaf, a miniature proxy application, or mini-app, that solves the heat conduction equation, and belongs to the Mantevo suite of applications. We find that the best performance is achieved with device-tuned implementations but that, in many cases, the performance portable models are able to solve the same problems to within a 5-20% performance penalty. The models expose varying levels of complexity to the developer, and they all present reasonable performance. We believe that complexity will become the major influencer in the long-term adoption of such models.