将优化的GPU内核移植到多核CPU:计算量子化学应用示例

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI:10.1109/SAAHPC.2011.8

Dong Ye, Alexey Titov, V. Kindratenko, Ivan S. Ufimtsev, Todd J. Martinez

{"title":"将优化的GPU内核移植到多核CPU:计算量子化学应用示例","authors":"Dong Ye, Alexey Titov, V. Kindratenko, Ivan S. Ufimtsev, Todd J. Martinez","doi":"10.1109/SAAHPC.2011.8","DOIUrl":null,"url":null,"abstract":"We investigate techniques for optimizing a multi-core CPU code back ported from a highly optimized GPU kernel. We show that common sub-expression elimination and loop unrolling optimization techniques improve code performance on the GPU, but not on the CPU. On the other hand, register reuse and loop merging are effective on the CPU and in combination they improve performance of the ported code by 16%.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Porting Optimized GPU Kernels to a Multi-core CPU: Computational Quantum Chemistry Application Example\",\"authors\":\"Dong Ye, Alexey Titov, V. Kindratenko, Ivan S. Ufimtsev, Todd J. Martinez\",\"doi\":\"10.1109/SAAHPC.2011.8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We investigate techniques for optimizing a multi-core CPU code back ported from a highly optimized GPU kernel. We show that common sub-expression elimination and loop unrolling optimization techniques improve code performance on the GPU, but not on the CPU. On the other hand, register reuse and loop merging are effective on the CPU and in combination they improve performance of the ported code by 16%.\",\"PeriodicalId\":331604,\"journal\":{\"name\":\"2011 Symposium on Application Accelerators in High-Performance Computing\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Symposium on Application Accelerators in High-Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SAAHPC.2011.8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Symposium on Application Accelerators in High-Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAAHPC.2011.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

我们研究了优化从高度优化的GPU内核反向移植的多核CPU代码的技术。我们表明，常见的子表达式消除和循环展开优化技术提高了GPU上的代码性能，但在CPU上却没有。另一方面，寄存器重用和循环合并在CPU上是有效的，它们结合起来使移植代码的性能提高了16%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Porting Optimized GPU Kernels to a Multi-core CPU: Computational Quantum Chemistry Application Example

We investigate techniques for optimizing a multi-core CPU code back ported from a highly optimized GPU kernel. We show that common sub-expression elimination and loop unrolling optimization techniques improve code performance on the GPU, but not on the CPU. On the other hand, register reuse and loop merging are effective on the CPU and in combination they improve performance of the ported code by 16%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 Symposium on Application Accelerators in High-Performance Computing

自引率

0.00%

发文量