{"title":"扩展OpenMP和OpenSHMEM实现高效异构计算","authors":"Wen-wei Lu, Shilei Tian, Tony Curtis, B. Chapman","doi":"10.1109/PAW-ATM56565.2022.00006","DOIUrl":null,"url":null,"abstract":"Heterogeneous supercomputing systems are becoming mainstream thanks to their powerful accelerators. However, the accelerators’ special memory model and APIs increase the development complexity, and calls for innovative programming model designs. To address this issue, OpenMP has added target offloading for portable accelerator programming, and MPI allows transparent send-receive of accelerator memory buffers. Meanwhile, Partitioned Global Address Space (PGAS) languages like OpenSHMEM are falling behind for heterogeneous computing because their special memory models pose additional challenges.We propose language and runtime interoperability extensions for both OpenMP and OpenSHMEM to enable portable remote access on GPU buffers, with minimal amount of code changes. Our modified runtime systems work in coordination to manage accelerator memory, eliminating the need for staging communication buffers. Compared to the standard implementation, our extensions attain 6x point-to-point latency improvement, 1.3x better collective operation latency, 4.9x random access throughput, and up to 12.5% better performance in strong scaling configurations.","PeriodicalId":231452,"journal":{"name":"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extending OpenMP and OpenSHMEM for Efficient Heterogeneous Computing\",\"authors\":\"Wen-wei Lu, Shilei Tian, Tony Curtis, B. Chapman\",\"doi\":\"10.1109/PAW-ATM56565.2022.00006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous supercomputing systems are becoming mainstream thanks to their powerful accelerators. However, the accelerators’ special memory model and APIs increase the development complexity, and calls for innovative programming model designs. To address this issue, OpenMP has added target offloading for portable accelerator programming, and MPI allows transparent send-receive of accelerator memory buffers. Meanwhile, Partitioned Global Address Space (PGAS) languages like OpenSHMEM are falling behind for heterogeneous computing because their special memory models pose additional challenges.We propose language and runtime interoperability extensions for both OpenMP and OpenSHMEM to enable portable remote access on GPU buffers, with minimal amount of code changes. Our modified runtime systems work in coordination to manage accelerator memory, eliminating the need for staging communication buffers. Compared to the standard implementation, our extensions attain 6x point-to-point latency improvement, 1.3x better collective operation latency, 4.9x random access throughput, and up to 12.5% better performance in strong scaling configurations.\",\"PeriodicalId\":231452,\"journal\":{\"name\":\"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PAW-ATM56565.2022.00006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PAW-ATM56565.2022.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extending OpenMP and OpenSHMEM for Efficient Heterogeneous Computing
Heterogeneous supercomputing systems are becoming mainstream thanks to their powerful accelerators. However, the accelerators’ special memory model and APIs increase the development complexity, and calls for innovative programming model designs. To address this issue, OpenMP has added target offloading for portable accelerator programming, and MPI allows transparent send-receive of accelerator memory buffers. Meanwhile, Partitioned Global Address Space (PGAS) languages like OpenSHMEM are falling behind for heterogeneous computing because their special memory models pose additional challenges.We propose language and runtime interoperability extensions for both OpenMP and OpenSHMEM to enable portable remote access on GPU buffers, with minimal amount of code changes. Our modified runtime systems work in coordination to manage accelerator memory, eliminating the need for staging communication buffers. Compared to the standard implementation, our extensions attain 6x point-to-point latency improvement, 1.3x better collective operation latency, 4.9x random access throughput, and up to 12.5% better performance in strong scaling configurations.