{"title":"NUMA平台上科学应用的内存亲和性分析","authors":"Rafael Gauna Trindade, J. F. Lima, A. Charão","doi":"10.1109/sbac-padw53941.2021.00011","DOIUrl":null,"url":null,"abstract":"Understanding the underlying architecture is essential for scientific applications in general. An example of a computing environment is Non-Uniform Memory Access (NUMA) systems that enable a large amount of shared main memory. Nevertheless, NUMA systems can impose significant access latencies on data communications between distant memory nodes. Parallel applications with a naïve design may suffer significant performance penalties due to the lack of locality mechanisms. In this paper we present performance metrics on scientific applications to identify locality problems in NUMA systems and show data and thread mapping strategies to mitigate them. Our experiments were performed with four well-known scientific applications: CoMD, LBM, LULESH and Ondes3D. Experimental results demonstrate that scientific applications had significant locality problems and data and thread mapping strategies improved performance on all four applications.","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Memory Affinity Analysis of Scientific Applications on NUMA Platforms\",\"authors\":\"Rafael Gauna Trindade, J. F. Lima, A. Charão\",\"doi\":\"10.1109/sbac-padw53941.2021.00011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the underlying architecture is essential for scientific applications in general. An example of a computing environment is Non-Uniform Memory Access (NUMA) systems that enable a large amount of shared main memory. Nevertheless, NUMA systems can impose significant access latencies on data communications between distant memory nodes. Parallel applications with a naïve design may suffer significant performance penalties due to the lack of locality mechanisms. In this paper we present performance metrics on scientific applications to identify locality problems in NUMA systems and show data and thread mapping strategies to mitigate them. Our experiments were performed with four well-known scientific applications: CoMD, LBM, LULESH and Ondes3D. Experimental results demonstrate that scientific applications had significant locality problems and data and thread mapping strategies improved performance on all four applications.\",\"PeriodicalId\":233108,\"journal\":{\"name\":\"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/sbac-padw53941.2021.00011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/sbac-padw53941.2021.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Memory Affinity Analysis of Scientific Applications on NUMA Platforms
Understanding the underlying architecture is essential for scientific applications in general. An example of a computing environment is Non-Uniform Memory Access (NUMA) systems that enable a large amount of shared main memory. Nevertheless, NUMA systems can impose significant access latencies on data communications between distant memory nodes. Parallel applications with a naïve design may suffer significant performance penalties due to the lack of locality mechanisms. In this paper we present performance metrics on scientific applications to identify locality problems in NUMA systems and show data and thread mapping strategies to mitigate them. Our experiments were performed with four well-known scientific applications: CoMD, LBM, LULESH and Ondes3D. Experimental results demonstrate that scientific applications had significant locality problems and data and thread mapping strategies improved performance on all four applications.