{"title":"预测大型NUMA机器上多线程应用程序的内存带宽和最佳核心分配","authors":"Wei Wang, J. Davidson, M. Soffa","doi":"10.1109/HPCA.2016.7446083","DOIUrl":null,"url":null,"abstract":"Modern NUMA platforms offer large numbers of cores to boost performance through parallelism and multi-threading. However, because performance scalability is limited by available memory bandwidth, the strategy of allocating all cores can result in degraded performance. Consequently, accurately predicting optimal (best performing) core allocations, and executing applications with these allocations are crucial for achieving the best performance. Previous research focused on the prediction of optimal numbers of cores. However, in this paper, we show that, because of the asymmetric NUMA memory configuration and the asymmetric application memory behavior, optimal core allocations are not merely optimal numbers of cores. Additionally, previous studies do not adequately consider NUMA memory resources, which further limits their ability to accurately predict optimal core allocations. In this paper, we present a model, NuCore, which predicts both memory bandwidth usage and optimal core allocations. NuCore considers various memory resources and NUMA asymmetry, and employs Integer Programming to achieve high accuracy and low overhead. Experimental results from real NUMA machines show that the core allocations predicted by NuCore provide 1.27x average speedup over using all cores with only 75.6% cores allocated. NuCore also provides 1.18x and 1.21x average speedups over two state-of-the-art techniques. Our results also show that NuCore faithfully models NUMA memory systems and predicts memory bandwidth usages with only 10% average error.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":"{\"title\":\"Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines\",\"authors\":\"Wei Wang, J. Davidson, M. Soffa\",\"doi\":\"10.1109/HPCA.2016.7446083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern NUMA platforms offer large numbers of cores to boost performance through parallelism and multi-threading. However, because performance scalability is limited by available memory bandwidth, the strategy of allocating all cores can result in degraded performance. Consequently, accurately predicting optimal (best performing) core allocations, and executing applications with these allocations are crucial for achieving the best performance. Previous research focused on the prediction of optimal numbers of cores. However, in this paper, we show that, because of the asymmetric NUMA memory configuration and the asymmetric application memory behavior, optimal core allocations are not merely optimal numbers of cores. Additionally, previous studies do not adequately consider NUMA memory resources, which further limits their ability to accurately predict optimal core allocations. In this paper, we present a model, NuCore, which predicts both memory bandwidth usage and optimal core allocations. NuCore considers various memory resources and NUMA asymmetry, and employs Integer Programming to achieve high accuracy and low overhead. Experimental results from real NUMA machines show that the core allocations predicted by NuCore provide 1.27x average speedup over using all cores with only 75.6% cores allocated. NuCore also provides 1.18x and 1.21x average speedups over two state-of-the-art techniques. Our results also show that NuCore faithfully models NUMA memory systems and predicts memory bandwidth usages with only 10% average error.\",\"PeriodicalId\":417994,\"journal\":{\"name\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"45\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2016.7446083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines
Modern NUMA platforms offer large numbers of cores to boost performance through parallelism and multi-threading. However, because performance scalability is limited by available memory bandwidth, the strategy of allocating all cores can result in degraded performance. Consequently, accurately predicting optimal (best performing) core allocations, and executing applications with these allocations are crucial for achieving the best performance. Previous research focused on the prediction of optimal numbers of cores. However, in this paper, we show that, because of the asymmetric NUMA memory configuration and the asymmetric application memory behavior, optimal core allocations are not merely optimal numbers of cores. Additionally, previous studies do not adequately consider NUMA memory resources, which further limits their ability to accurately predict optimal core allocations. In this paper, we present a model, NuCore, which predicts both memory bandwidth usage and optimal core allocations. NuCore considers various memory resources and NUMA asymmetry, and employs Integer Programming to achieve high accuracy and low overhead. Experimental results from real NUMA machines show that the core allocations predicted by NuCore provide 1.27x average speedup over using all cores with only 75.6% cores allocated. NuCore also provides 1.18x and 1.21x average speedups over two state-of-the-art techniques. Our results also show that NuCore faithfully models NUMA memory systems and predicts memory bandwidth usages with only 10% average error.