{"title":"smt感知的瞬时内存占用优化","authors":"Probir Roy, Xu Liu, S. Song","doi":"10.1145/2907294.2907308","DOIUrl":null,"url":null,"abstract":"Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the entire memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging because they typically spawn threads within Single Program Multiple Data (SPMD) models. Since these threads have similar resource requirements, their contention cannot be easily mitigated through simple thread scheduling. To address this important issue, we first vigorously conduct a systematic performance evaluation on a wide-range of representative HPC and CMP applications on three mainstream SMT architectures, and quantify their performance sensitivity to SMT effects. Then we introduce a simple scheme for SMT-aware code optimization which aims to reduce the memory contention across SMT threads. Finally, we develop a lightweight performance tool, named SMTAnalyzer, to effectively identify the optimization opportunities in the source code of multithreaded programs. Experiments on three SMT architectures (i.e., Intel Xeon, IBM POWER7, and Intel Xeon Phi) demonstrate that our proposed SMT-aware optimization scheme can significantly improve the performance for general HPC applications.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"2 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SMT-Aware Instantaneous Footprint Optimization\",\"authors\":\"Probir Roy, Xu Liu, S. Song\",\"doi\":\"10.1145/2907294.2907308\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the entire memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging because they typically spawn threads within Single Program Multiple Data (SPMD) models. Since these threads have similar resource requirements, their contention cannot be easily mitigated through simple thread scheduling. To address this important issue, we first vigorously conduct a systematic performance evaluation on a wide-range of representative HPC and CMP applications on three mainstream SMT architectures, and quantify their performance sensitivity to SMT effects. Then we introduce a simple scheme for SMT-aware code optimization which aims to reduce the memory contention across SMT threads. Finally, we develop a lightweight performance tool, named SMTAnalyzer, to effectively identify the optimization opportunities in the source code of multithreaded programs. Experiments on three SMT architectures (i.e., Intel Xeon, IBM POWER7, and Intel Xeon Phi) demonstrate that our proposed SMT-aware optimization scheme can significantly improve the performance for general HPC applications.\",\"PeriodicalId\":20515,\"journal\":{\"name\":\"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing\",\"volume\":\"2 8\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2907294.2907308\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2907294.2907308","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the entire memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging because they typically spawn threads within Single Program Multiple Data (SPMD) models. Since these threads have similar resource requirements, their contention cannot be easily mitigated through simple thread scheduling. To address this important issue, we first vigorously conduct a systematic performance evaluation on a wide-range of representative HPC and CMP applications on three mainstream SMT architectures, and quantify their performance sensitivity to SMT effects. Then we introduce a simple scheme for SMT-aware code optimization which aims to reduce the memory contention across SMT threads. Finally, we develop a lightweight performance tool, named SMTAnalyzer, to effectively identify the optimization opportunities in the source code of multithreaded programs. Experiments on three SMT architectures (i.e., Intel Xeon, IBM POWER7, and Intel Xeon Phi) demonstrate that our proposed SMT-aware optimization scheme can significantly improve the performance for general HPC applications.