{"title":"表征新兴异质记忆","authors":"Du Shen, Xu Liu, F. Lin","doi":"10.1145/2926697.2926702","DOIUrl":null,"url":null,"abstract":"Heterogeneous memory (HM, also known as hybrid memory) has become popular in emerging parallel architectures due to its programming flexibility and energy efficiency. Unlike the traditional memory subsystem, HM consists of fast and slow components. Usually, the fast memory lacks hardware support, which puts extra burdens on programmers and compilers for explicit data placement. Thus, HM provides both opportunities and challenges with programming parallel codes. It is important to understand how to utilize HM and set expectations on the benefits of HM. Prior work principally uses simulators to study HM, which lacks the analysis on a real hardware. To address this issue, this paper experiments with a real system—the TI KeyStone II—to study HM. We make three contributions. First, we develop a set of parallel benchmarks to characterize the performance and power efficiency of HM. It is the first benchmark suite with OpenMP 4.0 features that is functional on real HM architectures. Second, we build a profiling tool to provide guidance for placing data in HM. Our tool analyzes memory access patterns and provides high-level feedback at the source-code level for optimization. Third, we apply the data placement optimization to our benchmarks and evaluate the effectiveness of HM in boosting performance and saving energy.","PeriodicalId":203550,"journal":{"name":"Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Characterizing emerging heterogeneous memory\",\"authors\":\"Du Shen, Xu Liu, F. Lin\",\"doi\":\"10.1145/2926697.2926702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous memory (HM, also known as hybrid memory) has become popular in emerging parallel architectures due to its programming flexibility and energy efficiency. Unlike the traditional memory subsystem, HM consists of fast and slow components. Usually, the fast memory lacks hardware support, which puts extra burdens on programmers and compilers for explicit data placement. Thus, HM provides both opportunities and challenges with programming parallel codes. It is important to understand how to utilize HM and set expectations on the benefits of HM. Prior work principally uses simulators to study HM, which lacks the analysis on a real hardware. To address this issue, this paper experiments with a real system—the TI KeyStone II—to study HM. We make three contributions. First, we develop a set of parallel benchmarks to characterize the performance and power efficiency of HM. It is the first benchmark suite with OpenMP 4.0 features that is functional on real HM architectures. Second, we build a profiling tool to provide guidance for placing data in HM. Our tool analyzes memory access patterns and provides high-level feedback at the source-code level for optimization. Third, we apply the data placement optimization to our benchmarks and evaluate the effectiveness of HM in boosting performance and saving energy.\",\"PeriodicalId\":203550,\"journal\":{\"name\":\"Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2926697.2926702\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2926697.2926702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Heterogeneous memory (HM, also known as hybrid memory) has become popular in emerging parallel architectures due to its programming flexibility and energy efficiency. Unlike the traditional memory subsystem, HM consists of fast and slow components. Usually, the fast memory lacks hardware support, which puts extra burdens on programmers and compilers for explicit data placement. Thus, HM provides both opportunities and challenges with programming parallel codes. It is important to understand how to utilize HM and set expectations on the benefits of HM. Prior work principally uses simulators to study HM, which lacks the analysis on a real hardware. To address this issue, this paper experiments with a real system—the TI KeyStone II—to study HM. We make three contributions. First, we develop a set of parallel benchmarks to characterize the performance and power efficiency of HM. It is the first benchmark suite with OpenMP 4.0 features that is functional on real HM architectures. Second, we build a profiling tool to provide guidance for placing data in HM. Our tool analyzes memory access patterns and provides high-level feedback at the source-code level for optimization. Third, we apply the data placement optimization to our benchmarks and evaluate the effectiveness of HM in boosting performance and saving energy.