Thomas B. Rolinger, Christopher D. Krieger, A. Sussman
{"title":"基于迁移线程架构的不规则应用程序内存计算配置优化","authors":"Thomas B. Rolinger, Christopher D. Krieger, A. Sussman","doi":"10.1109/IPDPS49936.2021.00015","DOIUrl":null,"url":null,"abstract":"The movement of data between memory and processors has become a performance bottleneck for many applications. This is made worse for applications with sparse and irregular memory accesses, as they exhibit weak locality and make poor utilization of cache. As a result, colocating memory and compute is crucial for achieving high performance on irregular applications. There are two paradigms for memory-compute colocation. The first is the conventional approach of moving the data to the compute. The second paradigm is to move the compute to the data, which is less conventional and not as well understood. An example are migratory threads, which physically relocate upon remote accesses to the compute resource that hosts the data. In this paper, we explore the paradigm of moving compute to the data by optimizing memory-compute colocation for irregular applications on a migratory thread architecture. Our optimization method includes both initial data placement as well as data replication. We evaluate our optimization on sparse matrix-vector multiply (SpMV) and sparse matrix-matrix multiply (SpGEMM). Our results show that we can achieve speed-ups as high as 4.2x on SpMV and 6x on SpGEMM when compared to the default data layout. We also highlight that our optimization to improve memory-compute colocation can be applicable to both migratory threads and more conventional systems. To this end, we evaluate our optimization approach on a conventional compute cluster using the Chapel programming language. We demonstrate speed-ups as high as 18x for SpMV.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Optimizing Memory-Compute Colocation for Irregular Applications on a Migratory Thread Architecture\",\"authors\":\"Thomas B. Rolinger, Christopher D. Krieger, A. Sussman\",\"doi\":\"10.1109/IPDPS49936.2021.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The movement of data between memory and processors has become a performance bottleneck for many applications. This is made worse for applications with sparse and irregular memory accesses, as they exhibit weak locality and make poor utilization of cache. As a result, colocating memory and compute is crucial for achieving high performance on irregular applications. There are two paradigms for memory-compute colocation. The first is the conventional approach of moving the data to the compute. The second paradigm is to move the compute to the data, which is less conventional and not as well understood. An example are migratory threads, which physically relocate upon remote accesses to the compute resource that hosts the data. In this paper, we explore the paradigm of moving compute to the data by optimizing memory-compute colocation for irregular applications on a migratory thread architecture. Our optimization method includes both initial data placement as well as data replication. We evaluate our optimization on sparse matrix-vector multiply (SpMV) and sparse matrix-matrix multiply (SpGEMM). Our results show that we can achieve speed-ups as high as 4.2x on SpMV and 6x on SpGEMM when compared to the default data layout. We also highlight that our optimization to improve memory-compute colocation can be applicable to both migratory threads and more conventional systems. To this end, we evaluate our optimization approach on a conventional compute cluster using the Chapel programming language. We demonstrate speed-ups as high as 18x for SpMV.\",\"PeriodicalId\":372234,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS49936.2021.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimizing Memory-Compute Colocation for Irregular Applications on a Migratory Thread Architecture
The movement of data between memory and processors has become a performance bottleneck for many applications. This is made worse for applications with sparse and irregular memory accesses, as they exhibit weak locality and make poor utilization of cache. As a result, colocating memory and compute is crucial for achieving high performance on irregular applications. There are two paradigms for memory-compute colocation. The first is the conventional approach of moving the data to the compute. The second paradigm is to move the compute to the data, which is less conventional and not as well understood. An example are migratory threads, which physically relocate upon remote accesses to the compute resource that hosts the data. In this paper, we explore the paradigm of moving compute to the data by optimizing memory-compute colocation for irregular applications on a migratory thread architecture. Our optimization method includes both initial data placement as well as data replication. We evaluate our optimization on sparse matrix-vector multiply (SpMV) and sparse matrix-matrix multiply (SpGEMM). Our results show that we can achieve speed-ups as high as 4.2x on SpMV and 6x on SpGEMM when compared to the default data layout. We also highlight that our optimization to improve memory-compute colocation can be applicable to both migratory threads and more conventional systems. To this end, we evaluate our optimization approach on a conventional compute cluster using the Chapel programming language. We demonstrate speed-ups as high as 18x for SpMV.