在完全内存延迟隐藏的情况下，最小化nvm写操作的最佳循环平铺

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI:10.1109/ASP-DAC52403.2022.9712532

Rui Xu, E. Sha, Qingfeng Zhuge, Yuhong Song, Jingzhi Lin

{"title":"在完全内存延迟隐藏的情况下，最小化nvm写操作的最佳循环平铺","authors":"Rui Xu, E. Sha, Qingfeng Zhuge, Yuhong Song, Jingzhi Lin","doi":"10.1109/ASP-DAC52403.2022.9712532","DOIUrl":null,"url":null,"abstract":"Non-volatile memory (NVM) is expected to be the second level memory (named remote memory) in two-level memory hierarchy in the future. However, NVM has the limited write endurance, thus it is vital to reduce the number of write operations on NVM. Meanwhile, in two-level memory hierarchy, prefetch is widely used for fetching certain data before it is actually required, to hide the remote memory access latency. In general, large-scale nested loop is the performance bottleneck in one program due to the write operations on NVM caused by the first level memory (named local memory) miss and data reuse. Loop tiling is the key technique for grouping iterations so as to reduce the communication with remote memory used in compiler. In this paper, we propose a new loop tiling approach for minimizing the write operations on NVMs and completely hiding the NVM access latency. Specifically, we introduce a series of theorems to help loop tiling. Then, a legal tile shape and an optimal tile size selection strategy is proposed according to data dependency and local memory capacity. Furthermore, we propose a pipeline scheduling policy to completely hide the remote memory latency. Extensive experiments show that the proposed techniques can reduce write operations on NVMs by 95.1% on average, and NVM latency can be completely hidden.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Optimal Loop Tiling for Minimizing Write Operations on NVMs with Complete Memory Latency Hiding\",\"authors\":\"Rui Xu, E. Sha, Qingfeng Zhuge, Yuhong Song, Jingzhi Lin\",\"doi\":\"10.1109/ASP-DAC52403.2022.9712532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-volatile memory (NVM) is expected to be the second level memory (named remote memory) in two-level memory hierarchy in the future. However, NVM has the limited write endurance, thus it is vital to reduce the number of write operations on NVM. Meanwhile, in two-level memory hierarchy, prefetch is widely used for fetching certain data before it is actually required, to hide the remote memory access latency. In general, large-scale nested loop is the performance bottleneck in one program due to the write operations on NVM caused by the first level memory (named local memory) miss and data reuse. Loop tiling is the key technique for grouping iterations so as to reduce the communication with remote memory used in compiler. In this paper, we propose a new loop tiling approach for minimizing the write operations on NVMs and completely hiding the NVM access latency. Specifically, we introduce a series of theorems to help loop tiling. Then, a legal tile shape and an optimal tile size selection strategy is proposed according to data dependency and local memory capacity. Furthermore, we propose a pipeline scheduling policy to completely hide the remote memory latency. Extensive experiments show that the proposed techniques can reduce write operations on NVMs by 95.1% on average, and NVM latency can be completely hidden.\",\"PeriodicalId\":239260,\"journal\":{\"name\":\"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASP-DAC52403.2022.9712532\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC52403.2022.9712532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

非易失性存储器(NVM)有望在未来的两级存储器结构中成为第二级存储器(称为远程存储器)。但是，NVM的写持久性有限，因此减少NVM上的写操作数量至关重要。同时，在两级内存结构中，预取被广泛用于在实际需要之前获取某些数据，以隐藏远程内存访问延迟。一般来说，大规模嵌套循环是一个程序的性能瓶颈，这是由于一级内存(称为本地内存)丢失和数据重用导致对NVM的写操作造成的。循环平铺是分组迭代的关键技术，可以减少编译器与远程内存的通信。在本文中，我们提出了一种新的循环平铺方法来最小化NVM上的写操作并完全隐藏NVM的访问延迟。具体来说，我们将引入一系列定理来帮助循环平铺。然后，根据数据依赖性和局部存储器容量，提出了合理的贴图形状和最优的贴图大小选择策略。此外，我们提出了一种管道调度策略来完全隐藏远程内存延迟。大量的实验表明，该技术可以将NVM上的写操作平均减少95.1%，并且可以完全隐藏NVM的延迟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimal Loop Tiling for Minimizing Write Operations on NVMs with Complete Memory Latency Hiding

Non-volatile memory (NVM) is expected to be the second level memory (named remote memory) in two-level memory hierarchy in the future. However, NVM has the limited write endurance, thus it is vital to reduce the number of write operations on NVM. Meanwhile, in two-level memory hierarchy, prefetch is widely used for fetching certain data before it is actually required, to hide the remote memory access latency. In general, large-scale nested loop is the performance bottleneck in one program due to the write operations on NVM caused by the first level memory (named local memory) miss and data reuse. Loop tiling is the key technique for grouping iterations so as to reduce the communication with remote memory used in compiler. In this paper, we propose a new loop tiling approach for minimizing the write operations on NVMs and completely hiding the NVM access latency. Specifically, we introduce a series of theorems to help loop tiling. Then, a legal tile shape and an optimal tile size selection strategy is proposed according to data dependency and local memory capacity. Furthermore, we propose a pipeline scheduling policy to completely hide the remote memory latency. Extensive experiments show that the proposed techniques can reduce write operations on NVMs by 95.1% on average, and NVM latency can be completely hidden.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量