AIoT处理器中一种新的非易失缓存预取方案

2020 5th International Conference on Universal Village (UV) Pub Date : 2020-10-24 DOI:10.1109/UV50937.2020.9426214

Mao Ni, Lan Chen, Xiaoran Hao, Hao Sun, Chenji Liu, Zhen Zhang, Lesong Wu, Lei Pan

{"title":"AIoT处理器中一种新的非易失缓存预取方案","authors":"Mao Ni, Lan Chen, Xiaoran Hao, Hao Sun, Chenji Liu, Zhen Zhang, Lesong Wu, Lei Pan","doi":"10.1109/UV50937.2020.9426214","DOIUrl":null,"url":null,"abstract":"Artificial intelligence Internet of Things (AIoT) systems need more processing capability and low energy consumption on the terminals. Increasing the cache capacity at the architecture level can effectively reduce the cache missing rate and the number of off-chip memory access, which improves the performance of processors. Because of the limitation of the storage density and high leakage currents, improving SRAM cache capacity will lead to big chip area and high power consumption of processors. Emerging non-volatile memory (NVM), such as spin-transfer torque RAM (STT-RAM), has some characters such as bit width read/write and short read/write latency which are attractive options for replacing or augmenting SRAM. However, the asymmetric read/write latency of NVM brings some new challenges in designing caches. In this paper, we introduce a method to bring STT-RAM into the cache system of AIoT processors. Experimental results show that if we replace SRAM L2 cache with the same sized STT-RAM L2 cache, the out-of-order processor performance is improved up to 10 percent. We first find that for the STT-RAM cache, although the better-performing stream-based data prefetch configuration can improve the processor performance, the amount of prefetching data also increases obviously, which has negative effect on the processor performance because of the cache congestion caused by STT-RAM’s long write latency. This paper presents a novel stream-based prefetch method ANCP (Adaptive Non-volatile Cache Prefetch) to reduce the amount of prefetch-write on STT-RAM. Compared with the best-performing stream-based data prefetch configuration, ANCP reduces the prefetching issued by 11 percent on average, which obtains additional up to 8 percent performance improvement of the AIoT processor.","PeriodicalId":279871,"journal":{"name":"2020 5th International Conference on Universal Village (UV)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Novel Prefetching Scheme for Non-Volatile Cache in the AIoT Processor\",\"authors\":\"Mao Ni, Lan Chen, Xiaoran Hao, Hao Sun, Chenji Liu, Zhen Zhang, Lesong Wu, Lei Pan\",\"doi\":\"10.1109/UV50937.2020.9426214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial intelligence Internet of Things (AIoT) systems need more processing capability and low energy consumption on the terminals. Increasing the cache capacity at the architecture level can effectively reduce the cache missing rate and the number of off-chip memory access, which improves the performance of processors. Because of the limitation of the storage density and high leakage currents, improving SRAM cache capacity will lead to big chip area and high power consumption of processors. Emerging non-volatile memory (NVM), such as spin-transfer torque RAM (STT-RAM), has some characters such as bit width read/write and short read/write latency which are attractive options for replacing or augmenting SRAM. However, the asymmetric read/write latency of NVM brings some new challenges in designing caches. In this paper, we introduce a method to bring STT-RAM into the cache system of AIoT processors. Experimental results show that if we replace SRAM L2 cache with the same sized STT-RAM L2 cache, the out-of-order processor performance is improved up to 10 percent. We first find that for the STT-RAM cache, although the better-performing stream-based data prefetch configuration can improve the processor performance, the amount of prefetching data also increases obviously, which has negative effect on the processor performance because of the cache congestion caused by STT-RAM’s long write latency. This paper presents a novel stream-based prefetch method ANCP (Adaptive Non-volatile Cache Prefetch) to reduce the amount of prefetch-write on STT-RAM. Compared with the best-performing stream-based data prefetch configuration, ANCP reduces the prefetching issued by 11 percent on average, which obtains additional up to 8 percent performance improvement of the AIoT processor.\",\"PeriodicalId\":279871,\"journal\":{\"name\":\"2020 5th International Conference on Universal Village (UV)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Universal Village (UV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UV50937.2020.9426214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Universal Village (UV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UV50937.2020.9426214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

人工智能物联网(AIoT)系统需要更高的处理能力和终端的低能耗。在架构层增加缓存容量可以有效降低缓存丢失率和片外存储器访问次数，从而提高处理器的性能。由于存储密度和泄漏电流的限制，提高SRAM缓存容量将导致芯片面积大，处理器功耗高。新兴的非易失性存储器(NVM)，如自旋转移扭矩RAM (STT-RAM)，具有位宽读写和短读写延迟等特性，是替代或增强SRAM的有吸引力的选择。然而，NVM的非对称读写延迟给缓存的设计带来了新的挑战。本文介绍了一种将STT-RAM引入AIoT处理器缓存系统的方法。实验结果表明，如果用相同大小的STT-RAM L2缓存代替SRAM L2缓存，无序处理器的性能可提高10%。我们首先发现，对于STT-RAM缓存，虽然性能更好的基于流的数据预取配置可以提高处理器性能，但预取数据量也明显增加，这对处理器性能有负面影响，因为STT-RAM的长写延迟导致缓存拥塞。本文提出了一种新的基于流的预取方法ANCP (Adaptive Non-volatile Cache prefetch)，以减少STT-RAM上的预取写操作。与性能最好的基于流的数据预取配置相比，ANCP平均减少了11%的预取，这使得AIoT处理器的性能提高了8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Novel Prefetching Scheme for Non-Volatile Cache in the AIoT Processor

Artificial intelligence Internet of Things (AIoT) systems need more processing capability and low energy consumption on the terminals. Increasing the cache capacity at the architecture level can effectively reduce the cache missing rate and the number of off-chip memory access, which improves the performance of processors. Because of the limitation of the storage density and high leakage currents, improving SRAM cache capacity will lead to big chip area and high power consumption of processors. Emerging non-volatile memory (NVM), such as spin-transfer torque RAM (STT-RAM), has some characters such as bit width read/write and short read/write latency which are attractive options for replacing or augmenting SRAM. However, the asymmetric read/write latency of NVM brings some new challenges in designing caches. In this paper, we introduce a method to bring STT-RAM into the cache system of AIoT processors. Experimental results show that if we replace SRAM L2 cache with the same sized STT-RAM L2 cache, the out-of-order processor performance is improved up to 10 percent. We first find that for the STT-RAM cache, although the better-performing stream-based data prefetch configuration can improve the processor performance, the amount of prefetching data also increases obviously, which has negative effect on the processor performance because of the cache congestion caused by STT-RAM’s long write latency. This paper presents a novel stream-based prefetch method ANCP (Adaptive Non-volatile Cache Prefetch) to reduce the amount of prefetch-write on STT-RAM. Compared with the best-performing stream-based data prefetch configuration, ANCP reduces the prefetching issued by 11 percent on average, which obtains additional up to 8 percent performance improvement of the AIoT processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 5th International Conference on Universal Village (UV)

自引率

0.00%

发文量