Mao Ni, Lan Chen, Xiaoran Hao, Hao Sun, Chenji Liu, Zhen Zhang, Lesong Wu, Lei Pan
{"title":"AIoT处理器中一种新的非易失缓存预取方案","authors":"Mao Ni, Lan Chen, Xiaoran Hao, Hao Sun, Chenji Liu, Zhen Zhang, Lesong Wu, Lei Pan","doi":"10.1109/UV50937.2020.9426214","DOIUrl":null,"url":null,"abstract":"Artificial intelligence Internet of Things (AIoT) systems need more processing capability and low energy consumption on the terminals. Increasing the cache capacity at the architecture level can effectively reduce the cache missing rate and the number of off-chip memory access, which improves the performance of processors. Because of the limitation of the storage density and high leakage currents, improving SRAM cache capacity will lead to big chip area and high power consumption of processors. Emerging non-volatile memory (NVM), such as spin-transfer torque RAM (STT-RAM), has some characters such as bit width read/write and short read/write latency which are attractive options for replacing or augmenting SRAM. However, the asymmetric read/write latency of NVM brings some new challenges in designing caches. In this paper, we introduce a method to bring STT-RAM into the cache system of AIoT processors. Experimental results show that if we replace SRAM L2 cache with the same sized STT-RAM L2 cache, the out-of-order processor performance is improved up to 10 percent. We first find that for the STT-RAM cache, although the better-performing stream-based data prefetch configuration can improve the processor performance, the amount of prefetching data also increases obviously, which has negative effect on the processor performance because of the cache congestion caused by STT-RAM’s long write latency. This paper presents a novel stream-based prefetch method ANCP (Adaptive Non-volatile Cache Prefetch) to reduce the amount of prefetch-write on STT-RAM. Compared with the best-performing stream-based data prefetch configuration, ANCP reduces the prefetching issued by 11 percent on average, which obtains additional up to 8 percent performance improvement of the AIoT processor.","PeriodicalId":279871,"journal":{"name":"2020 5th International Conference on Universal Village (UV)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Novel Prefetching Scheme for Non-Volatile Cache in the AIoT Processor\",\"authors\":\"Mao Ni, Lan Chen, Xiaoran Hao, Hao Sun, Chenji Liu, Zhen Zhang, Lesong Wu, Lei Pan\",\"doi\":\"10.1109/UV50937.2020.9426214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial intelligence Internet of Things (AIoT) systems need more processing capability and low energy consumption on the terminals. Increasing the cache capacity at the architecture level can effectively reduce the cache missing rate and the number of off-chip memory access, which improves the performance of processors. Because of the limitation of the storage density and high leakage currents, improving SRAM cache capacity will lead to big chip area and high power consumption of processors. Emerging non-volatile memory (NVM), such as spin-transfer torque RAM (STT-RAM), has some characters such as bit width read/write and short read/write latency which are attractive options for replacing or augmenting SRAM. However, the asymmetric read/write latency of NVM brings some new challenges in designing caches. In this paper, we introduce a method to bring STT-RAM into the cache system of AIoT processors. Experimental results show that if we replace SRAM L2 cache with the same sized STT-RAM L2 cache, the out-of-order processor performance is improved up to 10 percent. We first find that for the STT-RAM cache, although the better-performing stream-based data prefetch configuration can improve the processor performance, the amount of prefetching data also increases obviously, which has negative effect on the processor performance because of the cache congestion caused by STT-RAM’s long write latency. This paper presents a novel stream-based prefetch method ANCP (Adaptive Non-volatile Cache Prefetch) to reduce the amount of prefetch-write on STT-RAM. Compared with the best-performing stream-based data prefetch configuration, ANCP reduces the prefetching issued by 11 percent on average, which obtains additional up to 8 percent performance improvement of the AIoT processor.\",\"PeriodicalId\":279871,\"journal\":{\"name\":\"2020 5th International Conference on Universal Village (UV)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Universal Village (UV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UV50937.2020.9426214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Universal Village (UV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UV50937.2020.9426214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Prefetching Scheme for Non-Volatile Cache in the AIoT Processor
Artificial intelligence Internet of Things (AIoT) systems need more processing capability and low energy consumption on the terminals. Increasing the cache capacity at the architecture level can effectively reduce the cache missing rate and the number of off-chip memory access, which improves the performance of processors. Because of the limitation of the storage density and high leakage currents, improving SRAM cache capacity will lead to big chip area and high power consumption of processors. Emerging non-volatile memory (NVM), such as spin-transfer torque RAM (STT-RAM), has some characters such as bit width read/write and short read/write latency which are attractive options for replacing or augmenting SRAM. However, the asymmetric read/write latency of NVM brings some new challenges in designing caches. In this paper, we introduce a method to bring STT-RAM into the cache system of AIoT processors. Experimental results show that if we replace SRAM L2 cache with the same sized STT-RAM L2 cache, the out-of-order processor performance is improved up to 10 percent. We first find that for the STT-RAM cache, although the better-performing stream-based data prefetch configuration can improve the processor performance, the amount of prefetching data also increases obviously, which has negative effect on the processor performance because of the cache congestion caused by STT-RAM’s long write latency. This paper presents a novel stream-based prefetch method ANCP (Adaptive Non-volatile Cache Prefetch) to reduce the amount of prefetch-write on STT-RAM. Compared with the best-performing stream-based data prefetch configuration, ANCP reduces the prefetching issued by 11 percent on average, which obtains additional up to 8 percent performance improvement of the AIoT processor.