Xiaowei Wang, C. Augustine, E. Nurvitadhi, R. Iyer, Li Zhao, R. Das
{"title":"高效sram内数据比较的缓存压缩","authors":"Xiaowei Wang, C. Augustine, E. Nurvitadhi, R. Iyer, Li Zhao, R. Das","doi":"10.1109/nas51552.2021.9605440","DOIUrl":null,"url":null,"abstract":"We present a novel cache compression method that leverages the fine-grained data duplication across cache lines. We leverage the XOR operation of the in-SRAM bit-line computing peripherals, to search for compressible data over a wide range of data locations on cache, reducing the data movement requirements. To reduce the decompression latency, we design specialized compression schemes by fetching the data with the same parallelism as the original cache, according to the architecture of the last-level cache slice. The proposed compression method achieves a 2.05× compression ratio on average (up to 67×), and 4.73% of speedup on average (up to 29%), over the SPEC2006 benchmarks.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cache Compression with Efficient in-SRAM Data Comparison\",\"authors\":\"Xiaowei Wang, C. Augustine, E. Nurvitadhi, R. Iyer, Li Zhao, R. Das\",\"doi\":\"10.1109/nas51552.2021.9605440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a novel cache compression method that leverages the fine-grained data duplication across cache lines. We leverage the XOR operation of the in-SRAM bit-line computing peripherals, to search for compressible data over a wide range of data locations on cache, reducing the data movement requirements. To reduce the decompression latency, we design specialized compression schemes by fetching the data with the same parallelism as the original cache, according to the architecture of the last-level cache slice. The proposed compression method achieves a 2.05× compression ratio on average (up to 67×), and 4.73% of speedup on average (up to 29%), over the SPEC2006 benchmarks.\",\"PeriodicalId\":135930,\"journal\":{\"name\":\"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/nas51552.2021.9605440\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/nas51552.2021.9605440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cache Compression with Efficient in-SRAM Data Comparison
We present a novel cache compression method that leverages the fine-grained data duplication across cache lines. We leverage the XOR operation of the in-SRAM bit-line computing peripherals, to search for compressible data over a wide range of data locations on cache, reducing the data movement requirements. To reduce the decompression latency, we design specialized compression schemes by fetching the data with the same parallelism as the original cache, according to the architecture of the last-level cache slice. The proposed compression method achieves a 2.05× compression ratio on average (up to 67×), and 4.73% of speedup on average (up to 29%), over the SPEC2006 benchmarks.