2021 IEEE 39th International Conference on Computer Design (ICCD)最新文献_第7页

Block-LSM: An Ether-aware Block-ordered LSM-tree based Key-Value Storage Engine Block-LSM:基于以太感知的块排序lsm树的键值存储引擎

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00017

Zehao Chen, Bingzhe Li, Xiaojun Cai, Zhiping Jia, Zhaoyan Shen, Yi Wang, Z. Shao

{"title":"Block-LSM: An Ether-aware Block-ordered LSM-tree based Key-Value Storage Engine","authors":"Zehao Chen, Bingzhe Li, Xiaojun Cai, Zhiping Jia, Zhaoyan Shen, Yi Wang, Z. Shao","doi":"10.1109/ICCD53106.2021.00017","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00017","url":null,"abstract":"Ethereum as one of the largest blockchain systems plays an important role in the distributed ledger, database systems, etc. As more and more blocks are mined, the storage burden of Ethereum is significantly increased. The current Ethereum system uniformly transforms all its data into key-value (KV) items and stores them to the underlying Log-Structure Merged tree (LSM-tree) storage engine ignoring the software semantics. Consequently, it not only exacerbates the write amplification effect of the storage engine but also hurts the performance of Ethereum. In this paper, we proposed a new Ethereum-aware storage model called Block-LSM, which significantly improves the data synchronization of the Ethereum system. Specifically, we first design a shared prefix scheme to transform Ethereum data into ordered KV pairs to alleviate the key range overlaps of different levels in the underlying LSM-tree based storage engine. Moreover, we propose to maintain several semantic-orientated memory buffers to isolate different kinds of Ethereum data. To save space overhead, Block-LSM further aggregates multiple blocks into a group and assigns the same prefix to all KV items from the same block group. Finally, we implement Block-LSM in the real Ethereum environment and conduct a series of experiments. The evaluation results show that Block-LSM significantly reduces up to 3.7× storage write amplification and increases throughput by 3× compared with the original Ethereum design.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114309622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Model Synthesis for Communication Traces of System Designs 系统设计通信轨迹的模型综合

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00082

Hao Zheng, Md Rubel Ahmed, P. Mukherjee, M. Ketkar, Jin Yang

引用次数: 4

HammerFilter: Robust Protection and Low Hardware Overhead Method for RowHammer 锤式过滤器:鲁棒保护和低硬件开销的方法

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00043

Kwangrae Kim, Jeonghyun Woo, Junsu Kim, Ki-Seok Chung

{"title":"HammerFilter: Robust Protection and Low Hardware Overhead Method for RowHammer","authors":"Kwangrae Kim, Jeonghyun Woo, Junsu Kim, Ki-Seok Chung","doi":"10.1109/ICCD53106.2021.00043","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00043","url":null,"abstract":"The continuous scaling-down of the dynamic random access memory (DRAM) manufacturing process has made it possible to improve DRAM density. However, it makes small DRAM cells susceptible to electromagnetic interference between nearby cells. Unless DRAM cells are adequately isolated from each other, the frequent switching access of some cells may lead to unintended bit flips in adjacent cells. This phenomenon is commonly referred to as RowHammer. It is often considered a security issue because unusually frequent accesses to a small set of rows generated by malicious attacks can cause bit flips. Such bit flips may also be caused by general applications. Although several solutions have been proposed, most approaches either incur excessive area overhead or exhibit limited prevention capabilities against maliciously crafted attack patterns. Therefore, the goals of this study are (1) to mitigate RowHammer, even when the number of aggressor rows increases and attack patterns become complicated, and (2) to implement the method with a low area overhead.We propose a robust hardware-based protection method for RowHammer attacks with a low hardware cost called HammerFilter, which employs a modified version of the counting bloom filter. It tracks all attacking rows efficiently by leveraging the fact that the counting bloom filter is a space-efficient data structure, and we add an operation, HALF-DELETE, to mitigate the energy overhead. According to our experimental results, the proposed method can completely prevent bit flips when facing artificially crafted attack patterns (five patterns in our experiments), whereas state-of-the-art probabilistic solutions can only mitigate less than 56% of bit flips on average. Furthermore, the proposed method has a much lower area cost compared to existing counter-based solutions (40.6× better than TWiCe and 2.3× better than Graphene).","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133106368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Smart-DNN: Efficiently Reducing the Memory Requirements of Running Deep Neural Networks on Resource-constrained Platforms Smart-DNN:有效降低在资源受限平台上运行深度神经网络的内存需求

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00087

Zhenbo Hu, Xiangyu Zou, Wen Xia, Yuhong Zhao, Weizhe Zhang, Donglei Wu

{"title":"Smart-DNN: Efficiently Reducing the Memory Requirements of Running Deep Neural Networks on Resource-constrained Platforms","authors":"Zhenbo Hu, Xiangyu Zou, Wen Xia, Yuhong Zhao, Weizhe Zhang, Donglei Wu","doi":"10.1109/ICCD53106.2021.00087","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00087","url":null,"abstract":"Deep neural networks (DNNs) have gained considerable attention in various real-world applications due to their strong performance in representation learning. However, running a DNN needs tremendous memory resources, which significantly restricts DNN from being applicable on resource-constrained platforms (e.g., IoT, mobile devices, etc.). Lightweight DNNs can accommodate the characteristics of mobile devices, but the hardware resources of mobile or IoT devices are extremely limited, and the resource consumption of lightweight models needs to be further reduced. However, the current neural network compression approaches (i.e., pruning, quantization, knowledge distillation, etc.) works poorly on the lightweight DNNs, which are already simplified. In this paper, we present a novel framework called Smart-DNN, which can efficiently reduce the memory requirements of running DNNs on resource-constrained platforms. Specifically, we slice a neural network into several segments and use SZ error-bounded lossy compression to compress each segment separately while keeping the network structure unchanged. When running a network, we first store the compressed network into memory and then partially decompress the corresponding part layer by layer. According to experimental results on four popular lightweight DNNs (usually used in resource-constrained platforms), Smart-DNN achieves memory saving of 1/10∼1/5, while slightly sacrificing inference accuracy and unchanging the neural network structure with accepted extra runtime overhead.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"22 15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113976420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Flexible Instruction Set Architecture for Programmable Look-up Table based Processing-in-Memory 基于内存处理的可编程查找表灵活指令集体系结构

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00022

Mark Connolly, Purab Ranjan Sutradhar, Mark A. Indovina, A. Ganguly

{"title":"Flexible Instruction Set Architecture for Programmable Look-up Table based Processing-in-Memory","authors":"Mark Connolly, Purab Ranjan Sutradhar, Mark A. Indovina, A. Ganguly","doi":"10.1109/ICCD53106.2021.00022","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00022","url":null,"abstract":"Processing in Memory (PIM) is a recent novel computing paradigm that is still in its nascent stage of development. Therefore, there has been an observable lack of standardized and modular Instruction Set Architectures (ISA) for the PIM devices. In this work, we present the design of an ISA which is primarily aimed at a recent programmable Look-up Table (LUT) based PIM architecture. Our ISA performs the three major tasks of i) controlling the flow of data between the memory and the PIM units, ii) reprogramming the LUTs to perform various operations required for a particular application, and iii) executing sequential steps of operation within the PIM device. A microcoded architecture of the Controller/Sequencer unit ensures minimum circuit overhead as well as offers programmability to support any custom operation. We provide a case study of CNN inferences, large matrix multiplications, and bitwise computations on the PIM architecture equipped with our ISA and present performance evaluations based on this setup. We also compare the performances with several other PIM architectures.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124857380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

NIST-Lite: Randomness Testing of RNGs on an Energy-Constrained Platform NIST-Lite:能量受限平台上rng的随机性测试

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00019

Cheng-Yen Lee, K. Bharathi, Joellen S. Lansford, S. Khatri

{"title":"NIST-Lite: Randomness Testing of RNGs on an Energy-Constrained Platform","authors":"Cheng-Yen Lee, K. Bharathi, Joellen S. Lansford, S. Khatri","doi":"10.1109/ICCD53106.2021.00019","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00019","url":null,"abstract":"Random Number Generators (RNGs) are an essential part of many embedded applications and are used for security, encryption, and built-in test applications. The output of RNGs can be tested for randomness using the well-known NIST statistical test suite. Embedded applications using True Random Number Generators (TRNGs) need to test the randomness of their TRNGs periodically, because their randomness properties can drift over time. Using the full NIST test suite is unpracticed for this purpose, because the full NIST test suite is computationally intensive, and embedded systems (especially real-time systems) often have stringent constraints on the energy and runtime of the programs that are executed on them. In this paper, we propose novel algorithms to select the most effective subset of the NIST test suite, which works within specified runtime and energy budgets. To achieve this, we rank the NIST tests based on multiple metrics, including p-value/Time, p-value/Energy, p-value/Time2 and p-value/Energy2. Based on the total runtime or energy constraint specified by the user, our algorithms proceed to choose a subset of the NIST tests using this rank order. We call this subset of NIST tests as NIST-Lite. Our algorithms also take into account the runtime and energy required to generate the random sequences required (on the same platform) by the NIST-Lite tests. We evaluate the effectiveness of our method against the full NIST test suite (referred to as NIST-Full) and also against a greedily chosen subset of the NIST test suite (referred to as NIST-Greedy). We explore different variants of NIST-Lite. On average, using the same input sequences, the p-value obtained for the 4 best variants of NIST-Lite is 2× and 7× better than the p-value of NIST-Full and NIST-Greedy respectively. NIST-Lite also achieves 158× (204×) runtime (energy) reduction compared to the NIST-Full. Further, we study the performance of NIST-Lite and NIST-Full for deterministic (non-random) input sequences. For such sequences, the pass rate of the NIST-Lite tests is within 16% of the pass rate of NIST-Full on the same sequences, indicating that our NIST-Lite tests have a similar diagnostic ability as NIST-Full.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127804822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Comprehensive Exploration of the Parallel Prefix Adder Tree Space 并行前缀加法器树空间的综合探索

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00030

Teodor-Dumitru Ene, J. Stine

引用次数: 5

Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs 利用gpu上采样密集矩阵乘法的在线局部性和约简并行性

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00092

Zhongming Yu, Guohao Dai, Guyue Huang, Yu Wang, Huazhong Yang

{"title":"Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs","authors":"Zhongming Yu, Guohao Dai, Guyue Huang, Yu Wang, Huazhong Yang","doi":"10.1109/ICCD53106.2021.00092","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00092","url":null,"abstract":"Sampled Dense-Dense Matrix Multiplication (SDDMM) is a core component of many machine learning systems. SDDMM exposes a substantial amount of parallelism that favors throughput-oriented architectures like the GPU. However, accelerating it on GPUs is challenging in two aspects: the poor memory access locality caused by the sparse sampling matrix with the poor parallelism caused by the dot-product reduction of vectors in two dense matrices. To address both challenges, we present PRedS to boost SDDMM efficiency with a suite of Parallel Reduction Scheduling optimizations. PRedS uses Vectorized Coarsen 1-Dimensional Tiling (VCT) to benefit the online locality of loading the dense matrix. PRedS uses Integrated Interleaving Reduction (IIR) to increase thread occupancy in the parallel reduction. PRedS also leverages Warp-Merged Tiling (WMT) to preserve occupancy and parallelism when reducing very long arrays. Enhanced with GPU-intrinsic vectorized memory loading, PRedS achieves a geometric speedup of 29.20× compared to the vendor library. PRedS achieves up to 8.31× speedup over state-of-the-art implementations on the SuiteSparse benchmark.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116570257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

EFM: Elastic Flash Management to Enhance Performance of Hybrid Flash Memory 弹性闪存管理提高混合闪存的性能

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00035

Bingzhe Li, Bo Yuan, D. Du

{"title":"EFM: Elastic Flash Management to Enhance Performance of Hybrid Flash Memory","authors":"Bingzhe Li, Bo Yuan, D. Du","doi":"10.1109/ICCD53106.2021.00035","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00035","url":null,"abstract":"NAND-based flash memory has become a prevalent storage media due to its low access latency and high performance. By setting up different incremental step pulse programming (ISPP) values and threshold voltages, the tradeoffs between lifetime and access latency in NAND-based flash memory can be exploited. The existing studies that exploit the tradeoffs by using heuristic algorithms do not consider the dynamically changed access latency due to wearing-out, resulting in low access performance. In this paper, we proposed a new Elastic Flash Management scheme, called EFM, to manage data in hybrid flash memory, which consists of multiple physical regions with different read/write latencies according to their ISPP values and threshold voltages. EFM includes a Long-Term Classifier (LT-Classifier) and a Short-Term Classifier (ST-Classifier) to accurately track dynamically changed workloads by considering current quantitative differences of read/write latencies and workload access patterns. Moreover, a reduced effective wearing management is proposed to prolong the lifetime of flash memory by scheduling write-intensive workloads to the region with a reduced threshold voltage and the lowest write cost. Experimental results indicate that EFM reduces the average read/write latencies by about 54% - 296% and obtain 17.7% lifetime improvement on average compared to the existing studies.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CROP: FPGA Implementation of High-Performance Polynomial Multiplication in Saber KEM based on Novel Cyclic-Row Oriented Processing Strategy 基于新的面向循环行处理策略的Saber KEM中高性能多项式乘法的FPGA实现

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00031

Jiafeng Xie, Pengzhou He, Chiou-Yng Lee

{"title":"CROP: FPGA Implementation of High-Performance Polynomial Multiplication in Saber KEM based on Novel Cyclic-Row Oriented Processing Strategy","authors":"Jiafeng Xie, Pengzhou He, Chiou-Yng Lee","doi":"10.1109/ICCD53106.2021.00031","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00031","url":null,"abstract":"The rapid advancement in quantum technology has initiated a new round of post-quantum cryptography (PQC) related exploration. The key encapsulation mechanism (KEM) Saber is an important module lattice-based PQC, which has been selected as one of the PQC finalists in the ongoing National Institute of Standards and Technology (NIST) standardization process. On the other hand, however, efficient hardware implementation of KEM Saber has not been well covered in the literature. In this paper, therefore, we propose a novel cyclic-row oriented processing (CROP) strategy for efficient implementation of the key arithmetic operation of KEM Saber, i.e., the polynomial multiplication. The proposed work consists of three layers of interdependent efforts: (i) first of all, we have formulated the main operation of KEM Saber into desired mathematical forms to be further developed into CROP based algorithms, i.e., the basic version and the advanced higher-speed version; (ii) then, we have followed the proposed CROP strategy to innovatively transfer the derived two algorithms into desired polynomial multiplication structures with the help of a series of algorithm-architecture co-implementation techniques; (iii) finally, detailed complexity analysis and implementation results have shown that the proposed polynomial multiplication structures have better area-time complexities than the state-of-the-art solutions. Specifically, the field-programmable gate array (FPGA) implementation results show that the proposed design, e.g., the basic version has at least less 11.2% area-delay product (ADP) than the best competing one (Cyclone V device). The proposed high-performance polynomial multipliers offer not only efficient operation for output results delivery but also possess low-complexity feature brought by CROP strategy. The outcome of this work is expected to provide useful references for further development and standardization process of KEM Saber.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125940752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7