{"title":"Message from the Program Committee Chairs","authors":"","doi":"10.1109/coolchips52128.2021.9410315","DOIUrl":"https://doi.org/10.1109/coolchips52128.2021.9410315","url":null,"abstract":"On behalf of the SCAM 2003 conference and program committees, we would like to welcome you to this year’s workshop. This is the third Source Code Analysis and Manipulation workshop. While it required a great deal of effort by a large number of people to put together this year’s workshop, this work only serves to underscore the greater effort put forth by Mark Harman in making the first and hence later SCAM workshops a reality. Thank you, Mark. All the committee members have worked hard to ensure that SCAM is a useful and enjoyable occasion. However, there are two members who have worked tirelessly to ensure that this occasion is also affordable! Leon Moonen who has managed to obtain external funding from The Netherlands Organisation for Scientific Research (http://www.nwo.nl) and The Royal Netherlands Academy of Arts and Sciences (http://www.knaw.nl), and Dave Binkley for doing the financing and much, much more. Thanks, Lads! The aim of the SCAM workshop is to bring together researchers and practitioners working on theory, techniques and applications which concern analysis and/or manipulation of the source code of computer systems. It is the source code that contains the only precise description of the behavior of the system. Many conferences and workshops address the applications of source code analysis and manipulation. The aim of SCAM is to focus on the algorithms and tools themselves; what they can achieve; and how they can be improved, refined, and combined. This year we received 43 regular paper submissions for the workshop and were able to select from these 21 excellent papers which cover the broad range of activity in Source Code Analysis and Manipulation. All papers were fully reviewed by three referees for relevance, soundness, and originality. Each paper was assigned a rating ranging from A (excellent) to D (poor). Those receiving at least two accepts (A or B rating) appear herein and were included as part of the program. For the accepted papers, 42% received an A rating, 50% a B rating, only 5% a C rating, and a residual 3% received a D rating. Overall this indicates a strong technical program. We would also like to thank our keynote speaker, Chris Verhoef, for his contribution. We would like to take this opportunity to thank the SCAM Program Committee for their hard work and expertise in reviewing the papers. In addition to thanking the authors, reviewers, and Steering Committee for their work in bringing about the third SCAM workshop, we would also like to thank Hans van Vliet, and all those responsible for putting together ICSM 2003, Leon Moonen for his work on local arrangements, Mark Harman and Jianjun Zhao for publicizing the workshop, and Silvio Stefanucci for helping to manage the review process. Thanks are also due to Stacy A. Wagner and Maggie Johnson from the IEEE, and Stephanie Kawada and Thomas Baldwin from the IEEE publications. And last, but not least to Claire Knight for designing the SCAM logo. We hope that you find t","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124883130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LSFQ: A Low Precision Full Integer Quantization for High-Performance FPGA-Based CNN Acceleration","authors":"Zhenshan Bao, Kang Zhan, Wenbo Zhang, Junnan Guo","doi":"10.1109/COOLCHIPS52128.2021.9410327","DOIUrl":"https://doi.org/10.1109/COOLCHIPS52128.2021.9410327","url":null,"abstract":"Neural network quantization has become an important research area. Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, and can maintain high accuracy. However, few quantization can demonstrate this advantage on hardware platform, because the design of quantization algorithm lacks the consideration of actual hardware implementation. In this paper, we propose an efficient quantization method for hardware implementation, a learnable parameter soft clipping fully integer quantization (LSFQ), which includes weight quantization and activation quantization with learnable clipping parameter method. The quantization parameters are optimized automatically by back propagation to minimize the loss, then the BatchNorm layer and convolutional layer are fused, and the bias and quantization step size are further quantized. In this way, LSFQ accomplishes integer-only-arithmetic. We evaluate the quantization algorithm on a variety of models including VGG7, mobile-net v2 in CIFAR10 and CIFAR100. The results show that when the quantization reaches 3-bit or 4-bit, the accuracy loss of our method is less than 1 % compared with the full-precision network. In addition, we design an accelerator for the quantization algorithm and deploy it to the FPGA platform to verify the hardware-awareness of our method.","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127062792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomoki Shimizu, Kohe Ito, Kensuke Iizuka, Kazuei Hironaka, H. Amano
{"title":"Hybrid Network of Packet Switching and STDM in a Multi-FPGA System","authors":"Tomoki Shimizu, Kohe Ito, Kensuke Iizuka, Kazuei Hironaka, H. Amano","doi":"10.1109/COOLCHIPS52128.2021.9410322","DOIUrl":"https://doi.org/10.1109/COOLCHIPS52128.2021.9410322","url":null,"abstract":"A multi-FPGA system, Flow-in-Cloud (FiC) system is currently being developed as a server for Multi-access Edge Computing (MEC), one of the core technologies of 5G. FiC system is composed of mid-range FPGAs directly connected by highspeed serial links and works virtually as a single FPGA with huge resources. Since the applications of MEC are sometimes timing- critical, a Static Time Division Multiplexing (STDM) network has been built on the FiC system. However, the STDM network suffers from the extended latency and low usage of the network resource especially when the network traffic is light. Here, we propose a hybrid router that allows packet switching to use empty slots of the STDM. The evaluation results from a real system appear that packet switching is 2.42 times faster than STDM with 8 boards FFT. Also, we propose and evaluate a dynamic allocation method that changes the switching mode according to the network load.","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129169658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonvolatile SRAM Using Fishbone-in-Cage Capacitor in a 180 nm Standard CMOS Process for Zero-Standby and Instant-Powerup Embedded Memory on IoT","authors":"Takaki Urabe, H. Ochi, Kazutoshi Kobayashi","doi":"10.1109/COOLCHIPS52128.2021.9410314","DOIUrl":"https://doi.org/10.1109/COOLCHIPS52128.2021.9410314","url":null,"abstract":"In this paper, we propose a nonvolatile SRAM (NVSRAM) using the Fishbone-in-Cage Capacitor (FiCC) fabricated in a 0.18μm CMOS process technology. The FiCC can be implemented with metal wires as same as a metal-insulator-metal (MIM) capacitor that can be fabricated with a standard CMOS process technology. Three transistors and an FiCC are added to a conventional 6-transistor SRAM for non-volatile operations with 42% area overheads. Assuming 5 minutes active time per hour, the proposed NVSRAM can reduce 61.8% of power consumption compared with a standard SRAM. The fabricated NVSRAM can operate correctly as an SRAM at 100 MHz and perform nonvolatile store and restore operations by using the FiCC.","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131930770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eishi Arima, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, M. Sato
{"title":"Power/Performance/Area Evaluations for Next-Generation HPC Processors using the A64FX Chip","authors":"Eishi Arima, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, M. Sato","doi":"10.1109/COOLCHIPS52128.2021.9410320","DOIUrl":"https://doi.org/10.1109/COOLCHIPS52128.2021.9410320","url":null,"abstract":"Future HPC systems, including post-exascale supercomputers, will face severe problems such as the slowing-down of Moore's law and the limitation of power supply. To achieve desired system performance improvement while counteracting these issues, the hardware design optimization is a key factor. In this paper, we investigate the future directions of SIMD-based processor architectures by using the A64FX chip and a customized version of power/performance/area simulators, i.e., Gem5 and McPAT. More specifically, based on the A64FX chip, we firstly customize various energy parameters in the simulators, and then evaluate the power and area reductions by scaling the technology node down to 3nm. Moreover, we investigate also the achievable FLOPS improvement at 3nm by scaling the number of cores, SIMD width, and FP pipeline width under power/area constraints. The evaluation result indicates that no further SIMD/pipeline width scaling will help with improving FLOPS due to the memory system bottleneck, especially on L1 data caches and FP register files. Based on the observation, we discuss the future directions of SIMD-based HPC processors.","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131673275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Tsukada, Hikaru Takayashiki, Masayuki Sato, K. Komatsu, Hiroaki Kobayashi
{"title":"A Metadata Prefetching Mechanism for Hybrid Memory Architectures","authors":"S. Tsukada, Hikaru Takayashiki, Masayuki Sato, K. Komatsu, Hiroaki Kobayashi","doi":"10.1109/COOLCHIPS52128.2021.9410321","DOIUrl":"https://doi.org/10.1109/COOLCHIPS52128.2021.9410321","url":null,"abstract":"A hybrid memory, which is the main memory consisting of two distinct memory devices, is expected to achieve a good balance between high performance and large capacity. However, unlike a traditional memory, the hybrid memory needs the metadata for data management and requires additional access latency for their references. To hide the latency, this paper proposes a metadata prefetching mechanism considering the address differences to control the prefetching. The evaluation results show that the proposed mechanism increases the metadata hit rate in two-thirds of the examined benchmarks and improves IPC by up to 34% and 6% on average.","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116333587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, H. Yoo
{"title":"An Energy-Efficient Deep Neural Network Training Processor with Bit-Slice-Level Reconfigurability and Sparsity Exploitation","authors":"Donghyeon Han, Dongseok Im, Gwangtae Park, Youngwoo Kim, Seokchan Song, Juhyoung Lee, H. Yoo","doi":"10.1109/COOLCHIPS52128.2021.9410324","DOIUrl":"https://doi.org/10.1109/COOLCHIPS52128.2021.9410324","url":null,"abstract":"This paper presents an energy-efficient deep neural network (DNN) training processor through the four key features: 1) Layer-wise Adaptive bit-Precision Scaling (LAPS) with 2) In-Out Slice Skipping (IOSS) core, 3) double-buffered Reconfigurable Accumulation Network (RAN), 4) momentum-ADAM unified OPTimizer Core (OPTC). Thanks to the bit-slice-level scalability and zero-slice skipping, it shows 5.9 x higher energy-efficiency compared with the state-of-the-art on-chip-learning processor (OCLPs).","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134034251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In Search of the Performance- and Energy-Efficient CNN Accelerators","authors":"S. Sedukhin, Yoichi Tomioka, Kohei Yamamoto","doi":"10.1109/COOLCHIPS52128.2021.9410350","DOIUrl":"https://doi.org/10.1109/COOLCHIPS52128.2021.9410350","url":null,"abstract":"In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it was shown that proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO and SSD.","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114204092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High Performance Multicore SHA-256 Accelerator using Fully Parallel Computation and Local Memory","authors":"Van Dai Phan, H. Pham, T. Tran, Y. Nakashima","doi":"10.1109/COOLCHIPS52128.2021.9410349","DOIUrl":"https://doi.org/10.1109/COOLCHIPS52128.2021.9410349","url":null,"abstract":"Integrity checking is indispensable in the current technological age. One of the most popular algorithms for integrity checking is SHA-256. To achieve high performance, many applications generally design SHA-256 in hardware. However, the processing rate of SHA-256 is often low due to a large number of computations. Besides, data must be repeated in many loops to generate a hash, which requires transferring data multiple times between accelerator and off-chip memory if not using local memory. In this paper, an ALU combining fully parallel computation and pipeline layers is proposed to increase the SHA-256 processing rate. Moreover, the local memory is attached near ALU for reducing off-chip memory access during the iterations of computing. In the high hash rate, we design a SoC-based multicore SHA-256 accelerator. As a result, our proposed accelerator enhances throughput by more than 40% and be 2x higher hardware efficiency compared with the state-of-the-art design.","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130145440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Training Low-Latency Spiking Neural Network through Knowledge Distillation","authors":"Sugahara Takuya, Renyuan Zhang, Y. Nakashima","doi":"10.1109/COOLCHIPS52128.2021.9410323","DOIUrl":"https://doi.org/10.1109/COOLCHIPS52128.2021.9410323","url":null,"abstract":"Spiking neural networks (SNNs) that enable greater computational efficiency on neuromorphic hardware have attracted attention. Existing ANN-SNN conversion methods can effectively convert the weights to SNNs from a pre-trained ANN model. However, the state-of-the-art ANN-SNN conversion methods suffer from accuracy loss and high inference latency due to ineffective conversion methods. To solve this problem, we train low-latency SNN through knowledge distillation with Kullback-Leibler divergence (KL divergence). We achieve superior accuracy on CIFAR-100, 74.42% for VGG16 architecture with 5 timesteps. To our best knowledge, our work performs the fastest inference without accuracy loss compared to other state-of-the-art SNN models.","PeriodicalId":103337,"journal":{"name":"2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130469227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}