{"title":"A High-performance Deployment Framework for Pipelined CNN Accelerators with Flexible DSE Strategy","authors":"Conghui Luo, Wen-Liang Huang, Dehao Xiang, Yihua Huang","doi":"10.1109/HPEC55821.2022.9926377","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926377","url":null,"abstract":"The pipelined DCNN(Deep Convolutional Neural Networks) accelerator can effectively take advantage of the inter-layer parallelism, so it is widely used, e.g., video stream processing. But the large amount of intermediate results generated in the pipelined accelerator imposes a considerable burden on the on-chip storage resources on FPGAs. To ease the overburden storage demand, a storage-optimized design space exploration (DSE) method is proposed at the cost of a slight drop of computing resource utilization ratio. The experimental results show that the DSE strategy can achieve 98.49% and 98.00% CE (Computation Engines) utilization ratio on VGG16 and ResNet101, respectively. In addition, the resource optimization strategy can save 27.84% of BRAM resources on VGG 16, while the CE utilization ratio dropped by only 3.04%. An automated deployment framework that is adaptable to different networks with high computing resource utilization ratio is also proposed in this paper, which can achieve workload balancing automatically by optimizing the computing resource allocation of each layer.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125221490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Adaptation of Spiking Networks in a Gradual Changing Environment","authors":"Zaidao Mei, Mark D. Barnell, Qinru Qiu","doi":"10.1109/HPEC55821.2022.9926367","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926367","url":null,"abstract":"Spiking neural networks(SNNs) have drawn broad research interests in recent years due to their high energy efficiency and biologically-plausibility. They have proven to be competitive in many machine learning tasks. Similar to all Artificial Neural Network(ANNs) machine learning models, the SNNs rely on the assumption that the training and testing data are drawn from the same distribution. As the environment changes gradually, the input distribution will shift over time, and the performance of SNNs turns out to be brittle. To this end, we propose a unified framework that can adapt non-stationary streaming data by exploiting unlabeled intermediate domain, and fits with the in-hardware SNN learning algorithm Error-modulated STDP. Specifically, we propose a unique self-training framework to generate pseudo labels to retrain the model for intermediate and target domains. In addition, we develop an online-normalization method with an auxiliary neuron to normalize the output of the hidden layers. By combining the normalization with self-training, our approach gains average classification improvements over 10% on MNIST, NMINST, and two other datasets.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127184611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible Hardware Accelerator Design Generation with Spiral","authors":"Guanglin Xu, J. Hoe, F. Franchetti","doi":"10.1109/HPEC55821.2022.9926413","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926413","url":null,"abstract":"Hardware specialization has become a widely employed technique for approaching higher performance and en-ergy efficiency in computer systems. Yet obtaining efficient cus-tom hardware designs remains a challenging and tedious task, calling for the automated approaches. In the past, Spiral has been used for generating high-throughput streaming hardware designs for linear transform kernels. This paper is motivated by an observation that a memory-based iterative computing model may allow us to trade off throughput for algorithmic flexibility. In this paper, we present a hardware generation approach that generates and optimizes algorithms using Spiral's multi-level domain-specific languages (DSLs), targeting a scalar load-store architecture. We have incorporated this approach as a hardware backend into the Spiral system. Our evaluation of this approach on several fundamental kernels shows flexibility with reasonable performance and resource utilization.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"43 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120894069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware Design and Implementation of Classic McEliece Post-Quantum Cryptosystem Based on FPGA","authors":"Shaofen Chen, Hai-Tao Lin, Wen-Liang Huang, Yihua Huang","doi":"10.1109/HPEC55821.2022.9926295","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926295","url":null,"abstract":"With the development of information age, the security of data transmission has attracted more attention. In addition, quantum computers pose a great threat to widely used cryptography algorithms. Therefore, Classic McEliece al-gorithm is a post-uantum algorithm, which has high security and stands firm in all kinds of attacks for decades. The wide application of the cryptosystem is inseparable from its hard-ware implementation scheme. So this paper proposes a Classic McEliece implementation scheme based on FPGA platform. To achieve the balance between resources and speed, a variety of methods to implement the scheme are adopted. First, using the characteristics of random access in the RAM, the clock cycle consumption of the error vector generating module is reduced by 95.1%. Second, multiple computing units are employed inside the module for parallel computing and which reduces the number of computing cycles by about 22.4%. Finally, this thesis proposes a multiplexing syndrome decoding module, and compared to the non-multiplexing scheme, the LUT resource consumption of this thesis is reduced by about 24.2%, and the FF resource consumption of this thesis is reduced by about 15.4%.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114798524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Time Software Architecture for EM-Based Radar Signal Processing and Tracking","authors":"Alan Nussbaum, B. Keel, W. Blair, U. Ramachandran","doi":"10.1109/HPEC55821.2022.9926338","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926338","url":null,"abstract":"While a radar tracks the kinematic state (position, velocity, and acceleration) of the target, an optimal signal processing requires knowledge of the target's range rate and radial acceleration that are derived from the tracking function in real time. High precision tracks are achieved through precise range and angle measurements whose precision are determined by the signal-to-noise ratio (SNR) of the received signal. The SNR is maximized by minimizing the matched filter loss due to uncertainties in the radial velocity and acceleration of the target. In this paper, the Expectation-Maximization (EM) algorithm is proposed as an iterative signal processing scheme for maximizing the SNR by executing enhanced range walk compensation i.e., correction for errors in the radial velocity and acceleration) in the real-time control loop software architecture. Maintaining a stringent timeline and adhering to latency requirements are essential for real-time sensor signal processing. This research aims to examine existing methods and explore new approaches and technologies to mitigate the harmful effects of range walk in tracking radar systems with an EM-Based iterative algorithm and implement the new control loop steering methods in a real-time computing environment.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114840545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling Novel In-Memory Computation Algorithms to Address Next-Generation Throughput Constraints on SWaP- Limited Platforms","authors":"Jessica Ray, C. Meiners","doi":"10.1109/HPEC55821.2022.9926297","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926297","url":null,"abstract":"The Department of Defense relies heavily on filtering and selection applications to help manage the overwhelming amount of data constantly received at the tactical edge. Filtering and selection are both latency and throughput constrained, and systems at the tactical edge must heavily optimize their SWaP (size, weight, and power) usage, which can reduce overall compu-tation and memory performance. In-memory computation (IMC) provides a promising solution to the latency and throughput issues, as it helps enable the efficient processing of data as it is received, helping eliminate the memory bottleneck imposed by traditional Von Neumann architectures. In this paper, we discuss a specific type of IMC accelerator known as a Content Addressable Memory (CAM), which effectively operates as a hardware-based associative array, allowing fast lookup and match operations. In particular, we consider ternary CAMs (TCAMs) and their use within string matching, which are an important component of many filtering and se-lection applications. Despite the benefits gained with TCAMs, designing applications that utilize them remains a difficult task. Straightforward questions, such as “how large should my TCAM be?” and “what is the expected throughput?” are difficult to answer due to the many factors that go into effectively mapping data into a TCAM. This work aims to help answer these types of questions with a new framework called Stardust-Chicken. Stardust-Chicken supports generating and simulating TCAMs, and implements state-of-the-art algorithms and data representations that can effectively map data into TCAMs. With Stardust-Chicken, users can explore the tradeoff space that comes with TCAMs and better understand how to utilize them in their applications.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133979966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proposed Empirical Assessment of Remote Workers' Cyberslacking and Computer Security Posture to Assess Organizational Cybersecurity Risks","authors":"Ariel Luna, Y. Levy, G. Simco, Wei Li","doi":"10.1109/HPEC55821.2022.9926394","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926394","url":null,"abstract":"Cyberslacking is conducted by employees who are using their companies' equipment and network for personal purposes instead of working during work hours. Cyberslacking has a significant adverse effect on overall employee productivity., however, recently, due to COVID19 move to remote working also pose a cybersecurity risk to organizations networks and infrastructure. In this work-in-progress research study, we are developing, validating, and will empirically test a taxonomy to assess an organization's remote workers” risk level of cybersecurity threats. This study includes a three-phased developmental approach in developing the Remote Worker Cyberslacking Security Risk Taxonomy. In collaboration with cybersecurity Subject Matter Experts (SMEs) use the taxonomy to assess organization's remote workers” risk level of cybersecurity threats by using actual system indicators of productivity measures to estimate their cyberslacking along with assessing via organizational information the computer security posture of the remote device being used to access corporate resources. Anticipated results from 125 anonymous employees from one organization will then be assessed on the cybersecurity risk taxonomy where recommendation to the organization's cybersecurity leadership will be provided.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133309912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning for accurate and fast bandgap prediction of solid-state materials","authors":"Shomik Verma, S. Kajale, Rafael Gómez-Bombarelli","doi":"10.1109/HPEC55821.2022.9926355","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926355","url":null,"abstract":"Semi-Iocal DFT tends to vastly underestimate the bandgap of materials. Here we propose a machine learning calibration workflow to improve the accuracy of cheap DFT calculations. We first compile a dataset of 25k materials with PBE and HSE calculations completed. Using this dataset, we benchmark various machine learning architectures and features to determine which results in the highest accuracy. The best technique is able to improve the accuracy of PBE 10-fold. We then expand the generalizability of the model by utilizing active learning to intelligently sample chemical space. Because HSE data is not available for these new materials, we develop an optimized high-throughput parallelized workflow to calculate HSE bandgaps of lOk additional materials. We therefore develop a cheap, accurate, and generalized ML model for bandgap prediction.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124680732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA Acceleration of Fully Homomorphic Encryption over the Torus","authors":"Tian Ye, R. Kannan, V. Prasanna","doi":"10.1109/HPEC55821.2022.9926381","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926381","url":null,"abstract":"Fully Homomorphic Encryption over the Torus (TFHE) is a promising approach for secure computing in cloud servers to perform computations directly on encrypted data. However, TFHE has much higher computation complexity than its unencrypted counterpart. In this work, we propose an FPGA accelerator for TFHE computations. We illustrate the effects of an optimization called bootstrapping key unrolling on the tradeoff between performance of bootstrapping and FPGA resource consumption. We customize the data layout of TFHE ciphertext to optimize data access and improve data reuse. We parameterize the FPGA design for TFHE bootstrapping, which can be configured to achieve high performance for different user-specified security requirements and given FPGA resources. We implement our design on a state-of-the-art FPGA and compare it with existing results on CPUs. Our implementation for TFHE bootstrapping achieves 216x improvement in throughput and 16.5x improvement in latency compared with the software baseline on a state-of-the-art CPU server.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128517903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}