{"title":"Work-in-Progress: Scheduler for Collaborated FPGA-GPU-CPU Based on Intermediate Language","authors":"Na Hu, Chao Wang, Xuehai Zhou, Xi Li","doi":"10.1109/CODES-ISSS55005.2022.00008","DOIUrl":"https://doi.org/10.1109/CODES-ISSS55005.2022.00008","url":null,"abstract":"FPGA-GPU-CPU collaboration compromise high performance and low cost in modern computing systems. However, the large mapping space between modules and heterogeneous processors brings complexity to the scheduling algorithm. This paper proposes a uniform-pipeline-based real-time oriented scheduling algorithm and a servant execution-flow model (SEFM) optimized for this scheduler. SEFM at runtime generates the target code from the intermediate language (IL) and scheduler-controlled parameters. The algorithms such as contrast stretching, etc., are accelerated by 1.4-2.7×, 1.9-3.8×, 2.7-10.5× respectively on CPU, GPU, and FPGA over OpenCV baseline. A case study of 3D waveform oscilloscope using scheduling solution on collaborated processors achieves 1.5× resource utilization than the pure FPGA.","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128279954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Amrouch, M. Imani, Xun Jiao, Y. Aloimonos, Cornelia Fermuller, Dehao Yuan, Dongning Ma, H. E. Barkam, P. Genssler, Peter Sutor
{"title":"Brain-Inspired Hyperdimensional Computing for Ultra-Efficient Edge AI","authors":"H. Amrouch, M. Imani, Xun Jiao, Y. Aloimonos, Cornelia Fermuller, Dehao Yuan, Dongning Ma, H. E. Barkam, P. Genssler, Peter Sutor","doi":"10.1109/CODES-ISSS55005.2022.00017","DOIUrl":"https://doi.org/10.1109/CODES-ISSS55005.2022.00017","url":null,"abstract":"Hyperdimensional Computing (HDC) is rapidly emerging as an attractive alternative to traditional deep learning algorithms. Despite the profound success of Deep Neural Networks (DNNs) in many domains, the amount of computational power and storage that they demand during training makes deploying them in edge devices very challenging if not infeasible. This, in turn, inevitably necessitates streaming the data from the edge to the cloud which raises serious concerns when it comes to availability, scalability, security, and privacy. Further, the nature of data that edge devices often receive from sensors is inherently noisy. However, DNN algorithms are very sensitive to noise, which makes accomplishing the required learning tasks with high accuracy immensely difficult. In this paper, we aim at providing a comprehensive overview of the latest advances in HDC. HDC aims at realizing real-time performance and robustness through using strategies that more closely model the human brain. HDC is, in fact, motivated by the observation that the human brain operates on high-dimensional data representations. In HDC, objects are thereby encoded with high-dimensional vectors which have thousands of elements. In this paper, we will discuss the promising robustness of HDC algorithms against noise along with the ability to learn from little data. Further, we will present the outstanding synergy between HDC and beyond von Neumann architectures and how HDC opens doors for efficient learning at the edge due to the ultra-lightweight implementation that it needs, contrary to traditional DNNs.","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129741770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: High-Performance Systolic Hardware Accelerator for RBLWE-based Post-Quantum Cryptography","authors":"Tianyou Bao, J. Imaña, Pengzhou He, Jiafeng Xie","doi":"10.1109/CODES-ISSS55005.2022.00009","DOIUrl":"https://doi.org/10.1109/CODES-ISSS55005.2022.00009","url":null,"abstract":"Ring-Binary-Learning-with-Errors (RBLWE)-based post-quantum cryptography (PQC) is a promising scheme suitable for lightweight applications. This paper presents an efficient hardware systolic accelerator for RBLWE-based PQC, targeting high-performance applications. We have briefly given the algorithmic background for the proposed design. Then, we have transferred the proposed algorithmic operation into a new systolic accelerator. Lastly, field-programmable gate array (FPGA) implementation results have confirmed the efficiency of the proposed accelerator.","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130060286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianan Yuan, Hua Liu, Shangyu Wu, Yi-Chien Lin, Tiantian Wang, Chenlin Ma, Rui Mao, Yi Wang
{"title":"Work-in-Progress: Lark: A Learned Secondary Index Toward LSM-tree for Resource-Constrained Embedded Storage Systems","authors":"Jianan Yuan, Hua Liu, Shangyu Wu, Yi-Chien Lin, Tiantian Wang, Chenlin Ma, Rui Mao, Yi Wang","doi":"10.1109/CODES-ISSS55005.2022.00012","DOIUrl":"https://doi.org/10.1109/CODES-ISSS55005.2022.00012","url":null,"abstract":"LSM-tree-based key-value stores are popular in embedded storage systems. With the growing demands of data analysis, the secondary index is created to support non-primary-key lookups. However, the lookup efficiency and space consumption of secondary index remain for further optimization. Inspired by the learned index, this paper presents Lark, a learned secondary index toward LSM-tree for resource-constrained embedded storage systems. Lark employs machine learning to speed up the non-primary-key queries and compress secondary indexes. Our preliminary evaluations show that, in comparison with traditional secondary index schemes, Lark achieves better lookup performance with less space consumption.","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115737516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: BloCirNN: An Efficient Software/hardware Codesign Approach for Neural Network Accelerators with Block-Circulant Matrix","authors":"Yu Qin, Lei Gong, Zhendong Zheng, Chao Wang","doi":"10.1109/CODES-ISSS55005.2022.00010","DOIUrl":"https://doi.org/10.1109/CODES-ISSS55005.2022.00010","url":null,"abstract":"Nowadays, the scale of deep neural networks is getting larger and larger. These large-scale deep neural networks are both compute and memory intensive. To overcome these problems, we use block-circulant weight matrices and Fast Fourier Transform (FFT) to compress model and optimize computation. Compared to weight pruning, this method does not suffer from irregular networks. The main contributions of this paper include the implementation of a convolution module and a fully-connected module with High-Level Synthesis (HLS), deployment and performance test on FPGA platform. We use AlexNet as a case study, which demonstrates our design is more efficient than the FPGA2016.","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126485637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yueting Li, B. Zhao, Xinyi Xu, Yundong Zhang, Jun Wang, Weisheng Zhao
{"title":"Work-in-Progress: Toward Energy-efficient Near STT-MRAM Processing Architecture for Neural Networks","authors":"Yueting Li, B. Zhao, Xinyi Xu, Yundong Zhang, Jun Wang, Weisheng Zhao","doi":"10.1109/CODES-ISSS55005.2022.00013","DOIUrl":"https://doi.org/10.1109/CODES-ISSS55005.2022.00013","url":null,"abstract":"The size of parameters in artificial neural network (NN) applications grows quickly from a handful to the GB-level. The data transmission poses a key challenge for NN, and either neuron is removed or data compression reduces pressure on memory access but cannot successfully decrease data traffic. Therefore, we propose the near spin-transfer-torque magnetic random processing architecture for developing energy-efficient NNs. Our approach provides system architects with a preliminary scheme to obtain real-time transmission that near memory controller directly compresses non-zero elements, and encodes the corresponding index depending on the kernel size. Furthermore, it adjusts the number of multiplication accumulators and avoids unnecessary hardware overheads during computation. The preliminary experimental results demonstrated this design verified with weights that currently achieve up to 3.05x speedup and 29.6% power compared with the unoptimized one.","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122597916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: HeteroRW: A Generalized and Efficient Framework for Random Walks in Graph Analysis","authors":"Yingxue Gao, Lei Gong, Chao Wang, Xuehai Zhou","doi":"10.1109/CODES-ISSS55005.2022.00011","DOIUrl":"https://doi.org/10.1109/CODES-ISSS55005.2022.00011","url":null,"abstract":"Random walk (RW) is a common graph analysis algorithm that consists of two phases: construction and sampling. The construction phase is responsible for generating the sampling table. The sampling phase contains many walkers which wander through the whole graph to sample. However, RW is notorious for its dynamic and sparse memory access pattern, which makes existing research suffer low throughput and memory bottleneck. In addition, the variety of RW algorithms in different scenarios also brings new design challenges.This paper proposes HeteroRW, a generalized framework to accelerate RWs on FPGAs. HeteroRW first identifies the two phases’ computation characteristics and presents corresponding hardware acceleration designs, respectively. Then, HeteroRW achieves the template-based design to support a variety of RW algorithms. Finally, HeteroRW integrates a novel scheduling layer to partition the input data and perform design space exploration (DSE). Experimental results show that HeteroRW achieves 4.3x speedup over the recent FPGA implementation while effectively simplifying the accelerator customization process.","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"16 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129924021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Industry-track: Towards Agile Design of Neural Processing Unit","authors":"Binyi Wu, W. Furtner, Bernd Waschneck, C. Mayr","doi":"10.1109/CODES-ISSS55005.2022.00015","DOIUrl":"https://doi.org/10.1109/CODES-ISSS55005.2022.00015","url":null,"abstract":"More and more specialized processors, known as Neural Processing Units (NPUs), have been or are being built for deep neural network inference. Design and optimization of this kind of processor are inseparable from the deep learning ecosystem and corresponding underlying software. This HW/SW co-design requirement poses challenges for designers. Therefore, in this work, we experiment with an agile development method to shorten the development cycles of NPUs. We utilize Chisel for hardware design and develop a custom Chisel backend for generating cycle-accurate simulators with C++/Python APIs. On top of the simulator, we built a Python software stack for software development, performance evaluation, and simulation-based verification. The proposed method is purely software and does not involve real hardware, thus allowing the integration of software agile development methods into digital designs. In the experiments, we show how it helps us identify inherent hardware limitations and how it shortens our development cycles.","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128462087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Bloem, Alberto Larrauri, Roland Lengfeldner, Cristinel Mateis, D. Ničković, Bjorn Ziegler
{"title":"Industry Paper: Surrogate Models for Testing Analog Designs under Limited Budget – a Bandgap Case Study","authors":"R. Bloem, Alberto Larrauri, Roland Lengfeldner, Cristinel Mateis, D. Ničković, Bjorn Ziegler","doi":"10.1109/CODES-ISSS55005.2022.00016","DOIUrl":"https://doi.org/10.1109/CODES-ISSS55005.2022.00016","url":null,"abstract":"Testing analog integrated circuit (IC) designs is notoriously hard. Simulating tens of milliseconds from an accurate transistor level model of a complex analog design can take up to two weeks of computation. Therefore, the number of tests that can be executed during the late development stage of an analog IC can be very limited. We leverage the recent advancements in machine learning (ML) and propose two techniques, artificial neural networks (ANN) and Gaussian processes, to learn a surrogate model from an existing test suite. We then explore the surrogate model with Bayesian optimization to guide the generation of additional tests. We use an industrial bandgap case study to evaluate the two approaches and demonstrate the virtue of Bayesian optimization in efficiently generating complementary tests with constrained effort.","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"591 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122935936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CODES+ISSS 2022 Program Committee","authors":"","doi":"10.1109/codes-isss55005.2022.00006","DOIUrl":"https://doi.org/10.1109/codes-isss55005.2022.00006","url":null,"abstract":"","PeriodicalId":129167,"journal":{"name":"2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116304345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}