2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献_第3页

Network Intrusion Detection System Using Deep Learning Method with KDD Cup'99 Dataset 基于KDD Cup'99数据集的深度学习网络入侵检测系统

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00047

Jesse Jeremiah Tanimu, Mohamed Hamada, Patience Robert, Anish Mahendran

引用次数: 1

Accelerating Non-Negative Matrix Factorization on Embedded FPGA with Hybrid Logarithmic Dot-Product Approximation 基于混合对数点积逼近的嵌入式FPGA加速非负矩阵分解

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00070

Yizhi Chen, Yarib Nevarez, Zhonghai Lu, A. García-Ortiz

{"title":"Accelerating Non-Negative Matrix Factorization on Embedded FPGA with Hybrid Logarithmic Dot-Product Approximation","authors":"Yizhi Chen, Yarib Nevarez, Zhonghai Lu, A. García-Ortiz","doi":"10.1109/MCSoC57363.2022.00070","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00070","url":null,"abstract":"Non-negative matrix factorization (NMF) is an ef-fective method for dimensionality reduction and sparse decom-position. This method has been of great interest to the scien-tific community in applications including signal processing, data mining, compression, and pattern recognition. However, NMF implies elevated computational costs in terms of performance and energy consumption, which is inadequate for embedded applications. To overcome this limitation, we implement the vector dot-product with hybrid logarithmic approximation as a hardware optimization approach. This technique accelerates floating-point computation, reduces energy consumption, and preserves accuracy. To demonstrate our approach, we employ a design exploration flow using high-level synthesis on an embedded FPGA. Compared with software solutions on ARM CPU, this hardware implementation accelerates the overall computation to decompose matrix by $5.597times$ and reduces energy consumption by $69.323times$. Log approximation NMF combined with KNN(k-nearest neighbors) has only 2.38% decreasing accuracy compared with the result of KNN processing the matrix after floating-point NMF on MNIST. Further on, compared with a dedicated floating-point accelerator, the logarithmic approximation approach achieves $3.718times$ acceleration and $8.345times$ energy reduction. Compared with the fixed-point approach, our approach has an accuracy degradation of 1.93% on MNIST and an accuracy amelioration of 28.2% on the FASHION MNIST data set without pre-knowledge of the data range. Thus, our approach has better compatibility with the input data range.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116058713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploration of an Enhanced Scheduling Approach with Feasibility Analysis on a Single CPU System 单CPU系统上一种增强调度方法的探索及可行性分析

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00037

Vijayalakshmi Saravanan, Gang Wan, A. Pillai

{"title":"Exploration of an Enhanced Scheduling Approach with Feasibility Analysis on a Single CPU System","authors":"Vijayalakshmi Saravanan, Gang Wan, A. Pillai","doi":"10.1109/MCSoC57363.2022.00037","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00037","url":null,"abstract":"Developing a new scheduling algorithm and conducting the performance analysis to recognize its effect in practice can be a laborious task. CPU scheduling is crucial in achieving the operating system's (OS) design goals. There exists a variety of scheduling algorithms in the field and in this paper, a performance comparison of different existing scheduling algorithms by simulating the same bundle of tasks is carried out. A variety of algorithms under batch OS and time-sharing OS are considered. Upon the analysis, a novel task scheduling algorithm incorporating the merits of existing algorithms is proposed for a single CPU system. The performance of various algorithms is compared with the proposed algorithm for parameters viz., throughput, CPU utilization, average turnaround time, waiting time, and response time. Extensive simulation analysis for the various bundle of tasks is conducted and the proposed algorithm is found to outperform the other algorithms in terms of guaranteed reduced average response time. Thus, an efficient CPU scheduler is proposed to accommodate varying workloads at run-time making the best use of the CPU in a particular execution scenario.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130533709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Digital Computation-in-Memory Design with Adaptive Floating Point for Deep Neural Networks 基于自适应浮点数的深度神经网络数字内存计算设计

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00042

Yunhan Yang, Wei Lu, Po-Tsang Huang, Hung-Ming Chen

{"title":"Digital Computation-in-Memory Design with Adaptive Floating Point for Deep Neural Networks","authors":"Yunhan Yang, Wei Lu, Po-Tsang Huang, Hung-Ming Chen","doi":"10.1109/MCSoC57363.2022.00042","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00042","url":null,"abstract":"All-digital deep neural network (DNN) accelerators or processors suffer from the Von-Neumann bottleneck, because of the massive data movement required in DNNs. Computation-in-memory (CIM) can reduce the data movement by performing the computations in the memory to save the above problem. However, the analog CIM is susceptible to PVT variations and limited by the analog-digital/digital-analog conversions (ADC/DAC). Most of the current digital CIM techniques adopt integer operation and the bit-serial method, which limits the throughput to the total number of bits. Moreover, they use the adder tree for accumulation, which causes severe area overhead. In this paper, a folded architecture based on time-division multiplexing is proposed to reduce the area and improve the energy efficiency without reducing the throughput. We quantize and ternarize the adaptive floating point (ADP) format with low bits, which can achieve the same or better accuracy than integer quantization, to improve the energy cost of calculation and data movement. This proposed technique can improve the overall throughput and energy efficiency up to 3.83x and 2.19x, respectively, compared to other state-of-the-art digital CIMs with integer.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129363643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and FPGA Implementation of Lite Convolutional Neural Network Based Hardware Accelerator for Ocular Biometrics Recognition Technology 基于Lite卷积神经网络的眼部生物特征识别硬件加速器的设计与FPGA实现

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00051

Wei-Che Sun, Chih-Peng Fan, Chung-Bin Wu

引用次数: 0

Design and Analysis of a Nano-photonic Processing Unit for Low-Latency Recurrent Neural Network Applications 用于低延迟递归神经网络的纳米光子处理单元的设计与分析

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00058

Eito Sato, Koji Inoue, Satoshi Kawakami

{"title":"Design and Analysis of a Nano-photonic Processing Unit for Low-Latency Recurrent Neural Network Applications","authors":"Eito Sato, Koji Inoue, Satoshi Kawakami","doi":"10.1109/MCSoC57363.2022.00058","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00058","url":null,"abstract":"Recurrent neural networks (RNNs) have achieved high performance in inference processing that handles time-series data. Among them, hardware acceleration for fast processing RNNs is helpful for tasks where real-time performance is es-sential, such as speech recognition and stock market prediction. The nano-photonic neural network accelerator is an approach that takes advantage of the high speed, high parallelism, and low power consumption of light to achieve high performance in neural network processing. However, existing methods are inefficient for RNNs due to significant overhead caused by the absence of recursive paths and the immaturity of the model to be designed. Therefore, architectural considerations that take advantage of RNN characteristics are essential for low latency. This paper proposes a fast and low-power processing unit for RNNs that introduces activation functions and recursion processing using optical devices. We clarified the impact of noise on the proposed circuit's calculation accuracy and inference accuracy. As a result, the calculation accuracy deteriorated significantly in proportion to the increase in the number of recursions, but the effect on inference accuracy was negligible. We also compared the performance of the proposed circuit to an all-electric design and a hybrid design that processes the vector-matrix product optically and the recursion electrically. As a result, the performance of the proposed circuit improves latency by 467x, reduces power consumption by 93.0% compared with the all-electrical design, improves latency by 7.3x, and reduces power consumption by 58.6% compared with the hybrid design.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124528212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Reconfigurable Design of Flexible-arbitrated Crossbar Interconnects in Multi-core SoC system 多核SoC系统中柔性仲裁交叉互连的可重构设计

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00064

Xuewen He, Yajie Wu, Yichuan Bai, Jie Liu, Li Du, Yuan Du

引用次数: 0

Driver Status Monitoring System with Feedback from Fatigue Detection and Lane Line Detection 基于疲劳检测和车道线检测反馈的驾驶员状态监测系统

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00035

Kai Yan, Chaoyue Zhao, Chengkang Shen, Peiyan Wang, Guoqing Wang

引用次数: 0

A Message Passing Interface Library for High-Level Synthesis on Multi-FPGA Systems 面向多fpga系统高级综合的消息传递接口库

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00017

Kazuei Hironaka, Kensuke Iizuka, H. Amano

{"title":"A Message Passing Interface Library for High-Level Synthesis on Multi-FPGA Systems","authors":"Kazuei Hironaka, Kensuke Iizuka, H. Amano","doi":"10.1109/MCSoC57363.2022.00017","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00017","url":null,"abstract":"One obstacle to application development on multi-FPGA systems with high-level synthesis (HLS) is a lack of support for a programming interface. Implementing and debugging an application on multiple FPGA boards is difficult without a standard interface. Message Passing Interface (MPI) is a standard parallel programming interface commonly used in distributed memory systems. This paper presents a tool-independent MPI library called FiC-MPI that can be used in HLS for multi-FPGA systems in which each FPGA node is connected directly. By using FiC-MPI, various parallel software, including a general-purpose benchmark, can be easily implemented. FiC-MPI was implemented and evaluated on the M-KUBOS cluster consisting of Zynq MPSoC boards connected with a static time-division multiplexing network. By using the FiC-MPI simulator, parallel programs can be debugged before implementing on real machines. As a case study, the Himeno-BMT benchmark was implemented with FiC-MPI. It achieved 178.7 MFLOPS with a single node and scaled to 643.7 MFLOPS with four nodes, and 896.9 MFLOPS with six nodes of the M-KUBOS cluster. Through the implementation, the easiness of developing parallel programs with FiC-MPI on multi-FPGA systems was demonstrated.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117263539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating the Optimal Self-Checking Carry Propagate Adder for Cryptographic Processor 加密处理器中最优进位传播加法器的评估

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI: 10.1109/MCSoC57363.2022.00011

M.A. Akbar, Bo Wang, A. Bermak

{"title":"Evaluating the Optimal Self-Checking Carry Propagate Adder for Cryptographic Processor","authors":"M.A. Akbar, Bo Wang, A. Bermak","doi":"10.1109/MCSoC57363.2022.00011","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00011","url":null,"abstract":"With the increasing number of invasive attacks, cryptographic processors are becoming more susceptible to failure. Therefore, the desire for reliable hardware is becoming increasingly important. Since an adder is a vital component in the hardware design of cryptographic protocols, a reliable adder can significantly improve the vulnerability against invasive attacks. Adders with different architectures have already been widely studied and analyzed and appropriate types have been proposed based on the application. This paper considers the design of adder most suitable for reliable cryptographic operation and investigates the optimal self-checking carry propagate adder design offering the best possible performance in terms of latency, delay, and area. In terms of area versus delay, the self-checking parallel ripple carry adder (PRCA) with 23.4% area overhead as compared to the self-checking ripple carry adder (RCA) provides a delay efficiency of 70.31%. However, the area-delay product for 64-bit self-checking designs showed that the hybrid adder is 71.2%, 21.4%, and 37.9% more efficient than the RCA, PRCA and carry look-ahead adder design, respectively.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128832262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0