Tanaka Kohsuke, Yuta Shintomi, Y. Okuyama, Taro Suzuki
{"title":"Design of Reward Functions for RL-based High-Speed Autonomous Driving","authors":"Tanaka Kohsuke, Yuta Shintomi, Y. Okuyama, Taro Suzuki","doi":"10.1109/MCSoC57363.2022.00015","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00015","url":null,"abstract":"We aim to design a reward function for autonomous driving by reinforcement learning for achieving high-speed driving while maintaining training stability for reaching the racetrack's goal. High-speed driving is aggressive, such as running on the road's edge as fast as possible at corners. Thus, creating reinforcement learning agents that drive at high speeds and can reach a goal is difficult in racing competition situations because of running off the road or collisions with other objects. In general, human drivers see the road ahead and make control decisions. Therefore, we design a reward function to consider the road ahead depending on the driving speed. Through experiments in a simulator, we compared our proposed reward function with others proposed in previous works in terms of driving speed and the training stability about reaching the goal. As a result of the experiment, our proposed reward function achieves an improvement of lap time by 0.71 seconds (3 %) with only a 4.4 % loss in stability in reaching a goal compared to the most stable reward function proposed in previous work.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127383373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Susheel Ujwal Siddamshetty, Srinivas Boppu, D. Ghosh
{"title":"Efficient Hardware Architecture for Posit Addition/Subtraction","authors":"Susheel Ujwal Siddamshetty, Srinivas Boppu, D. Ghosh","doi":"10.1109/MCSoC57363.2022.00068","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00068","url":null,"abstract":"This paper proposes an efficient architecture for the design of adder/subtractor for the recently developed universal posit number system. Posits are designed as a direct drop-in replacement for IEEE-754 standard floating-point numbers. They provide compelling advantages over floats, such as larger dynamic range, higher accuracy than the same bit width floats, bit-wise identical results across systems, no overflow or underflow, tapered accuracy, and simpler exception handling. The word size $(N)$ and exponent size $(ES)$ define a posit format. It includes a variable exponent, consisting of variable length regime-bits and exponent-bits with a maximum size of up to $ES$ bits. This also leads to a change in the size and position of the mantissa bits. These run-time variations in the length of the regime, exponent, and mantissa fields pose a challenge while designing arithmetic hardware units. Though a few adder/subtractors are proposed in the literature, they are not 100% accurate. However, the proposed design is efficient in performance metrics such as area, delay, and leakage power. Furthermore, our design is 100% accurate, on an average 15 % area, and 23 % leakage power efficient while having a similar critical path delay when compared to the recent designs proposed in the literature when synthesized using Cadence's 45 nm standard cell library.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130323948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Lightweight End-to-end Network for Wearing Mask Recognition on Low-resolution Images","authors":"Menglei Li, Hongbo Chen, Zixue Cheng","doi":"10.1109/MCSoC57363.2022.00016","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00016","url":null,"abstract":"In realistic scenarios, resolution is still one of the major problems in wearing mask recognition. Due to the large distances between surveillance cameras and human faces, facial images captured by low-power devices usually have low resolution and lead to poor recognition results. To address the above issue, we propose a lightweight end-to-end network to reconstruct Super-resolution (SR) images and achieve wearing mask recognition. Besides, to apply to challenging real applications, we combine hardware devices and software technology to simulate the recognition process of wearing masks in real scenarios. To demonstrate the effectiveness of the method, we comprehensively evaluate our proposed method by comparing it with state-of-the-art methods. The recognition accuracy using super-resolution is 98.44%, outperforming RepVGG-A2 (97.00%) and ResNet34 (93.75%). Moreover, experimental results show that the number of parameters and FLOPs in our recognition model is 9.34 million and 1.85 billion, respectively, both of which outperform traditional CNN methods (20 million+ parameters and 3 billion+ FLOPs). The performance of our recognition system is competitive with state-of-the-art methods in terms of low memory usage and computational complexity, showing that the system can be cost-effectively and widely applied in real-world environments and thus has potential applications in respiratory disease prevention.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133140041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shi Hui Chua, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey
{"title":"Systolic Array Based Convolutional Neural Network Inference on FPGA","authors":"Shi Hui Chua, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey","doi":"10.1109/MCSoC57363.2022.00029","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00029","url":null,"abstract":"Convolutional Neural Networks (CNNs) possess a particular edge over its predecessor, the Multi-Layer Perceptron (MLP). This is due to its weight sharing features that allows the CNN to use less parameters for the same number of outputs as compared to the MLP. Systolic arrays capitalize on the weight sharing property of CNNs to do data reuse while performing convolutional operations, in order to reduce the power consumption from the memory accesses. A kernel fitting systolic processing element array was designed with only positive multiplication to increase the throughput and power efficiency of the CNN accelerator, while using weight stationary dataflow to achieve data reuse in the systolic array. A cost-optimized lightweight solution is implemented through low-cost FPGA hardware so as to allow for greater accessibility. The CNN accelerator consumes 0.363 W power at 100 MHz operating frequency. A peak throughput of 10.98 GOps/s was achieved with peak performance density of 0.200 GOps/s/DSP and peak power efficiency of 30.26 GOps/s/W. Even with the added support for additional functions, proposed design achieved up to 1.59x better power efficiency compared to other systolic implementations and up to 6.17x better power efficiency compared to non-systolic implementations.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131252527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA-Based Prototype of a Quantum Annealing Simulator for Sparse Ising Model","authors":"H. M. Waidyasooriya, Yuta Ohma, M. Hariyama","doi":"10.1109/MCSoC57363.2022.00039","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00039","url":null,"abstract":"Quantum annealing (QA) is a probabilistic approx-imation method to find the global optimum of a combinatorial optimization problem. QA is done on quantum annealers such as D-wave using quantum properties. Since the number of qubits in quantum annealers is limited, it is difficult to use those to solve large-scale real-world problems. Therefore, quantum annealing simulation on digital computers is necessary. In this paper, we discuss an FPGA based quantum annealing simulator for sparse Ising model. Unlike a fully-connected Ising model, the number of connections among spins in sparse model is limited. Highly sparse Ising models require significantly low amount of computations while allowing more parallel operations. One the other hand, sparsity and the connections among spins are not the same for different Ising models, and it is difficult to propose one specific accelerator architecture for all. We propose a method to automatically generate an application specific accelerator archi-tecture for a given sparse Ising model. The proposed accelerator fully exploits the parallelism to increase the processing speed. We design an FPGA prototype of the proposed accelerator and confirmed the correct behavior. In future, we expect to extend the proposed method to execute large quantum annealing simulations using multiple FPGAs.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116122286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akihisa Kawabe, Ryuto Haga, Yoichi Tomioka, Y. Okuyama, Jungpil Shin
{"title":"Fake Image Detection Using An Ensemble of CNN Models Specialized For Individual Face Parts","authors":"Akihisa Kawabe, Ryuto Haga, Yoichi Tomioka, Y. Okuyama, Jungpil Shin","doi":"10.1109/MCSoC57363.2022.00021","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00021","url":null,"abstract":"With the rapid increase of deep learning technology, creating human face images with artificial intelligence (AI) is becoming easier. Those generated images are coming up to images that humans cannot distinguish from authentic ones. It is essential to realize an accurate method to detect such fake images to avoid abusing them. In this paper, we propose a fake image detection using an ensemble model of convolutional neural network (CNN) models that focus on deepfake detection of individual face parts. Our results show that a combination of deepfake detection based on different face parts is effective. This idea can be adopted on partially manipulated deepfake images/videos.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122603764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware Implementation of an Automatic Color Equalization Algorithm for Real-time Image Enhancement","authors":"Xiang-Yu Chen, Yu-Hsiang Wang, Yao-Song Zhang, Yen-Jui Chen, Shiann-Rong Kuang","doi":"10.1109/MCSoC57363.2022.00036","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00036","url":null,"abstract":"Automatic color equalization (ACE) algorithm is an effective method for color image enhancement, but its computational complexity is extremely high. In this paper, we first modify the ACE algorithm to reduce the computational complexity and realization cost while maintaining good visual quality. Subsequently, an efficient VLSI architecture for the hardware-friendly ACE algorithm is proposed to meet the requirement of real-time image enhancement. FPGA (Field Programmable Gate Arrays) implementation result shows that the proposed architecture can operate at 120MHz and achieve a throughput of 60 frame/s for 256×256 resolution images using about 1.15k and 1.78k of FPGA's logic (LUT) and register resources, respectively. Compared with the existing design, the proposed architecture can achieve higher performance with fewer hardware resources and comparable visual quality.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131179326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of Different Microarchitectures for Energy-Efficient RISC-V Cores","authors":"J. Kadomoto, H. Irie, S. Sakai","doi":"10.1109/MCSoC57363.2022.00022","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00022","url":null,"abstract":"The increase in Internet of Things $(text{IoT})$ applications has triggered the development of energy-efficient embedded SoCs that can utilize limited energy sources. Relatively simple general-purpose processor cores are a vital component of SoCs, and op-timizing the power consumption, performance, and area is a key issue in the design of $text{SoCs}$. Therefore, this study quantitatively compared the power, performance, and area of several 32-bit RISC-V cores with different microarchitectures. The simulation evaluations were performed for each processor with different pipeline configurations, with and without a multiplier and divider. The benchmark execution performance of the processors in a register transfer level (RTL) design, as well as the estimated power consumption and area based on logic synthesis and place-and-route using various CMOS process technologies are presented. Based on the results, we provided a brief guideline for the selection of microarchitectures for energy-efficient embedded SoCs.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127046995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Albert Budi Christian, Yu-Hsuan Wu, Chih-Yu Lin, Lan-Da Van, Y. Tseng
{"title":"Radar and Camera Fusion for Object Forecasting in Driving Scenarios","authors":"Albert Budi Christian, Yu-Hsuan Wu, Chih-Yu Lin, Lan-Da Van, Y. Tseng","doi":"10.1109/MCSoC57363.2022.00026","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00026","url":null,"abstract":"In this paper, we propose a sensor fusion architecture that combines data collected by the camera and radars and utilizes radar velocity for road users' trajectory prediction in real-world driving scenarios. This architecture is multi-stage, following the detect-track-predict paradigm. In the detection stage, camera images and radar point clouds are used to detect objects in the vehicle's surroundings by adopting two object detection models. The detected objects are tracked by an online tracking method. We also design a radar association method to extract radar velocity for an object. In the prediction stage, we build a recurrent neural network to process an object's temporal sequence of positions and velocities and predict future trajectories. Experiments on the real-world autonomous driving nuScenes dataset show that the radar velocity mainly affects the center of the bounding box representing the position of an object and thus improves the prediction performance.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116827074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}