Smriti Srivastava, M. Shaikh, G. Shivaneetha, Minal Moharir
{"title":"Intelligent congestion control for NoC architecture in Gem5 simulator","authors":"Smriti Srivastava, M. Shaikh, G. Shivaneetha, Minal Moharir","doi":"10.1109/MCSoC57363.2022.00062","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00062","url":null,"abstract":"Congestion in a network significantly impacts the performance of an NoC as there is a substantial increase in latency and power consumption. Machine Learning techniques aid in designing routing methods to keep the network cognizant of the traffic status. This paper presents a congestion-aware Q-routing algorithm based on the Q-learning model of reinforcement learning. The proposed algorithm enhances the network's performance in an NoC under heavy traffic conditions by routing the packets along a less congested path. Thus, it reduces the congestion in the network. This is possible as Q-learning allows the network to keep track of the local and non-local congestion by estimating Q-values. The Q-values guide a node in sending a data packet along an optimal path, thereby evading busy routes. The simulation done on the gem5 simulator with uniform link latency in the network exhibits that Q-routing performs better in a high-load environment than traditional XY and Odd-Even Routing methods, with a performance gain of 5.73% and 12.73%, respectively. The results for varied link latencies that were randomly assigned to create a practical congestion-probable scenario showed that the proposed method outperformed both the XY and Odd-Even routing algorithm with a respective performance gain of 7.38% and 15.19%.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132131733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of Edge-cloud Cooperative CNN Inference on an IoT Platform","authors":"Yuan Wang, H. Shibamura, KuanYi Ng, Koji Inoue","doi":"10.1109/MCSoC57363.2022.00060","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00060","url":null,"abstract":"Since the Internet of Things (IoT) has become more widely used in various industrial situations, Artificial Intelligence (AI) programs, particularly Convolutional Neural Network (CNN) applications, are projected to be implemented on edge devices to meet high-accuracy and huge industry computing needs. Offloading computing-intensive workloads to the cloud is a promising solution for compact energy-constrained edge devices, but it tends to incur significant costs in total execution latency. For flexible and fine-grained offloading, this paper aims to design and implement an edge-cloud cooperative CNN inference framework on an IoT platform by targeting TensorFlow Lite. We have confirmed the implementation's feasibility and accuracy through the verification of implementing LeNet, AlexNet, and VGGNet. Intending to perform high-performance edge-cloud AI executions on the presented IoT platform, we evaluate the performance overhead (total execution latency) of the provided implementation and identify the current bottlenecks of the target platform for enhancing it.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115690788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm to Interconvert SQL and Procedural Visual Queries","authors":"Tomonori Suzuki, Y. Watanobe, Divij G. Singh","doi":"10.1109/MCSoC57363.2022.00048","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00048","url":null,"abstract":"In this paper, we propose an algorithm to convert SQL and procedural languages into each other. The algorithm converts features of SQL, a declarative programming language, that are not evaluated in top-to-bottom evaluation order, to be evaluated in top-to-bottom order. The algorithm also supports SQL-DML (SELECT, INSERT, UPDATE, DELETE). This helps students and inexperienced users who are learning SQL to understand SQL, and helps experienced users to understand nontrivial and difficult-to-understand SQL. It also introduces a system architecture for inter-conversion between SQL and procedural languages. This architecture allows the system to support a variety of RDBMS.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116070292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Image Sensor Fault Detection for Autonomous Vehicles","authors":"Yizhi Chen, Wenyao Zhu, Dejiu Chen, Zhonghai Lu","doi":"10.1109/MCSoC57363.2022.00028","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00028","url":null,"abstract":"Automated driving vehicles have shown glorious potential in the near future market due to the high safety and convenience for drivers and passengers. Image sensors' reliability attract many researchers' interests as many image sensors are used in autonomous vehicles. We propose an online image sensor fault detection method based on comparing the historical variances of normal pixels and defective pixels to detect faults. For fault pixels without uncertainty, with a detecting window of more than 30 frames, we get 100% accuracy and 100% recall on realistic continuous traffic pictures from the KITTI data set. We also explore the influence of fault pixel values' uncertainty from 0% to 25% and study different fixed thresholds and a dynamic threshold for judgments. Strict threshold, which is 0.1, has a high accuracy (99.16%) but has a low recall (34.46%) for 15% uncertainty. Loose threshold, which is 0.3, has a relatively high recall (83.78%) but mistakes too many normal pixels with 18.17% accuracy for 15% uncertainty. Our dynamic threshold balances the accuracy and recall. It gets 100% accuracy and 58.69% recall for 5% uncertainty and 78.38% accuracy and 55.39% recall for 15% uncertainty. Based on the detected damage pixel rate, we develop a health score for evaluating the image sensor system intuitively. It can also be helpful for making decision about replacing cameras.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129503872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abu Saleh Musa Miah, Jungpil Shin, Md. Al Mehedi Hasan, M. I. Molla, Y. Okuyama, Yoichi Tomioka
{"title":"Movie Oriented Positive Negative Emotion Classification from EEG Signal using Wavelet transformation and Machine learning Approaches","authors":"Abu Saleh Musa Miah, Jungpil Shin, Md. Al Mehedi Hasan, M. I. Molla, Y. Okuyama, Yoichi Tomioka","doi":"10.1109/MCSoC57363.2022.00014","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00014","url":null,"abstract":"Electroencephalography (EEG) sensor plays an important role in developing brain-computer interfaces (BCI) to enhance human-computer interaction (HCI). Nowadays, various types of research works are performed to develop EEG-based HCI systems for controlling and monitoring systems. However, researchers are still facing challenges in developing this system due to noise from the physiological and internal and external artefacts. This study proposed a method to find useful electrodes and extract potential information from the brain nerves for the classification of positive or negative emotions. The collected emotion's EEG signal is recorded using 14 electrodes from the 30-younger people. Two movies were used for positive and negative emotions. In the proposed method, we first extracted the five bands wavelet transform from the EEG and then calculated the standard deviation (SD), average power (AVP) and mean absolute value (MAV) of the five bands wavelet information. Finally, we applied an extra tree classifier (ETC), random forest (RF), and support vector machine (SVM) to classify the emotion based on the feature vector. Among three classifiers ETC achieved higher performance accuracy in F3, FC5, T8, FC6, F8, and AF4 electrodes. This indicates that the F3, FC5, T8, FC6, F8, and AF4 electrodes carry potential information in positive-negative emotion classification.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116442959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey
{"title":"High-Performance Asynchronous CNN Accelerator with Early Termination","authors":"Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey","doi":"10.1109/MCSoC57363.2022.00031","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00031","url":null,"abstract":"Convolutional Neural Network (CNN), especially very deep networks, are highly computation intensive, resulting in long delays and high-power consumption. The dynamically varying environmental conditions in real world inference can result in highly complex problems, and hence the need for these inefficient deep networks to guarantee satisfactory accuracy all the time. Several studies employ approximation techniques to execute partial computation of the network, in attempt to reduce the amount of computation were unnecessary. However, such approaches are still highly sequential in nature, since they still need to run the whole network. This paper proposes an early termination architecture on an already-trained CNN to allow for testing the partial results midway through the network, reducing computations by terminating the main network when it is sufficient as the inference results. The first proposal is implemented in synchronous circuit, however, due to its nature all memory elements are required to capture even when no new data is generated. The second proposal employs the use of asynchronous circuit to significantly reducing power consumption and further sped up the architecture since an operation need not wait the slowest critical path in the circuit. The proposed circuits were designed on FPGA platform. The results of the asynchronous circuit show a nearly 20% increment in speed with about 12% reduction in power consumption in comparison with a synchronous circuit.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132616390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using scheduling entropy amplification in CUDA/OpenMP code to exhibit non-reproducibility issues","authors":"D. Defour","doi":"10.1109/MCSoC57363.2022.00040","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00040","url":null,"abstract":"Rounding error or cancellation that appears with each floating-point operations, combined with the lack of control over execution order in parallel code leads to numerical issues such as numerical reproducibility. In order to enhance the possibility to discover such numerical issue, in this article we propose a simple solution base on an index interposer and an index scrambler to amplify the possible combination of execution order.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133634664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Critical Signature Assertion and On-the-Fly Recovery for Control Flow Errors in Processors","authors":"Ing-Jer Huang, Yi-Ju Ke, Shih-Jung Pao","doi":"10.1109/MCSoC57363.2022.00052","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00052","url":null,"abstract":"This paper presents a highly effective hybrid control flow error (CFE) detection and recovery mechanism for fault-tolerant instruction set processors. The mechanism consists of two innovations: critical signature assertion (CSA) and on-the-fly recovery (OTFR). The proposed mechanism is experimented with a commercial 32-bit microcontroller core, Andes N801s. Compared with related work, our approach achieves up to 75% and 221% lower in memory size and performance overheads respectively, and reduces the error correction latency by up to 54%, at the reasonable costs of 3470 gates (+19%) and 967uW (+17%) power and merely 0.3% sacrifice in fault coverage.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117016547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient and High-Performance Sparse Matrix-Vector Multiplication on a Many-Core Array","authors":"Peiyao Shi, Aaron Stillmaker, B. Baas","doi":"10.1109/MCSoC57363.2022.00038","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00038","url":null,"abstract":"Sparse matrix-vector multiplication (SpMV) is a critical operation in scientific computing, engineering, and other applications. Eight functionally-equivalent SpMV implementations are created for a fine-grained many-core platform with independent shared memory modules. These implementations are compared with a general-purpose processor (Intel Core-i7 3720QM) and a graphics processing unit (GPU, NVIDIA Quadro 620) and results are scaled to 32 nm CMOS. The performance (throughput per chip area) for all three platforms is compared when operating on a set of seven unstructured sparse matrices of varying dimensions up to 3.6 billion elements. The many-core implementations show a $54times$ greater performance than the general-purpose processor, and $40times$ greater performance than the GPU.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114297881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Realization of IO Physical Memory Protection for RISC-V Systems","authors":"Jien Hau Ng, Chee Hong Ang, Hwa Chaw Law","doi":"10.1109/MCSoC57363.2022.00066","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00066","url":null,"abstract":"Physical memories or RAMs are essential components in a computer system to hold temporary information required for both software and hardware to work properly. When a system's security is compromised (e.g., due to a malicious application), sensitive information being held in the memories can be leaked out for example to “the cloud”. The RISC-V privileged architecture standard adopts a method called Physical Memory Protection (PMP) to segregate a system's memory into regions with different policy and permissions to prevent unprivileged software from accessing unauthorized regions. However, PMP does not prevent malicious software from hijacking an Input/Output (IO) device with Direct Memory Access (DMA) capability to indirectly gain unauthorized accesses and hence, a similar method commonly termed as “IOPMP” is being worked on in the RISC-V community. This paper describes an early implementation of IOPMP and how it is used to protect physical memory regions in a RISC-V system. Then, the potential performance impact of IOPMP is briefly elaborated. There are still work to be done and this early IOPMP implementation allows various aspects of the protection method such as its scalability, practicality, and effectiveness etc. to be studied for future enhancement.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125517505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}