{"title":"NvMISC: Toward an FPGA-Based Emulation Platform for RISC-V and Nonvolatile Memories","authors":"Yuankang Zhao;Salim Ullah;Siva Satyendra Sahoo;Akash Kumar","doi":"10.1109/LES.2023.3299202","DOIUrl":"10.1109/LES.2023.3299202","url":null,"abstract":"The emerging nonvolatile memories (NVMs), such as spin transfer torque random access memory (STT-RAM) and racetrack memory (RTM), offer a promising solution to satisfy the memory and performance requirements of modern applications. Compared to the commonly utilized volatile static random-access memories (SRAMs), the NVMs provide better capacity and energy efficiency. However, many of these NVMs are still in the development phases and require proper evaluation in order to evaluate the impact of their use at the system level. Therefore, there is a need to design functional- and cycleaccurate simulators/emulators to evaluate the performance of these memory technologies. To this end, this work focuses on implementing a RISC-V-based emulation platform for evaluating NVMs. The proposed framework provides interfaces to integrate various types of NVMs, with RTMs and STT-RAMs used as test cases. The efficacy of the framework is evaluated by executing benchmark applications.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"170-173"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"No-Multiplication Deterministic Hyperdimensional Encoding for Resource-Constrained Devices","authors":"Mehran Shoushtari Moghadam;Sercan Aygun;M. Hassan Najafi","doi":"10.1109/LES.2023.3298732","DOIUrl":"10.1109/LES.2023.3298732","url":null,"abstract":"Hyperdimensional vector processing is a nascent computing approach that mimics the brain structure and offers lightweight, robust, and efficient hardware solutions for different learning and cognitive tasks. For image recognition and classification, hyperdimensional computing (HDC) utilizes the intensity values of captured images and the positions of image pixels. Traditional HDC systems represent the intensity and positions with binary hypervectors of 1K–10K dimensions. The intensity hypervectors are cross-correlated for closer values and uncorrelated for distant values in the intensity range. The position hypervectors are pseudo-random binary vectors generated iteratively for the best classification performance. In this study, we propose a radically new approach for encoding image data in HDC systems. Position hypervectors are no longer needed by encoding pixel intensities using a deterministic approach based on quasi-random sequences. The proposed approach significantly reduces the number of operations by eliminating the position hypervectors and the multiplication operations in the HDC system. Additionally, we suggest a hybrid technique for generating hypervectors by combining two deterministic sequences, achieving higher classification accuracy. Our experimental results show up to \u0000<inline-formula> <tex-math>$102times $ </tex-math></inline-formula>\u0000 reduction in runtime and significant memory-usage savings with improved accuracy compared to a baseline HDC system with conventional hypervector encoding.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"210-213"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Approximate Parallel Annealing Ising Machine for Solving Traveling Salesman Problems","authors":"Qichao Tao;Tingting Zhang;Jie Han","doi":"10.1109/LES.2023.3298739","DOIUrl":"10.1109/LES.2023.3298739","url":null,"abstract":"Annealing-based Ising machines have emerged as high-performance solvers for combinatorial optimization problems (COPs). As a typical COP with constraints imposed on the solution, traveling salesman problems (TSPs) are difficult to solve using conventional methods. To address this challenge, we design an approximate parallel annealing Ising machine (APAIM) based on an improved parallel annealing algorithm. In this design, adders are reused in the local field accumulator units (LAUs) with half-precision floating-point representation of the coefficients in the Ising model. The momentum scaling factor is approximated by a linear, incremental function to save hardware. To improve the solution quality, a buffer-based energy calculation unit selects the best solution among the found candidate results in multiple iterations. Finally, approximate adders are applied in the design for improving the speed of accumulation in the LAUs. The design and synthesis of a 64-spin APAIM show the potential of this methodology in efficiently solving complicated constrained COPs.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"226-229"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamidreza Alikhani;Anil Kanduri;Pasi Liljeberg;Amir M. Rahmani;Nikil Dutt
{"title":"DynaFuse: Dynamic Fusion for Resource Efficient Multimodal Machine Learning Inference","authors":"Hamidreza Alikhani;Anil Kanduri;Pasi Liljeberg;Amir M. Rahmani;Nikil Dutt","doi":"10.1109/LES.2023.3298738","DOIUrl":"10.1109/LES.2023.3298738","url":null,"abstract":"Multimodal machine learning (MMML) applications combine results from different modalities in the inference phase to improve prediction accuracy. Existing MMML fusion strategies use static modality weight assignment, based on the intrinsic value of sensor modalities determined during the training phase. However, input data perturbations in practical scenarios affect the intrinsic value of modalities in the inference phase, lowering prediction accuracy, and draining computational and energy resources. In this letter, we present dynamic fusion (DynaFuse), a framework for dynamic and adaptive fusion of MMML inference to set modality weights, considering run-time parameters of input data quality and sensor energy budgets. We determine the insightfulness of modalities by combining the design-time intrinsic value with the run-time extrinsic value of different modalities to assign updated modality weights, catering to both accuracy requirements and energy conservation demands. The DynaFuse approach achieves up to 22% gain in prediction accuracy and an average energy savings of 34% on exemplary MMML applications of human activity recognition and stress monitoring in comparison with state-of-the-art static fusion approaches.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"222-225"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10261977","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Should We Even Optimize for Execution Energy? Rethinking Mapping for MAGIC Design Style","authors":"Simranjeet Singh;Chandan Kumar Jha;Ankit Bende;Phrangboklang Lyngton Thangkhiew;Vikas Rana;Sachin Patkar;Rolf Drechsler;Farhad Merchant","doi":"10.1109/LES.2023.3298740","DOIUrl":"10.1109/LES.2023.3298740","url":null,"abstract":"Memristor-based logic-in-memory (LiM) has become popular as a means to overcome the von Neumann bottleneck in traditional data-intensive computing. Recently, the memristor-aided logic (MAGIC) design style has gained immense traction for LiM due to its simplicity. However, understanding the energy distribution during the design of logic operations within the memristive memory is crucial in assessing such an implementation’s significance. The current energy estimation methods rely on coarse-grained techniques, which underestimate the energy consumption of MAGIC-styled operations performed on a memristor crossbar. To address this issue, we analyze the energy breakdown in MAGIC operations and propose a solution that utilizes mapping from the SIMPLER MAGIC tool to achieve accurate energy estimation through SPICE simulations. In contrast to existing research that primarily focuses on optimizing execution energy, our findings reveal that the memristor’s initialization energy in the MAGIC design style is, on average, \u0000<inline-formula> <tex-math>$68times $ </tex-math></inline-formula>\u0000 higher. We demonstrate that this initialization energy significantly dominates the overall energy consumption. By highlighting this aspect, we aim to redirect the attention of designers toward developing algorithms and strategies that prioritize optimizations in initializations rather than execution for more effective energy savings.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"230-233"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Partial Weight Update Techniques for Lightweight On-Device Learning on Tiny Flash-Embedded MCUs","authors":"Jisu Kwon;Daejin Park","doi":"10.1109/LES.2023.3298731","DOIUrl":"10.1109/LES.2023.3298731","url":null,"abstract":"Typical training procedures involve read and write operations for weight updates during backpropagation. However, on-device training on microcontroller units (MCUs) presents two challenges. First, the on-chip SRAM has insufficient capacity to store the weight. Second, the large flash memory, which has a constraint on write access, becomes necessary to accommodate the network for on-device training on MCUs. To tackle these memory constraints, we propose a partial weight update technique based on gradient delta computation. The weights are stored in flash memory, and a part of the weight to be updated is selectively copied to the SRAM from the flash memory. We implemented this approach for training a fully connected network on an on-device MNIST digit classification task using only 20-kB SRAM and 1912-kB flash memory on an MCU. The proposed technique achieves reasonable accuracy with only 18.52% partial weight updates, which is comparable to state-of-the-art results. Furthermore, we achieved a reduction of up to 46.9% in the area-power-delay product compared to a commercially available high-performance MCU capable of embedding the entire model parameter, taking into account the area scale factor.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"206-209"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LOCoCAT: Low-Overhead Classification of CAN Bus Attack Types","authors":"Caio Batista de Melo;Nikil Dutt","doi":"10.1109/LES.2023.3299217","DOIUrl":"10.1109/LES.2023.3299217","url":null,"abstract":"Although research has shown vulnerabilities and shortcomings of the controller area network bus (CAN bus) and proposed alternatives, the CAN bus protocol is still the industry standard and present in most vehicles. Due to its vulnerability to potential intruders that can hinder execution or even take control of the vehicles, much work has focused on detecting intrusions on the CAN bus. However, most literature does not provide mechanisms to reason about, or respond to the attacks so that the system can continue to execute safely despite the intruder. This letter proposes a low-overhead methodology to automatically classify intrusions into predefined types once detected. Our framework: 1) groups messages of the same attacks into blocks; 2) extracts relevant features from each block; and 3) predicts the type of attack using a lightweight classifier model. The initial models depicted in this letter show an accuracy of up to 99.16% within the first 50 ms of the attack, allowing the system to quickly react to the intrusion before the malicious actor can conclude their attack. We believe this letter lays the groundwork for vehicles to have specialized runtime reactions based on the attack type.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"178-181"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10261979","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meruyert Karzhaubayeva;Aidar Amangeldi;Jurn-Gyu Park
{"title":"CNN Workloads Characterization and Integrated CPU–GPU DVFS Governors on Embedded Systems","authors":"Meruyert Karzhaubayeva;Aidar Amangeldi;Jurn-Gyu Park","doi":"10.1109/LES.2023.3299335","DOIUrl":"10.1109/LES.2023.3299335","url":null,"abstract":"Dynamic power management (DPM) techniques on mobile systems are indispensable for deep learning (DL) inference optimization, which is mainly performed on battery-based mobile and/or embedded platforms with constrained resources. To this end, we characterize CNN workloads using object detection applications of YOLOv4/-tiny and YOLOv3/-tiny, and then propose integrated CPU–GPU DVFS governor policies that scale integrated pairs of CPU and GPU frequencies to improve energy–delay product (EDP) with negligible inference execution time degradation. Our results show up to 16.7% EDP improvements with negligible (mostly less than 2%) performance degradation using object detection applications on NVIDIA Jetson TX2.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"202-205"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Swift-CNN: Leveraging PCM Memory’s Fast Write Mode to Accelerate CNNs","authors":"Lokesh Siddhu;Hassan Nassar;Lars Bauer;Christian Hakert;Nils Hölscher;Jian-Jia Chen;Joerg Henkel","doi":"10.1109/LES.2023.3298742","DOIUrl":"10.1109/LES.2023.3298742","url":null,"abstract":"Nonvolatile memories [especially phase change memories (PCMs)] offer scalability and higher density. However, reduced write performance has limited their use as main memory. Researchers have explored using the fast write mode available in PCM to alleviate the challenges. The fast write mode offers lower write latency and energy consumption. However, the fast-written data are retained for a limited time and need to be refreshed. Prior works perform fast writes when the memory is busy and use slow writes to refresh the data during memory idle phases. Such policies do not consider the retention time requirement of a variable and repeat all the writes made during the busy phase. In this work, we suggest a retention-time-aware selection of write modes. As a case study, we use convolutional neural networks (CNNs) and present a novel algorithm, Swift-CNN, that assesses each CNN layer’s memory access behavior and retention time requirement and suggests an appropriate PCM write mode. Our results show that Swift-CNN decreases inference and training execution time and memory energy compared to state-of-the-art techniques and achieves execution time close to the ideal (fast write-only) policy.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"234-237"},"PeriodicalIF":1.6,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chimezie Eguzo;Benedikt Scherer;Daniel Keßel;Ilja Bekman;Matthias Streun;Mario Schlosser;Stefan van Waasen
{"title":"On Automating FPGA Design Build Flow Using GitLab CI","authors":"Chimezie Eguzo;Benedikt Scherer;Daniel Keßel;Ilja Bekman;Matthias Streun;Mario Schlosser;Stefan van Waasen","doi":"10.1109/LES.2023.3314148","DOIUrl":"10.1109/LES.2023.3314148","url":null,"abstract":"Building and testing software for embedded systems can be challenging with an impact on delivery time, design reproducibility, and collaboration among project contributors. To accelerate project development, presented here is an automated build flow that utilizes Xilinx PetaLinux, and field programmable gate array (FPGA) hardware description and integrates with the GitLab continuous integration and continuous deployment (CI/CD) framework for embedded targets. This build flow automates the complete process of FPGA implementation, PetaLinux configuration, and cross-compilation of software essentials for the target system-on-chip (SoC). The system has been successfully deployed in cross-compiling the control and command toolset for the Positron Emission Tomography scanner (PhenoPET) and the implementation of the message queuing telemetry transport (MQTT) service on a Xilinx Zynq Ultrascale MPSoC. This approach can be easily adapted to other projects with specific requirements.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 2","pages":"227-230"},"PeriodicalIF":1.6,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135400128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}