Dennis Pinto, Jose-María Arnau, Marc Riera, Josep-Llorenç Cruz, Antonio González
{"title":"Mixture-of-Rookies: Saving DNN computations by predicting ReLU outputs","authors":"Dennis Pinto, Jose-María Arnau, Marc Riera, Josep-Llorenç Cruz, Antonio González","doi":"10.1016/j.micpro.2024.105087","DOIUrl":"10.1016/j.micpro.2024.105087","url":null,"abstract":"<div><p>Deep Neural Networks (DNNs) are widely used in many application domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named <em>Mixture-of-Rookies</em>, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105087"},"PeriodicalIF":1.9,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000826/pdfft?md5=f3e30ee4d950e1c93554e32d04ba1b80&pid=1-s2.0-S0141933124000826-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raimarius Delgado , Se Yeon Cho , Byoung Wook Choi
{"title":"PyIgH : A unified architecture of IgH EtherCAT Master based on Python considering hard real-time constraints","authors":"Raimarius Delgado , Se Yeon Cho , Byoung Wook Choi","doi":"10.1016/j.micpro.2024.105085","DOIUrl":"10.1016/j.micpro.2024.105085","url":null,"abstract":"<div><p>The increasing demand for rapid application development tools, especially those employing high-level languages such as Python, has underscored the importance of utilizing a wide array of popular libraries while addressing real-time constraints in distributed hardware systems. This paper introduces PyIgH, a unified architecture of an IgH EtherCAT master based on Python, specifically designed to satisfy hard real-time requirements in an EtherCAT network. Implemented as a Python module, PyIgH exposes the functionalities and capabilities of an open-source EtherCAT master, facilitating seamless configuration and control of EtherCAT slave devices within the Python runtime environment. Real-time adaptation of the POSIX library, encapsulated within Python, is also utilized to satisfy the timing requirements of EtherCAT. The feasibility of the proposed approach is verified by analyzing the real-time performance in terms of periodicity and in-controller delay of the EtherCAT control task with a 1 kHz cycle. Experimental results demonstrate that PyIgH is suitable for hard real-time applications and serves as a valid alternative to conventional low-level EtherCAT masters. Additionally, a practical application involving motion control of a six-axis collaborative robot showcases consistent performance of PyIgH within a real-time multi-tasking environment.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105085"},"PeriodicalIF":1.9,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Full wireless goniometer design with activity recognition for upper and lower limb","authors":"Cemil Keskinoğlu , Ahmet Aydın","doi":"10.1016/j.micpro.2024.105086","DOIUrl":"10.1016/j.micpro.2024.105086","url":null,"abstract":"<div><p>People must move using their lower and upper extremities to complete their work. Depending on these extremities' using frequency or different effects such as age, genetics, and body weight, the extremities' ability may decrease. The joints' range of motion(ROM) is measured to evaluate this decrease. Different systems, such as conventional goniometers, mobile phone applications, and sensor-based systems, can measure the ROM value. Still, it can be challenging to measure this parameter in different situations, such as training, moving activities, etc. The partial wireless goniometer and a companion 3D visualization and control GUI were developed in our previous study. However, it was difficult to mount it on the limbs at a distance, or it was impossible to use it for both legs to measure the hip angles. Therefore, this study presents a full wireless goniometer system that can simultaneously measure in real-time and show joint movements in a 3D model for the upper and lower extremities. The angle values required for the ROM were measured with two IMU sensors. Two ESP32s were used as microcontrollers in the system, and a fully wireless system was enabled by transferring data via ESP-NOW and Bluetooth. Thanks to ESP-NOW, the system has less latency compared to other protocols and can transmit data over longer distances. The developed system can also perform activity recognition which is not available in other goniometers. The measurements of the system were compared with a conventional goniometer, and their results were found to be completely correlated <span><math><mrow><mo>(</mo><mrow><msub><mi>ρ</mi><mi>c</mi></msub><mo>=</mo><mn>1</mn></mrow><mo>)</mo></mrow></math></span>.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105086"},"PeriodicalIF":1.9,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141775424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Fernández Gallego, Miguel Jiménez Arribas, Iván Gamino del Río, Agustín Martínez Hellín, Manuel Prieto Mateo, Óscar Rodríguez Polo, Antonio da Silva, Pablo Parra, Sebastián Sánchez
{"title":"Count overflow and privilege mode filtering extension implementation on a RISC-V on-board processor","authors":"Andrea Fernández Gallego, Miguel Jiménez Arribas, Iván Gamino del Río, Agustín Martínez Hellín, Manuel Prieto Mateo, Óscar Rodríguez Polo, Antonio da Silva, Pablo Parra, Sebastián Sánchez","doi":"10.1016/j.micpro.2024.105084","DOIUrl":"10.1016/j.micpro.2024.105084","url":null,"abstract":"<div><p><em>RISC-V</em> is a computer architecture that has recently attracted considerable attention due to its advantageous qualities: it is an open instruction set, based on reduced and simple instructions. For this reason it has become an appealing choice for a wide range of computing applications and has positioned it as a disruptive force in a wide variety of fields, including those that involve the development of safety–critical software, as in the space sector. The ability to evaluate the activities performed within a processor is of paramount importance in this type of systems to ensure the fulfillment of the requirements during space missions. The monitoring of these events inside the processor is managed by an instrument called Hardware Performance Monitor (HPM). This work shows the implementation of the <em>Sscofpmf</em> extension of the HPM compliant to the <em>RISC-V</em> privileged specification. The paper details the redesign of the existing performance counters from a <em>RISC-V</em> baseline version previously implemented. A comparison between the two versions of both resource utilization data and power consumption is also provided. As expected, the <em>Sscofpmf</em> extension version has a higher resource utilization. Nevertheless, the paper shows that the additional functionalities included in the system have been validated without any changes in the processor clock frequency, so the extension does not introduce any performance overhead.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105084"},"PeriodicalIF":1.9,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000796/pdfft?md5=db2cd71fd8fabeee87eb0b479d1b76cc&pid=1-s2.0-S0141933124000796-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141699792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Formal timing analysis of gate-level digital circuits using model checking","authors":"Qurat-ul Ain, Osman Hasan","doi":"10.1016/j.micpro.2024.105083","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105083","url":null,"abstract":"<div><p>Due to the continuous reduction in the transistors sizing ruled by the Moore’s law, digital devices have become smaller, and more complex resulting in an enormous rise in the delay variations. Therefore, there is a dire need of precise and rigorous timing analysis to overcome anomalies during the timing analysis. Timings of digital circuits can be verified using various simulation or static timing analysis (STA) based tools but they provide estimated results due to their inherent in-exhaustive nature or report timing paths corresponding to non-existent functional paths, respectively. Formal verification provides complete and sound analysis results and has widely been used for the functional verification of digital circuits but its application in the timing analysis domain is somewhat limited. We present a generic framework to perform formal timing analysis of digital circuits with the help of Uppaal model-checker. The given digital circuit along with its timing parameters in the form of state transition diagram are modeled using timed automata in the Uppaal model checker. Timing delays are calculated from corresponding technology parameters, and Quartus Prime Pro is used to obtain the information about the circuits’ paths. In order to make the analysis scalable, we also propose a novel path partitioning technique and compare its results with complete path analysis and traditional STA. The formal model is verified with the help of properties to assess the timing characteristics, like time period of a clock, critical path, and propagation delay of the considered circuit. Modeling and verification of ISCAS-85 and ISCAS-89 benchmark circuits is presented for illustration purposes.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105083"},"PeriodicalIF":1.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a low-area hardware architecture to predict early signs of sudden cardiac arrests","authors":"Anusaka Gon, Atin Mukherjee","doi":"10.1016/j.micpro.2024.105082","DOIUrl":"10.1016/j.micpro.2024.105082","url":null,"abstract":"<div><p>Sudden cardiac arrest (SCA) results in an unexpected and untimely death within minutes, and its early prediction can alert cardiac patients to a timely medical diagnosis. To detect early symptoms of an SCA, the detection and classification of ventricular tachycardias (VT) are of utmost importance. In this work, a low-area yet highly accurate hardware architecture for VT classification is proposed based on the detection of premature ventricular contraction (PVC) beats. After pre-processing of the ECG signals using a wavelet-based pre-processing unit, a characteristics-matching algorithm is used to detect the PVC beats, and a low-complexity adaptive decision-based logic classifier is used to classify them into four types of VTs, namely monomorphic, polymorphic, non-sustained VT (NSVT), and sustained VT (SVT). FPGA verification of the hardware architecture for the VT classifier using the Nexys 4 DDR Artix-7 board utilizes 10.4 % of the total available resources and displays the type of VT and the number of PVCs detected to help in determining the severity of SCA and the need for medical attention. The ASIC implementation of the proposed PVC-based VT classification using the SCL 180 nm CMOS technology results in an area overhead of 0.02 mm<sup>2</sup> and a power consumption of 3.47 μW for a high accuracy rate of 98.2 %. When compared to the existing CA detection systems for wearable devices, the proposed one consumes the least area while achieving high detection rates.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105082"},"PeriodicalIF":1.9,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141414119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An automated consistency management approach for a privacy-aware electric vehicle architecture","authors":"Jonathan Stancke, Christian Plappert, Lukas Jäger","doi":"10.1016/j.micpro.2024.105074","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105074","url":null,"abstract":"<div><p>Modern vehicles contain a number of highly connected embedded systems that generate, store, and process information and exchange it with their environment. Since a large part of this information is privacy-critical, privacy laws such as the GDPR of the European Union apply to it. In this work, we evaluate the privacy-criticality of exemplary data and data flows of the electric driving domain on a reference architecture. We categorize the ECUs of the architecture based on the criticality of the data they process and propose measures and technologies as building blocks that provide adequate privacy protection according to the requirements given by the GDPR.</p><p>To ensure that all requirements are met by the reference architecture, we propose a more principled solution that simplifies the mapping between an architecture and the measures. For this purpose, we propose an architecture description template in JSON and an algorithm for automated consistency checks that outputs the measures and the security extension needed per Electronic Control Unit (ECU) to comply with derived privacy requirements.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105074"},"PeriodicalIF":2.6,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000693/pdfft?md5=e4034fe6211d68785c24aa81ea2401f7&pid=1-s2.0-S0141933124000693-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141333389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving performance of simultaneous multithreading CPUs using autonomous control of speculative traces","authors":"Ryan F. Ortiz, Wei-Ming Lin","doi":"10.1016/j.micpro.2024.105073","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105073","url":null,"abstract":"<div><p>Simultaneous Multithreading (SMT) allows for a processor to concurrently execute multiple independent threads while sharing certain data path components to optimize resource waste. Speculative execution allows for these processors to take advantage of Instruction-Level Parallelism but the penalty for a miss speculation includes the wasting of resources amongst these shared resources where clock cycles are wasted at a time. In this paper we show that an average of 13 % of instructions are flushed as a result of incorrect predictions. These flushed out instructions could have potentially taken up shared resources which other non-speculative threads could have used. This paper proposes a technique that can dynamically adjust how many speculative instructions a thread can rename and decode aiming to diminish the waste of the shared resources. Our simulation results show, with the proposed technique, that the average flushed out instruction rate is reduced by 23 % and average throughput is improved by 13 %.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105073"},"PeriodicalIF":2.6,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141242952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancements on IoT and AI applied to Pneumology","authors":"Enrico Cambiaso , Sara Narteni , Ilaria Baiardini , Fulvio Braido , Alessia Paglialonga , Maurizio Mongelli","doi":"10.1016/j.micpro.2024.105062","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105062","url":null,"abstract":"<div><p>The objective of this work is the design of a technological platform for remote monitoring of patients with Chronic Obstructive Pulmonary Disease (COPD). The concept of the framework is a breakthrough in the state of medical, scientific and technological art, aimed at engaging patients in the treatment plan and supporting interaction with healthcare professionals. The proposed platform is able to support a new paradigm for the management of patients with COPD, by integrating clinical data and parameters monitored in daily life using Artificial Intelligence algorithms. Therefore, the doctor is provided with a dynamic picture of the disease and its impact on lifestyle and vice versa, and can thus plan more personalized diagnostics, therapeutics, and social interventions. This strategy allows for a more effective organization of access to outpatient care and therefore a reduction of emergencies and hospitalizations because exacerbations of the disease can be better prevented and monitored. Hence, it can result in improvements in patients’ quality of life and lower costs for the healthcare system.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105062"},"PeriodicalIF":2.6,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000577/pdfft?md5=04b32d737cc9dd247636adf8505b415a&pid=1-s2.0-S0141933124000577-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A two stage pipeline architecture for hardware implementation of multi-level decomposition of 1-D framelet transform","authors":"Kasetty Praveen Kumar, Aniruddha Kanhe","doi":"10.1016/j.micpro.2024.105064","DOIUrl":"10.1016/j.micpro.2024.105064","url":null,"abstract":"<div><p>In this paper a two stage pipeline architecture for computation of multilevel decomposition of framelet transform is proposed. To handle the problem of perfect reconstruction, an area efficient symmetric extension router is used that duplicates the appropriate number of data samples of input signal at the boundary followed by reflection about the symmetry axis. In addition, to reduce the period and number of clock cycles required for computing the framelet transform, the inter-stage and intrastage pipeline of the computational units is maximized. The inter-stage pipelining is obtained by distributing the various levels of decomposition among the computational units of two stages, and a synchronization mechanism is adopted to reduce the total number of clock cycles. Similarly, the intrastage pipelining is achieved by using the pipeline registers such that the clock period is limited to the delay of multiplier and accumulator (MAC) circuit of the finite-impulse response (FIR) filter. To validate the feasibility and functionality of the proposed hardware architecture, the design is implemented on Artix7 XC7A100TCSG324-1 field-programmable gate array (FPGA) for the case of framelet transform with one low-pass and two high-pass filters. The proposed architecture is able to operate at a maximum clock frequency of 112 MHz.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105064"},"PeriodicalIF":2.6,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141138587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}