{"title":"Formal timing analysis of gate-level digital circuits using model checking","authors":"Qurat-ul Ain, Osman Hasan","doi":"10.1016/j.micpro.2024.105083","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105083","url":null,"abstract":"<div><p>Due to the continuous reduction in the transistors sizing ruled by the Moore’s law, digital devices have become smaller, and more complex resulting in an enormous rise in the delay variations. Therefore, there is a dire need of precise and rigorous timing analysis to overcome anomalies during the timing analysis. Timings of digital circuits can be verified using various simulation or static timing analysis (STA) based tools but they provide estimated results due to their inherent in-exhaustive nature or report timing paths corresponding to non-existent functional paths, respectively. Formal verification provides complete and sound analysis results and has widely been used for the functional verification of digital circuits but its application in the timing analysis domain is somewhat limited. We present a generic framework to perform formal timing analysis of digital circuits with the help of Uppaal model-checker. The given digital circuit along with its timing parameters in the form of state transition diagram are modeled using timed automata in the Uppaal model checker. Timing delays are calculated from corresponding technology parameters, and Quartus Prime Pro is used to obtain the information about the circuits’ paths. In order to make the analysis scalable, we also propose a novel path partitioning technique and compare its results with complete path analysis and traditional STA. The formal model is verified with the help of properties to assess the timing characteristics, like time period of a clock, critical path, and propagation delay of the considered circuit. Modeling and verification of ISCAS-85 and ISCAS-89 benchmark circuits is presented for illustration purposes.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105083"},"PeriodicalIF":1.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a low-area hardware architecture to predict early signs of sudden cardiac arrests","authors":"Anusaka Gon, Atin Mukherjee","doi":"10.1016/j.micpro.2024.105082","DOIUrl":"10.1016/j.micpro.2024.105082","url":null,"abstract":"<div><p>Sudden cardiac arrest (SCA) results in an unexpected and untimely death within minutes, and its early prediction can alert cardiac patients to a timely medical diagnosis. To detect early symptoms of an SCA, the detection and classification of ventricular tachycardias (VT) are of utmost importance. In this work, a low-area yet highly accurate hardware architecture for VT classification is proposed based on the detection of premature ventricular contraction (PVC) beats. After pre-processing of the ECG signals using a wavelet-based pre-processing unit, a characteristics-matching algorithm is used to detect the PVC beats, and a low-complexity adaptive decision-based logic classifier is used to classify them into four types of VTs, namely monomorphic, polymorphic, non-sustained VT (NSVT), and sustained VT (SVT). FPGA verification of the hardware architecture for the VT classifier using the Nexys 4 DDR Artix-7 board utilizes 10.4 % of the total available resources and displays the type of VT and the number of PVCs detected to help in determining the severity of SCA and the need for medical attention. The ASIC implementation of the proposed PVC-based VT classification using the SCL 180 nm CMOS technology results in an area overhead of 0.02 mm<sup>2</sup> and a power consumption of 3.47 μW for a high accuracy rate of 98.2 %. When compared to the existing CA detection systems for wearable devices, the proposed one consumes the least area while achieving high detection rates.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105082"},"PeriodicalIF":1.9,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141414119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An automated consistency management approach for a privacy-aware electric vehicle architecture","authors":"Jonathan Stancke, Christian Plappert, Lukas Jäger","doi":"10.1016/j.micpro.2024.105074","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105074","url":null,"abstract":"<div><p>Modern vehicles contain a number of highly connected embedded systems that generate, store, and process information and exchange it with their environment. Since a large part of this information is privacy-critical, privacy laws such as the GDPR of the European Union apply to it. In this work, we evaluate the privacy-criticality of exemplary data and data flows of the electric driving domain on a reference architecture. We categorize the ECUs of the architecture based on the criticality of the data they process and propose measures and technologies as building blocks that provide adequate privacy protection according to the requirements given by the GDPR.</p><p>To ensure that all requirements are met by the reference architecture, we propose a more principled solution that simplifies the mapping between an architecture and the measures. For this purpose, we propose an architecture description template in JSON and an algorithm for automated consistency checks that outputs the measures and the security extension needed per Electronic Control Unit (ECU) to comply with derived privacy requirements.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105074"},"PeriodicalIF":2.6,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000693/pdfft?md5=e4034fe6211d68785c24aa81ea2401f7&pid=1-s2.0-S0141933124000693-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141333389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving performance of simultaneous multithreading CPUs using autonomous control of speculative traces","authors":"Ryan F. Ortiz, Wei-Ming Lin","doi":"10.1016/j.micpro.2024.105073","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105073","url":null,"abstract":"<div><p>Simultaneous Multithreading (SMT) allows for a processor to concurrently execute multiple independent threads while sharing certain data path components to optimize resource waste. Speculative execution allows for these processors to take advantage of Instruction-Level Parallelism but the penalty for a miss speculation includes the wasting of resources amongst these shared resources where clock cycles are wasted at a time. In this paper we show that an average of 13 % of instructions are flushed as a result of incorrect predictions. These flushed out instructions could have potentially taken up shared resources which other non-speculative threads could have used. This paper proposes a technique that can dynamically adjust how many speculative instructions a thread can rename and decode aiming to diminish the waste of the shared resources. Our simulation results show, with the proposed technique, that the average flushed out instruction rate is reduced by 23 % and average throughput is improved by 13 %.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105073"},"PeriodicalIF":2.6,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141242952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancements on IoT and AI applied to Pneumology","authors":"Enrico Cambiaso , Sara Narteni , Ilaria Baiardini , Fulvio Braido , Alessia Paglialonga , Maurizio Mongelli","doi":"10.1016/j.micpro.2024.105062","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105062","url":null,"abstract":"<div><p>The objective of this work is the design of a technological platform for remote monitoring of patients with Chronic Obstructive Pulmonary Disease (COPD). The concept of the framework is a breakthrough in the state of medical, scientific and technological art, aimed at engaging patients in the treatment plan and supporting interaction with healthcare professionals. The proposed platform is able to support a new paradigm for the management of patients with COPD, by integrating clinical data and parameters monitored in daily life using Artificial Intelligence algorithms. Therefore, the doctor is provided with a dynamic picture of the disease and its impact on lifestyle and vice versa, and can thus plan more personalized diagnostics, therapeutics, and social interventions. This strategy allows for a more effective organization of access to outpatient care and therefore a reduction of emergencies and hospitalizations because exacerbations of the disease can be better prevented and monitored. Hence, it can result in improvements in patients’ quality of life and lower costs for the healthcare system.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105062"},"PeriodicalIF":2.6,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000577/pdfft?md5=04b32d737cc9dd247636adf8505b415a&pid=1-s2.0-S0141933124000577-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A two stage pipeline architecture for hardware implementation of multi-level decomposition of 1-D framelet transform","authors":"Kasetty Praveen Kumar, Aniruddha Kanhe","doi":"10.1016/j.micpro.2024.105064","DOIUrl":"10.1016/j.micpro.2024.105064","url":null,"abstract":"<div><p>In this paper a two stage pipeline architecture for computation of multilevel decomposition of framelet transform is proposed. To handle the problem of perfect reconstruction, an area efficient symmetric extension router is used that duplicates the appropriate number of data samples of input signal at the boundary followed by reflection about the symmetry axis. In addition, to reduce the period and number of clock cycles required for computing the framelet transform, the inter-stage and intrastage pipeline of the computational units is maximized. The inter-stage pipelining is obtained by distributing the various levels of decomposition among the computational units of two stages, and a synchronization mechanism is adopted to reduce the total number of clock cycles. Similarly, the intrastage pipelining is achieved by using the pipeline registers such that the clock period is limited to the delay of multiplier and accumulator (MAC) circuit of the finite-impulse response (FIR) filter. To validate the feasibility and functionality of the proposed hardware architecture, the design is implemented on Artix7 XC7A100TCSG324-1 field-programmable gate array (FPGA) for the case of framelet transform with one low-pass and two high-pass filters. The proposed architecture is able to operate at a maximum clock frequency of 112 MHz.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105064"},"PeriodicalIF":2.6,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141138587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iman Firmansyah , Yoshiki Yamaguchi , Tsutomu Maruyama , Yuta Matsuura , Zhang Heming , Shin Kawai , Hajime Nobuhara
{"title":"FPGA-based stereo matching for crop height measurement using monocular camera","authors":"Iman Firmansyah , Yoshiki Yamaguchi , Tsutomu Maruyama , Yuta Matsuura , Zhang Heming , Shin Kawai , Hajime Nobuhara","doi":"10.1016/j.micpro.2024.105063","DOIUrl":"10.1016/j.micpro.2024.105063","url":null,"abstract":"<div><p>We have proposed a hardware-accelerated drone to analyze the condition of farmland right then and there; as a first step, we report that the proposed system can take crop height measurements with high accuracy using a monocular camera. The proposed three-dimensional farmland is generated using stereo matching, where a drone with a monocular camera can extend the parallax distance as the length between two positions when taking a ground image. This means that our approach can improve the accuracy of a reconstructed 3D farmland. In addition, toward real-time computation and low power consumption, the proposed hardware design accelerates image processing efficiently. Thus, to achieve this, we propose a strategy that combines the semi-global matching (SGM) with single path direction and a sum of absolute difference (SAD) with reduced disparity searching length. For example, a semi-global matching (SGM) was employed to smooth the disparity map result before checking the consistency, where the scan line was performed in one direction, from left to right, to speed up the computation time. The experimental result shows that the computation time performed by Xilinx Zynq ZCU102 FPGA achieves 0.77 s for the stereo data set images with 1536 × 1024 pixels resolution. To meet the real-time application and reduce the FPGA resources toward lower power consumption, the experiment discusses reducing the disparity searching length for the SAD computation. In our experiment, the execution time is less than 40 milliseconds, and the circuit volume is around 9,500 LUTs, equivalent to a small-size FPGA. Finally, we also estimated the object's height; a value of 0.43 m was estimated for the object with a physical height of 0.45 m. Meanwhile, for the object with a physical height of 0.65 m, a value of 0.63 m was estimated.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105063"},"PeriodicalIF":2.6,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141053941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OpSAVE: Eviction Based Scheme for Efficient Optical Network-on-Chip","authors":"Uzmat Ul Nisa, Janibul Bashir","doi":"10.1016/j.micpro.2024.105061","DOIUrl":"10.1016/j.micpro.2024.105061","url":null,"abstract":"<div><p>For on-chip networks, nanophotonics has been considered a strong alternative owing to its high speed (due to low latency) and high bandwidth (due to wavelength division multiplexing). However, the major hurdle in the adoption of nanophotonic-based on-chip networks is their high static power consumption. Various proposals are there in the literature which try to reduce the static power consumption either by modulating the laser or by allowing the on-chip stations to share the photonic channels. In this paper, we propose <em>OpSAVE</em>— an optical NoC that combines the above two strategies to effectively reduce static power consumption. It proposes a superior prediction mechanism based on the eviction details from the private caches. It explains how shared channels can be used to dynamically balance the load and at the same time handle mispredictions. It allows the optical stations to share both the power and the available bandwidth to increase their utilization. Moreover, <em>OpSAVE</em> proposes to use a double pumping strategy to improve the system performance. We compared our scheme with the state-of-the-art proposals in this domain and the results show that our scheme consumes 4.4X less optical power and at the same time improves the performance by nearly 28%. In the evaluation, we have considered the multicore benchmarks from the Splash and Parsec benchmark suites.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105061"},"PeriodicalIF":2.6,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141052214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low latency FPGA implementation of NTT for Kyber","authors":"Mohamed Saoudi, Akram Kermiche, Omar Hocine Benhaddad, Nadir Guetmi, Boufeldja Allailou","doi":"10.1016/j.micpro.2024.105059","DOIUrl":"10.1016/j.micpro.2024.105059","url":null,"abstract":"<div><p>This paper presents an FPGA implementation of Number Theoretic Transform (NTT) for the Kyber Post-Quantum Cryptographic (PQC) standard. NTT is the slowest process within Kyber thus a large number of efforts has been conducted to enhance its computational efficiency. Leveraging parallelism and dedicated multipliers, our design achieves state-of-the-art latency, performing NTT/INTT in just 0.4/<span><math><mrow><mn>0</mn><mo>.</mo><mn>5</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span>, surpassing existing designs by at least 3.75/3 times. The proposed design is implemented on the cost-effective Artix-7 FPGA.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"107 ","pages":"Article 105059"},"PeriodicalIF":2.6,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140792938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ExTern: Boosting RISC-V core performance using ternary encoding","authors":"Farhad EbrahimiAzandaryani, Dietmar Fey","doi":"10.1016/j.micpro.2024.105058","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105058","url":null,"abstract":"<div><p>This paper presents an effective <span><math><mi>μ</mi></math></span>-architectural design method, called ExTern, to enhance the performance of a RISC-V processor experiencing computation bottlenecks. ExTern involves integrating Canonical Signed Digit (CSD) representation, a ternary number system enabling carry/borrow-free addition/subtraction in constant time <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>, into the RISC-V processor, particularly into the execution stage. Furthermore, it adopts an extended six-stage pipeline architecture to maximize employed encoding benefits, leading to more improvement in overall execution time and throughput. Despite the presence of optimized circuits, such as fast carry chain (CARRY4) modules for binary encoding on FPGA, the customized processor applying ExTern, RISC-VT, showcases remarkable improvement in computation performance. Experimental results demonstrate a 34.3% (12.2%) improvement in working frequency leading to a lower 31% (11.5%) execution time and a 32% (12%) increase in throughput compared to a State-of-the-Art open-source five(six)-stage RISC-V processor.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"107 ","pages":"Article 105058"},"PeriodicalIF":2.6,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S014193312400053X/pdfft?md5=5219c364add625230da3e174054a963d&pid=1-s2.0-S014193312400053X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140620839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}