Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab
{"title":"A Real-time P-SFA hardware implementation of Deep Neural Networks using FPGA","authors":"Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab","doi":"10.1016/j.micpro.2024.105037","DOIUrl":"10.1016/j.micpro.2024.105037","url":null,"abstract":"<div><p>Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105037"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139919876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CNC: A lightweight architecture for Binary Ring-LWE based PQC","authors":"Shaik Ahmadunnisa, Sudha Ellison Mathe","doi":"10.1016/j.micpro.2024.105044","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105044","url":null,"abstract":"<div><p>In lattice-based cryptography, Ring Learning with Errors (RLWE) is a computationally hard cryptographic problem, comprising three basic mechanisms i.e., key generation, encryption, and decryption. Binary Ring Learning with Error (BRLWE), a new variant of RLWE has been proposed recently to reduce the key size and computational complexity compared to previous RLWE-based schemes. Based on this BRLWE scheme, efficient hardware architectures have been obtained in recent works for lightweight applications. The key operation involved in this scheme is <span><math><mrow><mi>A</mi><mi>B</mi><mo>+</mo><mi>C</mi></mrow></math></span> , where <span><math><mi>A</mi></math></span> and <span><math><mi>C</mi></math></span> are integer polynomials and <span><math><mi>B</mi></math></span> is a binary polynomial. This paper proposes an efficient hardware architecture for BRLWE-based scheme targeted for lightweight applications. The architecture computes the arithmetic operation <span><math><mrow><mi>A</mi><mi>B</mi><mo>+</mo><mi>C</mi></mrow></math></span>, which includes polynomial multiplication and addition over the polynomial ring <span><math><mrow><msub><mrow><mi>Z</mi></mrow><mrow><mi>q</mi></mrow></msub><mo>/</mo><mrow><mo>(</mo><msup><mrow><mi>x</mi></mrow><mrow><mi>n</mi></mrow></msup><mo>+</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>. The proposed architecture is applied in two conditions, fixed and variable values of <span><math><mi>q</mi></math></span>. Experimental results show the architecture proposed has 50% less Area-Delay Product (ADP) and 20% less Power-Delay Product (PDP) compared to the recently reported work for <span><math><mrow><mi>n</mi><mo>=</mo><mn>256</mn></mrow></math></span>.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105044"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140309393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MOSAIC: Maximizing ResOurce Sharing in Behavioral Application SpecIfic ProCessors","authors":"Qilin Si, Benjamin Carrion Schafer","doi":"10.1016/j.micpro.2024.105039","DOIUrl":"10.1016/j.micpro.2024.105039","url":null,"abstract":"<div><p>This work presents a method that can quickly determine which hardware accelerators (HWaccs) should be mapped together onto an Application-Specific Instruction Set Processor (ASIP), such that the resources shared among them are maximized. This work in particular targets HWaccs generated from untimed behavioral descriptions for High-Level Synthesis (HLS). Although HLS is a single process synthesis method, our approach is able to force resource sharing among the HWaccs by combining their behavioral descriptions together into a single description based on their potential to share resources. These shared resources include functional units (FUs) like multipliers, adders, and dividers, and also registers. In particular, our proposed flow leads up to 48% in area savings and on average 30%. Because an exhaustive enumeration of all possible combinations can lead to long runtimes, we propose a fast heuristic that leads to comparable results (only 6% worse on average), while being much faster (on average 500<span><math><mo>×</mo></math></span>).</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105039"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139919853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amjad Rehman , Tanzila Saba , Khalid Haseeb , Teg Alam , Gwanggil Jeon
{"title":"IoT-Edge technology based cloud optimization using artificial neural networks","authors":"Amjad Rehman , Tanzila Saba , Khalid Haseeb , Teg Alam , Gwanggil Jeon","doi":"10.1016/j.micpro.2024.105049","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105049","url":null,"abstract":"<div><p>In recent decades, artificial intelligence techniques have been adopted for many real-time applications. The Internet of Things (IoT) network comprises many sensing devices and physical objects for information gathering and further transmission. In addition to being sent to the receiving nodes, the collected data also needs to be received promptly. Also, many solutions have been proposed for IoT-based embedded systems using edge computing but they are not fully protected against unidentified communication threats. In such circumstances, such systems decrease the trust ratio, and communication performance is compromised. In this research, we describe an optimization model based on IoT-edged technology that incorporates cloud computational intelligence. Furthermore, edge nodes employ artificial intelligence algorithms to provide the optimal outcome for selecting trustworthy forwarded data and lengthen the connected time for smart devices. Firstly, the edge devices extract useful information from the IoT nodes, and accordingly, it provides a decision module based on optimization computing. Secondly, utilizing cryptographic approaches, edge technology secures the multi-layers of the IoT system and ensures data privacy with integrity. Finally, the proposed model is tested and verified for its performance than other related studies in terms of energy consumption, packet delivery ratio, and data delay.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105049"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140351221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hand-held GPU accelerated device for multiclass classification of X-ray images using CNN model","authors":"K.G. Satheeshkumar , V. Arunachalam , S. Deepika","doi":"10.1016/j.micpro.2024.105046","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105046","url":null,"abstract":"<div><p>Chest X-ray (CXR) images are the primary investigation aid for many lung diseases and their follow-ups. For diagnosis of SARS-CoV-2, RT–PCR test and chest Computed Tomography (CT) are commonly used but both face false negatives for ruling out the infection. So, there is a demanding need for developing a system combined with Artificial Intelligence (AI) and CXR imaging to detect COVID-19 patients to avoid its spread. Here, a robust and efficient handheld device is proposed. It uses the computational power of the Graphics Processing Unit (GPU) and pre-trained deep learning models for analyzing the CXR images. A Resnet-50 CNN model is deployed on an NVIDIA Jetson Nano GPU module for the real-time classification of COVID-19, Tuberculosis, and Normal using CXR images. The device can perform real-time classification of CXR images from a portable X-ray machine and classify the image into one of the above categories. For the extensive training, a database of 680 COVID-19, 1230 Tuberculosis, and 1050 normal CXR images are extracted by combining several global databases like Kaggle, SIRM, RSNA, and Radiopaedia. The classification accuracy, precision, and loss rate were 0.9879, 0.9758, and 0.0196 respectively and our model would improve with larger data sets. The highly accurate and high-performance GPU device significantly plays a far-reaching role in COVID-19 diagnosis using Chest X-ray, which could be beneficial to triage the health system and to combat the outbreak of COVID-19.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105046"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140537103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A light-weight neuromorphic controlling clock gating based multi-core cryptography platform","authors":"Pham-Khoi Dong , Khanh N. Dang , Duy-Anh Nguyen , Xuan-Tu Tran","doi":"10.1016/j.micpro.2024.105040","DOIUrl":"10.1016/j.micpro.2024.105040","url":null,"abstract":"<div><p>While speeding up cryptography tasks can be accomplished by using a multi-core architecture to parallelize computation, one of the major challenges is optimizing power consumption. In principle, depending on the computation workload, individual cores can be turned off to save power during operation. However, too few active cores may lead to computational bottlenecks. In this work, we propose a novel platform named Spike-MCryptCores: a low-power multi-core AES platform with a neuromorphic controller. The proposed Spike-MCryptCores platform is composed of multiple AES cores, each core is equipped with a clock-gating scheme for reducing its power consumption while being idle. To optimize the power consumption of the whole platform, we use a neuromorphic controller. Therefore, a comprehensive framework to generate a data set, train the neural network, and produce hardware configuration for the Spiking Neural Network (SNN), a brain-inspired computing paradigm, is also presented in this paper. Moreover, Spike-MCryptCores integrates the hardware SNN inside its architecture to support low-cost and low-latency adaptations. The results show that implemented SNN controller occupies only 2.3 % of the overall area cost while providing the ability to reduce power consumption significantly. The lightweight SNN controller model is trained and tested with up to 95 % accuracy. The maximum difference between the predicted number of cores and the ideal one from the label is one unit only. Under 24 test scenarios, a SNN controller with clock-gating helps Spike-MCryptCores reducing the power consumption by 48.6 % on the average; by 67 % for the best-case scenario, and by 39 % for the worst-case scenario.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105040"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vishal Gundavarapu , P. Gowtham , A. Anita Angeline , P. Sasipriya
{"title":"Design and evaluation of low power and area efficient approximate Booth multipliers for error tolerant applications","authors":"Vishal Gundavarapu , P. Gowtham , A. Anita Angeline , P. Sasipriya","doi":"10.1016/j.micpro.2024.105036","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105036","url":null,"abstract":"<div><p>Approximate computing is an innovative design methodology to reduce the design complexity with an improvement in power efficiency, performance and area by compromising on the requirement of accuracy. In this paper, 8-bit approximate Booth multipliers have been proposed based on the approximate Radix-4 modified Booth encoding algorithm and approximate compressors for partial product accumulation to produce the final products are proposed. Two approximate Probability Based Booth Encoders (PBBE-1 and PBBE-2) have been proposed and used in the Booth multipliers. Error parameters have been measured and compared with the existing approximate booth multipliers. Exact booth multiplier of novel design existing in the literature has also been implemented for comparison purpose. The proposed approximate multipliers are then used in applications like image multiplication and IIR bi-quad filtering to prove their performance. Simulation results prove that the proposed booth multipliers outperform the existing approximate booth multipliers in terms of power and area with better accuracy. Synthesis results prove that the proposed Multiplier 6 was found to be the most efficient with a 56 % power consumption improvement and a 47 % area improvement when compared to the exact multiplier. All the simulations are carried out using Cadence® Genus with 180 nm CMOS process technology.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105036"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139908537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retraction notice to the articles published in the Special Issue Signal Processing from “Microprocessors and Microsystems”","authors":"","doi":"10.1016/j.micpro.2024.105043","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105043","url":null,"abstract":"","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105043"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000383/pdfft?md5=b791a7c7e5a9bb52a68a4f6dceabab14&pid=1-s2.0-S0141933124000383-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140134565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marina Bulat, Stefan Mirković, Nemanja Gazivoda, Dragan Pejić, Marjan Urekar, Boris Antić
{"title":"An improved algorithm for the estimation of the root mean square value as an optimal solution for commercial measurement equipment","authors":"Marina Bulat, Stefan Mirković, Nemanja Gazivoda, Dragan Pejić, Marjan Urekar, Boris Antić","doi":"10.1016/j.micpro.2024.105042","DOIUrl":"10.1016/j.micpro.2024.105042","url":null,"abstract":"<div><p>This paper demonstrates that direct changes in the algorithm for the estimation of the root mean square value of a voltage signal of an arbitrary waveform can lead to improved performances and lower measurement uncertainty of commercially available instruments without requiring any upgrade of their existing hardware. The research conducted and presented here is an original contribution to the development of estimation techniques and mathematical models for measurement oriented purposes regardless of the number of samples in the given period relying on mathematical calculation of the equal complexity as in the methods already in use. The theoretical approach examines the problem of numerical integration focusing on modified Simpson's 1/3 rule and modified Simpson's 3/8 rule used for the purpose of the estimation of the root mean square value when a small number of samples per period is available. It highlights the limitations of Simpson's 1/3 rule and Simpson's 3/8 rule, and shows that the newly proposed algorithm is optimal with respect to measurement accuracy and precision even in cases when the ratio of the sampling frequency and the signal's fundamental frequency is low. All theoretical results have been validated experimentally.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105042"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retraction notice to the articles published in the Special issue Smart Agri from “Microprocessors and Microsystems”","authors":"","doi":"10.1016/j.micpro.2024.105038","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105038","url":null,"abstract":"","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105038"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000334/pdfft?md5=6970ffb236c663c09734849d8c298072&pid=1-s2.0-S0141933124000334-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139993273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}