Juan Encinas, Alfonso Rodríguez, Andrés Otero, Eduardo de la Torre
{"title":"Data-driven modeling of reconfigurable multi-accelerator systems under dynamic workloads","authors":"Juan Encinas, Alfonso Rodríguez, Andrés Otero, Eduardo de la Torre","doi":"10.1016/j.micpro.2024.105050","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105050","url":null,"abstract":"<div><p>Reconfigurable multi-accelerator systems used as computing offloading platforms in edge-cloud continuum scenarios usually have to deal with highly dynamic workloads and operating conditions. In order to properly take advantage of their parallel processing capabilities and increase execution performance for a given workload, these systems need to continuously adapt their configuration (i.e., number and type of accelerators) at run time. When working at the edge, additional requirements such as energy efficiency must be also met. In this paper, Machine Learning techniques are applied to extract predictive models of the execution of different combinations of hardware accelerators on a reconfigurable multi-accelerator platform, aiming at satisfying the previously mentioned continuous optimization needs. One of the key benefits of the proposed approach is that its data-driven models can transparently estimate the impact of the complex interactions between hardware accelerators due to run-time resource contention among them and with the rest of the system, as opposed to traditional modeling approaches that cannot include that information in an easy and scalable way (e.g., analytical models). The proposed models are complemented with a complete infrastructure to generate, execute and monitor dynamic workloads in FPGA-based systems. This infrastructure has been used to (i) quantitatively analyze resource contention in reconfigurable multi-accelerator systems and (ii) produce the training and evaluation datasets for the ML-based models using annotated power consumption and execution performance traces. Experimental results obtained with a reconfigurable multi-accelerator platform based on the ARTICo<sup>3</sup> framework running the MachSuite benchmarks show that the proposed modeling approach is highly effective, with a relative prediction error of less than 5% on average for both power consumption and execution performance. Result also show that the ML-based models achieve high accuracy levels when predicting the impact of resource contention and accelerator interaction on both metrics, with a mean relative prediction error of less than 0.6% and a standard deviation below 4.7% for the worst case.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"107 ","pages":"Article 105050"},"PeriodicalIF":2.6,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000450/pdfft?md5=a52d32f5fafee4bda56df513540d6eb8&pid=1-s2.0-S0141933124000450-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140545665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Santos , E. Mendes , J. Carvalho , F. Alves , J. Azevedo , J. Cabral
{"title":"Hardware accelerated Active Noise Cancellation system using Haar wavelets","authors":"P. Santos , E. Mendes , J. Carvalho , F. Alves , J. Azevedo , J. Cabral","doi":"10.1016/j.micpro.2024.105047","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105047","url":null,"abstract":"<div><p>Active Noise Cancellation (<em>ANC</em>) systems are widely used to mitigate unwanted noises in several applications, such as automotive environments and high-end headsets. Multi-Channel (<em>MC</em>) <em>ANC</em> systems have shown promise in creating improved silent zones. Typically, these systems are implemented on <em>FPGA</em> platforms due to the systolic nature and granularity of optimization of these devices. This article describes the design, implementation, and evaluation of a wavelet-based <em>MC ANC</em> Filtered-x Normalized Least Mean Square (<em>FxNLMS</em>) on an <em>FPGA</em> platform.</p><p>The use of wavelet transform enables the decomposition of complex noise signals into spectrally more compact signals (i.e., easier to process). In this work, for each decomposed signal, an independent <em>NLMS</em> is applied. The system implements 64 parallel <em>NLMS</em> with 1000 coefficients. Additionally, the static <em>FIR</em> filters employed for secondary and tertiary path estimations are of the 2047th order. The system adopts an integer arithmetic architecture and operates at a sampling rate of 47.97 kHz. To assess the performance of the wavelet-based approach, benchmark tests were conducted by comparing it against a similar implementation without the wavelet transform. The evaluation was performed using noise reduction (<em>NR</em>) tests with spectrally rich (20 Hz to 10 kHz) and high dynamic range noises. The experimental setup involved two error microphones and two secondary sources.</p><p>The results show that the wavelet-based version has overall better performance than the traditional implementation, particularly in the higher frequency band of the spectrum (1 kHz to 8 kHz). For instance, in the case of city ambient noise (a realistic noise with high dynamic range), the relative <em>NR</em> achieved was 8.23 dB.</p><p>To the authors’ knowledge, this is the first time that the implementation and field-test of a wavelet-based <em>MC ANC</em> on an <em>FPGA</em> platform was conducted. Moreover, the obtained results show that the novel approach is better in reducing complex noises than the traditional implementation – without wavelets.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"107 ","pages":"Article 105047"},"PeriodicalIF":2.6,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000425/pdfft?md5=694a4b8ef90eac68e2e659134a17a6f8&pid=1-s2.0-S0141933124000425-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140539189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASIC design of power and area efficient programmable FIR filter using optimized Urdhva-Tiryagbhyam Multiplier for impedance cardiography","authors":"Sudhanshu Janwadkar, Rasika Dhavse","doi":"10.1016/j.micpro.2024.105048","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105048","url":null,"abstract":"<div><p>Impedance cardiography (ICG) is a rapidly growing non-invasive cardiac health monitoring approach. Synchronous detection of ICG requires an FIR filter to remove the high-frequency carrier signal. Low power consumption and compact area are critical considerations in the design of portable biomedical systems. This paper proposes a novel product quantization-based optimization strategy for the Urdhva Tiryagbhyam Sutra-based multiplier architecture. This paper presents an ASIC design of a low-power and low-area 64th-order programmable FIR filter architecture using the optimized Urdhva Tiryagbhyam Multiplier. The programmable architecture empowers medical practitioners to select the carrier frequency at which the ICG analysis will be performed. The elimination of redundant multipliers from the design based on the filter coefficients is demonstrated. The programmable Vedic FIR filter architecture (described in VHDL) is implemented on the Basys-3 FPGA board for rapid prototyping. The RTL-to-GDSII flow has been completed using Cadence digital design and sign-off tools for the SCL-180 nm technology. The results indicate that the proposed filter architecture occupies 41.33% less area and 42.16% lower power consumption than the contemporary designs described in the literature.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"107 ","pages":"Article 105048"},"PeriodicalIF":2.6,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140641142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amjad Rehman , Tanzila Saba , Khalid Haseeb , Teg Alam , Gwanggil Jeon
{"title":"IoT-Edge technology based cloud optimization using artificial neural networks","authors":"Amjad Rehman , Tanzila Saba , Khalid Haseeb , Teg Alam , Gwanggil Jeon","doi":"10.1016/j.micpro.2024.105049","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105049","url":null,"abstract":"<div><p>In recent decades, artificial intelligence techniques have been adopted for many real-time applications. The Internet of Things (IoT) network comprises many sensing devices and physical objects for information gathering and further transmission. In addition to being sent to the receiving nodes, the collected data also needs to be received promptly. Also, many solutions have been proposed for IoT-based embedded systems using edge computing but they are not fully protected against unidentified communication threats. In such circumstances, such systems decrease the trust ratio, and communication performance is compromised. In this research, we describe an optimization model based on IoT-edged technology that incorporates cloud computational intelligence. Furthermore, edge nodes employ artificial intelligence algorithms to provide the optimal outcome for selecting trustworthy forwarded data and lengthen the connected time for smart devices. Firstly, the edge devices extract useful information from the IoT nodes, and accordingly, it provides a decision module based on optimization computing. Secondly, utilizing cryptographic approaches, edge technology secures the multi-layers of the IoT system and ensures data privacy with integrity. Finally, the proposed model is tested and verified for its performance than other related studies in terms of energy consumption, packet delivery ratio, and data delay.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105049"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140351221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hand-held GPU accelerated device for multiclass classification of X-ray images using CNN model","authors":"K.G. Satheeshkumar , V. Arunachalam , S. Deepika","doi":"10.1016/j.micpro.2024.105046","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105046","url":null,"abstract":"<div><p>Chest X-ray (CXR) images are the primary investigation aid for many lung diseases and their follow-ups. For diagnosis of SARS-CoV-2, RT–PCR test and chest Computed Tomography (CT) are commonly used but both face false negatives for ruling out the infection. So, there is a demanding need for developing a system combined with Artificial Intelligence (AI) and CXR imaging to detect COVID-19 patients to avoid its spread. Here, a robust and efficient handheld device is proposed. It uses the computational power of the Graphics Processing Unit (GPU) and pre-trained deep learning models for analyzing the CXR images. A Resnet-50 CNN model is deployed on an NVIDIA Jetson Nano GPU module for the real-time classification of COVID-19, Tuberculosis, and Normal using CXR images. The device can perform real-time classification of CXR images from a portable X-ray machine and classify the image into one of the above categories. For the extensive training, a database of 680 COVID-19, 1230 Tuberculosis, and 1050 normal CXR images are extracted by combining several global databases like Kaggle, SIRM, RSNA, and Radiopaedia. The classification accuracy, precision, and loss rate were 0.9879, 0.9758, and 0.0196 respectively and our model would improve with larger data sets. The highly accurate and high-performance GPU device significantly plays a far-reaching role in COVID-19 diagnosis using Chest X-ray, which could be beneficial to triage the health system and to combat the outbreak of COVID-19.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105046"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140537103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CNC: A lightweight architecture for Binary Ring-LWE based PQC","authors":"Shaik Ahmadunnisa, Sudha Ellison Mathe","doi":"10.1016/j.micpro.2024.105044","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105044","url":null,"abstract":"<div><p>In lattice-based cryptography, Ring Learning with Errors (RLWE) is a computationally hard cryptographic problem, comprising three basic mechanisms i.e., key generation, encryption, and decryption. Binary Ring Learning with Error (BRLWE), a new variant of RLWE has been proposed recently to reduce the key size and computational complexity compared to previous RLWE-based schemes. Based on this BRLWE scheme, efficient hardware architectures have been obtained in recent works for lightweight applications. The key operation involved in this scheme is <span><math><mrow><mi>A</mi><mi>B</mi><mo>+</mo><mi>C</mi></mrow></math></span> , where <span><math><mi>A</mi></math></span> and <span><math><mi>C</mi></math></span> are integer polynomials and <span><math><mi>B</mi></math></span> is a binary polynomial. This paper proposes an efficient hardware architecture for BRLWE-based scheme targeted for lightweight applications. The architecture computes the arithmetic operation <span><math><mrow><mi>A</mi><mi>B</mi><mo>+</mo><mi>C</mi></mrow></math></span>, which includes polynomial multiplication and addition over the polynomial ring <span><math><mrow><msub><mrow><mi>Z</mi></mrow><mrow><mi>q</mi></mrow></msub><mo>/</mo><mrow><mo>(</mo><msup><mrow><mi>x</mi></mrow><mrow><mi>n</mi></mrow></msup><mo>+</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>. The proposed architecture is applied in two conditions, fixed and variable values of <span><math><mi>q</mi></math></span>. Experimental results show the architecture proposed has 50% less Area-Delay Product (ADP) and 20% less Power-Delay Product (PDP) compared to the recently reported work for <span><math><mrow><mi>n</mi><mo>=</mo><mn>256</mn></mrow></math></span>.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105044"},"PeriodicalIF":2.6,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140309393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Be My Guesses: The interplay between side-channel leakage metrics","authors":"Julien Béguinot , Wei Cheng , Sylvain Guilley , Olivier Rioul","doi":"10.1016/j.micpro.2024.105045","DOIUrl":"10.1016/j.micpro.2024.105045","url":null,"abstract":"<div><p>In a theoretical context of side-channel attacks, optimal bounds between success rate, guessing entropy and statistical distance are derived with a simple majorization (Schur-concavity) argument. They are further theoretically refined for different versions of the classical Hamming weight leakage model, in particular assuming a priori equiprobable secret keys and additive white Gaussian measurement noise. Closed-form expressions and numerical computation are given. A study of the impact of the choice of the substitution box with respect to side-channel resistance reveals that its nonlinearity tends to homogenize the expressivity of success rate, guessing entropy and statistical distance. The intriguing approximate relation between guessing entropy and success rate <span><math><mrow><mi>G</mi><mi>E</mi><mo>=</mo><mn>1</mn><mo>/</mo><mi>S</mi><mi>R</mi></mrow></math></span> is observed in the case of 8-bit bytes and low noise. The exact relation between guessing entropy, statistical distance and alphabet size <span><math><mrow><mi>G</mi><mi>E</mi><mo>=</mo><mfrac><mrow><mi>M</mi><mo>+</mo><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo>−</mo><mfrac><mrow><mi>M</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mi>S</mi><mi>D</mi></mrow></math></span> for deterministic leakages and equiprobable keys is proved.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"107 ","pages":"Article 105045"},"PeriodicalIF":2.6,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140401449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retraction notice to the articles published in the Special Issue Signal Processing from “Microprocessors and Microsystems”","authors":"","doi":"10.1016/j.micpro.2024.105043","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105043","url":null,"abstract":"","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105043"},"PeriodicalIF":2.6,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000383/pdfft?md5=b791a7c7e5a9bb52a68a4f6dceabab14&pid=1-s2.0-S0141933124000383-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140134565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marina Bulat, Stefan Mirković, Nemanja Gazivoda, Dragan Pejić, Marjan Urekar, Boris Antić
{"title":"An improved algorithm for the estimation of the root mean square value as an optimal solution for commercial measurement equipment","authors":"Marina Bulat, Stefan Mirković, Nemanja Gazivoda, Dragan Pejić, Marjan Urekar, Boris Antić","doi":"10.1016/j.micpro.2024.105042","DOIUrl":"10.1016/j.micpro.2024.105042","url":null,"abstract":"<div><p>This paper demonstrates that direct changes in the algorithm for the estimation of the root mean square value of a voltage signal of an arbitrary waveform can lead to improved performances and lower measurement uncertainty of commercially available instruments without requiring any upgrade of their existing hardware. The research conducted and presented here is an original contribution to the development of estimation techniques and mathematical models for measurement oriented purposes regardless of the number of samples in the given period relying on mathematical calculation of the equal complexity as in the methods already in use. The theoretical approach examines the problem of numerical integration focusing on modified Simpson's 1/3 rule and modified Simpson's 3/8 rule used for the purpose of the estimation of the root mean square value when a small number of samples per period is available. It highlights the limitations of Simpson's 1/3 rule and Simpson's 3/8 rule, and shows that the newly proposed algorithm is optimal with respect to measurement accuracy and precision even in cases when the ratio of the sampling frequency and the signal's fundamental frequency is low. All theoretical results have been validated experimentally.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105042"},"PeriodicalIF":2.6,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Indoor localization using device sensors: A threat to privacy","authors":"Hitesh Verma , Smita Naval , Balaprakasa Rao Killi , Vinod P.","doi":"10.1016/j.micpro.2024.105041","DOIUrl":"10.1016/j.micpro.2024.105041","url":null,"abstract":"<div><p>The localization techniques used in today’s smartphone are mainly based on Global Positioning System (GPS). However, GPS Sensors cannot work properly under in-door and underground locations. Therefore, many applications utilize device sensors such as accelerometer, gyrometer, and magnetometer for indoor localization. In this paper, we present a misuse case of how device sensors can be used to exploit the privacy of a user by geo-tracking. We propose an attack model through which the user location can be compromised without using the GPS sensors. The proposed attack model comprises of two stages. The first stage consists of deployment of the malicious application on the users’ smart-phones and gathering the information of various sensors in the background. The collected sensor data is uploaded to the malicious cloud server set up by the adversary. The second stage consists of pre-processing the sensor data received from the malicious cloud server and plot the user’s trajectory onto a graph in real-time. The proposed attack model is evaluated by developing two applications. The victim application tracks location, direction, and trajectory of the user without any location permission from the user. The proposed model achieves an accuracy of 98% without using special infrastructure and separate training phase. Further, we have discussed three mitigation schemes, which can be adapted by the Android developers in order to protect the user’s privacy.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105041"},"PeriodicalIF":2.6,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140087111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}