Microprocessors and Microsystems最新文献

筛选
英文 中文
SIMIL: SIMple Issue Logic for GPUs SIMIL:用于 GPU 的简单问题逻辑
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-10-09 DOI: 10.1016/j.micpro.2024.105105
Rodrigo Huerta , José-Lorenzo Cruz , Jose-Maria Arnau , Antonio González
{"title":"SIMIL: SIMple Issue Logic for GPUs","authors":"Rodrigo Huerta ,&nbsp;José-Lorenzo Cruz ,&nbsp;Jose-Maria Arnau ,&nbsp;Antonio González","doi":"10.1016/j.micpro.2024.105105","DOIUrl":"10.1016/j.micpro.2024.105105","url":null,"abstract":"<div><div>GPU architectures have become popular for executing general-purpose programs. In particular, they are some of the most efficient architectures for machine learning applications which are among the most trendy and demanding applications nowadays.</div><div>This paper presents SIMIL (SIMple Issue Logic for GPUs), an architectural modification to the issue stage that replaces scoreboards with a Dependence Matrix to track dependencies among instructions and avoid data hazards. We show that a Dependence Matrix is more effective in the presence of repetitive use of source operands, which is common in many applications. Besides, a Dependence Matrix with minor extensions can also support a simplistic out-of-order issue. Evaluations on an NVIDIA Tesla V100-like GPU show that SIMIL provides a speed-up of up to 2.39 in some machine learning programs and 1.31 on average for various benchmarks, while it reduces energy consumption by 12.81%, with only 1.5% area overhead. We also show that SIMIL outperforms a recently proposed approach for out-of-order issue that uses register renaming.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"111 ","pages":"Article 105105"},"PeriodicalIF":1.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142531711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hardware architecture for single and multiple ellipse detection using genetic algorithms and high-level synthesis tools 利用遗传算法和高级合成工具实现单椭圆和多椭圆检测的硬件架构
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-10-09 DOI: 10.1016/j.micpro.2024.105106
Francisco J. Iñiguez-Lomeli , Carlos H. Garcia-Capulin , Horacio Rostro-Gonzalez
{"title":"A hardware architecture for single and multiple ellipse detection using genetic algorithms and high-level synthesis tools","authors":"Francisco J. Iñiguez-Lomeli ,&nbsp;Carlos H. Garcia-Capulin ,&nbsp;Horacio Rostro-Gonzalez","doi":"10.1016/j.micpro.2024.105106","DOIUrl":"10.1016/j.micpro.2024.105106","url":null,"abstract":"<div><div>Ellipse detection techniques are often developed and validated in software environments, neglecting the critical consideration of computational efficiency and resource constraints prevalent in embedded systems. Furthermore, programmable logic devices, notably Field Programmable Gate Arrays (FPGAs), have emerged as indispensable assets for enhancing performance and expediting various processing applications. In the realm of computational efficiency, hardware implementations have the flexibility to tailor the required arithmetic for various applications using fixed-point representation. This approach enables faster computations while upholding adequate accuracy, resulting in reduced resource and energy consumption compared to software applications that rely on higher clock speeds, which often lead to increased resource and energy consumption. Additionally, hardware solutions provide portability and are suitable for resource-constrained and battery-powered applications. This study introduces a novel hardware architecture in the form of an intellectual property core that harnesses the capabilities of a genetic algorithm to detect single and multi ellipses in digital images. In general, genetic algorithms have been demonstrated to be an alternative that shows better results than those based on traditional methods such as the Hough Transform and Random Sample Consensus, particularly in terms of accuracy, flexibility, and robustness. Our genetic algorithm randomly takes five edge points as parameters from the image tested, creating an individual treated as a potential candidate ellipse. The fitness evaluation function determines whether the candidate ellipse truly exists in the image space. The core is designed using Vitis High-Level Synthesis (HLS), a powerful tool that converts C or C++functions into Register-Transfer Level (RTL) code, including VHDL and Verilog. The implementation and testing of the ellipse detection system were carried out on the PYNQ-Z1, a cost-effective development board housing the Xilinx Zynq-7000 System-on-Chip (SoC). PYNQ, an open-source framework, seamlessly integrates programmable logic with a dual-core ARM Cortex-A9 processor, offering the flexibility of Python programming for the onboard SoC processor. The experimental results, based on synthetic and real images, some of them with the presence of noise processed by the developed ellipse detection system, highlight the intellectual property core’s exceptional suitability for resource-constrained embedded systems. Notably, it achieves remarkable performance and accuracy rates, consistently exceeding 99% in most cases. This research aims to contribute to the advancement of hardware-accelerated ellipse detection, catering to the demanding requirements of real-time applications while minimizing resource consumption.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"111 ","pages":"Article 105106"},"PeriodicalIF":1.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142432862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tuning high-level synthesis SpMV kernels in Alveo FPGAs 在 Alveo FPGA 中调整高级合成 SpMV 内核
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-10-01 DOI: 10.1016/j.micpro.2024.105104
Federico Favaro , Ernesto Dufrechou , Juan P. Oliver , Pablo Ezzatti
{"title":"Tuning high-level synthesis SpMV kernels in Alveo FPGAs","authors":"Federico Favaro ,&nbsp;Ernesto Dufrechou ,&nbsp;Juan P. Oliver ,&nbsp;Pablo Ezzatti","doi":"10.1016/j.micpro.2024.105104","DOIUrl":"10.1016/j.micpro.2024.105104","url":null,"abstract":"<div><div>Sparse Matrix-Vector Multiplication (SpMV) is an essential operation in scientific and engineering fields, with applications in areas like finite element analysis, image processing, and machine learning. To address the need for faster and more energy-efficient computing, this paper investigates the acceleration of SpMV through Field-Programmable Gate Arrays (FPGAs), leveraging High-Level Synthesis (HLS) for design simplicity. Our study focuses on the AMD-Xilinx Alveo U280 FPGA, assessing the performance of the SpMV kernel from Vitis Libraries, which is the state of the art on SpMV acceleration on FPGAs. We explore kernel modifications, transition to single precision, and varying partition sizes, demonstrating the impact of these changes on execution time. Furthermore, we investigate matrix preprocessing techniques, including Reverse Cuthill-McKee (RCM) reordering and a hybrid sparse storage format, to enhance efficiency. Our findings reveal that the performance of FPGA-accelerated SpMV is influenced by matrix characteristics, by smaller partition sizes, and by specific preprocessing techniques delivering notable performance improvements. By selecting the best results from these experiments, we achieved execution time enhancements of up to 3.2<span><math><mo>×</mo></math></span>. This study advances the understanding of FPGA-accelerated SpMV, providing insights into key factors that impact performance and potential avenues for further improvement.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105104"},"PeriodicalIF":1.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SLOPE: Safety LOg PEripherals implementation and software drivers for a safe RISC-V microcontroller unit SLOPE:用于安全 RISC-V 微控制器单元的安全 LOg PEripherals 实现和软件驱动程序
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-09-19 DOI: 10.1016/j.micpro.2024.105103
Francesco Cosimi , Antonio Arena , Sergio Saponara , Paolo Gai
{"title":"SLOPE: Safety LOg PEripherals implementation and software drivers for a safe RISC-V microcontroller unit","authors":"Francesco Cosimi ,&nbsp;Antonio Arena ,&nbsp;Sergio Saponara ,&nbsp;Paolo Gai","doi":"10.1016/j.micpro.2024.105103","DOIUrl":"10.1016/j.micpro.2024.105103","url":null,"abstract":"<div><p>The focus of this manuscript is related to the main safety issues regarding a mixed criticality system running multiple concurrent tasks. Our concerns are related to the guarantee of Freedom of Interference between concurrent partitions, and to the respect of the Worst Case Execution Time for tasks. Moreover, we are interested in the evaluation of resources budgeting and the study of system behavior in case of occurring random hardware failures. In this paper we present a set of Safety LOg PEripherals (SLOPE): Performance Monitoring Unit (PMU), Execution Tracing Unit (ETU), Error Management Unit (EMU), Time Management Unit (TMU) and Data Log Unit (DLU); then, an implementation of SLOPE on a single core RISC-V architecture is proposed. Such peripherals are able to collect software and hardware information about execution, and eventually trigger recovery actions to mitigate a possible dangerous misbehavior. We show results of the hardware implementation and software testing of the units with a dedicated software library. For the PMU we standardized the software layer according to embedded Performance Application Programming Interface (ePAPI), and compared its functionality with a bare-metal use of the library. To test the ETU we compared the hardware simulation results with software ones, to understand if overflow may occur in internal hardware buffers during tracing. In conclusion, designed devices introduce new instruments for system investigation for RISC-V technologies and can generate an execution profile for safety related tasks.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105103"},"PeriodicalIF":1.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142274383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RED-SEA Project: Towards a new-generation European interconnect RED-SEA 项目:建立新一代欧洲互连网
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-09-16 DOI: 10.1016/j.micpro.2024.105102
Maria Engracia Gomez , Julio Sahuquillo , Andrea Biagioni , Nikos Chrysos , Damien Berton , Ottorino Frezza , Francesca Lo Cicero , Alessandro Lonardo , Michele Martinelli , Pier Stanislao Paolucci , Elena Pastorelli , Francesco Simula , Matteo Turisini , Piero Vicini , Roberto Ammendola , Carlotta Chiarini , Chiara De Luca , Fabrizio Capuani , Adrián Castelló , Jose Duro , Simon Pickartz
{"title":"RED-SEA Project: Towards a new-generation European interconnect","authors":"Maria Engracia Gomez ,&nbsp;Julio Sahuquillo ,&nbsp;Andrea Biagioni ,&nbsp;Nikos Chrysos ,&nbsp;Damien Berton ,&nbsp;Ottorino Frezza ,&nbsp;Francesca Lo Cicero ,&nbsp;Alessandro Lonardo ,&nbsp;Michele Martinelli ,&nbsp;Pier Stanislao Paolucci ,&nbsp;Elena Pastorelli ,&nbsp;Francesco Simula ,&nbsp;Matteo Turisini ,&nbsp;Piero Vicini ,&nbsp;Roberto Ammendola ,&nbsp;Carlotta Chiarini ,&nbsp;Chiara De Luca ,&nbsp;Fabrizio Capuani ,&nbsp;Adrián Castelló ,&nbsp;Jose Duro ,&nbsp;Simon Pickartz","doi":"10.1016/j.micpro.2024.105102","DOIUrl":"10.1016/j.micpro.2024.105102","url":null,"abstract":"<div><div>RED-SEA is a H2020 EuroHPC project, whose main objective is to prepare a new-generation European Interconnect, capable of powering the EU Exascale systems to come, through an economically viable and technologically efficient interconnect, leveraging European interconnect technology (BXI) associated with standard and mature technology (Ethernet), previous EU-funded initiatives, as well as open standards and compatible APIs.</div><div>To achieve this objective, the RED-SEA project is being carried out around four key pillars: (i) network architecture and workload requirements-interconnects co-design – aiming at optimizing the fit with the other EuroHPC projects and with the EPI processors; (ii) development of a high-performance, low-latency, seamless bridge with Ethernet; (iii) efficient network resource management, including congestion and Quality-of-Service; and (iv) end-to-end functions implemented at the network edges.</div><div>This paper presents key achievements and results at the midterm of the project for each key pillar in the way to reach the final project objective. In this regard we can highlight: (i) The definition of the network requirements and architecture as well as a list of benchmarks and applications; (ii) In addition to initially planned IPs progress, BXI3 architecture has evolved to support natively Ethernet at low level, resulting in reduced complexity, with advantages in terms of cost optimization, and power consumption; (iii) The congestion characterization of target applications and proposals to reduce this congestion by the optimization of collective communication primitives, injection throttling and adaptive routing; and (iv) the low-latency high-message rate endpoint functions and their connection with new open technologies.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105102"},"PeriodicalIF":1.9,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000978/pdfft?md5=078031f75a9ce320a049b03c1e432247&pid=1-s2.0-S0141933124000978-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142314850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent advances in Machine Learning based Advanced Driver Assistance System applications 基于机器学习的高级驾驶辅助系统应用的最新进展
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-09-12 DOI: 10.1016/j.micpro.2024.105101
Guner Tatar , Salih Bayar , Ihsan Cicek , Smail Niar
{"title":"Recent advances in Machine Learning based Advanced Driver Assistance System applications","authors":"Guner Tatar ,&nbsp;Salih Bayar ,&nbsp;Ihsan Cicek ,&nbsp;Smail Niar","doi":"10.1016/j.micpro.2024.105101","DOIUrl":"10.1016/j.micpro.2024.105101","url":null,"abstract":"<div><p>In recent years, the rise of traffic in modern cities has demanded novel technology to support the drivers and protect the passengers and other third parties involved in transportation. Thanks to rapid technological progress and innovations, many Advanced Driver Assistance Systems (A/DAS) based on Machine Learning (ML) algorithms have emerged to address the increasing demand for practical A/DAS applications. Fast and accurate execution of A/DAS algorithms is essential for preventing loss of life and property. High-speed hardware accelerators are vital for processing the high volume of data captured by increasingly sophisticated sensors and complex mathematical models’ execution of modern deep learning (DL) algorithms. One of the fundamental challenges in this new era is to design energy-efficient and portable ML-enabled platforms for vehicles to provide driver assistance and safety. This article presents recent progress in ML-driven A/DAS technology to offer new insights for researchers. We covered standard ML models and optimization approaches based on widely accepted open-source frameworks extensively used in A/DAS applications. We have also highlighted related articles on ML and its sub-branches, neural networks (NNs), and DL. We have also reported the implementation issues, bench-marking problems, and potential challenges for future research. Popular embedded hardware platforms such as Field Programmable Gate Arrays (FPGAs), central processing units (CPUs), Graphical Processing Units (GPUs), and Application Specific Integrated Circuits (ASICs) used to implement A/DAS applications are also compared concerning their performance and resource utilization. We have examined the hardware and software development environments used in implementing A/DAS applications and reported their advantages and disadvantages. We provided performance comparisons of usual A/DAS tasks such as traffic sign recognition, road and lane detection, vehicle and pedestrian detection, driver behavior, and multiple tasking. Considering the current research dynamics, A/DAS will remain one of the most popular application fields for vehicular transportation shortly.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105101"},"PeriodicalIF":1.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proactive deadlock prevention based on traffic classification sub-graphs for triplet-based NoC TriBA-cNoC 基于流量分类子图的主动死锁预防,适用于基于三胞胎的 NoC TriBA-cNoC
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-08-31 DOI: 10.1016/j.micpro.2024.105091
Karim Soliman, Shi Feng, Ruan Shengqiang, Chunfeng Li
{"title":"Proactive deadlock prevention based on traffic classification sub-graphs for triplet-based NoC TriBA-cNoC","authors":"Karim Soliman,&nbsp;Shi Feng,&nbsp;Ruan Shengqiang,&nbsp;Chunfeng Li","doi":"10.1016/j.micpro.2024.105091","DOIUrl":"10.1016/j.micpro.2024.105091","url":null,"abstract":"<div><p>Network topology and routing algorithms stand as pivotal decision points that profoundly impact the performance of Network-on-Chip (NoC) systems. As core counts rise, so does the inherent competition for shared resources, spotlighting the critical need for meticulously designed routing algorithms that circumvent deadlocks to ensure optimal network efficiency. This research capitalizes on the Triplet-Base Architecture (TriBA) and its Distributed Minimal Routing Algorithm (DM4T) to overcome the limitations of previous approaches. While DM4T exhibits performance advantages over previous routing algorithms, its deterministic nature and potential for circular dependencies during routing can lead to deadlocks and congestion. Therefore, this work addresses these vulnerabilities while leveraging the performance benefits of TriBA and DM4T. This work introduces a novel approach that merges a proactive deadlock prevention mechanism with Intermediate Adjacent Shortest Path Routing (IASPR). This combination guarantees both deadlock-free and livelock-free routing, ensuring reliable communication within the network. The key to this integration lies in a flow model-based data transfer categorization technique. This technique prevents the formation of circular dependencies. Additionally, it reduces redundant distance calculations during the routing process. By addressing these challenges, the proposed approach achieves improvements in both routing latency and throughput. To rigorously assess the performance of TriBA network topologies under varying configurations, extensive simulations were undertaken. The investigation encompassed both TriBA networks comprising 9 nodes and those with 27 nodes, employing DM4T, IASPR routing algorithms, and the proactive deadlock prevention method. The gem5 simulator, operating under the Garnet 3.0 network model using a standalone protocol for synthetic traffic patterns, was utilized for simulations at high injection rates, spanning diverse synthetic traffic patterns and PARSEC benchmark suite applications. Simulations rigorously quantified the effectiveness of the proposed approach, revealing reductions in average latency 40.17% and 34.05% compared to the lookup table and DM4T, respectively. Additionally, there were notable increases in average throughput of 7.48% and 5.66%.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105091"},"PeriodicalIF":1.9,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel lightweight multi-factor authentication scheme for MQTT-based IoT applications 基于 MQTT 的物联网应用的新型轻量级多因素身份验证方案
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-08-30 DOI: 10.1016/j.micpro.2024.105088
Manasha Saqib , Ayaz Hassan Moon
{"title":"A novel lightweight multi-factor authentication scheme for MQTT-based IoT applications","authors":"Manasha Saqib ,&nbsp;Ayaz Hassan Moon","doi":"10.1016/j.micpro.2024.105088","DOIUrl":"10.1016/j.micpro.2024.105088","url":null,"abstract":"<div><p>The present authentication solutions employed in the Internet of Things (IoT) are either inadequate or computationally intensive, given the resource-constrained nature of IoT devices. This challenges the researchers to devise efficient solutions to embed an important security tenet like <em>authentication</em>. In IoT, the most popular machine-to-machine communication protocol used at the application layer is <em>Message Queuing Telemetry Transport (MQTT)</em>. However, the MQTT protocol inherently lacks security-related functions, like <em>authentication, authorization, confidentiality, access control,</em> and <em>data integrity</em>, which is unacceptable for IoT-driven mission-critical applications when connected over public networks. In such a situation, the security is hardened by employing a transport layer security protocol like TLS, which entails significant computational overheads. This paper presents a novel scheme to enhance MQTT security by providing a lightweight multi-factor authentication scheme based on Elliptical curve cryptography. The proposed scheme uses a low-cost signature and a fuzzy extractor to correct errors in imprinted biometrics in noisy environments. This scheme attains mutual authentication, generates a securely agreed-upon session key for secret communication, and guarantees perfect forward secrecy. Furthermore, the rigorous informal security analysis shows the proposed scheme resists cryptographic attacks, including known session critical attacks. Furthermore, an empirical study has been carried out to assess the effectiveness of the proposed scheme in the Cooja simulated environment.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105088"},"PeriodicalIF":1.9,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142163118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test generation algorithm for QCA circuits targeting novel defects and its corresponding fault models 针对新型缺陷的 QCA 电路测试生成算法及其相应的故障模型
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-08-30 DOI: 10.1016/j.micpro.2024.105090
Vaishali Dhare, Usha Mehta
{"title":"Test generation algorithm for QCA circuits targeting novel defects and its corresponding fault models","authors":"Vaishali Dhare,&nbsp;Usha Mehta","doi":"10.1016/j.micpro.2024.105090","DOIUrl":"10.1016/j.micpro.2024.105090","url":null,"abstract":"<div><p>Considering the scaling limitations of current Complementary Metal Oxide Semiconductor (CMOS) technology, Quantum-dot-Cellular Automata (QCA) is emerging as one of the alternatives. QCA being at the molecular scale, defects are more likely to occur in it. Therefore, substantial development of QCA-oriented defects, its corresponding fault models and test generation is required. In this paper, a test generation algorithm for a QCA combinational circuit is proposed. The FAN (A Fanout Oriented) test generation algorithm is extended for QCA. The proposed Automatic Test Pattern Generator (ATPG) for QCA targets Single Stuck at Fault (SSF) set produced by novel Multiple Missing Cells (MMC) defects. The proposed ATPG is based on the QCA-oriented test generation properties and guided by proposed testability measures.</p><p>The MCNC benchmark circuits are synthesized into QCA using proposed synthesis algorithms to check the effectiveness of the proposed ATPG. The ATPG is developed using C++ and tested on MCNC benchmark circuits. Further, ATPG-generated test vectors are validated at the QCA device level to demonstrate their correctness. The QCADesigner-E tool is used for the device-level implementation of the MCNC benchmark circuit.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105090"},"PeriodicalIF":1.9,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimized k-Nearest neighbors search implementation on resource-constrained FPGA platforms 在资源受限的 FPGA 平台上实现优化的 k 近邻搜索
IF 1.9 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2024-08-10 DOI: 10.1016/j.micpro.2024.105089
Sandra Djosic, Milica Jovanovic, Goran Lj. Djordjevic
{"title":"Optimized k-Nearest neighbors search implementation on resource-constrained FPGA platforms","authors":"Sandra Djosic,&nbsp;Milica Jovanovic,&nbsp;Goran Lj. Djordjevic","doi":"10.1016/j.micpro.2024.105089","DOIUrl":"10.1016/j.micpro.2024.105089","url":null,"abstract":"<div><p>The k-Nearest Neighbors (kNN) algorithm is a fundamental machine learning classification technique with wide-ranging applications. Among various kNN implementation choices, FPGA-based heterogeneous systems have gained popularity due to FPGA's inherent parallelism, energy efficiency, and reconfigurability. However, implementing the kNN algorithm on resource-constrained embedded FPGA platforms, typically characterized by constrained programmable resources shared among various application-specific hardware units, necessitates a kNN accelerator architecture that balances high performance, hardware efficiency, and flexibility. To address this challenge, in this paper, we present a kNN hardware accelerator unit designed to optimize resource utilization by utilizing sequential, i.e. accumulation-based, instead of pipelined/parallel distance computations. The proposed architecture incorporates two key algorithmic optimizations to reduce the iteration count of the sequential distance computation loop: a dynamic lower bound enabling early termination of the distance computation and an online element selection that maximizes partial distance growth per iteration. We further enhance the accelerator's performance by incorporating multiple optimized sequential distance computation units, each dedicated to processing a segment of the training dataset. Our experiments demonstrate that the proposed approach is scalable, making it applicable to various hardware platforms and resource constraints. In particular, when implemented on an AMD Zynq device, the proposed single-core kNN accelerator occupies a mere 5 % of the FPGA's resources while delivering a speedup of 3 – 5 times compared to the kNN software implementation running on the accompanying ARM A9 processor. For the 8-core kNN accelerator, the resource utilization stands at 30 <span><math><mo>%</mo></math></span>, while the speedup factor ranges between 25 and 35.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105089"},"PeriodicalIF":1.9,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信