{"title":"A Reconfigurable Coarse-to-Fine Approach for the Execution of CNN Inference Models in Low-Power Edge Devices","authors":"Auangkun Rangsikunpum, Sam Amiri, Luciano Ost","doi":"10.1049/cdt2/6214436","DOIUrl":"https://doi.org/10.1049/cdt2/6214436","url":null,"abstract":"<div>\u0000 <p>Convolutional neural networks (CNNs) have evolved into essential components for a wide range of embedded applications due to their outstanding efficiency and performance. To efficiently deploy CNN inference models on resource-constrained edge devices, field programmable gate arrays (FPGAs) have become a viable processing solution because of their unique hardware characteristics, enabling flexibility, parallel computation and low-power consumption. In this regard, this work proposes an FPGA-based dynamic reconfigurable coarse-to-fine (C2F) inference of CNN models, aiming to increase power efficiency and flexibility. The proposed C2F approach first coarsely classifies related input images into superclasses and then selects the appropriate fine model(s) to recognise and classify the input images according to their bespoke categories. Furthermore, the proposed architecture can be reprogrammed to the original model using partial reconfiguration (PR) in case the typical classification is required. To efficiently utilise different fine models on low-cost FPGAs with area minimisation, ZyCAP-based PR is adopted. Results show that our approach significantly improves the classification process when object identification of only one coarse category of interest is needed. This approach can reduce energy consumption and inference time by up to 27.2% and 13.2%, respectively, which can greatly benefit resource-constrained applications.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2024 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2/6214436","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"E-Commerce Logistics Software Package Tracking and Route Planning and Optimization System of Embedded Technology Based on the Intelligent Era","authors":"Dan Zhang, Zhiyang Jia","doi":"10.1049/2024/6687853","DOIUrl":"https://doi.org/10.1049/2024/6687853","url":null,"abstract":"<div>\u0000 <p>In the Internet era, the e-commerce industry has risen, its development scale continues to expand, cross-border e-commerce (CBEC) has also been born, and it is now in the stage of sustainable development. The rapid development of CBEC also needs the strong support of logistics, the two are inseparable, and today, the development scale of CBEC is constantly expanding. The existing e-commerce logistics (ECL) model is also gradually unable to meet the increasingly diverse needs of users, and new logistics models need to be actively explored. To change this situation, this paper carried out a specific analysis of CBEC logistics model, and applied embedded technology to ECL, which also built a logistics tracking system. At the same time, combined with the ant colony algorithm, the paper carried out experimental research on the logistics package distribution route planning problem. From the experimental results, in terms of average delivery time, the algorithm’s result was 25.95 hr, while the traditional algorithm was 32.53 hr; in terms of average distribution freight cost, the algorithm’s result was 163.3 yuan, while the traditional algorithm was 257.7 yuan; in terms of average distribution cost, this algorithm’s result was 131.53 yuan, while the traditional algorithm was 211.68 yuan. To sum up, this algorithm could effectively optimize the distribution route of logistics packages and improve the efficiency of package transportation.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2024 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6687853","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142429696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingzhao Shao, Jincheng Shang, Yunsong Li, Yueli Ding, Mingming Zhang, Ke Ren, Yang Liu
{"title":"A Configurable Accelerator for CNN-Based Remote Sensing Object Detection on FPGAs","authors":"Yingzhao Shao, Jincheng Shang, Yunsong Li, Yueli Ding, Mingming Zhang, Ke Ren, Yang Liu","doi":"10.1049/2024/4415342","DOIUrl":"https://doi.org/10.1049/2024/4415342","url":null,"abstract":"<div>\u0000 <p>Convolutional neural networks (CNNs) have been widely used in satellite remote sensing. However, satellites in orbit with limited resources and power consumption cannot meet the storage and computing power requirements of current million-scale artificial intelligence models. This paper proposes a new generation of high flexibility and intelligent CNNs hardware accelerator for satellite remote sensing in order to make its computing carrier more lightweight and efficient. A data quantization scheme for INT16 or INT8 is designed based on the idea of dynamic fixed point numbers and is applied to different scenarios. The operation mode of the systolic array is divided into channel blocks, and the calculation method is optimized to increase the utilization of on-chip computing resources and enhance the calculation efficiency. An RTL-level CNNs field programable gate arrays accelerator with microinstruction sequence scheduling data flow is then designed. The hardware framework is built upon the Xilinx VC709. The results show that, under INT16 or INT8 precision, the system achieves remarkable throughput in most convolutional layers of the network, with an average performance of 153.14 giga operations per second (GOPS) or 301.52 GOPS, which is close to the system’s peak performance, taking full advantage of the platform’s parallel computing capabilities.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2024 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/4415342","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141435679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A FPGA Accelerator of Distributed A3C Algorithm with Optimal Resource Deployment","authors":"Fen Ge, Guohui Zhang, Ziyu Li, Fang Zhou","doi":"10.1049/2024/7855250","DOIUrl":"https://doi.org/10.1049/2024/7855250","url":null,"abstract":"<div>\u0000 <p>The asynchronous advantage actor-critic (A3C) algorithm is widely regarded as one of the most effective and powerful algorithms among various deep reinforcement learning algorithms. However, the distributed and asynchronous nature of the A3C algorithm brings increased algorithm complexity and computational requirements, which not only leads to an increased training cost but also amplifies the difficulty of deploying the algorithm on resource-limited field programmable gate array (FPGA) platforms. In addition, the resource wastage problem caused by the distributed training characteristics of A3C algorithms and the resource allocation problem affected by the imbalance between the computational amount of inference and training need to be carefully considered when designing accelerators. In this paper, we introduce a deployment strategy designed for distributed algorithms aimed at enhancing the resource utilization of hardware devices. Subsequently, a FPGA architecture is constructed specifically for accelerating the inference and training processes of the A3C algorithm. The experimental results show that our proposed deployment strategy reduces resource consumption by 62.5% and decreases the number of agents waiting for training by 32.2%, and the proposed A3C accelerator achieves 1.83× and 2.39× improvements in speedup compared to CPU (Intel i9-13900K) and GPU (NVIDIA RTX 4090) with less power consumption respectively. Furthermore, our design shows superior resource efficiency compared to existing works.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2024 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/7855250","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141246095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tahereh Vasei, Mohammad Ali Saber, Alireza Nahvy, Zainalabedin Navabi
{"title":"An Efficient RTL Design for a Wearable Brain–Computer Interface","authors":"Tahereh Vasei, Mohammad Ali Saber, Alireza Nahvy, Zainalabedin Navabi","doi":"10.1049/2024/5596468","DOIUrl":"10.1049/2024/5596468","url":null,"abstract":"<div>\u0000 <p>This article proposes an efficient and accurate embedded motor imagery-based brain–computer interface (MI-BCI) that meets the requirements for wearable and real-time applications. To achieve a suitable accuracy considering hardware constraints, we explore BCI transducer algorithms, among which Infinite impulse response (IIR) filter, common spatial pattern, and support vector machine are used to <i>preprocess</i>, <i>extract features</i>, and <i>classify data</i>, respectively. With our hardware implementation of these tasks, we have achieved an accuracy of 77%. Our system is designed at register transfer level (RTL) targeting an ASIC implementation, which significantly decreases power consumption, latency, and area compared to the state-of-the-art (SoA) architectures for embedded BCI systems. To this end, we fold IIR filters using time-shared and RAM-based techniques and use hardware-friendly algorithms for the implementation of other tasks. The RTL design is realized on 45 nm CMOS technology consuming 4 mW power and 0.25 mm<sup>2</sup> area, which outperforms the SoA platforms for embedded BCI systems. To further illustrate the outperformance of our design, the proposed architecture is implemented on Virtex-7 field program gate array as a prototyping platform consuming 6 <i>μ</i>J energy with 1.52% area utilization.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2024 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5596468","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140257988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Soesanto, Igi Ardiyanto, Teguh Bharata Adji
{"title":"Adaptive Shrink and Shard Architecture Design for Blockchain Storage Efficiency","authors":"Daniel Soesanto, Igi Ardiyanto, Teguh Bharata Adji","doi":"10.1049/2024/2280828","DOIUrl":"10.1049/2024/2280828","url":null,"abstract":"<div>\u0000 <p>One of the problems in the blockchain is the formation of increasingly large data (big data) because each block must store all the transactions it makes. With the problem of the appearance of extensive data (big data), many studies aim to maintain the data in small amounts. This research combines a sorting data technique and a proper compression technique to obtain efficient data storage on the blockchain. The result of this research is a blockchain platform called Adaptive Shrink and Shard Blockchain (AS<sup>2</sup>BC), which conceptually and computationally can minimize the use of storage space in the blockchain up to 22 times smaller.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2024 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/2280828","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140443803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerated and Highly Correlated ASIC Synthesis of AI Hardware Subsystems Using CGP","authors":"H. C. Prashanth, Madhav Rao","doi":"10.1049/2024/6623637","DOIUrl":"10.1049/2024/6623637","url":null,"abstract":"<div>\u0000 <p>Unconventional functions, including activation functions and power functions, are extremely hard-to-realize primarily due to the difficulty in arriving at the hierarchical design. The hierarchical design allows the synthesis tool to map the functionality with that of standard cells employed through the regular ASIC synthesis flow. For conventional functions, the hierarchical design is structured and then supplied to the synthesis flow, whereas, for unconventional functions, the same method is not reliable, since the current synthesis method does not offer any design-space exploration scheme to arrive at an easy-to-realize design entity. The unconventional functions either take a long synthesis run-time or additional efforts are spent in restructuring the hierarchical design for the desired function to synthesizable ones. Cartesian genetic programing (CGP) allows to not only incorporate custom logic gates for synthesizing the hierarchical design but also aids in the design-space exploration for the targeted function through the custom gates. The CGP configuration evolves difficult-to-realize complex functions with multiple solutions, and filtering through desired Pareto-optimal requirements offers a unique hierarchical design. Incorporating CGP-derived hierarchical designs into the traditional synthesis flow is instrumental for implementing and evaluating higher-order designs comprising nonlinear functional constructs. Six activation functions and power functions that fall in the category of unconventional functions are realized by the CGP method using custom cells to demonstrate the capability. Further, the hierarchical design of these unconventional functions is flattened and compared with the same function that is directly synthesized using basic gates. The CGP-derived synthesis method reports 3× less synthesis time for realizing the complex functions at the hierarchical level compared to the synthesis using basic gate cells. Hardware characteristics and error metrics are also investigated for the CGP realized complex functions and are made freely available for further usage to the research and designers’ community.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2024 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6623637","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140486715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-objective digital circuit block optimisation based on cell mapping in an industrial electronic design automation flow","authors":"Linan Cao, Simon J. Bale, Martin A. Trefzer","doi":"10.1049/cdt2.12062","DOIUrl":"https://doi.org/10.1049/cdt2.12062","url":null,"abstract":"<p>Modern electronic design automation (EDA) tools can handle the complexity of state-of-the-art electronic systems by decomposing them into smaller blocks or cells, introducing different levels of abstraction and staged design flows. However, throughout each independently optimised design step, overheads and inefficiencies can accumulate in the resulting overall design. Performing design-specific optimisation from a more global viewpoint requires more time due to the larger search space but has the potential to provide solutions with improved performanc. In this work, a fully-automated, multi-objective (MO) EDA flow is introduced to address this issue. It specifically tunes drive strength mapping, prior to physical implementation, through MO population-based search algorithms. Designs are evaluated with respect to their power, performance and area (PPA). The proposed approach is aimed at digital circuit optimisation at the block level, where it is capable of expanding the design space and offers a set of trade-off solutions for different case-specific utilisation. We have applied the proposed multi-objective electronic design automation flow (MOEDA) framework to ISCAS-85 and EPFL benchmark circuits by using a commercial 65 nm standard cell library. The experimental results demonstrate how the MOEDA flow enhances the solutions initially generated by the standard digital flow and how simultaneously a significant improvement in PPA metrics is achieved.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"17 3-4","pages":"180-194"},"PeriodicalIF":1.2,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12062","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50144641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and analysis of a novel fast adder using logical effort method","authors":"Hamid Tavakolaee, Gholamreza Ardeshir, Yasser Baleghi","doi":"10.1049/cdt2.12063","DOIUrl":"https://doi.org/10.1049/cdt2.12063","url":null,"abstract":"<p>Addition, as one of the fundamental math operations, is applied widely in Very-large-scale integration systems and digital signal processing, such that the computational speed of a system depends mainly on the computational speed of its adders. There are various types of digital adders based on different methods. A novel adder is proposed which performs addition based on a path with a fewer number of levels, and, hence, with higher computational speed and lower power consumption. The goal and innovation is to design a structured fast adder that has a block that can be expanded to higher bits, and in this design, the calculation speed and power consumption of the proposed circuit are optimal. Each proposed adder circuit has several levels, and the formulae of each level are stated. Each level of the circuit is designed with a number of multiplexers and OR gates. The performance of the proposed adder has been investigated and evaluated in two parts of mathematical calculations and simulation, and it has also been compared with other existing fast adders, such as ripple carry adder, carry skip adder, carry select adder, carry look ahead adder and prefix kogge-stone in cases of 8, 16, 32 and 64 bits. The results show that the proposed collector has a good performance compared to other adders-based power consumption, power delay product and delay area product metrics.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"17 3-4","pages":"195-208"},"PeriodicalIF":1.2,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12063","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50137809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alireza Abdellahi Khorasgani, Mahdi Sajadieh, Mohammad Rouhollah Yazdani
{"title":"Reconstructing a lightweight security protocol in the radio-frequency identification systems","authors":"Alireza Abdellahi Khorasgani, Mahdi Sajadieh, Mohammad Rouhollah Yazdani","doi":"10.1049/cdt2.12064","DOIUrl":"https://doi.org/10.1049/cdt2.12064","url":null,"abstract":"<p>Nowadays, the Internet of things (IoT) has extensively found its way into everyday life, raising the alarm regarding data security and user privacy. However, IoT devices have numerous limitations that inhibit the implementation of optimal cost-effective security solutions. In recent years, researchers have proposed a small number of RFID-based (radio-frequency identification) security solutions for the IoT. The use of RFID to secure IoT systems is growing rapidly, for it provides small-scale efficient security mechanisms. Due to the importance of privacy and security in IoT systems, Chuang and Tu have proposed a lightweight authentication protocol using XCor operation. The purpose is to investigate the security of the mentioned protocol and to show the problems of XCor operations used in this protocol. The authors reveal its vulnerability to various attacks, such as tag impersonation, reader impersonation and de−synchronisation attacks. To solve the problems of the Chuang protocol, a secure authentication protocol that uses the lightweight Plr operation is proposed. A formal security analysis of this protocol is performed based on the BAN (Burrows-Abadi-Needham) logic. Furthermore, a comparison was drawn between the proposed protocol and the existing similar protocols in terms of performance evaluation. The comparison will reveal that the proposed protocol is both lightweight and highly secure.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"17 3-4","pages":"209-223"},"PeriodicalIF":1.2,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12064","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50132670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}