Mrinal Goswami, Jayanta Pal, Mayukh Roy Choudhury, Pritam P. Chougule, Bibhash Sen
{"title":"In memory computation using quantum-dot cellular automata","authors":"Mrinal Goswami, Jayanta Pal, Mayukh Roy Choudhury, Pritam P. Chougule, Bibhash Sen","doi":"10.1049/iet-cdt.2020.0008","DOIUrl":"https://doi.org/10.1049/iet-cdt.2020.0008","url":null,"abstract":"<div>\u0000 <p>The conventional computing system has been facing enormous pressure to cope with the uprising demand for computing speed in today's world. In search of high-speed computing in the nano-scale era, it becomes the utmost necessity to explore a viable alternative to overcome the challenges of the physical limit of complementary-metal-oxide-semiconductor (CMOS). Towards that direction, the processing-in-memory (PIM) is advancing its importance as it keeps the computation as adjacent as possible to memory. It promises to outperform the latencies of the conventional stored-program concept by embedding storage and data computation in a single unit. On the other hand, the bit storing and processing capability of Akers array provides the foundation of PIM. Again, quantum-dot cellular automata (QCA) emerges as a promising nanoelectronic to put back CMOS to give fast-paced devices at the nanoelectronics era. This work presents a novel PIM concept, embedding Akers array in QCA to achieve high-speed computing at the nano-scale era. QCA implementation of universal logic utilizing Akers array signifies its processing power and puts forth its potentials. A universal function is considered for testing the effectiveness of the proposed PIM cell. The performance evaluation indicates the efficacy of QCA PIM over the conventional Von Neumann architecture.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"336-343"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2020.0008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72190064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient parallelisation of the packet classification algorithms on multi-core central processing units using multi-threading application program interfaces","authors":"Mahdi Abbasi, Milad Rafiee","doi":"10.1049/iet-cdt.2019.0118","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0118","url":null,"abstract":"<div>\u0000 <p>The categorisation of network packets according to multiple parameters such as sender and receiver addresses is called packet classification. Packet classification lies at the core of Software-Defined Networking (SDN)-based network applications. Due to the increasing speed of network traffic, there is an urgent need for packet classification at higher speeds. Although it is possible to accelerate packet classification algorithms through hardware implementation, this solution imposes high costs and offers limited development capacity. On the other hand, current software methods to solve this problem are relatively slow. A practical solution to this problem is to parallelise packet classification using multi-core processors. In this study, the Thread, parallel patterns library (PPL), open multi-processing (OpenMP), and threading building blocks (TBB) libraries are examined and implemented to parallelise three packet classification algorithms, i.e. tuple space search, tuple pruning search, and hierarchical tree. According to the results, the type of algorithm and rulesets may influence the performance of parallelisation libraries. In general, the TBB-based method shows the best performance among parallelisation libraries due to using a theft mechanism and can accelerate the classification process up to 8.3 times on a system with a quad-core processor.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"313-321"},"PeriodicalIF":1.2,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0118","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72147016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High throughput and area-efficient FPGA implementation of AES for high-traffic applications","authors":"Karim Shahbazi, Seok-Bum Ko","doi":"10.1049/iet-cdt.2019.0179","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0179","url":null,"abstract":"<div>\u0000 <p>This study presents a high throughput field-programmable gate array (FPGA) implementation of advanced encryption standard-128 (AES-128). AES is a well-known symmetric key encryption algorithm with high security against different attacks that are widely used in different applications. The main goal of this study is to design a high throughput and FPGA efficiency (FPGA-Eff) cryptosystem for high-traffic applications. To achieve high throughput, loop-unrolling, inner and outer pipelining techniques are employed. In AES, substitution bytes (Sub-Bytes) is one of the costly functions that occupy a large number of resources and has a large delay. To reduce the area of Sub-Bytes, new-affine-transformation, which is the combination of inverse isomorphic and affine transformation, is proposed and employed. Besides that, AES has been modified according to the proposed architecture. For the first nine rounds, Shift-Rows and Sub-Bytes have been exchanged, and Shift-Rows is merged with Add-Round-Key. To make an equal latency between stages, Mix-Columns is divided into two different stages. AES is implemented in counter mode on Xilinx Virtex-5 using VHDL. The proposed implementation achieves a throughput of 79.7 Gbps, FPGA-Eff of 13.3 Mbps/slice, and frequency of 622.4 MHz. Compared to the state-of-the-art work, the proposed design has improved data throughput by 8.02% and FPGA-Eff by 22.63%.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"344-352"},"PeriodicalIF":1.2,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0179","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72147015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI implementation of anti-notch lattice structure for identification of exon regions in Eukaryotic genes","authors":"Vikas Pathak, Satyasai Jagannath Nanda, Amit Mahesh Joshi, Sitanshu Sekhar Sahu","doi":"10.1049/iet-cdt.2019.0086","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0086","url":null,"abstract":"<div>\u0000 <p>In a Eukaryotic gene, identification of exon regions is crucial for protein formation. The periodic-3 property of exon regions has been used for its identification. An anti-notch infinite impulse response (IIR) filter is mostly employed to recognise this periodic-3 property. The lattice structure realisation of anti-notch IIR filter requires less hardware over direct from-II structures. In this study, a hardware implementation of IIR anti-notch filter lattice structure is carried out on Zynq-series (Zybo board) field programmable gate array (FPGA). The performance of hardware design has been improved using techniques like retiming, pipelining and unfolding and finally assessed on various Eukaryotic genes. The hardware implementation reduces the time frame to analyse the DNA sequence of Eukaryotic genes for protein formation, which plays a significant role in detecting individual diseases from genetic reports. Here, the performance evaluation is carried out in MATLAB simulation environment and the results are found similar. Application-specific integrated circuit (ASIC) implementation of the anti-notch filter lattice structure is also carried out on CADENCE-RTL compiler. It is observed that the FPGA implementation is 31 to 34 times faster and ASIC implementation is 58 to 64 times faster compared to the results generated by MATLAB platform with similar prediction accuracy.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 5","pages":"217-229"},"PeriodicalIF":1.2,"publicationDate":"2020-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2019.0086","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72160389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lower complexity error location detection block of adjacent error correcting decoder for SRAMs","authors":"Raj Kumar Maity, Sayan Tripathi, Jagannath Samanta, Jaydeb Bhaumik","doi":"10.1049/iet-cdt.2019.0268","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0268","url":null,"abstract":"<div>\u0000 <p>Multiple cell upsets (MCUs) caused by radiation is an important issue related to the reliability of embedded static random access memories (SRAMs). Multiple random and adjacent error correcting codes have been extensively employed for several years to protect stored data in SRAMs against MCUs. A compact and fast error correcting codec is desirable in most of these applications. In this study, simplified expressions for error location detection (ELD) block for single error correction-double error detection-double adjacent error correction (SEC-DED-DAEC) and single error correction-double error detection-triple adjacent error correction (SEC-DED-TAEC) decoders have been obtained by employing Karnaugh map. The conventional SEC-DED-DAEC and SEC-DED-TAEC decoders have been designed and implemented in both field-programmable gate array and ASIC platforms by considering these simplified ELD expressions. In FPGA platform, the proposed design for SEC-DED-DAEC and SEC-DED-TAEC decoders require 1.37–28.40% improvement in area and maximum 14.74% improvement in delay compared to existing designs. Whereas ASIC-based designs provide 2.20–26.81% reduction in area and 0.30–28.96% reduction in delay compared to existing related works. So the proposed design can be considered as an efficient alternative of traditional adjacent error correcting decoders in resource constraint applications.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 5","pages":"210-216"},"PeriodicalIF":1.2,"publicationDate":"2020-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2019.0268","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72160390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Li, Jun Liang, Yunquan Zhang, Haipeng Jia, Lin Xiao, Qing Li
{"title":"Accelerated LiDAR data processing algorithm for self-driving cars on the heterogeneous computing platform","authors":"Wei Li, Jun Liang, Yunquan Zhang, Haipeng Jia, Lin Xiao, Qing Li","doi":"10.1049/iet-cdt.2019.0166","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0166","url":null,"abstract":"<div>\u0000 <p>In recent years, light detection and ranging (LiDAR) has been widely used in the field of self-driving cars, and the LiDAR data processing algorithm is the core algorithm used for environment perception in self-driving cars. At the same time, the real-time performance of the LiDAR data processing algorithm is highly demanding in self-driving cars. The LiDAR point cloud is characterised by its high density and uneven distribution, which poses a severe challenge in the implementation and optimisation of data processing algorithms. In view of the distribution characteristics of LiDAR data and the characteristics of the data processing algorithm, this study completes the implementation and optimisation of the LiDAR data processing algorithm on an NVIDIA Tegra X2 computing platform and greatly improves the real-time performance of LiDAR data processing algorithms. The experimental results show that compared with an Intel® Core™ i7 industrial personal computer, the optimised algorithm improves feature extraction by nearly 4.5 times, obstacle clustering by nearly 3.5 times, and the performance of the whole algorithm by 2.3 times.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 5","pages":"201-209"},"PeriodicalIF":1.2,"publicationDate":"2020-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0166","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72158194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design topologies with dual-Vth and dual-Tox assignment in 16 nm CMOS technology","authors":"Smita Singhal, Anu Mehra, Upendra Tripathi","doi":"10.1049/iet-cdt.2018.5211","DOIUrl":"https://doi.org/10.1049/iet-cdt.2018.5211","url":null,"abstract":"<div>\u0000 <p>This study presents different topologies for the assignment of dual threshold voltage and dual gate oxide thickness in 16 nm complementary metal-oxide-semiconductor technology. The objective is to optimise the circuit in terms of static power dissipation, delay, and power-delay-product (pdp). Topologies namely direct, grouping, and divide-by-2 are simulated for and conventional 1-bit full adder circuits. Results of the proposed topologies are compared with some of the existing techniques of leakage reduction i.e. dual-, dual- and supply switching with ground collapse (SSGC). 1-bit full adder circuit using direct topology reduces static power to 99.98, 96.71, and 95.86% as compared to static power in dual-, dual-, and SSGC techniques, respectively. The pdp of the circuit is significantly improved using proposed topologies. Thus, these topologies can be used for low power and high-performance applications with no area overhead.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 4","pages":"176-186"},"PeriodicalIF":1.2,"publicationDate":"2020-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2018.5211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72169499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nevena R. Brnović, Veselin N. Ivanović, Igor Djurović, Marko Simeunović
{"title":"Multi-core hardware realisation of the quasi maximum likelihood PPS estimator","authors":"Nevena R. Brnović, Veselin N. Ivanović, Igor Djurović, Marko Simeunović","doi":"10.1049/iet-cdt.2019.0114","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0114","url":null,"abstract":"<div>\u0000 <p>Multi-core hardware realisation of the quasi maximum likelihood algorithm as the state-of-the-art estimator of polynomial phase signals (PPSs) is proposed in this study. Developed multiple-clock-cycle realisation is suitable for real-time implementation. To prove this, the proposed design is implemented on a field programmable gate array circuit. The hardware realisation is tested and verified on PPSs corrupted with various amounts of the Gaussian noise. Obtained results are compared with software simulations showing excellent match between the proposed system-based and the software-based outputs.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 5","pages":"187-192"},"PeriodicalIF":1.2,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0114","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72161844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Area and power-efficient variable-length fast Fourier transform for MR-OFDM physical layer of IEEE 802.15.4-g","authors":"Ganjikunta Ganesh Kumar, Subhendu K. Sahoo","doi":"10.1049/iet-cdt.2018.5260","DOIUrl":"https://doi.org/10.1049/iet-cdt.2018.5260","url":null,"abstract":"<div>\u0000 <p>The authors present a novel 16/32/64/128-point single-path delay feedback pipeline fast Fourier transform (FFT) architecture targeting the multi-rate and multi-regional orthogonal frequency division multiplexing (MR-OFDM) physical layer of IEEE 802.15.4-g. The proposed FFT architecture employs a mixed-radix algorithm to significantly reduce the number of complex multipliers. It utilises a configurable complex constant multiplier structure instead of a fixed constant multiplier to efficiently conduct , , and twiddle factor multiplication. A hardware-sharing mechanism has also been formulated to reduce the memory space requirements of the proposed 16/32/64/128-point FFT computation scheme. The proposed design is implemented in Xilinx Virtex-5 and Altera's field-programmable gate array devices. For the computation of 128-point FFT, the proposed mixed-radix FFT architecture significantly reduces the hardware cost in comparison with existing FFT architecture. The proposed FFT architecture is also implemented by adopting the 90 nm complementary metal-oxide-semiconductor technology with a supply voltage of 1 V. Post-synthesis results reveal that the design is efficient in terms of gate count and power consumption, compared to earlier reported designs. The proposed variable-length FFT architecture gate count is 22.3K and consumes 3.832 mW, while the word-length is 12-bits and can be efficiently useful for the IEEE 802.15.4-g standard.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 5","pages":"193-200"},"PeriodicalIF":1.2,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2018.5260","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72161843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nooshin Azimi, Reza Faghih Mirzaee, Keivan Navi, Amir Masoud Rahmani
{"title":"Ternary DDCVSL: a combined dynamic logic style for standard ternary logic with single power source","authors":"Nooshin Azimi, Reza Faghih Mirzaee, Keivan Navi, Amir Masoud Rahmani","doi":"10.1049/iet-cdt.2019.0216","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0216","url":null,"abstract":"<div>\u0000 <p>Every logic style has certain advantages for a specific application. Therefore, it is essential to introduce and investigate different logic styles. Differential cascode voltage switch logic (DCVSL) with the inherent redundancy is known to be an ideal logic style for error detection applications. This study combines ternary static DCVSL (SDCVSL) with dynamic logic (DL) to realise ternary dynamic DCVSL (DDCVSL) by means of a single power source. At first, it is shown that why the same static-to-dynamic conversion method in binary logic fails to operate correctly in ternary logic. Then, two solutions are given. Static power dissipation and switching activity are particularly dealt with in the second proposed ternary DDCVSL to reduce power consumption. The new designs are simulated and tested by using HSPICE simulator and 32 nm Stanford carbon nanotube field effect transistor model. Simulation results and comparisons with a vast range of conventional and state-of-the-art competitors show prominence and great potential for the new ternary circuit methodology. For example, the authors second proposed ternary DDCVSL AND/NAND has 19.7, 37.4, and 60.5% higher performance than some famous static ternary logic styles such as CMOS-like, SDCVSL, and pseudo N-type, respectively, in terms of energy consumption.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 4","pages":"166-175"},"PeriodicalIF":1.2,"publicationDate":"2020-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2019.0216","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71968646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}