{"title":"Real-time speech enhancement using optimised empirical mode decomposition and non-local means estimation","authors":"Sagar Reddy Vumanthala, Bikshalu Kalagadda","doi":"10.1049/iet-cdt.2020.0034","DOIUrl":"https://doi.org/10.1049/iet-cdt.2020.0034","url":null,"abstract":"<div>\u0000 <p>In this study, the authors present a novel speech enhancement method by exploring the benefits of non-local means (NLM) estimation and optimised empirical mode decomposition (OEMD) adopting cubic-spline interpolation. The optimal parameters responsible for improving the performance are estimated using the path-finder algorithm. At first, the noisy speech signal is decomposed into many scaled signals called intrinsic-mode functions (IMFs) through the use of a temporary decomposition method is called sifting process in OEMD approach. The obtained IMFs are processed by NLM estimation technique in terms of non-local similarities present in each IMF, to reduce the ill-effects caused by interfering noise. The proposed NLM-based method is effective to eliminate the noise of less-frequency. Each IMF contains essential information about the signals, on some scale or frequency band. Field programmable gate array architecture is implemented on a Xilinx ISE 14.5 and the result of the proposed method offers good performance with a high signal-to-noise ratio (SNR) and low mean-square error compared to other approaches. The performance evolution is carried out for different speech signals taken from the TIMIT database and noises taken from the NOISEX-92 database in different SNR stages of 0, 5 and 10 dB, respectively.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"290-298"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2020.0034","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71984674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rectilinear routing algorithm for crosstalk minimisation in 2D and 3D IC","authors":"Khokan Mondal, Subhajit Das, Tuhina Samanta","doi":"10.1049/iet-cdt.2020.0010","DOIUrl":"https://doi.org/10.1049/iet-cdt.2020.0010","url":null,"abstract":"<div>\u0000 <p>The coupling capacitance and inductance of 2D and 3D integrated circuit (IC) interconnects in deep sub-micron technology has been increased due to reduced coupling distance in such a way that their magnitudes become comparable to the area and fringing capacitance of an interconnect. This leads to an increasing risk of failure due to unintentional noise and a need for accurate noise assessment. Incorrect noise estimation could either result in defects in circuit design if the design resources are understated or it will end up with a waste of overestimation resources. In this study, a crosstalk noise model for coupled RLC on-chip interconnects has been demonstrated. Subsequently, a novel time-efficient method is proposed to estimate and optimise the crosstalk noise precisely. The proposed method calculates coupling noise as well as optimises crosstalk noise, which has been validated using SPICE. Besides the estimation of crosstalk noise for 2D interconnect, this study also estimates the crosstalk noise for through-silicon-via (TSV), which is used to connect different dies vertically in a 3D IC. Under high-frequency operation, effects of signal rise time, TSV structure (height of the TSV), substrate resistivity and the guarding TSV termination on crosstalk noise have also been studied in this work.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"263-271"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2020.0010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71984669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated planning for finding alternative bug traces","authors":"Rajib Lochan Jana, Soumyajit Dey, Arijit Mondal, Pallab Dasgupta","doi":"10.1049/iet-cdt.2019.0283","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0283","url":null,"abstract":"<div>\u0000 <p>Bug traces serve as references for patching a microprocessor design after a bug has been found. Unless the root cause of a bug has been detected and patched, variants of the bug may return through alternative bug traces, following a different sequence of micro-architectural events. To avoid such a situation, the verification engineer must think of every possible way in which the bug may return, which is a complex problem for a modern microprocessor. This study proposes a methodology which gleans high-level descriptions of the micro-architectural steps and uses them in an artificial Intelligence planning framework to find alternative pathways through which a bug may return. The plans are then translated to simulation test cases which explore these potential bug scenarios. The planning tool essentially automates the task of the verification engineer towards exploring possible alternative sequences of micro-architectural steps that may allow a bug to return. The proposed methodology is demonstrated in three case studies.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"322-335"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72157106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimised HEVC encoder intra-only configuration","authors":"Nejmeddine Bahri, Randa Khemiri","doi":"10.1049/iet-cdt.2019.0197","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0197","url":null,"abstract":"<div>\u0000 <p>High-efficiency video coding (HEVC) is the latest video coding standard aimed to reduce the bitrate by half for the same video quality compared to H.264/AVC. This encoding performance makes HEVC more suitable for high-definition video applications. However, this performance is coupled with a high-computational complexity, which makes it hard to achieve real-time video encoding with a classic embedded processor. Multicore technology of programmable processors could be a very promising solution to overcome this computational complexity. Moreover, software optimisations by proposing fast algorithms for the most complex functions could also be an efficient solution to speed up the encoding process. In this context, this study presents a fast mode decision algorithm for the intra prediction module. This algorithm aims to reduce the number of intra prediction modes to be tested instead of performing a full intra mode search. Experimental results for all-Intra configuration show that the proposed fast intra mode decision allows saving up to 46.79% of the intra prediction time in average. Encoding performance in terms of video quality and bitrate is not significantly affected.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"256-262"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0197","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71984672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient VLSI architectures of lifting based 3D discrete wavelet transform","authors":"M. Mohamed Asan Basiri","doi":"10.1049/iet-cdt.2020.0038","DOIUrl":"https://doi.org/10.1049/iet-cdt.2020.0038","url":null,"abstract":"<div>\u0000 <p>Discrete wavelet transform (DWT) is widely used in the image and video compression due to its high compression ratio and resolution. This study proposes efficient very large scale integration (VLSI) architectures of lifting based 3D-DWT using (5,3) and (9,7) Daubechies wavelets. The advantage of these proposed architectures is the absence of storage buffer in between the row, column, and temporal processes. Also, five and nine numbers of frames of the 3D signal can be processed in parallel using the proposed (5,3) and (9,7) lifting based DWTs, respectively. Due to this parallelism and the elimination of storage buffers, the throughput of the proposed design is greater than other existing techniques. The authors have implemented all the existing and proposed 3D-DWTs using 45 nm CMOS library with Cadence and Artix-7 FPGA with Xilinx Vivado. The synthesis results show that the proposed designs achieve significant improvement in throughput than various existing designs. For example, the proposed (9,7) lifting based 3D-DWT achieves 85.4% of improvement in the throughput than the conventional design.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"247-255"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2020.0038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71984673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards IP integration on SoC: a case study of high-throughput and low-cost wrapper design on a novel IBUS architecture","authors":"Xiaokun Yang, Shi Sha, Ishaq Unwala, Jiang Lu","doi":"10.1049/iet-cdt.2019.0090","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0090","url":null,"abstract":"<div>\u0000 <p>To integrate third-party intellectual properties (IPs) into a new system-on-chip (SoC) architecture is a big challenge. Therefore, this study first presents a new bus protocol named as integrated bus (IBUS), and more important, a configurable bus wrapper for connecting AXI3-interfaced IPs into IBUS is further proposed, aiming to finding the optimal balance between bus efficiency and resource cost in terms of field-programming gate array slice count, bus transfer latency, and energy consumption. As a case study, the authors implemented three IBUS wrappers for integrating three AXI3-interfaced verification IPs into an IBUS SoC. Experimental results show that their proposed work achieves a higher valid data throughput ( in the block test and in the cipher test) compared with the designs on conventional bridge-based SoC integration, as well as a large reduction in the normalised slice-time-power (18.73% in the block benchmark and 23.45% in the cipher benchmark) when setting the same weights of slice number, data transfer latency, and energy dissipation.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"353-362"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0090","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72157107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mrinal Goswami, Jayanta Pal, Mayukh Roy Choudhury, Pritam P. Chougule, Bibhash Sen
{"title":"In memory computation using quantum-dot cellular automata","authors":"Mrinal Goswami, Jayanta Pal, Mayukh Roy Choudhury, Pritam P. Chougule, Bibhash Sen","doi":"10.1049/iet-cdt.2020.0008","DOIUrl":"https://doi.org/10.1049/iet-cdt.2020.0008","url":null,"abstract":"<div>\u0000 <p>The conventional computing system has been facing enormous pressure to cope with the uprising demand for computing speed in today's world. In search of high-speed computing in the nano-scale era, it becomes the utmost necessity to explore a viable alternative to overcome the challenges of the physical limit of complementary-metal-oxide-semiconductor (CMOS). Towards that direction, the processing-in-memory (PIM) is advancing its importance as it keeps the computation as adjacent as possible to memory. It promises to outperform the latencies of the conventional stored-program concept by embedding storage and data computation in a single unit. On the other hand, the bit storing and processing capability of Akers array provides the foundation of PIM. Again, quantum-dot cellular automata (QCA) emerges as a promising nanoelectronic to put back CMOS to give fast-paced devices at the nanoelectronics era. This work presents a novel PIM concept, embedding Akers array in QCA to achieve high-speed computing at the nano-scale era. QCA implementation of universal logic utilizing Akers array signifies its processing power and puts forth its potentials. A universal function is considered for testing the effectiveness of the proposed PIM cell. The performance evaluation indicates the efficacy of QCA PIM over the conventional Von Neumann architecture.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"336-343"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2020.0008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72190064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient parallelisation of the packet classification algorithms on multi-core central processing units using multi-threading application program interfaces","authors":"Mahdi Abbasi, Milad Rafiee","doi":"10.1049/iet-cdt.2019.0118","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0118","url":null,"abstract":"<div>\u0000 <p>The categorisation of network packets according to multiple parameters such as sender and receiver addresses is called packet classification. Packet classification lies at the core of Software-Defined Networking (SDN)-based network applications. Due to the increasing speed of network traffic, there is an urgent need for packet classification at higher speeds. Although it is possible to accelerate packet classification algorithms through hardware implementation, this solution imposes high costs and offers limited development capacity. On the other hand, current software methods to solve this problem are relatively slow. A practical solution to this problem is to parallelise packet classification using multi-core processors. In this study, the Thread, parallel patterns library (PPL), open multi-processing (OpenMP), and threading building blocks (TBB) libraries are examined and implemented to parallelise three packet classification algorithms, i.e. tuple space search, tuple pruning search, and hierarchical tree. According to the results, the type of algorithm and rulesets may influence the performance of parallelisation libraries. In general, the TBB-based method shows the best performance among parallelisation libraries due to using a theft mechanism and can accelerate the classification process up to 8.3 times on a system with a quad-core processor.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"313-321"},"PeriodicalIF":1.2,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0118","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72147016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High throughput and area-efficient FPGA implementation of AES for high-traffic applications","authors":"Karim Shahbazi, Seok-Bum Ko","doi":"10.1049/iet-cdt.2019.0179","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0179","url":null,"abstract":"<div>\u0000 <p>This study presents a high throughput field-programmable gate array (FPGA) implementation of advanced encryption standard-128 (AES-128). AES is a well-known symmetric key encryption algorithm with high security against different attacks that are widely used in different applications. The main goal of this study is to design a high throughput and FPGA efficiency (FPGA-Eff) cryptosystem for high-traffic applications. To achieve high throughput, loop-unrolling, inner and outer pipelining techniques are employed. In AES, substitution bytes (Sub-Bytes) is one of the costly functions that occupy a large number of resources and has a large delay. To reduce the area of Sub-Bytes, new-affine-transformation, which is the combination of inverse isomorphic and affine transformation, is proposed and employed. Besides that, AES has been modified according to the proposed architecture. For the first nine rounds, Shift-Rows and Sub-Bytes have been exchanged, and Shift-Rows is merged with Add-Round-Key. To make an equal latency between stages, Mix-Columns is divided into two different stages. AES is implemented in counter mode on Xilinx Virtex-5 using VHDL. The proposed implementation achieves a throughput of 79.7 Gbps, FPGA-Eff of 13.3 Mbps/slice, and frequency of 622.4 MHz. Compared to the state-of-the-art work, the proposed design has improved data throughput by 8.02% and FPGA-Eff by 22.63%.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"344-352"},"PeriodicalIF":1.2,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0179","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72147015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI implementation of anti-notch lattice structure for identification of exon regions in Eukaryotic genes","authors":"Vikas Pathak, Satyasai Jagannath Nanda, Amit Mahesh Joshi, Sitanshu Sekhar Sahu","doi":"10.1049/iet-cdt.2019.0086","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0086","url":null,"abstract":"<div>\u0000 <p>In a Eukaryotic gene, identification of exon regions is crucial for protein formation. The periodic-3 property of exon regions has been used for its identification. An anti-notch infinite impulse response (IIR) filter is mostly employed to recognise this periodic-3 property. The lattice structure realisation of anti-notch IIR filter requires less hardware over direct from-II structures. In this study, a hardware implementation of IIR anti-notch filter lattice structure is carried out on Zynq-series (Zybo board) field programmable gate array (FPGA). The performance of hardware design has been improved using techniques like retiming, pipelining and unfolding and finally assessed on various Eukaryotic genes. The hardware implementation reduces the time frame to analyse the DNA sequence of Eukaryotic genes for protein formation, which plays a significant role in detecting individual diseases from genetic reports. Here, the performance evaluation is carried out in MATLAB simulation environment and the results are found similar. Application-specific integrated circuit (ASIC) implementation of the anti-notch filter lattice structure is also carried out on CADENCE-RTL compiler. It is observed that the FPGA implementation is 31 to 34 times faster and ASIC implementation is 58 to 64 times faster compared to the results generated by MATLAB platform with similar prediction accuracy.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 5","pages":"217-229"},"PeriodicalIF":1.2,"publicationDate":"2020-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2019.0086","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72160389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}