Lennart Bamberg;Ardalan Najafi;Alberto Garcia-Ortiz
{"title":"Exploiting Neural-Network Statistics for Low-Power DNN Inference","authors":"Lennart Bamberg;Ardalan Najafi;Alberto Garcia-Ortiz","doi":"10.1109/OJCAS.2024.3388210","DOIUrl":"10.1109/OJCAS.2024.3388210","url":null,"abstract":"Specialized compute blocks have been developed for efficient nn execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the power consumption of the logic, interconnect, and memory blocks used for data storage and movements by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39 %. These power improvements are achieved with no loss of accuracy and negligible hardware cost.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"178-188"},"PeriodicalIF":0.0,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10498075","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices","authors":"Ci-Hao Wu;Tian-Sheuan Chang","doi":"10.1109/OJCAS.2024.3387849","DOIUrl":"10.1109/OJCAS.2024.3387849","url":null,"abstract":"Transformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power applications. Addressing these challenges, this paper proposes a low-power streaming speech enhancement accelerator through model and hardware optimization. The proposed high performance model is optimized for hardware execution with the co-design of model compression and target application, which reduces 93.9% of model size by the proposed domain-aware and streaming-aware pruning techniques. The required latency is further reduced with batch normalization-based transformers. Additionally, we employed softmax-free attention, complemented by an extra batch normalization, facilitating simpler hardware design. The tailored hardware accommodates these diverse computing patterns by breaking them down into element-wise multiplication and accumulation (MAC). This is achieved through a 1-D processing array, utilizing configurable SRAM addressing, thereby minimizing hardware complexities and simplifying zero skipping. Using the TSMC 40nm CMOS process, the final implementation requires merely 207.8K gates and 53.75KB SRAM. It consumes only 8.08 mW for real-time inference at a 62.5MHz frequency.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"128-140"},"PeriodicalIF":0.0,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10496994","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vassilis Alimisis;Dimitrios G. Arnaoutoglou;Emmanouil Anastasios Serlis;Argyro Kamperi;Konstantinos Metaxas;George A. Kyriacou;Paul P. Sotiriadis
{"title":"A Radar-Based System for Detection of Human Fall Utilizing Analog Hardware Architectures of Decision Tree Model","authors":"Vassilis Alimisis;Dimitrios G. Arnaoutoglou;Emmanouil Anastasios Serlis;Argyro Kamperi;Konstantinos Metaxas;George A. Kyriacou;Paul P. Sotiriadis","doi":"10.1109/OJCAS.2024.3407663","DOIUrl":"10.1109/OJCAS.2024.3407663","url":null,"abstract":"A fall-detection system was implemented utilizing a 2.45 GHz continuous wave radar along with power-efficient and fully-analog integrated classifier architectures. The Power Burst Curve and the effective acceleration were derived from the short time Fourier transform, and then processed by the analog classifier. The proposed classifier architectures are based on different approximations of the Decision tree classification model. The architectures consist of three main building blocks: sigmoid function circuit, analog multiplier and an argmax operator circuit. To assess the hardware design, a thorough analysis is performed, comparing it to commonly used analog classifiers while exploiting the extracted data. The architectures were trained using Python and were compared to software-based classifiers. The circuit designs were executed using TSMC’s 90 nm CMOS process technology and the Cadence IC Suite was employed for tasks including design, schematic implementation, and post-layout simulations.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"224-242"},"PeriodicalIF":0.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10542293","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Javad Karimi;Menghe Jin;Catherine Dehollain;Alexandre Schmid
{"title":"A Wireless Power Conversion Chain With Fully On-Chip Automatic Resonance Tuning System for Biomedical Implants","authors":"Mohammad Javad Karimi;Menghe Jin;Catherine Dehollain;Alexandre Schmid","doi":"10.1109/OJCAS.2024.3382355","DOIUrl":"10.1109/OJCAS.2024.3382355","url":null,"abstract":"This paper presents a wireless power conversion system designed for biomedical implants, with integrated automatic resonance tuning. The automatic tuning mechanism improves power transfer efficiency (PTE) by finely tuning the resonant frequency of the power link and maximizing the rectified voltage. This adjustment ensures robust and reliable remote powering, even in the face of environmental changes and process variations, while also minimizing tissue exposure to power. On-chip switched array capacitors are connected in parallel with the resonant capacitor, and the system identifies the optimal switched capacitor combination for the highest rectified voltage by iterating over each of them. The proposed system is implemented and fabricated in standard 180nm CMOS technology, with a total area of 0.339 mm2, and its operation is verified. The measurement results demonstrate that this system provides tolerance up to mismatches equivalent to 75 pF capacitance variation in LC tank, ±15% LC variation in this design. The system offers a PTE enhancement from 9.1% to 30.2% in case of high LC variation, and the tuning control consumes 154.7\u0000<inline-formula> <tex-math>$mu text{W}$ </tex-math></inline-formula>\u0000 of power during resonance tuning. Moreover, the power conversion chain delivers an optimized rectified voltage along with a regulated voltage of 1.8 V.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"117-127"},"PeriodicalIF":0.0,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10481676","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140324283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Clustering Using Hyperdimensional Computing","authors":"Lulu Ge;Keshab K. Parhi","doi":"10.1109/OJCAS.2024.3381508","DOIUrl":"10.1109/OJCAS.2024.3381508","url":null,"abstract":"This paper addresses the clustering of data in the hyperdimensional computing (HDC) domain. In prior work, an HDC-based clustering framework, referred to as HDCluster, has been proposed. However, the performance of the existing HDCluster is not robust. The performance of HDCluster is degraded as the hypervectors for the clusters are chosen at random during the initialization step. To overcome this bottleneck, we assign the initial cluster hypervectors by exploring the similarity of the encoded data, referred to as query hypervectors. Intra-cluster hypervectors have a higher similarity than inter-cluster hypervectors. Harnessing the similarity results among query hypervectors, this paper proposes four HDC-based clustering algorithms: similarity-based k-means, equal bin-width histogram, equal bin-height histogram, and similarity-based affinity propagation. Experimental results illustrate that: (i) Compared to the existing HDCluster, our proposed HDC-based clustering algorithms can achieve better accuracy, more robust performance, fewer iterations, and less execution time. Similarity-based affinity propagation outperforms the other three HDC-based clustering algorithms on eight datasets by 2% ~ 38% in clustering accuracy. (ii) Even for one-pass clustering, i.e., without any iterative update of the cluster hypervectors, our proposed algorithms can provide more robust clustering accuracy than HDCluster. (iii) Over eight datasets, five out of eight can achieve higher or comparable accuracy when projected onto the hyperdimensional space. Traditional clustering is more desirable than HDC when the number of clusters, \u0000<inline-formula> <tex-math>$k$ </tex-math></inline-formula>\u0000, is large.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"102-116"},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10480378","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140314400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manish Srivastava;Alessandro Ferro;Aleksandr Sidun;José M. De La Rosa;Kilian O’Donoghue;Pádraig Cantillon-Murphy;Daniel O’Hare
{"title":"A Small-Area 2nd-Order Adder-Less Continuous-Time ΔΣ Modulator With Pulse Shaping FIR DAC for Magnetic Sensing","authors":"Manish Srivastava;Alessandro Ferro;Aleksandr Sidun;José M. De La Rosa;Kilian O’Donoghue;Pádraig Cantillon-Murphy;Daniel O’Hare","doi":"10.1109/OJCAS.2024.3378653","DOIUrl":"10.1109/OJCAS.2024.3378653","url":null,"abstract":"This work presents a small-area 2nd-order continuous-time \u0000<inline-formula> <tex-math>$Delta Sigma $ </tex-math></inline-formula>\u0000 Modulator (CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000) with a single low dropout regulator (LDO) serving as both the power supply for the CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000 and reference voltage buffer. The CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000 is used for digitising very low amplitude signals in applications such as magnetic tracking for image-guided and robotic surgery. A cascade of integrators in a feed-forward architecture implemented with an adder-less architecture has been proposed to minimise the silicon area. In addition, a novel continuous-time pulse-shaped digital-to-analog converter (CT-PS DAC) is proposed for excess loop delay (ELD) compensation to simplify the current drive requirements of the reference voltage buffer. This enables a single low-dropout (LDO) voltage regulator to generate both power supply and \u0000<inline-formula> <tex-math>$text{V}_{ref}$ </tex-math></inline-formula>\u0000 for the DAC. The circuit has been designed in 65-nm CMOS technology, achieving a peak 82-dB SNDR and 91-dB DR within a signal bandwidth of 20 kHz and the CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000 consumes \u0000<inline-formula> <tex-math>$300 ~mu text{W}$ </tex-math></inline-formula>\u0000 of power when clocked at 10.24 MHz. The CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000 achieves a state-of-the-art area of 0.07 mm2.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"42-54"},"PeriodicalIF":0.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10475189","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140170286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"StrideHD: A Binary Hyperdimensional Computing System Utilizing Window Striding for Image Classification","authors":"Dehua Liang;Jun Shiomi;Noriyuki Miura;Hiromitsu Awano","doi":"10.1109/OJCAS.2024.3401028","DOIUrl":"10.1109/OJCAS.2024.3401028","url":null,"abstract":"Hyper-Dimensional (HD) computing is a brain-inspired learning approach for efficient and fast learning on today’s embedded devices. HDC first encodes all data points to high-dimensional vectors called hypervectors and then efficiently performs the classification task using a well-defined set of operations. Although HDC achieved reasonable performances in several practical tasks, it comes with huge memory requirements since the data point should be stored in a very long vector having thousands of bits. To alleviate this problem, we propose a novel HDC architecture, called StrideHD. By utilizing the window striding in image classification, StrideHD enables HDC system to be trained and tested using binary hypervectors and achieves high accuracy with fast training speed and significantly low hardware resources. StrideHD encodes data points to distributed binary hypervectors and eliminates the expensive Channel item Memory (CiM) and item Memory (iM) in the encoder, which significantly reduces the required hardware cost for inference. Our evaluation also shows that compared with two popular HD algorithms, the singlepass StrideHD model achieves a 27.6\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 and 8.2\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 reduction in inference memory cost without hurting the classification accuracy, while the iterative mode further provides 8.7\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 memory efficiency. Under the same inference memory cost, our single-pass mode StrideHD averagely achieves 13.56% accuracy improvement in comparison with the single-pass baseline HD, which is a similar performance even in comparison with the costly iterative baseline HD models. As an extension, the iterative retraining mode of StrideHD averagely provides 11.33% accuracy improvement to its single-pass mode, which can be accomplished in fewer iterations in comparison with the baseline HD algorithms. The hardware implementation also demonstrates that StrideHD achieves over 9.9\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 and 28.8\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 reduction compared with baseline in area and power, respectively.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"211-223"},"PeriodicalIF":0.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10530353","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141064078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of Frequency Heterogeneity on Mutually Synchronized Spatially Distributed 24 GHz PLLs","authors":"Christian Hoyer;Jens Wagner;Frank Ellinger","doi":"10.1109/OJCAS.2024.3396336","DOIUrl":"10.1109/OJCAS.2024.3396336","url":null,"abstract":"This research analyzes the mutual self-organized synchronization of phase-locked loops (PLLs) in the presence of variations in the free-running frequency of a PLL. In contrast to traditional synchronization methods that rely on a reference signal, this study investigates the synchronization dynamics that arise solely from the interactions of PLL nodes within a network. Previous research has proposed theoretical frameworks that can predict the synchronized states of such designs. However, these frameworks do not account for the dynamic behavior that occurs during initial synchronization. To address this gap, this work proposes a constraint that refines the understanding of initial synchronization. The results of this analysis show that there is a maximum detuning between free-running frequencies up to which synchronization is possible. Furthermore, this analysis indicates that detuning not only affects the range of time delays at which stable synchronized states emerge between PLL nodes, but also limits the allowable range of initial phase differences for stable synchronization. In the cases studied, a frequency difference of 1.56% reduces the probability of achieving stable synchronized states through self-organized synchronization to 73.5%, while no stable synchronization can be achieved at a frequency difference greater than 2.65%. The study underscores the critical importance of operating ranges when implementing mutual coupling. In particular, all PLL nodes must have overlapping lock ranges to achieve stable synchronization. It also emphasizes the need for accurate analysis of hold and lock ranges in relation to the time delays between coupled PLL nodes.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"199-210"},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10517955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140836500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Welcome to the 5th Volume of the Open Journal of Circuits and Systems","authors":"Nicole McFarlane","doi":"10.1109/OJCAS.2024.3358107","DOIUrl":"https://doi.org/10.1109/OJCAS.2024.3358107","url":null,"abstract":"Welcome to the 5th volume of the Open Journal of Circuits and Systems (OJCAS). The Circuits and Systems Society’s Gold Open Access Journal is maturing, welcoming more submissions and getting our first impact factor score. I welcome our new Associate Editor in Chief, Alex James of Digital University Kerala in Trivandrum India to help mature the journal even more. As the journal matures, it is important to note that OJCAS covers all the topics of the society with the only exception being that it is open access. This means we hold submissions to the same quality standard as the other IEEE Journals. As soon as the paper is accepted, the paper is immediately available on IEEE Xplore and freely available to all researchers across the globe. In order to cover the cost of hosting the papers, as well as minimal editing and formatting, the article processing charges are indeed higher than traditional journals. Fortunately, many institutions have open access funds to cover this purpose and some funding agencies in certain countries mandate that research funded by those agencies be freely available to the public. In addition, IEEE has a waiver policy for authors from low and lower-middle income countries. More facts about open access for IEEE can be found at \u0000<uri>https://open.ieee.org/about/faqs/</uri>\u0000.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"1-1"},"PeriodicalIF":0.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10423924","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139704434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GC-Like LDPC Code Construction and its NN-Aided Decoder Implementation","authors":"Yu-Lun Hsu;Li-Wei Liu;Yen-Chin Liao;Hsie-Chia Chang","doi":"10.1109/OJCAS.2024.3363043","DOIUrl":"10.1109/OJCAS.2024.3363043","url":null,"abstract":"The trade-off between decoding performance and hardware costs has been a long-standing challenge in Low-Density Parity Check (LDPC) decoding. Based on model-driven methodology, the Neural Network-Aided Variable Weight Min-Sum (NN-aided vwMS) algorithm is proposed to address this dilemma in this paper. Not only eliminating the second minimum value in the check node update process for reducing hardware complexity, our approach featuring a fast-convergent shuffled scheduling method proposed to enhance convergence speed can also maintain similar decoding performance as compared to the traditional normalized min-sum algorithm. Different from existing model-driven methodologies only suitable for short codes, a Globally-Coupled Like (GC-like) LDPC code construction is presented to enable efficient training with simplified neural networks for longer LDPC codes. To demonstrate the capability of the NN-aided vwMS algorithm with the fast-convergent shuffled scheduling method, a GC-like (9126,8197) LDPC decoder is implemented for NAND flash applications, achieving a 6.56 Gbps throughput with a core area of \u0000<inline-formula> <tex-math>$0.58~mm^{2}$ </tex-math></inline-formula>\u0000 under the 40-nm CMOS TSMC process, and average power consumption of 288 mW under the frame error rate of \u0000<inline-formula> <tex-math>$2.64 times 10^{-5}$ </tex-math></inline-formula>\u0000 at 4.5dB. Our decoder architecture achieves a superior normalized throughput-to-area ratio of \u0000<inline-formula> <tex-math>$11.31~Gbps/mm^{2}$ </tex-math></inline-formula>\u0000, demonstrating a 2.4x improvement among previous works.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"189-198"},"PeriodicalIF":0.0,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10423290","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139956583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}