{"title":"Sparsity-Oriented MRAM-Centric Computing for Efficient Neural Network Inference","authors":"Jia-Le Cui;Yanan Guo;Juntong Chen;Bo Liu;Hao Cai","doi":"10.1109/TETC.2023.3326312","DOIUrl":"10.1109/TETC.2023.3326312","url":null,"abstract":"Near-memory computing (NMC) and in- memory computing (IMC) paradigms show great importance in non-von Neumann architecture. Spin-transfer torque magnetic random access memory (STT-MRAM) is considered as a promising candidate to realize both NMC and IMC for resource-constrained applications. In this work, two MRAM-centric computing frameworks are proposed: triple-skipping NMC (TS-NMC) and analog-multi-bit-sparsity IMC (AMS-IMC). The TS-NMC exploits the sparsity of activations and weights to implement a write-read-calculation triple skipping computing scheme by utilizing a sparse flag generator. The AMS-IMC with reconfigured computing bit-cell and flag generator accommodate bit-level activation sparsity in the computing. STT-MRAM array and its peripheral circuits are implemented with an industrial 28-nm CMOS design-kit and an MTJ compact model. The triple-skipping scheme can reduce memory access energy consumption by 51.5× when processing zero vectors, compared to processing non-zero vectors. The energy efficiency of AMS-IMC is improved by 5.9× and 1.5× (with 75% input sparsity) as compared to the conventional NMC framework and existing analog IMC framework. Verification results show that TS-NMC and AMS-IMC achieved 98.6% and 97.5% inference accuracy in MNIST classification, with energy consumption of 14.2 nJ/pattern and 12.7 nJ/pattern, respectively.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"97-108"},"PeriodicalIF":5.9,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135210898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Indexing Schemes for K-Dominant Skyline Analytics on Uncertain Edge-IoT Data","authors":"Chuan-Chi Lai;Hsuan-Yu Lin;Chuan-Ming Liu","doi":"10.1109/TETC.2023.3326295","DOIUrl":"10.1109/TETC.2023.3326295","url":null,"abstract":"Skyline queries typically search a Pareto-optimal set from a given data set to solve the corresponding multiobjective optimization problem. As the number of criteria increases, the skyline presumes excessive data items, which yield a meaningless result. To address this curse of dimensionality, we proposed a \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline in which the number of skyline members was reduced by relaxing the restriction on the number of dimensions, considering the uncertainty of data. Specifically, each data item was associated with a probability of appearance, which represented the probability of becoming a member of the \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline. As data items appear continuously in data streams, the corresponding \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline may vary with time. Therefore, an effective and rapid mechanism of updating the \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline becomes crucial. Herein, we proposed two time-efficient schemes, Middle Indexing (MI) and All Indexing (AI), for \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-dominant skyline in distributed edge-computing environments, where irrelevant data items can be effectively excluded from the compute to reduce the processing duration. Furthermore, the proposed schemes were validated with extensive experimental simulations. The experimental results demonstrated that the proposed MI and AI schemes reduced the computation time by approximately 13% and 56%, respectively, compared with the existing method.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"878-890"},"PeriodicalIF":5.1,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135058351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangchao Zhao;Zhiwei Zeng;Xingli Wang;Abdelrahman G. Qoutb;Philippe Coquet;Eby G. Friedman;Beng Kang Tay;Mingqiang Huang
{"title":"Efficient Ternary Logic Circuits Optimized by Ternary Arithmetic Algorithms","authors":"Guangchao Zhao;Zhiwei Zeng;Xingli Wang;Abdelrahman G. Qoutb;Philippe Coquet;Eby G. Friedman;Beng Kang Tay;Mingqiang Huang","doi":"10.1109/TETC.2023.3321050","DOIUrl":"10.1109/TETC.2023.3321050","url":null,"abstract":"Multi-valued logic (MVL) circuits, especially the ternary logic circuits, have attracted great attention in recent years due to their higher information density than binary logic systems. However, the basic construction method for MVL circuit standard cells and the CMOS fabrication possibility/compatibility issues are still to be addressed. In this work, we propose various ternary arithmetic circuits (adders and multipliers) with embedded ternary arithmetic algorithms to improve the efficiency. First, ternary cycling gates are designed to optimize both the arithmetic algorithms and logic circuits of ternary adders. Second, optimized ternary Boolean truth table is used to simplify the circuit complexity. Third, high-speed ternary Wallace tree multipliers are implemented with task dividing policy. Significant improvements in propagation delay and power-delay-product (PDP) have been achieved as compared with previous works. In particular, the ternary full adder shows 11 aJ PDP at 0.5 GHz, which is the best result among all the reported works using the same simulation platform. And an average PDP improvement of 36.8% in the ternary multiplier is also achieved. Furthermore, the proposed methods have been successfully explored using standard CMOS 180nm silicon devices, indicating its great potential for the practical application of ternary computing in the near future.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"826-839"},"PeriodicalIF":5.1,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135058269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Diversities to Model the Reliability of Two-Version Machine Learning Systems","authors":"Fumio Machida","doi":"10.1109/TETC.2023.3322563","DOIUrl":"10.1109/TETC.2023.3322563","url":null,"abstract":"The N-version machine learning system (MLS) is an architectural approach to reduce error outputs from a system by redundant configuration using multiple machine learning (ML) modules. Improved system reliability achieved by N-version MLSs inherently depends on how diverse ML models are employed and how diverse input data sets are given. However, neither error input spaces of individual ML models nor input data distributions are obtainable in practice, which is a fundamental barrier to understanding the reliability improvement by N-version architectures. In this paper, we introduce two diversity measures quantifying the similarities of ML models’ capabilities and the interdependence of input data sets causing errors, respectively. The defined measures are used to formulate the reliability of an elemental N-version MLS called dependent double-modules double-inputs MLS. The system is assumed to fail when two ML modules output errors simultaneously for the same classification task. The reliabilities of different architecture options for this MLS are comprehensively analyzed through a compact matrix representation form of the proposed reliability model. The theoretical analysis and numerical results show that the architecture exploiting two diversities achieves preferable reliability under reasonable assumptions. Intuitive relations between diversity parameters and architecture reliabilities are also demonstrated through numerical examples.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"810-825"},"PeriodicalIF":5.1,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136303218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Invasive Reverse Engineering of One-Hot Finite State Machines Using Scan Dump Data","authors":"Zhaoxuan Dong;Aijiao Cui;Hao Lu","doi":"10.1109/TETC.2023.3322299","DOIUrl":"10.1109/TETC.2023.3322299","url":null,"abstract":"Finite-state machine (FSM) always works as a core control unit of a chip or a system. As a high level design, FSM has also been exploited to build multiple secure designs as it is deemed hard to discern FSM structure from the netlist or physical design. However, these secure designs can never sustain once the FSM structure is reversed. Reverse engineering FSM not only indicates the access of the control scheme of a design, but also poses a severe threat to those FSM-based secure designs. As the one-hot encoding FSM is widely adopted in various circuit designs, this paper proposes a non-invasive method to reverse engineer the one-hot encoding FSM. The data dumped from the scan chain during chip operation is first collected. The scan data is then used to identify all the candidate sets of state registers which satisfy two necessary conditions for one-hot state registers. Association relationship between the candidate registers and data registers are further evaluated to identify the unique target set of state registers. The transitions among FSM states are finally retrieved based on the scan dump data from those identified state registers. The experimental results on the benchmark circuits of different size show that this proposed method can identify all one-hot state registers exactly and the transitions can be retrieved at a high accuracy while the existing methods cannot achieve a satisfactory correct detection rate for one-hot encoding FSM.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"795-809"},"PeriodicalIF":5.1,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136257360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Neural Architecture Search With Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices","authors":"Alessio Burrello;Matteo Risso;Beatrice Alessandra Motetti;Enrico Macii;Luca Benini;Daniele Jahier Pagliari","doi":"10.1109/TETC.2023.3322033","DOIUrl":"10.1109/TETC.2023.3322033","url":null,"abstract":"The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"780-794"},"PeriodicalIF":5.1,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136207718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Chaotic Maps-Based Privacy-Preserving Distributed Deep Learning for Incomplete and Non-IID Datasets","authors":"Irina Arévalo;Jose L. Salmeron","doi":"10.1109/TETC.2023.3320758","DOIUrl":"10.1109/TETC.2023.3320758","url":null,"abstract":"Federated Learning is a machine learning approach that enables the training of a deep learning model among several participants with sensitive data that wish to share their own knowledge without compromising the privacy of their data. In this research, the authors employ a secured Federated Learning method with an additional layer of privacy and proposes a method for addressing the non-IID challenge. Moreover, differential privacy is compared with chaotic-based encryption as layer of privacy. The experimental approach assesses the performance of the federated deep learning model with differential privacy using both IID and non-IID data. In each experiment, the Federated Learning process improves the average performance metrics of the deep neural network, even in the case of non-IID data.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"357-367"},"PeriodicalIF":5.9,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136002626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memristive Crossbar Array-Based Adversarial Defense Using Compression","authors":"Bijay Raj Paudel;Spyros Tragoudas","doi":"10.1109/TETC.2023.3319659","DOIUrl":"10.1109/TETC.2023.3319659","url":null,"abstract":"This article shows that Memristive Crossbar Array (MCA)-based neuromorphic architectures provide a robust defense against adversarial attacks due to the stochastic behavior of memristors. Furthermore, it shows that adversarial robustness can be further improved by compression-based preprocessing steps that can be implemented on MCAs. It also evaluates the effect of inter-chip process variations on adversarial robustness using the proposed MCA implementation and studies the effect of on-chip training. It shows that adversarial attacks do not uniformly affect the classification accuracy of different chips. Experimental evidence using a variety of datasets and attack models supports the impact of MCA-based neuromorphic architectures and compression-based preprocessing implemented using MCA on defending against adversarial attacks. It is also experimentally shown that the on-chip training results in high resiliency to adversarial attacks in all chips.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"864-877"},"PeriodicalIF":5.1,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135914085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scheduling Coflows by Online Identification in Data Center Network","authors":"Chang Ruan;Jianxin Wang;Wanchun Jiang;Tao Zhang","doi":"10.1109/TETC.2023.3315512","DOIUrl":"10.1109/TETC.2023.3315512","url":null,"abstract":"Recently, many scheduling schemes leverage coflows to improve the communication performance of jobs in distributed application frameworks deployed in data center networks, such as MapReduce and Spark. Most of them require application modification to obtain the coflow information such as the coflow ID. The latest work CODA suggests non-intrusively extracting coflow information via an identification method. However, the method depends on the historical traffic information, which may cause the identification accuracy to decrease a lot when traffic varies. To tackle the problem, we present SOCI for Scheduling coflows by the Online Coflow Identification. By observing that flows in a coflow typically communicate with a master process for starting and ending in the up-to-date distributed application frameworks, SOCI uses this characteristic for the online coflow identification. Given identification errors are inevitable, the coflow scheduler in SOCI adopts a Selectively Late Binding (SLB) mechanism, which associates the misclassified flows with coflows according to the estimation on the impact of this association on the average Coflow Completion Time (CCT). The trace-driven simulations show that SOCI can reduce CCT by up to \u0000<inline-formula><tex-math>$1.23times$</tex-math></inline-formula>\u0000 compared to CODA when the identification accuracy decreases and is comparable to schemes without coflow identification.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"1057-1069"},"PeriodicalIF":5.9,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135843617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memristive Crossbar Array-Based Computing Framework via DWT for Biomedical Image Enhancement","authors":"Kumari Jyoti;Mohit Kumar Gautam;Sanjay Kumar;Sai Sushma;Ram Bilas Pachori;Shaibal Mukherjee","doi":"10.1109/TETC.2023.3318303","DOIUrl":"10.1109/TETC.2023.3318303","url":null,"abstract":"Here, we report the fabrication of Y\u0000<sub>2</sub>\u0000O\u0000<sub>3</sub>\u0000-based memristive crossbar array (MCA) by utilizing dual ion beam sputtering system, which shows high cyclic stability in the resistive switching behavior. Further, the obtained experimental results are validated with an analytical MCA based model, which exhibits extremely well fitting with the corresponding experimental data. Moreover, the experimentally validated analytical model is further used for biomedical image analysis, specifically computed tomography (CT) scan and magnetic resonance imaging (MRI) images by utilizing the 2-dimensional image decomposition technique. The different levels of decomposition are used for different threshold values which help to analyze the quality of the reconstructed image in terms of peak signal-to-noise ratio, structural similarity index and mean square error. For the MRI and CT scan images, at the first decomposition level, the data compression ratio of 21.01%, and 47.81% with Haar and 18.82%, and 46.05% with biorthogonal wavelet are obtained. Furthermore, the impact of brightness is also analyzed which shows a sufficient increment in the quality of output image by 103.72% and 18.59% for CT scan and MRI image, respectively for Haar wavelet. The proposed MCA based model for image processing is a novel approach to reduce the computation time and storage for biomedical engineering.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"766-779"},"PeriodicalIF":5.1,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135800896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}