{"title":"Area and Power Efficient FFT/IFFT Processor for FALCON Post-Quantum Cryptography","authors":"Ghada Alsuhli;Hani Saleh;Mahmoud Al-Qutayri;Baker Mohammad;Thanos Stouraitis","doi":"10.1109/TETC.2024.3407124","DOIUrl":"10.1109/TETC.2024.3407124","url":null,"abstract":"Quantum computing is an emerging technology on the verge of reshaping industries, while simultaneously challenging existing cryptographic algorithms. FALCON, a recent standard quantum-resistant digital signature, presents a challenging hardware implementation due to its extensive non-integer polynomial operations, necessitating FFT over the ring <inline-formula><tex-math>$mathbb {Q}[x]/(x^{n}+1)$</tex-math></inline-formula>. This paper introduces an ultra-low-power and compact processor tailored for FFT/IFFT operations over the ring for efficient FALCON implementation. The proposed processor incorporates various optimization techniques, including twiddle factor compression and conflict-free scheduling. In an ASIC implementation using a 22 nm GF process, the proposed processor demonstrates an area occupancy of 0.15 mm<inline-formula><tex-math>$^{2}$</tex-math></inline-formula> and a power consumption of 12.6 mW/28.1 mW at an operating frequency of 167 MHz/500 MHz for the non-pipelined/pipelined version of the processor. Since a hardware implementation of FFT/IFFT over the ring is currently non-existent, the execution time achieved by this processor is compared to the reference software implementation of FFT/IFFT of FALCON on a Raspberry Pi 4 with Cortex-A72, where the proposed pipelined processor achieves a speedup up to 3.8×. Furthermore, in comparison to dedicated state-of-the-art hardware accelerators for classic FFT, the pipelined architecture occupies 42% less area and consumes 64% less power, on average. The quantified speedup in the context of FALCON suggests that the proposed hardware design offers a promising solution for the efficient implementation of FALCON.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 2","pages":"423-437"},"PeriodicalIF":5.1,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yun-Wei Chu;Seyyedali Hosseinalipour;Elizabeth Tenorio;Laura Cruz;Kerrie Douglas;Andrew S. Lan;Christopher G. Brinton
{"title":"Multi-Layer Personalized Federated Learning for Mitigating Biases in Student Predictive Analytics","authors":"Yun-Wei Chu;Seyyedali Hosseinalipour;Elizabeth Tenorio;Laura Cruz;Kerrie Douglas;Andrew S. Lan;Christopher G. Brinton","doi":"10.1109/TETC.2024.3407716","DOIUrl":"10.1109/TETC.2024.3407716","url":null,"abstract":"Conventional methods for student modeling, which involve predicting grades based on measured activities, struggle to provide accurate results for minority/ underrepresented student groups due to data availability biases. In this paper, we propose a Multi-Layer Personalized Federated Learning (MLPFL) methodology that optimizes inference accuracy over different layers of student grouping criteria, such as by course and by demographic subgroups within each course. In our approach, personalized models for individual student subgroups are derived from a global model, which is trained in a distributed fashion via meta-gradient updates that account for subgroup heterogeneity while preserving modeling commonalities that exist across the full dataset. The evaluation of the proposed methodology considers case studies of two popular downstream student modeling tasks, knowledge tracing and outcome prediction, which leverage multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums) in model training. Experiments on three real-world online course datasets show significant improvements achieved by our approach over existing student modeling benchmarks, as evidenced by an increased average prediction quality and decreased variance across different student subgroups. Visual analysis of the resulting students’ knowledge state embeddings confirm that our personalization methodology extracts activity patterns clustered into different student subgroups, consistent with the performance enhancements we obtain over the baselines.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 2","pages":"451-466"},"PeriodicalIF":5.1,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balancing Throughput and Fair Execution of Multi-DNN Workloads on Heterogeneous Embedded Devices","authors":"Andreas Karatzas;Iraklis Anagnostopoulos","doi":"10.1109/TETC.2024.3407055","DOIUrl":"10.1109/TETC.2024.3407055","url":null,"abstract":"The rise of Deep Neural Networks (DNNs) has resulted in complex workloads employing multiple DNNs concurrently. This trend introduces unique challenges related to workload distribution, particularly in heterogeneous embedded systems. Current run-time managers struggle to efficiently utilize all computing components on these platforms, resulting in two major problems. First, the system throughput deteriorates due to contention on the computing resources. Second, not all DNNs are affected equally, leading to inconsistent performance levels across different models. To address these challenges, we introduce FairBoost, a framework for efficient and fair multi-DNN inference on heterogeneous embedded systems. FairBoost employs Reinforcement Learning (RL) to efficiently manage multi-DNN workloads. Additionally, it incorporates a novel numerical representation of DNN layers via a Vector Quantized Variational Auto-Encoder (VQ-VAE). Finally, it enables knowledge transfer to similar heterogeneous embedded systems without retraining and/or fine-tuning. Experimental evaluation of FairBoost over 18 DNNs and various multi-DNN scenarios shows an average throughput/fairness improvement of <inline-formula><tex-math>$times 3.24$</tex-math></inline-formula>. Additionally, FairBoost facilitates knowledge transfer from the initial platform, Orange Pi 5, to a new system, Odroid N2+, without any retraining or fine-tuning achieving similar gains.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 2","pages":"409-422"},"PeriodicalIF":5.1,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BCIM: Efficient Implementation of Binary Neural Network Based on Computation in Memory","authors":"Mahdi Zahedi;Taha Shahroodi;Carlos Escuin;Georgi Gaydadjiev;Stephan Wong;Said Hamdioui","doi":"10.1109/TETC.2024.3406628","DOIUrl":"10.1109/TETC.2024.3406628","url":null,"abstract":"Applications of Binary Neural Networks (BNNs) are promising for embedded systems with hard constraints on energy and computing power. Contrary to conventional neural networks using floating-point datatypes, BNNs use binarized weights and activations to reduce memory and computation requirements. Memristors, emerging non-volatile memory devices, show great potential as a target implementation platform for BNNs by integrating storage and compute units. However, the efficiency of this hardware highly depends on how the network is mapped and executed on these devices. In this paper, we propose an efficient implementation of XNOR-based BNN to maximize parallelization. In this implementation, costly analog-to-digital converters are replaced with sense amplifiers with custom reference(s) to generate activation values. Besides, a novel mapping is introduced to minimize the overhead of data communication between convolution layers mapped to different memristor crossbars. This comes with extensive analytical and simulation-based analysis to evaluate the implication of different design choices considering the accuracy of the network. The results show that our approach achieves up to <inline-formula><tex-math>$5times$</tex-math></inline-formula> energy-saving and <inline-formula><tex-math>$100times$</tex-math></inline-formula> improvement in latency compared to baselines.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 2","pages":"395-408"},"PeriodicalIF":5.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Janus: A Trusted Execution Environment Approach for Attack Detection in Industrial Robot Controllers","authors":"Stefano Longari;Jacopo Jannone;Mario Polino;Michele Carminati;Andrea Zanchettin;Mara Tanelli;Stefano Zanero","doi":"10.1109/TETC.2024.3390435","DOIUrl":"10.1109/TETC.2024.3390435","url":null,"abstract":"In the last few decades, technological progress has led to a spike in the adoption of robots by the manufacturing industry. With the new “Industry 4.0” paradigm, companies strive to automate their production processes by interconnecting and integrating different industrial systems. The resulting increase in complexity contributes to a larger attack surface and paves the way for novel attacks. In the context of cyber-physical systems, consequences include economic and physical damage, as well as harm to human workers. In this article, we present Janus, a novel monitoring mechanism for industrial robot controllers that exploits the trusted execution environment (TEE) to guarantee the integrity of the attack detection algorithm even in case the controller's software is compromised, while not requiring external hardware for its detection process. In particular, we use the state observers strategy for detecting low-level controller (LLC) attacks. We assess our approach by testing it against various attacks, identifying those that are simpler to detect and pinpointing the more elusive ones, which are mostly detected nonetheless. Finally, we demonstrate that our approach does not add significant computation overheads.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 1","pages":"185-195"},"PeriodicalIF":5.1,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10508318","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140801415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Rasid Ali;Debranjan Pal;Abhijit Das;Dipanwita Roy Chowdhury
{"title":"HARPOCRATES: An Approach Towards Efficient Encryption of Data-at-Rest","authors":"Md Rasid Ali;Debranjan Pal;Abhijit Das;Dipanwita Roy Chowdhury","doi":"10.1109/TETC.2024.3387558","DOIUrl":"10.1109/TETC.2024.3387558","url":null,"abstract":"This paper proposes a new block cipher called HARPOCRATES, which is different from traditional SPN, Feistel, or ARX designs. The new design structure that we use is called the substitution convolution network. The novelty of the approach lies in that the substitution function does not use fixed S-boxes. Instead, it uses a key-driven lookup table storing a permutation of all 8-bit values. If the lookup table is sufficiently randomly shuffled, the round sub-operations achieve good confusion and diffusion to the cipher. While designing the cipher, the security, cost, and performances are balanced, keeping the requirements of encryption of data-at-rest in mind. The round sub-operations are massively parallelizable and designed such that a single active bit may make the entire state (an <inline-formula><tex-math>$8 times 16$</tex-math></inline-formula> binary matrix) active in one round. We analyze the security of the cipher against linear, differential, and impossible differential cryptanalysis. The cipher's resistance against many other attacks like algebraic attacks, structural attacks, and weak keys are also shown. We implemented the cipher in software and hardware; found that the software implementation of the cipher results in better throughput than many well-known ciphers. Although HARPOCRATES is appropriate for the encryption of data-at-rest, it is also well-suited in data-in-transit environments.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 1","pages":"173-184"},"PeriodicalIF":5.1,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LP-Star: Embedding Longest Paths into Star Networks With Large-Scale Missing Edges Under an Emerging Assessment Model","authors":"Xiao-Yan Li;Jou-Ming Chang","doi":"10.1109/TETC.2024.3387119","DOIUrl":"10.1109/TETC.2024.3387119","url":null,"abstract":"Star networks play an essential role in designing parallel and distributed systems. With the massive growth of faulty edges and the widespread applications of the longest paths and cycles, it is crucial to embed the longest fault-free paths and cycles in edge-faulty networks. However, the traditional fault model allows a concentrated distribution of faulty edges and thus can only tolerate faults that depend on the minimum degree of the network vertices. This article introduces an improved fault model called the partitioned fault model, which is an emerging assessment model for fault tolerance. Based on this model, we first explore the longest fault-free paths and cycles by proving the edge fault-tolerant Hamiltonian laceability, edge fault-tolerant strongly Hamiltonian laceability, and edge fault-tolerant Hamiltonicity in the <inline-formula><tex-math>$n$</tex-math></inline-formula>-dimensional star network <inline-formula><tex-math>$S_{n}$</tex-math></inline-formula>. Furthermore, based on the theoretical proof, we give an <inline-formula><tex-math>$O(nN)$</tex-math></inline-formula> algorithm to construct the longest fault-free paths in star networks based on the partitioned fault model, where <inline-formula><tex-math>$N$</tex-math></inline-formula> is the number of vertices in <inline-formula><tex-math>$S_{n}$</tex-math></inline-formula>. We also make comparisons to show that our result of edge fault tolerance has exponentially improved other known results.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 1","pages":"147-161"},"PeriodicalIF":5.1,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Casanueva-Morato;Alvaro Ayuso-Martinez;Juan P. Dominguez-Morales;Angel Jimenez-Fernandez;Gabriel Jimenez-Moreno
{"title":"A Bio-Inspired Implementation of a Sparse-Learning Spike-Based Hippocampus Memory Model","authors":"Daniel Casanueva-Morato;Alvaro Ayuso-Martinez;Juan P. Dominguez-Morales;Angel Jimenez-Fernandez;Gabriel Jimenez-Moreno","doi":"10.1109/TETC.2024.3387026","DOIUrl":"10.1109/TETC.2024.3387026","url":null,"abstract":"The brain is capable of solving complex problems simply and efficiently, far surpassing modern computers. In this regard, neuromorphic engineering focuses on mimicking the basic principles that govern the brain in order to develop systems that achieve such computational capabilities. Within this field, bio-inspired learning and memory systems are still a challenge to be solved, and this is where the hippocampus is involved. It is the region of the brain that acts as a short-term memory, allowing the learning and storage of information from all the sensory nuclei of the cerebral cortex and its subsequent recall. In this work, we propose a novel bio-inspired hippocampal memory model with the ability to learn memories, recall them from a fragment of itself (cue) and even forget memories when trying to learn others with the same cue. This model has been implemented on SpiNNaker using Spiking Neural Networks, and a set of experiments were performed to demonstrate its correct operation. This work presents the first simulation implemented on a special-purpose hardware platform for Spiking Neural Networks of a fully functional bio-inspired spike-based hippocampus memory model, paving the road for the development of future more complex neuromorphic systems.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 1","pages":"119-133"},"PeriodicalIF":5.1,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10502330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"One-Spike SNN: Single-Spike Phase Coding With Base Manipulation for ANN-to-SNN Conversion Loss Minimization","authors":"Sangwoo Hwang;Jaeha Kung","doi":"10.1109/TETC.2024.3386893","DOIUrl":"10.1109/TETC.2024.3386893","url":null,"abstract":"As spiking neural networks (SNNs) are event-driven, energy efficiency is higher than conventional artificial neural networks (ANNs). Since SNN delivers data through discrete spikes, it is difficult to use gradient methods for training, limiting its accuracy. To keep the accuracy of SNNs similar to ANN counterparts, pre-trained ANNs are converted to SNNs (ANN-to-SNN conversion). During the conversion, encoding activations of ANNs to a set of spikes in SNNs is crucial for minimizing the conversion loss. In this work, we propose a single-spike phase coding as an encoding scheme that minimizes the number of spikes to transfer data between SNN layers. To minimize the encoding error due to single-spike approximation in phase coding, threshold shift and base manipulation are proposed. Without any additional retraining or architectural constraints on ANNs, the proposed conversion method does not lose inference accuracy (0.58% on average) verified on three convolutional neural networks (CNNs) with CIFAR and ImageNet datasets. In addition, graph convolutional networks (GCNs) are converted to SNNs successfully with an average accuracy loss of 0.90%. Most importantly, the energy efficiency of our SNN improves by 4.6<inline-formula><tex-math>$sim!! 17.3times$</tex-math></inline-formula> compared to the ANN baseline.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 1","pages":"162-172"},"PeriodicalIF":5.1,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FakeTracer: Catching Face-Swap DeepFakes via Implanting Traces in Training","authors":"Pu Sun;Honggang Qi;Yuezun Li;Siwei Lyu","doi":"10.1109/TETC.2024.3386960","DOIUrl":"10.1109/TETC.2024.3386960","url":null,"abstract":"Face-swap DeepFake is an emerging AI-based face forgery technique that can replace the original face in a video with a generated face of the target identity while retaining consistent facial attributes such as expression and orientation. Due to the high privacy of faces, the misuse of this technique can raise severe social concerns, drawing tremendous attention to defend against DeepFakes recently. In this article, we describe a new proactive defense method called FakeTracer to expose face-swap DeepFakes via implanting traces in training. Compared to general face-synthesis DeepFake, the face-swap DeepFake is more complex as it involves identity change, is subjected to the encoding-decoding process, and is trained unsupervised, increasing the difficulty of implanting traces into the training phase. To effectively defend against face-swap DeepFake, we design two types of traces, sustainable trace (STrace) and erasable trace (ETrace), to be added to training faces. During the training, these manipulated faces affect the learning of the face-swap DeepFake model, enabling it to generate faces that only contain sustainable traces. In light of these two traces, our method can effectively expose DeepFakes by identifying them. Extensive experiments corroborate the efficacy of our method on defending against face-swap DeepFake.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 1","pages":"134-146"},"PeriodicalIF":5.1,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}