Charles Gouert;Dimitris Mouris;Nektarios Georgios Tsoutsos
{"title":"Juliet: A Configurable Processor for Computing on Encrypted Data","authors":"Charles Gouert;Dimitris Mouris;Nektarios Georgios Tsoutsos","doi":"10.1109/TC.2024.3416752","DOIUrl":"10.1109/TC.2024.3416752","url":null,"abstract":"Fully homomorphic encryption (FHE) has become progressively more viable in the years since its original inception in 2009. At the same time, leveraging state-of-the-art schemes in an efficient way for general computation remains prohibitively difficult for the average programmer. In this work, we introduce a new design for a fully homomorphic processor, dubbed Juliet, to enable faster operations on encrypted data using the state-of-the-art TFHE and cuFHE libraries for both CPU and GPU evaluation. To improve usability, we define an expressive assembly language and instruction set architecture (ISA) judiciously designed for end-to-end encrypted computation. We demonstrate Juliet's capabilities with a broad range of realistic benchmarks including cryptographic algorithms, such as the lightweight ciphers \u0000<sc>Simon</small>\u0000 and \u0000<sc>Speck</small>\u0000, as well as logistic regression (LR) inference and matrix multiplication.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2335-2349"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Fault-Tolerant Path Embedding for 3D Torus Network Using Locally Faulty Blocks","authors":"Weibei Fan;Fu Xiao;Mengjie Lv;Lei Han;Shui Yu","doi":"10.1109/TC.2024.3416695","DOIUrl":"https://doi.org/10.1109/TC.2024.3416695","url":null,"abstract":"3D tori are significant interconnection architectures in building supercomputers and parallel computing systems. Due to the rapid growth of edge faults and the crucial role of path structures in large-scale distributed systems, fault-tolerant path embedding and correlated issues have drawn widespread researches. However, existing path embedding methods are based on traditional fault models, allowing all faults to be near the same node, so they usually only focus on theoretical proof and generate linear fault-tolerance related to dimension \u0000<inline-formula><tex-math>$n$</tex-math></inline-formula>\u0000. In order to improve the fault-tolerance of 3D torus, we first propose a novel conditional fault model called the Locally Faulty Block model (LFB model). On the basis of this model, the Hamiltonian paths with large-scale edge defects in torus are investigated. After that, we construct an Hamiltonian path embedding algorithm HP-LFB into torus with \u0000<inline-formula><tex-math>$O(N)$</tex-math></inline-formula>\u0000 under the LFB model, where \u0000<inline-formula><tex-math>$N$</tex-math></inline-formula>\u0000 is the number of nodes in torus. Furthermore, we present an adaptive routing algorithm HoeFA, which is based on the method of distance vector to limit the use of virtual channels (VCs). We also make a comparison with state-of-the-art schemes, indicating that our scheme enhance other comprehensive results. The experiment indicated that HP-LFB can sustain the dynamic degradation of the batting average of establishing Hamiltonian paths, with the added faulty edges exceeding fault-tolerance.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2305-2319"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141966295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuxuan Chen;Zhen Zhang;Yuhui Deng;Geyong Min;Lin Cui
{"title":"A Combined Trend Virtual Machine Consolidation Strategy for Cloud Data Centers","authors":"Yuxuan Chen;Zhen Zhang;Yuhui Deng;Geyong Min;Lin Cui","doi":"10.1109/TC.2024.3416734","DOIUrl":"10.1109/TC.2024.3416734","url":null,"abstract":"Virtual machine (VM) consolidation strategies are widely used in cloud data centers (CDC) to optimize resource utilization and reduce total energy consumption. Although existing strategies consider current and future resource utilization, the impact of sudden bursts in historical resource utilization on the hosts has been underestimated in uncertain future periods. Insufficient analysis of historical resource utilization may increase the risk of host overloading and Service Level Agreement Violation (SLAV). By defining historical and future trends based on resource utilization, we propose a novel combined trend VM consolidation (CTVMC) strategy which can effectively reduce energy consumption and SLAV. The VMs with the largest combined trend are selected for migration to prevent host overloading. Based on the temporal locality and prediction technique, CTVMC then employs the past, present, and future resource utilization to filter candidate hosts, and identifies the most complementary host to place VM using combined trends. We conduct extensive simulation experiments with PlanetLab Trace and Google Cluster Trace in the CloudSim simulator. Compared with the well-known strategies, CTVMC strategy using the PlanetLab Trace can reduce the number of migrations by over 72.39%, SLAV by over 75.85%, and ESV (a combined metric that judges the trade-off between energy consumption and SLAV) by over 81.54%. According to the Google Cluster Trace, our strategy can reduce the number of migrations by over 61.51%, SLAV by over 37.37%, and ESV by over 35.30%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2150-2164"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Ding;Wei Tong;Yu Hua;Zhangyu Chen;Xueliang Wei;Dan Feng
{"title":"Enabling Reliable Memory-Mapped I/O With Auto-Snapshot for Persistent Memory Systems","authors":"Bo Ding;Wei Tong;Yu Hua;Zhangyu Chen;Xueliang Wei;Dan Feng","doi":"10.1109/TC.2024.3416683","DOIUrl":"10.1109/TC.2024.3416683","url":null,"abstract":"Persistent memory (PM) is promising to be the next-generation storage device with better I/O performance. Since the traditional I/O path is too lengthy to drive PM featuring low latency and high bandwidth, prior works proposed memory-mapped I/O (MMIO) to shorten the I/O path to PM. However, native MMIO directly maps files into the user address space, which puts files at risk of being corrupted by scribbles and non-atomic I/O interfaces, causing serious reliability issues. To address these issues, we propose RMMIO, an efficient user-space library that provides reliable MMIO for PM systems. RMMIO provides atomic I/O interfaces and lightweight snapshots to ensure the reliability of MMIO. Compared with existing schemes, RMMIO mitigates additional writes and extra software overheads caused by reliability guarantees, thus achieving MMIO-like performance. In addition, we also propose an automatic snapshot with efficient memory management for RMMIO to minimize data loss incurred by reliability issues. The experimental results of microbenchmarks show that RMMIO achieves 8.49x and 2.31x higher throughput than ext4-DAX and the state-of-the-art MMIO-based scheme, respectively, while ensuring data reliability. The real-world application accelerated by RMMIO achieves at most 7.06x higher throughput than that of ext4-DAX.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2290-2304"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Highly Evasive Targeted Bit-Trojan on Deep Neural Networks","authors":"Lingxin Jin;Wei Jiang;Jinyu Zhan;Xiangyu Wen","doi":"10.1109/TC.2024.3416705","DOIUrl":"10.1109/TC.2024.3416705","url":null,"abstract":"Bit-Trojan attacks based on Bit-Flip Attacks (BFAs) have emerged as severe threats to Deep Neural Networks (DNNs) deployed in safety-critical systems since they can inject Trojans during the model deployment stage without accessing training supply chains. Existing works are mainly devoted to improving the executability of Bit-Trojan attacks, while seriously ignoring the concerns on evasiveness. In this paper, we propose a highly Evasive Targeted Bit-Trojan (ETBT) with evasiveness improvements from three aspects, i.e., reducing the number of bit-flips (improving executability), smoothing activation distribution, and reducing accuracy fluctuation. Specifically, key neuron extraction is utilized to identify essential neurons from DNNs precisely and decouple the key neurons between different classes, thus improving the evasiveness regarding accuracy fluctuation and executability. Additionally, activation-constrained trigger generation is devised to eliminate the differences between activation distributions of Trojaned and clean models, which enhances evasiveness from the perspective of activation distribution. Ultimately, the strategy of constrained target bits search is designed to reduce bit-flip numbers, directly benefits the evasiveness of ETBT. Benchmark-based experiments are conducted to evaluate the superiority of ETBT. Compared with existing works, ETBT can significantly improve evasiveness-relevant performances with much lower computation overheads, better robustness, and generalizability. Our code is released at \u0000<uri>https://github.com/bluefier/ETBT</uri>\u0000.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2350-2363"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hiding in Plain Sight: Adversarial Attack via Style Transfer on Image Borders","authors":"Haiyan Zhang;Xinghua Li;Jiawei Tang;Chunlei Peng;Yunwei Wang;Ning Zhang;Yingbin Miao;Ximeng Liu;Kim-Kwang Raymond Choo","doi":"10.1109/TC.2024.3416761","DOIUrl":"10.1109/TC.2024.3416761","url":null,"abstract":"Deep Convolution Neural Networks (CNNs) have become the cornerstone of image classification, but the emergence of adversarial image attacks brings serious security risks to CNN-based applications. As a local perturbation attack, the border attack can achieve high success rates by only modifying the pixels around the border of an image, which is a novel attack perspective. However, existing border attacks have shortcomings in stealthiness and are easily detected. In this article, we propose a novel stealthy border attack method based on deep feature alignment. Specifically, we propose a deep feature alignment algorithm based on style transfer to guarantee the stealthiness of adversarial borders. The algorithm takes the deep feature difference between the adversarial and the original borders as the stealthiness loss and thus ensures good stealthiness of the generated adversarial images. To ensure high attack success rates simultaneously, we apply cross entropy to design the targeted attack loss and use margin loss as well as Leaky ReLU to design the untargeted attack loss. Experiments show that the structural similarity between the generated adversarial images and the original images is 8.8% higher than the state-of-art border attack method, indicating that our proposed adversarial images have better stealthiness. At the same time, the success rate of our attack in the face of defense methods is much higher, which is about four times that of the state-of-art border attack under the adversarial training defense.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 10","pages":"2405-2419"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qilin Hu;Yan Ding;Chubo Liu;Keqin Li;Kenli Li;Albert Y. Zomaya
{"title":"CBANA: A Lightweight, Efficient, and Flexible Cache Behavior Analysis Framework","authors":"Qilin Hu;Yan Ding;Chubo Liu;Keqin Li;Kenli Li;Albert Y. Zomaya","doi":"10.1109/TC.2024.3416747","DOIUrl":"10.1109/TC.2024.3416747","url":null,"abstract":"Cache miss analysis has become one of the most important things to improve the execution performance of a program. Generally, the approaches for analyzing cache misses can be categorized into dynamic analysis and static analysis. The former collects sampling statistics during program execution but is limited to specialized hardware support and incurs expensive execution overhead. The latter avoids the limitations but faces two challenges: inaccurate execution path prediction and inefficient analysis resulted by the explosion of the program state graph. To overcome these challenges, we propose CBANA, an LLVM- and process address space-based lightweight, efficient, and flexible cache behavior analysis framework. CBANA significantly improves the prediction accuracy of the execution path with awareness of inputs. To improve analysis efficiency and utilize the program preprocessing, CBANA refactors loop structures to reduce search space and dynamically splices intermediate results to reduce unnecessary or redundant computations. CBANA also supports configurable hardware parameter settings, and decouples the module of cache replacement policy from other modules. Thus, its flexibility is established. We evaluate CBANA by using the popular open benchmark PolyBench, graph workloads, and our synthetic workloads with good and poor data locality. Compared with the popular dynamic cache analysis tools Perf and Valgrind, the cache miss gap is less than 3.79% and 2.74% respectively with over ten thousand data accesses for the synthetic workloads, and the time reduction is up to 92.38% and 97.51% for the multiple-path workloads. Compared with the popular static cache analysis tool Heptane, CBANA achieves a time reduction of 97.71% while ensuring accuracy at the same time.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2262-2274"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantum Support Vector Machine for Classifying Noisy Data","authors":"Jiaye Li;Yangding Li;Jiagang Song;Jian Zhang;Shichao Zhang","doi":"10.1109/TC.2024.3416619","DOIUrl":"10.1109/TC.2024.3416619","url":null,"abstract":"Noisy data is ubiquitous in quantum computer, greatly affecting the performance of various algorithms. However, existing quantum support vector machine models are not equipped with anti-noise ability, and often deliver low performance when learning accurate hyperplane normal vectors from noisy data. To attack this issue, an anti-noise quantum support vector machine algorithm is developed in this paper. Specifically, a weight factor is first embedded into the hinge loss, so as to construct the objective function of anti-noise support vector machine. And then, an alternative iterative optimization strategy and a quantum circuit are designed for solving the objective function, aiming to obtain the normal vector and intercept of the hyperplane that finally divides the data. Finally, the classification and anti-noise effect of the algorithm are verified on artificial dataset and public dataset. Experimental results show that the proposed algorithm is efficient, yet maintains stable accuracy in noisy data.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2233-2247"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yun-Chen Lo;Jun-Shen Wu;Chia-Chun Wang;Yu-Chih Tsai;Chih-Chen Yeh;Wen-Chien Ting;Ren-Shuo Liu
{"title":"ISSA: Architecting CNN Accelerators Using Input-Skippable, Set-Associative Computing-in-Memory","authors":"Yun-Chen Lo;Jun-Shen Wu;Chia-Chun Wang;Yu-Chih Tsai;Chih-Chen Yeh;Wen-Chien Ting;Ren-Shuo Liu","doi":"10.1109/TC.2024.3404060","DOIUrl":"10.1109/TC.2024.3404060","url":null,"abstract":"Among several emerging architectures, computing in memory (CIM), which features in-situ analog computation, is a potential solution to the data movement bottleneck of the Von Neumann architecture for artificial intelligence (AI). Interestingly, more strengths of CIM significantly different from in-situ analog computation are not widely known yet. In this work, we point out that mutually stationary vectors (MSVs), which can be maximized by introducing associativity to CIM, are another inherent power unique to CIM. By MSVs, CIM exhibits significant freedom to dynamically vectorize the stored data (e.g., weights) to perform agile computation using the dynamically formed vectors. We have designed and realized an SA-CIM silicon prototype and corresponding architecture and acceleration schemes in the TSMC 28 nm process. More specifically, the contributions of this paper are fivefold: 1) We identify MSVs as new features that can be exploited to improve the current performance and energy challenges of the CIM-based hardware. 2) We propose SA-CIM to enhance MSVs (input-reordering flexibility) for skipping the zeros, small values, and sparse vectors. 3) We propose channel swapping to enhance the zero-skipping technique. 4) We propose a transposed systolic dataflow to efficiently conduct conv3\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u00003 while being capable of exploiting input-skipping schemes. 5) We propose a design flow to search for optimal aggressive skipping scheme setups while satisfying the accuracy loss constraint. The proposed ISSA architecture improves the throughput by \u0000<inline-formula><tex-math>$1.91times$</tex-math></inline-formula>\u0000 to \u0000<inline-formula><tex-math>$2.97times$</tex-math></inline-formula>\u0000 speedup and the energy efficiency by \u0000<inline-formula><tex-math>$2.5times$</tex-math></inline-formula>\u0000 to \u0000<inline-formula><tex-math>$4.2times$</tex-math></inline-formula>\u0000.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2136-2149"},"PeriodicalIF":3.6,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mobility-Aware Utility Maximization in Digital Twin-Enabled Serverless Edge Computing","authors":"Jing Li;Song Guo;Weifa Liang;Jianping Wang;Quan Chen;Wenchao Xu;Kang Wei;Xiaohua Jia","doi":"10.1109/TC.2024.3388897","DOIUrl":"10.1109/TC.2024.3388897","url":null,"abstract":"Driven by data and models, the digital twin technique presents a new concept of optimizing system design, process monitoring, decision-making and more, through performing comprehensive virtual-reality interaction and continuous mapping. By introducing serverless computing to Mobile Edge Computing (MEC) environments, the emerging serverless edge computing paradigm facilitates the communication-efficient digital twin services, and promises agile, fine-grained and cost-efficient provisioning of limited edge resources, where serverless functions are implemented by containers in cloudlets (edge servers). However, the nonnegligible cold start delay of containers deteriorates the responsiveness of digital twin services dramatically and the perceived user service experience. In this paper, we investigate delay-sensitive query service provisioning in digital twin-empowered serverless edge computing by considering user mobility. With digital twins of users deployed in the remote cloud, referred to as primary digital twins, we deploy their digital twin replicas based on serverless functions in cloudlets to mitigate the query service delay while enhancing user service satisfaction that is expressed as a utility function. We study two optimization problems with the aim of maximizing the accumulative utility gain: the digital twin replica placement problem per time slot, and the dynamic digital twin replica placement problem over a finite time horizon. We first formulate an Integer Linear Program (ILP) solution for the digital twin replica placement problem when the problem size is small; otherwise, we propose an approximation algorithm for the problem with a provable approximation ratio. We then design an online algorithm for the dynamic digital twin replica placement problem, and a performance-guaranteed online algorithm for a special case of the problem by assuming each user issues a query at each time slot. Finally, we evaluate the performance of the proposed algorithms for placing digital twin replicas in MEC networks through simulations. The results demonstrate the proposed algorithms are promising, outperforming their counterparts.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1837-1851"},"PeriodicalIF":3.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}