{"title":"Information Sharing in Multi-Tenant Metaverse via Intent-Driven Multicasting","authors":"Yu Qiu;Min Chen;Weifa Liang;Lejun Ai;Dusit Niyato","doi":"10.1109/TC.2025.3603720","DOIUrl":"https://doi.org/10.1109/TC.2025.3603720","url":null,"abstract":"A multi-tenant metaverse enables multiple users in a common virtual world to interact with each other online. Information sharing will occur when interactions between a user and the environment are multicast to other users by an interactive metaverse (IM) service. However, ineffective information-sharing strategies intensify competitions among users for limited resources in networks, and fail to interpret optimization intent prompts conveyed in high-level natural languages, ultimately diminishing user immersion. In this paper, we explore reliable information sharing in a multi-tenant metaverse with time-varying resource capacities and costs, where IM services are unreliable and alter the volumes of data processed by them, while the service provider dynamically adjusts global intent to minimize multicast delays and costs. To this end, we first formulate the information sharing problem as a Markov decision process and show its NP-hardness. Then, we propose a learning-based system GTP, which combines the proximal policy optimization reinforcement learning with feature extraction networks, including graph attention network and gated recurrent unit, and a Transformer encoder for multi-feature comparison to process a sequence of incoming multicast requests without the knowledge of future arrival information. The GTP operates through three modules: a deployer that allocates primary and backup IM services across the network to minimize a weighted goal of server computation costs and communication distances between users and services, an intent extractor that dynamically infers provider intent conveyed in natural language, and a router that constructs on-demand multicast routing trees adhering to users, the provider, and network constraints. We finally conduct theoretical and empirical analysis on the proposed algorithms for the system. Experimental results show that the proposed algorithms are promising, and superior to their comparison baseline algorithms.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3763-3777"},"PeriodicalIF":3.8,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Computation-Quantized Training Framework to Generate Accuracy Lossless QNNs for One-Shot Deployment in Embedded Systems","authors":"Xingzhi Zhou;Wei Jiang;Jinyu Zhan;Lingxin Jin;Lin Zuo","doi":"10.1109/TC.2025.3603732","DOIUrl":"https://doi.org/10.1109/TC.2025.3603732","url":null,"abstract":"Quantized Neural Networks (QNNs) have received increasing attention, since they can enrich intelligent applications deployed on embedded devices with limited resources, such as mobile devices and AIoT systems. Unfortunately, the numerical and computational discrepancies between training systems (i.e., servers) and deployment systems (e.g., embedded ends) may lead to large accuracy drop for QNNs in real deployments. We propose a Computation-Quantized Training Framework (CQTF), which simulates deployment-time fixed-point computation during training to enable one-shot, lossless deployment. The training procedure of CQTF is built upon a well-formulated quantization-specific numerical representation that quantifies both numerical and computational discrepancies between training and deployment. Leveraging this representation, forward propagation executes all computations in quantization mode to simulate deployment-time inference, while backward propagation identifies and mitigates gradient vanishing through an efficient floating-point gradient update scheme. Benchmark-based experiments demonstrate the efficiency of our approach, which can achieve no accuracy loss from training to deployment. Compared with existing five frameworks, the deployed accuracy of CQTF can be improved by up to 18.41%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3818-3831"},"PeriodicalIF":3.8,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Bin Packing With Heterogeneous Dependent Bins for Regionless in Geo-Distributed Clouds","authors":"Yinuo Li;Jin-Kao Hao;Liwei Song","doi":"10.1109/TC.2025.3602297","DOIUrl":"https://doi.org/10.1109/TC.2025.3602297","url":null,"abstract":"Cloud service providers use geo-distributed datacenters to provide resources and services to clients located in different regions. However, uneven population density leads to unbalanced development of geo-distributed datacenters and cloud service providers face a shortage of land resources to further develop datacenters in densely populated regions. Thus, it is a real challenge for cloud service providers to meet the increasing demand from clients in affluent regions with saturated resources and to better utilize underutilized data centers in other regions. To address this challenge, we study an online resource allocation problem in geo-distributed clouds, whose goal is to assign each user request upon arrival to an appropriate geographic cloud region to minimize the resulting peak utilization of resource pools with different cost coefficients. To this end, we formulate the problem as a dynamic bin packing problem with heterogeneous dependent bins where user requests correspond to items to be packed and heterogeneous cloud resources are bins. To solve this online problem with high uncertainty, we propose a simulation based memetic algorithm to generate robust offline proactive policies based on historical data, which enable fast decision making for online packing. Our experiments based on realistic data show that the proposed approach leads to a reduction in total costs of up to 15% compared to the current practice, while being much faster for decision making compared to a popular online method.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3596-3608"},"PeriodicalIF":3.8,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Angione;Paolo Bernardi;Andrea Calabrese;Lorenzo Cardone;Stefano Quer;Claudia Bertani;Vincenzo Tancorre
{"title":"A Novel Indirect Methodology Based on Execution Traces for Grading Functional Test Programs","authors":"Francesco Angione;Paolo Bernardi;Andrea Calabrese;Lorenzo Cardone;Stefano Quer;Claudia Bertani;Vincenzo Tancorre","doi":"10.1109/TC.2025.3600005","DOIUrl":"https://doi.org/10.1109/TC.2025.3600005","url":null,"abstract":"Developing functional test programs for hardware testing is time-consuming and experience-wise. A functional test program’s quality is usually assessed only through expensive fault simulation campaigns during early development. This paper presents indirect quality measurements of fault detection capabilities of functional test programs to reduce the total cost of fault simulation in the early development stages. We present a methodology that analyzes the instruction trace generated by running functional test programs on-chip and building its control and dataflow graph. We use the graph to identify potential flaws that affect the program’s fault detection capabilities. We present different graph-based techniques to measure the programs’ quality indirectly. By exploiting standard debugging formats, we individuate instructions in the source code that affect the graph-based measurements. We perform experiments on an automotive device manufactured by STMicroelectronics, running functional test programs of different natures. Our results show that our metric allows test engineers to develop better functional test programs without basing their development solely on fault simulation campaigns.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3582-3595"},"PeriodicalIF":3.8,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cost-Efficient Delay-Bounded Dependent Task Offloading With Service Caching at Edges","authors":"Yu Liang;Sheng Zhang;Jie Wu","doi":"10.1109/TC.2025.3598749","DOIUrl":"https://doi.org/10.1109/TC.2025.3598749","url":null,"abstract":"We are now embracing an era of edge computing and artificial intelligence, and the combination of the two has spawned a new field of research called edge intelligence. Massive amounts of data is generated at the edge of network, which relies on artificial intelligence to realize its potential. Meanwhile, artificial intelligence is able to flourish when processing diverse edge data. However, the computation and storage resources of edge servers are not unlimited. For some large-scale intelligent applications, it is difficult to meet their service quality requirements by directly offloading the entire application to a nearby server for processing. Due to the heterogeneity of server resources in edge environments, how to balance the workload among edge servers to provide better services also becomes complicated. The goal of this paper is to minimize the total cost of offloading large-scale applications consisting of many dependent tasks in an edge system. We formulate the Dependent task Offloading with Service Caching (DOSC) problem, which is proved to be NP-hard. A dynamic planning-based algorithm is introduced to solve fixed-DOSC, in which some services are pre-configured on the edge server, and other services can not be downloaded from the remote cloud. We also present a theoretical analysis on the performance guarantee of the dynamic planning-based algorithm. Then, we propose a near-optimal algorithm using the Gibbs sampling to solve the general DOSC problem. Testbed experiments and trace-driven simulations are conducted to verify the performance of our algorithm. Our algorithm, shown to be the most effective in terms of cost, considers both service caching and task dependencies when task offloading in comparison to other baseline algorithms.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3568-3581"},"PeriodicalIF":3.8,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145248074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TeeRollup: Efficient Rollup Design Using Heterogeneous TEE","authors":"Xiaoqing Wen;Quanbi Feng;Hanzheng Lyu;Jianyu Niu;Yinqian Zhang;Chen Feng","doi":"10.1109/TC.2025.3596698","DOIUrl":"https://doi.org/10.1109/TC.2025.3596698","url":null,"abstract":"Rollups have emerged as a promising approach to improving blockchains’ scalability by offloading transaction execution off-chain. Existing rollup solutions either leverage complex zero-knowledge proofs or optimistically assume execution correctness unless challenged. However, these solutions suffer from high gas costs and significant withdrawal delays, hindering their adoption in decentralized applications. This paper introduces <sc>TeeRollup</small>, an efficient rollup protocol that leverages Trusted Execution Environments (TEEs) to achieve both low gas costs and short withdrawal delays. Sequencers (<italic>i.e.</i>, system participants) execute transactions within TEEs and upload signed execution results to the blockchain with confidential keys of TEEs. Unlike most TEE-assisted blockchain designs, <sc>TeeRollup</small> adopts a practical threat model where the integrity and availability of TEEs may be compromised. To address these issues, we first introduce a distributed system of sequencers with heterogeneous TEEs, ensuring system security even if a certain proportion of TEEs are compromised. Second, we propose a challenge mechanism to solve the redeemability issue caused by TEE unavailability. Furthermore, <sc>TeeRollup</small> incorporates Data Availability Providers (DAPs) to reduce on-chain storage overhead and uses a laziness penalty mechanism to regulate DAP behavior. We implement a prototype of <sc>TeeRollup</small> in Golang, using the Ethereum test network, Sepolia. Our experimental results indicate that <sc>TeeRollup</small> outperforms zero-knowledge rollups (ZK-rollups), reducing on-chain verification costs by approximately 86% and withdrawal delays to a few minutes.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3546-3558"},"PeriodicalIF":3.8,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallelization Strategies for DeepMD-Kit Using OpenMP: Enhancing Efficiency in Machine Learning-Based Molecular Simulations","authors":"Qi Du;Feng Wang;Chengkun Wu","doi":"10.1109/TC.2025.3595078","DOIUrl":"https://doi.org/10.1109/TC.2025.3595078","url":null,"abstract":"DeepMD-kit enables deep learning-based molecular dynamics (MD) simulations that require efficient parallelization to leverage modern HPC architectures. In this work, we optimize DeepMD-kit using advanced OpenMP strategies to improve scalability and computational efficiency on an ARMv8 processor-based server. Our optimizations include data parallelism for neural network inference, force calculation acceleration, NUMA-aware memory management, and synchronization reductions, leading to up to <inline-formula><tex-math>$4.1boldsymbol{times}$</tex-math></inline-formula> speedup and 82% higher memory bandwidth efficiency compared to the baseline implementation. Strong scaling analysis demonstrates superlinear speedup at mid-range core counts, with improved workload balancing and vectorized computations. However, challenges remain at ultra-large scales due to increasing synchronization overhead.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3534-3545"},"PeriodicalIF":3.8,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EC2P: Cost-Effective Cross-Chain Payments via Hubs Resisting the Abort Attack","authors":"Danlei Xiao;Shaobo Xu;Chuan Zhang;Licheng Wang;Xiulong Liu;Liehuang Zhu","doi":"10.1109/TC.2025.3590960","DOIUrl":"https://doi.org/10.1109/TC.2025.3590960","url":null,"abstract":"Cross-chain technology facilitates the interoperability among isolated blockchains, where users can transfer and exchange coins. While the heterogeneity between Turing-complete (TC) blockchains like Ethereum and non-Turing-complete (NTC) blockchains like Bitcoin presents a significant challenge for cross-chain transactions. Payment Channel Hubs (PCHs) offer a promising solution for enabling TC-NTC cross-chain payments with high throughput and low confirmation delays. However, existing schemes still face two key challenges: (i) significant computation and communication overhead for variable-amount payment, and (ii) limited unlinkability, i.e., vulnerable to the abort attack. This paper proposes EC2P, the first TC-NTC cross-chain PCH that achieves variable-amount payment unlinkability while resisting the abort attack and minimizing reliance on non-interactive zero-knowledge (NIZK) proofs. EC2P introduces two protocols: the NTC-to-TC and TC-to-NTC payment protocols. The NTC-to-TC payment protocol replaces the traditional puzzle-promise and puzzle-solve paradigm with a semi-blind approach, where only one side is blinded and the blinded side’s interactions are eliminated. This achieves unlinkability and resists the abort attack without NIZK. The TC-to-NTC payment protocol enhances the paradigm by utilizing Turing-complete functionality to constrain the inability to carry out an abort attack. Through rigorous security analysis, we show that EC2P is secure and variable-amount payment unlinkable while resisting the abort attack. We implement EC2P on Ethereum and Bitcoin test networks. Our evaluation demonstrates that EC2P outperforms both in terms of communication and computation overhead and reduces communication costs by 3 orders of magnitude compared to existing variable-amount methods.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3504-3518"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PRECIOUS: Approximate Real-Time Computing in MLC-MRAM Based Heterogeneous CMPs","authors":"Sangeet Saha;Shounak Chakraborty;Sukarn Agarwal;Magnus Själander;Klaus McDonald-Maier","doi":"10.1109/TC.2025.3590809","DOIUrl":"https://doi.org/10.1109/TC.2025.3590809","url":null,"abstract":"Enhancing quality of service (QoS) in approximate-computing (AC) based real-time systems, without violating power limits is becoming increasingly challenging due to contradictory constraints, i.e., power consumption and time criticality, as multicore computing platforms are becoming heterogeneous. To fulfill these constraints and optimise system QoS, AC tasks should be judiciously mapped on such platforms. However, prior approaches rarely considered the problem of AC task deployment on heterogeneous platforms. Moreover, the majority of prior approaches typically neglect the runtime architectural phenomena, which can be accounted for along with the approximation tolerance of the applications to enhance the QoS. We present <italic>PRECIOUS</i>, a novel hybrid offline-online approach that first <italic>schedules AC real-time</i> tasks on a <italic>heterogeneous multicore</i> with an objective to maximise QoS and determines the appropriate cluster for each task constrained by a system-wide power limit, deadline, and task-dependency. At runtime, <italic>PRECIOUS</i> introduces novel architectural techniques for the AC tasks, where tasks are executed on a heterogeneous platform equipped with <italic>multilevel-cell (MLC)-MRAM</i> based last-level cache to improve energy efficiency and performance by prudentially leveraging storage density of MLC-MRAM while ameliorating associated high write latency and write energy. Our novel block management for the MLC-MRAM cache further improves performance of the system, which we exploit opportunistically to enhance system QoS, and turn off processor cores during the dynamically generated slacks. <italic>PRECIOUS-Offline</i> achieves up to 76% QoS for a specific task-set, surpassing prior art, whereas <italic>PRECIOUS-Online</i> enhances QoS by 9.0% by reducing cache miss-rate by 19% on a 64-core heterogeneous system without incurring any energy overhead over a conventional MRAM based cache design.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3476-3489"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yile Xing;Guangyan Li;Zewen Ye;Ryan W. L. Luk;Donglong Chen;Hong Yan;Ray C. C. Cheung
{"title":"High-Radix/Mixed-Radix NTT Multiplication Algorithm/Architecture Co-Design Over Fermat Modulus","authors":"Yile Xing;Guangyan Li;Zewen Ye;Ryan W. L. Luk;Donglong Chen;Hong Yan;Ray C. C. Cheung","doi":"10.1109/TC.2025.3590972","DOIUrl":"https://doi.org/10.1109/TC.2025.3590972","url":null,"abstract":"Polynomial multiplication using Number Theoretic Transform (NTT) is crucial in lattice-based post-quantum cryptography (PQC) and fully homomorphic encryption (FHE), with modulus <inline-formula><tex-math>$q$</tex-math></inline-formula> significantly affecting performance. Fermat moduli of the form <inline-formula><tex-math>$2^{2^{n}}+1$</tex-math></inline-formula>, such as 65537, offer efficiency gains due to simplified modular reduction and powers-of-2 twiddle factors in NTT. While Fermat moduli have been directly applied or explored for incorporation into existing schemes, Fermat NTT-based polynomial multiplication designs remain underexplored in fully exploiting the benefits of Fermat moduli. This work presents a high-radix/mixed-radix NTT architecture tailored for Fermat moduli, which improves the utilization of the powers-of-2 twiddle factors in large transform sizes. In most cases, our design achieves a 30%–85% reduction in DSP area-time product (ATP) and a 70%–100% reduction in BRAM ATP compared to state-of-the-art designs with smaller or equivalent modulus, while maintaining competitive LUT and FF ATP, underscoring the potential of Fermat NTT-based polynomial multipliers in lattice-based cryptography.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3519-3533"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}