IEEE Transactions on Computers最新文献_第6页

A High-Intensity Solution of Hardware Accelerator for Sparse and Redundant Computations in Semantic Segmentation Models 语义分割模型中稀疏冗余计算的硬件加速器高强度解决方案

IF 3.8 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-07-02 DOI: 10.1109/TC.2025.3585354

Jiahui Huang;Zhan Li;Yuxian Jiang;Zhihan Zhang;Hao Wang;Sheng Chang

{"title":"A High-Intensity Solution of Hardware Accelerator for Sparse and Redundant Computations in Semantic Segmentation Models","authors":"Jiahui Huang;Zhan Li;Yuxian Jiang;Zhihan Zhang;Hao Wang;Sheng Chang","doi":"10.1109/TC.2025.3585354","DOIUrl":"https://doi.org/10.1109/TC.2025.3585354","url":null,"abstract":"The rapid development of artificial intelligence (AI) has met people's personalized needs. However, with the increase of data capacities and computing requirements, the imbalance between large-scale data transmission and limited network bandwidth has become increasingly prominent. To improve the speed of embedded system, real-time intelligent computing is gradually moving from the cloud to the edge. Traditional FPGA-based AI accelerators mainly utilize PE architecture, but the low computing throughput and resource utilization make it difficult to meet the power requirement of edge AI application scenarios such as image segmentation. In recent years, AI accelerators based on streaming architecture have become a trend, and it is necessary to customize high-performance streaming accelerators for specific segmentation algorithms. In this paper, we design a high-intensity pixel-level fully pipelined accelerator with customized strategies to eliminate the sparse and redundant computations in specific algorithms of semantic segmentation, which significantly improve the accelerator's computing throughput and hardware resources utilization. On Xilinx FPGA, our acceleration of two typical semantic segmentation networks-ESPNet and DeepLabV3, achieves optimized throughputs of 171.3 GOPS and 1324.8 GOPS, and computing efficiency of 9.26 and 9.01, respectively. It provides the possibility of hardware deployment in real-time application with high computing intensity.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"3129-3142"},"PeriodicalIF":3.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reconfigurable Intelligent Surface Assisted UAV-MCS Based on Transformer Enhanced Deep Reinforcement Learning 基于变压器增强深度强化学习的可重构智能表面辅助无人机- mcs

IF 3.8 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-07-02 DOI: 10.1109/TC.2025.3585361

Qianqian Wu;Qiang Liu;Ying He;Zefan Wu

{"title":"Reconfigurable Intelligent Surface Assisted UAV-MCS Based on Transformer Enhanced Deep Reinforcement Learning","authors":"Qianqian Wu;Qiang Liu;Ying He;Zefan Wu","doi":"10.1109/TC.2025.3585361","DOIUrl":"https://doi.org/10.1109/TC.2025.3585361","url":null,"abstract":"Mobile crowd sensing (MCS) is an emerging paradigm that enables participants to collaborate on various sensing tasks. UAVs are increasingly integrated into MCS systems to provide more reliable, accurate and cost-effective sensing services. However, optimizing UAV trajectories and communication efficiency, especially under non-line-of-sight (NLoS) channel conditions, remains a significant challenge. This paper proposes TRAIL, a Transformer-enhanced deep reinforcement Learning (DRL) algorithm. TRAIL aims to jointly optimize UAV trajectories and Reconfigurable Intelligent Surface (RIS) phase shifts to maximize data throughput while minimizing UAV energy consumption. The optimization problem is modeled as a Markov Decision Process (MDP), where the Transformer architecture captures long-term dependencies in UAV trajectories, and these features are input into a Double Deep Q-Network with Prioritized Experience Replay (PER-DDQN) to guide the agent in learning the optimal strategy. Simulation results demonstrate that TRAIL significantly outperforms state-of-the-art methods in both data throughput and energy efficiency.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"3143-3155"},"PeriodicalIF":3.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MDC+: A Cooperative Approach to Memory-Efficient Fork-Based Checkpointing for In-Memory Database Systems MDC+：一种用于内存数据库系统的基于fork的内存高效检查点的协作方法

IF 3.8 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-06-30 DOI: 10.1109/TC.2025.3584520

Cheolgi Min;Jiwoong Park;Heon Young Yeom;Hyungsoo Jung

{"title":"MDC+: A Cooperative Approach to Memory-Efficient Fork-Based Checkpointing for In-Memory Database Systems","authors":"Cheolgi Min;Jiwoong Park;Heon Young Yeom;Hyungsoo Jung","doi":"10.1109/TC.2025.3584520","DOIUrl":"https://doi.org/10.1109/TC.2025.3584520","url":null,"abstract":"Consistent checkpointing is a critical for in-memory databases (IMDBs) but its resource-intensive nature poses challenges for small- and medium-sized deployments in cloud environments, where memory utilization directly affects operational costs. Although traditional fork-based checkpointing offers merits in terms of performance and implementation simplicity, it incurs a considerable rise in memory footprint during checkpointing, particularly under update-intensive workloads. Memory provisioning emerges as a practical remedy to handle peak demands without compromising performance, albeit with potential concerns related to memory over-provisioning. In this article, we propose MDC+, a memory-efficient fork-based checkpointing scheme designed to maintain a reasonable memory footprint during checkpointing by leveraging collaboration among an IMDB, a user-level memory allocator, and the operating system. We explore two key techniques within the checkpointing scheme: (1) memory dump-based checkpointing, which enables early memory release, and (2) hint-based segregated memory allocation, which isolates immutable and updatable data to minimize page duplication. Our evaluation demonstrates that MDC+ significantly lowers peak memory footprint during checkpointing without affecting throughput or checkpointing time.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"3059-3071"},"PeriodicalIF":3.8,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic DPU Offloading and Computational Resource Management in Heterogeneous Systems 异构系统中动态DPU卸载与计算资源管理

IF 3.8 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-06-30 DOI: 10.1109/TC.2025.3584501

Zhaoyang Huang;Yanjie Tan;Yifu Zhu;Huailiang Tan;Keqin Li

{"title":"Dynamic DPU Offloading and Computational Resource Management in Heterogeneous Systems","authors":"Zhaoyang Huang;Yanjie Tan;Yifu Zhu;Huailiang Tan;Keqin Li","doi":"10.1109/TC.2025.3584501","DOIUrl":"https://doi.org/10.1109/TC.2025.3584501","url":null,"abstract":"DPU offloading has emerged as a promising way to enhance data processing efficiency and free up host CPU resources. However, unsuitable offloading may overwhelm the hardware and hurt overall system performance. It is still unclear how to make full use of the shared hardware resources and select optimal execution units for each tenant application. In this paper, we propose DORM, a dynamic DPU offloading and resource management architecture for multi-tenant cloud environments with CPU-DPU heterogeneous platforms. The primary goal of DORM is to minimize host resource consumption and maximize request processing efficiency. By establishing a joint optimization model for offloading decision and resource allocation, we abstract the problem into a mixed integer programming mathematical model. To simplify the complexity of model-solving, we decompose the model into two subproblems: a 0-1 integer programming model for offloading decision-making and a convex optimization problem for fine-grained resource allocation. Besides, DORM presents an orchestrator agent to detect load changes and dynamically adjust the scheduling strategy. Experimental results demonstrate that DORM significantly improves resource efficiency, reducing host CPU core usage by up to 83.3%, increasing per-core throughput by up to 4.61x, and lowering the latency by up to 58.5% compared to baseline systems.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"3046-3058"},"PeriodicalIF":3.8,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BHerd: Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients 通过选择有益的局部梯度群来加速联邦学习

IF 3.8 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-06-27 DOI: 10.1109/TC.2025.3583827

Ping Luo;Xiaoge Deng;Ziqing Wen;Tao Sun;Dongsheng Li

{"title":"BHerd: Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients","authors":"Ping Luo;Xiaoge Deng;Ziqing Wen;Tao Sun;Dongsheng Li","doi":"10.1109/TC.2025.3583827","DOIUrl":"https://doi.org/10.1109/TC.2025.3583827","url":null,"abstract":"In the domain of computer architecture, Federated Learning (FL) is a paradigm of distributed machine learning in edge systems. However, the systems’ Non-Independent and Identically Distributed (Non-IID) data negatively affect the convergence efficiency of the global model, since only a subset of these data samples is beneficial for accelerating model convergence. In pursuit of this subset, a reliable approach involves determining a measure of validity to rank the samples within the dataset. In this paper, we propose the BHerd strategy, which selects a beneficial herd of local gradients to accelerate the convergence of the FL model. Specifically, we map the distribution of the local dataset to the local gradients and use the Herding strategy to obtain a permutation of the set of gradients, where the more advanced gradients in the permutation are closer to the average of the set of gradients. These top portions of the gradients will be selected and sent to the server for global aggregation. We conduct experiments on different datasets, models, and scenarios by building a prototype system, and experimental results demonstrate that our BHerd strategy is effective in selecting beneficial local gradients to mitigate the effects brought by the Non-IID dataset.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"2977-2990"},"PeriodicalIF":3.8,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Energy-Efficient and Privacy-Aware MEC-Enabled IoMT Health Monitoring System 节能和隐私意识mec支持IoMT健康监测系统

IF 3.8 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-06-20 DOI: 10.1109/TC.2025.3576944

Xiaolu Cheng;Xiaoshuang Xing;Wei Li;Hong Xue;Tong Can

{"title":"An Energy-Efficient and Privacy-Aware MEC-Enabled IoMT Health Monitoring System","authors":"Xiaolu Cheng;Xiaoshuang Xing;Wei Li;Hong Xue;Tong Can","doi":"10.1109/TC.2025.3576944","DOIUrl":"https://doi.org/10.1109/TC.2025.3576944","url":null,"abstract":"Advancements in the Internet of Medical Things (IoMT) have made remote patient monitoring increasingly viable. However, challenges persist in safeguarding sensitive data, optimizing resources, and addressing the energy constraints of patient devices. This paper presents a health monitoring framework integrating Mobile Edge Computing (MEC) and sixth-generation (6G) technologies, structured into internal Medical Body Area Networks (int-MBANs) and external communications beyond MBANs (ext-MBANs). For int-MBANs, the proposed OptiBand algorithm optimizes energy consumption, extends device standby time, and considers message timeliness and medical criticality. A key innovation of OptiBand is its incorporation of patient’s device standby time into the resource allocation strategy to address real-world patient needs. For ext-MBANs, the DynaMEC algorithm dynamically balances energy efficiency, privacy protection, latency, and fairness, even under varying patient scales. A latency-aware scheduling mechanism also be introduced to guarantee timely completion of emergency tasks. Theoretical analysis and experimental results confirm the feasibility, convergence, and optimality of both algorithms. These characteristics and advantages of the proposed system make remote patient monitoring through IoMT more feasible and effective.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"2936-2949"},"PeriodicalIF":3.8,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Static Schedules for Fault-Tolerant Transmissions on Shared Media 共享媒体上容错传输的有效静态调度

IF 3.8 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-06-04 DOI: 10.1109/TC.2025.3576908

Scott Sirri;Zhe Wang;Netanel Raviv;Jeremy Fineman;Kunal Agrawal

{"title":"Efficient Static Schedules for Fault-Tolerant Transmissions on Shared Media","authors":"Scott Sirri;Zhe Wang;Netanel Raviv;Jeremy Fineman;Kunal Agrawal","doi":"10.1109/TC.2025.3576908","DOIUrl":"https://doi.org/10.1109/TC.2025.3576908","url":null,"abstract":"Shared communication media are widely used in many applications including safety-critical applications. However, noise and transient errors can cause transmission failures. We consider the problem of designing and minimizing the length of fault-tolerant static schedules for transmitting messages in these media provided the number of errors fall below some upper bound. To transmit n messages in a medium while tolerating a maximum of f faults, prior work had shown how to construct schedules which had a fault tolerance overhead of <inline-formula><tex-math>$nf/2$</tex-math></inline-formula>. In this paper, we provide an efficient constructive algorithm for producing a schedule for n messages with total length <inline-formula><tex-math>$n,+,O(f^{2},mathbf{log}^{2},n)$</tex-math></inline-formula> that can tolerate f medium errors. We also provide an algorithm for randomly generating fault-tolerant schedules with length <inline-formula><tex-math>$n,+,O(f,mathbf{log}(f)mathbf{log}(n))$</tex-math></inline-formula> as well as a technique for quickly verifying these on reasonably small inputs.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"2882-2895"},"PeriodicalIF":3.8,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-Performance In-Memory Bayesian Inference With Multi-Bit Ferroelectric FET 基于多比特铁电场效应管的高性能内存贝叶斯推断

IF 3.8 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-06-04 DOI: 10.1109/TC.2025.3576941

Chao Li;Xuchu Huang;Zhicheng Xu;Bo Wen;Ruibin Mao;Min Zhou;Thomas Kämpfe;Kai Ni;Can Li;Xunzhao Yin;Cheng Zhuo

{"title":"High-Performance In-Memory Bayesian Inference With Multi-Bit Ferroelectric FET","authors":"Chao Li;Xuchu Huang;Zhicheng Xu;Bo Wen;Ruibin Mao;Min Zhou;Thomas Kämpfe;Kai Ni;Can Li;Xunzhao Yin;Cheng Zhuo","doi":"10.1109/TC.2025.3576941","DOIUrl":"https://doi.org/10.1109/TC.2025.3576941","url":null,"abstract":"Conventional neural network-based machine learning algorithms often encounter difficulties in data-limited scenarios or where interpretability is critical. Conversely, Bayesian inference-based models excel with reliable uncertainty estimates and explainable predictions. Recently, many in-memory computing (IMC) architectures achieve exceptional computing capacity and efficiency for neural network tasks leveraging emerging non-volatile memory (NVM) technologies. However, their application in Bayesian inference remains limited because the operations in Bayesian inference differ substantially from those in neural networks. In this article, we introduce a compact in-memory Bayesian inference engine with high efficiency and performance utilizing a multi-bit ferroelectric field-effect transistor (FeFET). This design encodes a Bayesian model within a compact FeFET-based crossbar by mapping quantized probabilities to discrete FeFET states. Consequently, the crossbar’s outputs naturally represent the output posteriors of the Bayesian model. Our design facilitates efficient Bayesian inference, accommodating various input types and probability precisions, without additional calculation circuitry. As the first FeFET-based in-memory Bayesian inference engine, our design demonstrates a notable storage density of 26.32 Mb/mm2 and a computing efficiency of 581.40 TOPS/W in a representative Bayesian classification task, indicating a 10.7×/43.4× compactness/efficiency improvement compared to the state-of-the-art alternative. Utilizing the proposed Bayesian inference engine, we develop a feature selection system that efficiently addresses a representative NP-hard optimization problem, showcasing our design’s capability and potential to enhance various Bayesian inference-based applications. Test results suggest that our design identifies the essential features, enhancing the model’s performance while reducing its complexity, surpassing the latest implementation in operation speed and algorithm efficiency by 2.9×/2.0×, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"2923-2935"},"PeriodicalIF":3.8,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SAL-PIM: A Subarray-Level Processing-in-Memory Architecture With LUT-Based Linear Interpolation for Transformer-Based Text Generation 基于lut的线性插值的子数组级内存处理体系结构，用于基于转换的文本生成

IF 3.8 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-06-04 DOI: 10.1109/TC.2025.3576935

Wontak Han;Hyunjun Cho;Donghyuk Kim;Joo-Young Kim

{"title":"SAL-PIM: A Subarray-Level Processing-in-Memory Architecture With LUT-Based Linear Interpolation for Transformer-Based Text Generation","authors":"Wontak Han;Hyunjun Cho;Donghyuk Kim;Joo-Young Kim","doi":"10.1109/TC.2025.3576935","DOIUrl":"https://doi.org/10.1109/TC.2025.3576935","url":null,"abstract":"Text generation is a compelling sub-field of natural language processing, aiming to generate human-readable text from input words. Although many deep learning models have been proposed, the recent emergence of transformer-based large language models advances its academic research and industry development, showing remarkable qualitative results in text generation. In particular, the decoder-only generative models, such as generative pre-trained transformer (GPT), are widely used for text generation, with two major computational stages: summarization and generation. Unlike the summarization stage, which can process the input tokens in parallel, the generation stage is difficult to accelerate due to its sequential generation of output tokens through iteration. Moreover, each iteration requires reading a whole model with little data reuse opportunity. Therefore, the workload of transformer-based text generation is severely memory-bound, making the external memory bandwidth system bottleneck. In this paper, we propose a subarray-level processing-in-memory (PIM) architecture named SAL-PIM, the first HBM-based PIM architecture for the end-to-end acceleration of transformer-based text generation. With optimized data mapping schemes for different operations, SAL-PIM utilizes higher internal bandwidth by integrating multiple subarray-level arithmetic logic units (S-ALUs) next to memory subarrays. To minimize the area overhead for S-ALU, it uses shared MACs leveraging slow clock frequency of commands for the same bank. In addition, a few subarrays in the bank are used as look-up tables (LUTs) to handle non-linear functions in PIM, supporting multiple addressing to select sections for linear interpolation. Lastly, the channel-level arithmetic logic unit (C-ALU) is added in the buffer die of HBM to perform the accumulation and reduce-sum operations of data across multiple banks, completing end-to-end inference on PIM. To validate the SAL-PIM architecture, we built a cycle-accurate simulator based on Ramulator. We also implemented the SAL-PIM’s logic units in 28-nm CMOS technology and scaled the results to DRAM technology to verify its feasibility. We measured the end-to-end latency of SAL-PIM when it runs various text generation workloads on the GPT-2 medium model (with 345 million parameters), in which the input and output token numbers vary from 32 to 128 and from 1 to 256, respectively. As a result, with 4.81% area overhead, SAL-PIM achieves up to 4.72× speedup (1.83× on average) over the Nvidia Titan RTX GPU running Faster Transformer Framework.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"2909-2922"},"PeriodicalIF":3.8,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Serving MoE Models on Resource-Constrained Edge Devices via Dynamic Expert Swapping 基于动态专家交换的资源受限边缘设备MoE模型服务

IF 3.6 2区计算机科学

IEEE Transactions on Computers Pub Date : 2025-06-03 DOI: 10.1109/TC.2025.3575905

Rui Kong;Yuanchun Li;Weijun Wang;Linghe Kong;Yunxin Liu

{"title":"Serving MoE Models on Resource-Constrained Edge Devices via Dynamic Expert Swapping","authors":"Rui Kong;Yuanchun Li;Weijun Wang;Linghe Kong;Yunxin Liu","doi":"10.1109/TC.2025.3575905","DOIUrl":"https://doi.org/10.1109/TC.2025.3575905","url":null,"abstract":"Mixture of experts (MoE) is a popular technique in deep learning that improves model capacity with conditionally-activated parallel neural network modules (experts). However, serving MoE models in resource-constrained latency-critical edge scenarios is challenging due to the significantly increased model size and complexity. In this paper, we first analyze the behavior pattern of MoE models in continuous inference scenarios, which leads to three key observations about the expert activations, including temporal locality, exchangeability, and skippable computation. Based on these observations, we introduce PC-MoE, an inference framework for resource-constrained continuous MoE model serving. The core of PC-MoE is a new data structure, <italic>Parameter Committee, that intelligently maintains a subset of important experts in use to reduce resource consumption. To evaluate the effectiveness of PC-MoE, we conduct experiments using state-of-the-art MoE models on common computer vision and natural language processing tasks. The results demonstrate optimal trade-offs between resource consumption and model accuracy achieved by PC-MoE. For instance, on object detection tasks with the Swin-MoE model, our approach can reduce memory usage and latency by 42.34% and 18.63% with only 0.10% accuracy degradation.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 8","pages":"2799-2811"},"PeriodicalIF":3.6,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0