IEEE Transactions on Computers最新文献

筛选
英文 中文
ToEx: Accelerating Generation Stage of Transformer-Based Language Models via Token-Adaptive Early Exit ToEx:通过令牌自适应早期退出加速基于转换器的语言模型的生成阶段
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-21 DOI: 10.1109/TC.2024.3404051
Myeonggu Kang;Junyoung Park;Hyein Shin;Jaekang Shin;Lee-Sup Kim
{"title":"ToEx: Accelerating Generation Stage of Transformer-Based Language Models via Token-Adaptive Early Exit","authors":"Myeonggu Kang;Junyoung Park;Hyein Shin;Jaekang Shin;Lee-Sup Kim","doi":"10.1109/TC.2024.3404051","DOIUrl":"10.1109/TC.2024.3404051","url":null,"abstract":"Transformer-based language models have recently gained popularity in numerous natural language processing (NLP) applications due to their superior performance compared to traditional algorithms. These models involve two execution stages: summarization and generation. The generation stage accounts for a significant portion of the total execution time due to its auto-regressive property, which necessitates considerable and repetitive off-chip accesses. Consequently, our objective is to minimize off-chip accesses during the generation stage to expedite transformer execution. To achieve the goal, we propose a token-adaptive early exit (ToEx) that generates output tokens using fewer decoders, thereby reducing off-chip accesses for loading weight parameters. Although our approach has the potential to minimize data communication, it brings two challenges: 1) inaccurate self-attention computation, and 2) significant overhead for exit decision. To overcome these challenges, we introduce a methodology that facilitates accurate self-attention by lazily performing computations for previously exited tokens. Moreover, we mitigate the overhead of exit decision by incorporating a lightweight output embedding layer. We also present a hardware design to efficiently support the proposed work. Evaluation results demonstrate that our work can reduce the number of decoders by 2.6\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u0000 on average. Accordingly, it achieves 3.2\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u0000 speedup on average compared to transformer execution without our work.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2248-2261"},"PeriodicalIF":3.6,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relieving Write Disturbance for Phase Change Memory With RESET-Aware Data Encoding 利用 RESET 感知数据编码缓解相变存储器的写入干扰
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-21 DOI: 10.1109/TC.2024.3398490
Ronglong Wu;Zhirong Shen;Jianqiang Chen;Chengshuo Zheng;Zhiwei Yang;Jiwu Shu
{"title":"Relieving Write Disturbance for Phase Change Memory With RESET-Aware Data Encoding","authors":"Ronglong Wu;Zhirong Shen;Jianqiang Chen;Chengshuo Zheng;Zhiwei Yang;Jiwu Shu","doi":"10.1109/TC.2024.3398490","DOIUrl":"10.1109/TC.2024.3398490","url":null,"abstract":"The write disturbance (WD) problem is becoming increasingly severe in PCM due to the continuous scaling down of memory technology. Previous studies have attempted to transform WD-vulnerable data patterns of the new data to alleviate the WD problem. However, through a wide spectrum of real-world benchmarks, we have discovered that simply transforming WD-vulnerable data patterns does not proportionally reduce (or may even increase) WD errors. To address this issue, we present \u0000<monospace>ResEnc</monospace>\u0000, a RESET-aware data encoding scheme that reduces RESET operations to mitigate the WD problem in both wordlines and bitlines for PCM. It dynamically establishes a mask word for each block for data encoding and adaptively selects an appropriate encoding granularity based on the diverse write patterns. \u0000<monospace>ResEnc</monospace>\u0000 finally reassigns the mask words of unchanged blocks to changed blocks for exploring a further reduction of WD errors. Extensive experiments show that \u0000<monospace>ResEnc</monospace>\u0000 can reduce 16.8-87.0% of WD errors, shorten 5.6-39.6% of write latency, and save 7.0-43.1% of write energy for PCM.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 8","pages":"1939-1952"},"PeriodicalIF":3.6,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decentralized Task Offloading in Edge Computing: An Offline-to-Online Reinforcement Learning Approach 边缘计算中的分散任务卸载:离线到在线强化学习方法
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-19 DOI: 10.1109/TC.2024.3377912
Hongcai Lin;Lei Yang;Hao Guo;Jiannong Cao
{"title":"Decentralized Task Offloading in Edge Computing: An Offline-to-Online Reinforcement Learning Approach","authors":"Hongcai Lin;Lei Yang;Hao Guo;Jiannong Cao","doi":"10.1109/TC.2024.3377912","DOIUrl":"10.1109/TC.2024.3377912","url":null,"abstract":"Decentralized task offloading among cooperative edge nodes has been a promising solution to enhance resource utilization and improve users’ Quality of Experience (QoE) in edge computing. However, current decentralized methods, such as heuristics and game theory-based methods, either optimize greedily or depend on rigid assumptions, failing to adapt to the dynamic edge environment. Existing DRL-based approaches train the model in a simulation and then apply it in practical systems. These methods may perform poorly because of the divergence between the practical system and the simulated environment. Other methods that train and deploy the model directly in real-world systems face a cold-start problem, which will reduce the users’ QoE before the model converges. This paper proposes a novel offline-to-online DRL called (O2O-DRL). It uses the heuristic task logs to warm-start the DRL model offline. However, offline and online data have different distributions, so using offline methods for online fine-tuning will ruin the policy learned offline. To avoid this problem, we use on-policy DRL to fine-tune the model and prevent value overestimation. We evaluate O2O-DRL with other approaches in a simulation and a Kubernetes-based testbed. The performance results show that O2O-DRL outperforms other methods and solves the cold-start problem.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 6","pages":"1603-1615"},"PeriodicalIF":3.7,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140170080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HPDK: A Hybrid PM-DRAM Key-Value Store for High I/O Throughput HPDK:实现高 I/O 吞吐量的混合 PM-DRAM 键值存储器
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-18 DOI: 10.1109/TC.2024.3377914
Bihui Liu;Zhenyu Ye;Qiao Hu;Yupeng Hu;Yuchong Hu;Yang Xu;Keqin Li
{"title":"HPDK: A Hybrid PM-DRAM Key-Value Store for High I/O Throughput","authors":"Bihui Liu;Zhenyu Ye;Qiao Hu;Yupeng Hu;Yuchong Hu;Yang Xu;Keqin Li","doi":"10.1109/TC.2024.3377914","DOIUrl":"10.1109/TC.2024.3377914","url":null,"abstract":"This paper explores the design of an architecture that replaces Disk with Persistent Memory (PM) to achieve the highest I/O throughput in Log-Structured Merge Tree (LSM-Tree) based key-value stores (KVS). Most existing LSM-Tree based KVSs use PM as an intermediate or smoothing layer, which fails to fully exploit PM's unique advantages to maximize I/O throughput. However, due to PM's distinct characteristics, such as byte addressability and short erasure time, simply replacing existing storage with PM does not yield optimal I/O performance. Furthermore, LSM-Tree based KVSs often face slow read performance. To tackle these challenges, this paper presents HPDK, a hybrid PM-DRAM KVS that combines level compression for LSM-Trees in PM with a B\u0000<inline-formula><tex-math>${}^{+}$</tex-math></inline-formula>\u0000-tree based in-memory search index in DRAM, resulting in high write and read throughput. HPDK also employs a key-value separation design and a live-item rate-based dynamic merge method to reduce the volume of PM writes. We implement and evaluate HPDK using a real PM drive, and our extensive experiments show that HPDK provides 1.25-11.8 and 1.47-36.4 times higher read and write throughput, respectively, compared to other state-of-the-art LSM-Tree based approaches.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 6","pages":"1575-1587"},"PeriodicalIF":3.7,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140170171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing Big-PERCIVAL:探索科学计算中 64 位正则表达式的本地使用
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-18 DOI: 10.1109/TC.2024.3377890
David Mallasén;Alberto A. Del Barrio;Manuel Prieto-Matias
{"title":"Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing","authors":"David Mallasén;Alberto A. Del Barrio;Manuel Prieto-Matias","doi":"10.1109/TC.2024.3377890","DOIUrl":"10.1109/TC.2024.3377890","url":null,"abstract":"The accuracy requirements in many scientific computing workloads result in the use of double-precision floating-point arithmetic in the execution kernels. Nevertheless, emerging real-number representations, such as posit arithmetic, show promise in delivering even higher accuracy in such computations. In this work, we explore the native use of 64-bit posits in a series of numerical benchmarks and compare their timing performance, accuracy and hardware cost to IEEE 754 doubles. In addition, we also study the conjugate gradient method for numerically solving systems of linear equations in real-world applications. For this, we extend the PERCIVAL RISC-V core and the Xposit custom RISC-V extension with posit64 and quire operations. Results show that posit64 can obtain up to 4 orders of magnitude lower mean square error than doubles. This leads to a reduction in the number of iterations required for convergence in some iterative solvers. However, leveraging the quire accumulator register can limit the order of some operations such as matrix multiplications. Furthermore, detailed FPGA and ASIC synthesis results highlight the significant hardware cost of 64-bit posit arithmetic and quire. Despite this, the large accuracy improvements achieved with the same memory bandwidth suggest that posit arithmetic may provide a potential alternative representation for scientific computing.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 6","pages":"1472-1485"},"PeriodicalIF":3.7,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10473215","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dynamic Adaptive Framework for Practical Byzantine Fault Tolerance Consensus Protocol in the Internet of Things 物联网中实用拜占庭容错共识协议的动态自适应框架
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-18 DOI: 10.1109/TC.2024.3377921
Chunpei Li;Wangjie Qiu;Xianxian Li;Chen Liu;Zhiming Zheng
{"title":"A Dynamic Adaptive Framework for Practical Byzantine Fault Tolerance Consensus Protocol in the Internet of Things","authors":"Chunpei Li;Wangjie Qiu;Xianxian Li;Chen Liu;Zhiming Zheng","doi":"10.1109/TC.2024.3377921","DOIUrl":"10.1109/TC.2024.3377921","url":null,"abstract":"The Practical Byzantine Fault Tolerance (PBFT) protocol-supported blockchain can provide decentralized security and trust mechanisms for the Internet of Things (IoT). However, the PBFT protocol is not specifically designed for IoT applications. Consequently, adapting PBFT to the dynamic changes of an IoT environment with incomplete information represents a challenge that urgently needs to be addressed. To this end, we introduce DA-PBFT, a PBFT dynamic adaptive framework based on a multi-agent architecture. DA-PBFT divides the dynamic adaptive process into two sub-processes: optimality-seeking and optimization decision-making. During the optimality-seeking process, a PBFT optimization model is constructed based on deep reinforcement learning. This model is designed to generate PBFT optimization strategies for consensus nodes. In the optimization decision-making process, a PBFT optimization decision consensus mechanism is constructed based on the Borda count method. This mechanism ensures consistency in PBFT optimization decisions within an environment characterized by incomplete information. Furthermore, we designed a dynamic adaptive incentive mechanism to explore the Nash equilibrium conditions and security aspects of DA-PBFT. The experimental results demonstrate that DA-PBFT is capable of achieving consistency in PBFT optimization decisions within an environment of incomplete information, thereby offering robust and efficient transaction throughput for IoT applications.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1669-1682"},"PeriodicalIF":3.7,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Integrated FPGA Accelerator for Deep Learning-Based 2D/3D Path Planning 基于深度学习的 2D/3D 路径规划的 FPGA 集成加速器
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-18 DOI: 10.1109/TC.2024.3377895
Keisuke Sugiura;Hiroki Matsutani
{"title":"An Integrated FPGA Accelerator for Deep Learning-Based 2D/3D Path Planning","authors":"Keisuke Sugiura;Hiroki Matsutani","doi":"10.1109/TC.2024.3377895","DOIUrl":"10.1109/TC.2024.3377895","url":null,"abstract":"Path planning is a crucial component for realizing the autonomy of mobile robots. However, due to limited computational resources on mobile robots, it remains challenging to deploy state-of-the-art methods and achieve real-time performance. To address this, we propose P3Net (PointNet-based Path Planning Networks), a lightweight deep-learning-based method for 2D/3D path planning, and design an IP core (P3NetCore) targeting FPGA SoCs (Xilinx ZCU104). P3Net improves the algorithm and model architecture of the recently-proposed MPNet. P3Net employs an encoder with a PointNet backbone and a lightweight planning network in order to extract robust point cloud features and sample path points from a promising region. P3NetCore is comprised of the fully-pipelined point cloud encoder, batched bidirectional path planner, and parallel collision checker, to cover most part of the algorithm. On the 2D (3D) datasets, P3Net with the IP core runs 30.52–186.36x and 7.68–143.62x (15.69–93.26x and 5.30–45.27x) faster than ARM Cortex CPU and Nvidia Jetson while only consuming 0.255W (0.809W), and is up to 1278.14x (455.34x) power-efficient than the workstation. P3Net improves the success rate by up to 28.2% and plans a near-optimal path, leading to a significantly better tradeoff between computation and solution quality than MPNet and the state-of-the-art sampling-based methods.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 6","pages":"1442-1456"},"PeriodicalIF":3.7,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474486","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reordering and Compression for Hypergraph Processing 超图处理的重新排序与压缩
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-18 DOI: 10.1109/TC.2024.3377915
Yu Liu;Qi Luo;Mengbai Xiao;Dongxiao Yu;Huashan Chen;Xiuzhen Cheng
{"title":"Reordering and Compression for Hypergraph Processing","authors":"Yu Liu;Qi Luo;Mengbai Xiao;Dongxiao Yu;Huashan Chen;Xiuzhen Cheng","doi":"10.1109/TC.2024.3377915","DOIUrl":"10.1109/TC.2024.3377915","url":null,"abstract":"Hypergraphs are applicable to various domains such as social contagion, online groups, and protein structures due to their effective modeling of multivariate relationships. However, the increasing size of hypergraphs has led to high computation costs, necessitating efficient acceleration strategies. Existing approaches often require consideration of algorithm-specific issues, making them difficult to directly apply to arbitrary hypergraph processing tasks. In this paper, we propose a compression-array acceleration strategy involving hypergraph reordering to improve memory access efficiency, which can be applied to various hypergraph processing tasks without considering the algorithm itself. We introduce a new metric called closeness to optimize the ordering of vertices and hyperedges in the one-dimensional array representation. Moreover, we present an \u0000<inline-formula><tex-math>$frac{1}{2w}$</tex-math></inline-formula>\u0000-approximation algorithm to obtain the optimal ordering of vertices and hyperedges. We also develop an efficient update mechanism for dynamic hypergraphs. Our extensive experiments demonstrate significant improvements in hypergraph processing performance, reduced cache misses, and reduced memory footprint. Furthermore, our method can be integrated into existing hypergraph processing frameworks, such as Hygra, to enhance their performance.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 6","pages":"1486-1499"},"PeriodicalIF":3.7,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prefender: A Prefetching Defender Against Cache Side Channel Attacks as a Pretender Prefender:作为伪装者抵御缓存侧通道攻击的预取防御器
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-18 DOI: 10.1109/TC.2024.3377891
Luyi Li;Jiayi Huang;Lang Feng;Zhongfeng Wang
{"title":"Prefender: A Prefetching Defender Against Cache Side Channel Attacks as a Pretender","authors":"Luyi Li;Jiayi Huang;Lang Feng;Zhongfeng Wang","doi":"10.1109/TC.2024.3377891","DOIUrl":"10.1109/TC.2024.3377891","url":null,"abstract":"Cache side channel attacks are increasingly alarming in modern processors due to the recent emergence of Spectre and Meltdown attacks. A typical attack performs intentional cache access and manipulates cache states to leak secrets by observing the victim's cache access patterns. Different countermeasures have been proposed to defend against both general and transient execution based attacks. Despite their effectiveness, they mostly trade some level of performance for security, or have restricted security scope. In this paper, we seek an approach to enforcing security while maintaining performance. We leverage the insight that attackers need to access cache in order to manipulate and observe cache state changes for information leakage. Specifically, we propose \u0000<sc>Prefender</small>\u0000, a secure prefetcher that learns and predicts attack-related accesses for prefetching the cachelines to simultaneously help security and performance. Our results show that \u0000<sc>Prefender</small>\u0000 is effective against several cache side channel attacks while maintaining or even improving performance for SPEC CPU 2006 and 2017 benchmarks.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 6","pages":"1457-1471"},"PeriodicalIF":3.7,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ElasticDNN: On-Device Neural Network Remodeling for Adapting Evolving Vision Domains at Edge ElasticDNN:在设备上重塑神经网络,以适应边缘不断变化的视觉领域
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-03-14 DOI: 10.1109/TC.2024.3375608
Qinglong Zhang;Rui Han;Chi Harold Liu;Guoren Wang;Lydia Y. Chen
{"title":"ElasticDNN: On-Device Neural Network Remodeling for Adapting Evolving Vision Domains at Edge","authors":"Qinglong Zhang;Rui Han;Chi Harold Liu;Guoren Wang;Lydia Y. Chen","doi":"10.1109/TC.2024.3375608","DOIUrl":"10.1109/TC.2024.3375608","url":null,"abstract":"Executing deep neural networks (DNN) based vision tasks on edge devices encounters challenging scenarios of significant and continually evolving data domains (e.g. background or subpopulation shift). With limited resources, the state-of-the-art domain adaptation (DA) methods either cause high training overheads on large DNN models, or incur significant accuracy losses when adapting small/compressed models in an online fashion. The inefficient resource scheduling among multiple applications further degrades their overall model accuracy. In this paper, we present ElasticDNN, a framework that enables online DNN remodeling for applications encountering evolving domain drifts at edge. Its first key component is the master-surrogate DNN models, which can dynamically generate a small surrogate DNN by retaining and training the large master DNN's most relevant regions pertinent to the new domain. The second novelty of ElasticDNN is the filter-grained resource scheduling, which allocates GPU resources based on online accuracy estimation and DNN remodeling of co-running applications. We fully implement ElasticDNN and demonstrate its effectiveness through extensive experiments. The results show that, compared to existing online DA methods using the same model sizes, ElasticDNN improves accuracy by 23.31% and reduces adaption time by 35.67x. In the more challenging multi-application scenario, ElasticDNN improves accuracy by an average of 25.91%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 6","pages":"1616-1630"},"PeriodicalIF":3.7,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140155045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信