IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献

筛选
英文 中文
NDPGNN: A Near-Data Processing Architecture for GNN Training and Inference Acceleration NDPGNN:用于 GNN 训练和推理加速的近数据处理架构
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3446871
Haoyang Wang;Shengbing Zhang;Xiaoya Fan;Zhao Yang;Meng Zhang
{"title":"NDPGNN: A Near-Data Processing Architecture for GNN Training and Inference Acceleration","authors":"Haoyang Wang;Shengbing Zhang;Xiaoya Fan;Zhao Yang;Meng Zhang","doi":"10.1109/TCAD.2024.3446871","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3446871","url":null,"abstract":"Graph neural networks (GNNs) require a large number of fine-grained memory accesses, which results in inefficient use of bandwidth resources. In this article, we introduce a near-data processing architecture tailored for GNN acceleration, named NDPGNN. NDPGNN provides different operating modes to meet the acceleration needs of various GNN frameworks while ensuring the configurability and scalability of the system. NDPGNN takes advantage of data locality characteristics to repeatedly distribute and utilize data, thereby reducing memory access requirements, and further improving memory access efficiency by combining a subgraph sparse node scheduling strategy with intermediate result reuse. We use data packaging to provide a higher effective data ratio for long-distance data transmission, thereby improving the utilization of the system’s limited bandwidth resources. Compared with the previous method, NDPGNN brings 5.68 times improvement in system performance while reducing energy consumption overhead by 8.49 times.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3997-4008"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Security and Efficiency: System-Informed Mitigation of Power-Based Covert Channels 平衡安全与效率:基于系统的功率型隐蔽信道缓解措施
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3438999
Jeferson González-Gómez;Mohammed Bakr Sikal;Heba Khdr;Lars Bauer;Jörg Henkel
{"title":"Balancing Security and Efficiency: System-Informed Mitigation of Power-Based Covert Channels","authors":"Jeferson González-Gómez;Mohammed Bakr Sikal;Heba Khdr;Lars Bauer;Jörg Henkel","doi":"10.1109/TCAD.2024.3438999","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3438999","url":null,"abstract":"As the digital landscape continues to evolve, the security of computing systems has become a critical concern. Power-based covert channels (e.g., thermal covert channel s (TCCs)), a form of communication that exploits the system resources to transmit information in a hidden or unintended manner, have been recently studied as an effective mechanism to leak information between malicious entities via the modulation of CPU power. To this end, dynamic voltage and frequency scaling (DVFS) has been widely used as a countermeasure to mitigate TCCs by directly affecting the communication between the actors. Although this technique has proven effective in neutralizing such attacks, it introduces significant performance and energy penalties, that are particularly detrimental to energy-constrained embedded systems. In this article, we propose different system-informed countermeasures to power-based covert channels from the heuristic and machine learning (ML) domains. Our proposed techniques leverage task migration and DVFS to jointly mitigate the channels and maximize energy efficiency. Our extensive experimental evaluation on two commercial platforms: 1) the NVIDIA Jetson TX2 and 2) Jetson Orin shows that our approach significantly improves the overall energy efficiency of the system compared to the state-of-the-art solution while nullifying the attack at all times.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3395-3406"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware and Software Co-Design for Optimized Decoding Schemes and Application Mapping in NVM Compute-in-Memory Architectures 优化 NVM 内存计算架构中的解码方案和应用映射的软硬件协同设计
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3447216
Shanmukha Mangadahalli Siddaramu;Ali Nezhadi;Mahta Mayahinia;Seyedehmaryam Ghasemi;Mehdi B. Tahoori
{"title":"Hardware and Software Co-Design for Optimized Decoding Schemes and Application Mapping in NVM Compute-in-Memory Architectures","authors":"Shanmukha Mangadahalli Siddaramu;Ali Nezhadi;Mahta Mayahinia;Seyedehmaryam Ghasemi;Mehdi B. Tahoori","doi":"10.1109/TCAD.2024.3447216","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447216","url":null,"abstract":"The computation-in nonvolatile memory (NVM-CiM) approach addresses the growing computational demands and the memory-wall problem faced by traditional processor-centric architectures. Computation-in-memory (CiM) capitalizes on the parallel nature of memory arrays enabling effective computation through multirow memristor reading and sensing. In this context, the conventional design of memory decoders needs to be accordingly modified for efficient multirow activation and parallel data processing. This article presents the design and optimization of address decoders for NVM-CiM system architectures, employing a cross-layer co-optimization approach that integrates circuit and architecture design with application requirements. Our methodology starts at the circuit level, examining various decoder designs, including cascaded, hierarchical, latched, and hybrid models. An in-depth application-level characterization follows, utilizing an extended NVM-CiM-capable gem5 simulator to assess the impact of these decoders on the mapping of CiM-friendly applications and the resulting system performance, particularly in facilitating rapid and efficient activation of multirow memory configurations. This holistic analysis allows us to identify the bottlenecks and requirements from the application side and adjust the design of the decoder accordingly. Our analysis reveals that Hybrid Decoders significantly decrease latency and power consumption compared to other decoder designs within NVM-CiM systems. This highlights the crucial role of the decoder’s row selection flexibility, reducing additional system-level data movement even at the expense of its performance, can substantially improve the overall efficiency of NVM-CiM systems.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3744-3755"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ghostbuster: A Software Approach for Reducing Ghosting Effect on Electrophoretic Displays 幽灵克星减少电泳显示屏鬼影效应的软件方法
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3446711
Tao Hu;Menglong Cui;Mingsong Lv;Tao Yang;Yiyang Zhou;Qingxu Deng;Chun Jason Xue;Nan Guan
{"title":"Ghostbuster: A Software Approach for Reducing Ghosting Effect on Electrophoretic Displays","authors":"Tao Hu;Menglong Cui;Mingsong Lv;Tao Yang;Yiyang Zhou;Qingxu Deng;Chun Jason Xue;Nan Guan","doi":"10.1109/TCAD.2024.3446711","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3446711","url":null,"abstract":"Electrophoretic displays (EPDs), also known as e-paper, offer a paper-like visual experience by reflecting ambient light, making them distinct from traditional LCD or LED displays. They are favored for their eye comfort, energy efficiency, and material flexibility, which make them appealing for a wide range of embedded devices, including eReaders, smartphones, tablets, and wearables. However, EPDs face a significant challenge: the necessity for a fast refresh rate (to maintain an acceptable display performance) introduces a pronounced ghosting effect. This effect results in noticeable color discrepancies between the displayed and source images, harming the user experience and hindering EPDs’ broader application in devices requiring dynamic content display. This article proposes a software-based solution to address the ghosting issue in EPDs. Our approach involves developing analytical models to predict the occurrence of ghosting effects and adjusting the source images to counteract the anticipated color deviations, which can reduce the perceivable ghosts on the display. Experimental evaluation conducted on real-world EPDs validates the effectiveness of our proposed approach in reducing the ghosting effect.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3780-3791"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Performance Remote Data Persisting for Key-Value Stores via Persistent Memory Region 通过持久内存区域实现键值存储的高性能远程数据持久化
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3442992
Yongping Luo;Peiquan Jin;Xiaoliang Wang;Zhaole Chu;Kuankuan Guo;Jinhui Guo
{"title":"High-Performance Remote Data Persisting for Key-Value Stores via Persistent Memory Region","authors":"Yongping Luo;Peiquan Jin;Xiaoliang Wang;Zhaole Chu;Kuankuan Guo;Jinhui Guo","doi":"10.1109/TCAD.2024.3442992","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3442992","url":null,"abstract":"Key-value stores (KVStores), such as LevelDB and Redis, have been widely used in real-world production environments. To guarantee data durability and availability, traditional KVStores suffer from high write latency, mainly caused by the long network and data-persisting time. To solve this problem, this article presents a novel data-persisting path for KVStores, allowing remote clients to persist data to the KVStore server with \u0000<inline-formula> <tex-math>$mu s$ </tex-math></inline-formula>\u0000-level latency. The novelty of this study is threefold. First, we propose PMRDirect, which utilizes a persistent memory region (PMR) in the NVM express standard to construct a direct data-persisting path from the RDMA networking card (NIC) to the PMR region inside an SSD. Second, to showcase PMRDirect in KVStores, we developed a new accessing stack called PMRAccess, enabling remote clients to access existing KVStores and providing durability for each write request. Specifically, we present a low-latency RDMA-based messaging mode and a chunk-based PMR management in PMRAccess to reduce write latency and improve system throughput. Finally, we conducted extensive experiments to evaluate the performance of our proposals. We first compared PMRDirect with a few remote data-persisting paths to show its effectiveness. Then, we evaluated PMRAccess upon two KVStores, including LibCuckoo (an in-memory KVStore) and LevelDB (an in-storage KVStore). The results showed that PMRAccess outperformed the SSD-based accessing stack by up to \u0000<inline-formula> <tex-math>$6.1times $ </tex-math></inline-formula>\u0000 in write throughput and \u0000<inline-formula> <tex-math>$36times $ </tex-math></inline-formula>\u0000 in write tail latency, and it achieved \u0000<inline-formula> <tex-math>$1.7times $ </tex-math></inline-formula>\u0000 higher write throughput and \u0000<inline-formula> <tex-math>$0.59times $ </tex-math></inline-formula>\u0000 lower write tail latency over the PMEM-based accessing stack. Further, we conducted a system-to-system comparison between the PMRAccess-integrated LibCuckoo and Redis, and the results showed our proposal achieved up to \u0000<inline-formula> <tex-math>$13times $ </tex-math></inline-formula>\u0000 higher throughputs and \u0000<inline-formula> <tex-math>$40times $ </tex-math></inline-formula>\u0000 lower write latency than Redis.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3828-3839"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TPE-Det: A Tamper-Proof External Detector via Hardware Traces Analysis Against IoT Malware TPE-Det:通过硬件痕迹分析对抗物联网恶意软件的防篡改外部探测器
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3444712
Ziming Zhao;Zhaoxuan Li;Tingting Li;Fan Zhang
{"title":"TPE-Det: A Tamper-Proof External Detector via Hardware Traces Analysis Against IoT Malware","authors":"Ziming Zhao;Zhaoxuan Li;Tingting Li;Fan Zhang","doi":"10.1109/TCAD.2024.3444712","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3444712","url":null,"abstract":"With the widespread use of Internet of Things (IoT) devices, malware detection has become a hot spot for both academic and industrial communities. A series of solutions based on system calls, system logs, or hardware performance counters achieve promising results. However, such internal monitors are easily tampered with, especially against adaptive adversaries. In addition, existing system log records typically exhibit substantial volume, resulting in data explosion problems. In this article, we present TPE-Det, a side-channel-based external monitor to cope with these issues. Specifically, TPE-Det leverages the serial peripheral interface bus to extract the on-chip traces and designs a recovery pipeline for operating logs. The advantages of this external monitor are adversary-unperceived and tamper-proof. The restored logs mainly include file operation commands, which are lightweight compared to complete records. Meanwhile, we deploy a series of machine learning models with respect to statistical, sequence, and graph features to identify malware. Empirical evaluation shows that our proposal has tamper-proof capability, high-detection accuracy, and low-time/space overhead compared to state-of-the-art methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3455-3466"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMI: Energy Management Meets Imputation in Wearable IoT Devices EMI:可穿戴物联网设备中的能量管理与推算
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3448379
Dina Hussein;Nuzhat Yamin;Ganapati Bhat
{"title":"EMI: Energy Management Meets Imputation in Wearable IoT Devices","authors":"Dina Hussein;Nuzhat Yamin;Ganapati Bhat","doi":"10.1109/TCAD.2024.3448379","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3448379","url":null,"abstract":"Wearable and Internet of Things (IoT) devices are becoming popular in several applications, such as health monitoring, wide area sensing, and digital agriculture. These devices are energy-constrained due to limited battery capacities. As such, IoT devices harvest energy from the environment and manage it to prolong operation of the system. Stochastic nature of ambient energy, coupled with small battery sizes may lead to insufficient energy for obtaining data from all sensors. As a result, sensors either have to be duty cycled or subsampled to meet the energy budget. However, machine learning (ML) models for these applications are typically trained with the assumption that data from all sensors are available, leading to loss in accuracy. To overcome this, we propose a novel approach that combines data imputation with energy management (EM). Data imputation aims to substitute missing data with appropriate values so that complete sensor data are available for application processing, while EM makes energy budget decisions on the devices. We use the energy budget to obtain complete data from as many sensors as possible and turn off other sensors instead of duty cycling all sensors. Then, we use a low-overhead imputation technique for unavailable sensors and use them in ML models. Evaluations with six diverse datasets show that the proposed EM with imputation approach achieves 25%–55% higher accuracy when compared to duty cycling or subsampling without using additional energy.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3792-3803"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MaskedHLS: Domain-Specific High-Level Synthesis of Masked Cryptographic Designs MaskedHLS:针对特定领域的屏蔽密码设计高层合成
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3447223
Nilotpola Sarma;Anuj Singh Thakur;Chandan Karfa
{"title":"MaskedHLS: Domain-Specific High-Level Synthesis of Masked Cryptographic Designs","authors":"Nilotpola Sarma;Anuj Singh Thakur;Chandan Karfa","doi":"10.1109/TCAD.2024.3447223","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447223","url":null,"abstract":"The design and synthesis of masked cryptographic hardware implementations that are secure against power side-channel attacks (PSCAs) in the presence of glitches is a challenging task. High-level synthesis (HLS) is a promising technique for generating masked hardware directly from masked software, offering opportunities for design space exploration. However, conventional HLS tools make modifications that alter the guarantee against PSCA security via masking, resulting in an insecure register transfer level (RTL). Moreover, existing HLS tools cannot place registers at designated places and balance parallel paths in a masked cryptographic design. This is necessary to stop the propagation glitches that may hamper PSCA-security. This article introduces a domain-specific HLS tool tailored to obtain a PSCA secure masked hardware implementation directly from a masked software implementation. This tool places registers at specific locations required by the glitch-robust masking gadgets, resulting in a secure RTL. Furthermore, it automatically balances parallel paths and facilitates a reduction in latency while preserving the PSCA security guaranteed by masking. Experimental results with the PRESENT Cipher’s S-box and AES Canright’s S-box masked with four state-of-the-art gadgets, show that MaskedHLS produces RTLs with 73.9% decrease in registers and 45.7% decrease in latency on an average compared to manual register insertions. The PSCA security of MaskedHLS generated RTLs is also shown with TVLA test.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3973-3984"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain-Adaptive Online Active Learning for Real-Time Intelligent Video Analytics on Edge Devices 面向边缘设备实时智能视频分析的领域自适应在线主动学习
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3453188
Michele Boldo;Mirco De Marchi;Enrico Martini;Stefano Aldegheri;Nicola Bombieri
{"title":"Domain-Adaptive Online Active Learning for Real-Time Intelligent Video Analytics on Edge Devices","authors":"Michele Boldo;Mirco De Marchi;Enrico Martini;Stefano Aldegheri;Nicola Bombieri","doi":"10.1109/TCAD.2024.3453188","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3453188","url":null,"abstract":"Deep learning (DL) for intelligent video analytics is increasingly pervasive in various application domains, ranging from Healthcare to Industry 5.0. A significant trend involves deploying DL models on edge devices with limited resources. Techniques, such as pruning, quantization, and early exit, have demonstrated the feasibility of real-time inference at the edge by compressing and optimizing deep neural networks (DNNs). However, adapting pretrained models to new and dynamic scenarios remains a significant challenge. While solutions like domain adaptation, active learning (AL), and teacher-student knowledge distillation (KD) contribute to addressing this challenge, they often rely on cloud or well-equipped computing platforms for fine tuning. In this study, we propose a framework for domain-adaptive online AL of DNN models tailored for intelligent video analytics on resource-constrained devices. Our framework employs a KD approach where both teacher and student models are deployed on the edge device. To determine when to retrain the student DNN model without ground-truth or cloud-based teacher inference, our model utilizes singular value decomposition of input data. It implements the identification of key data frames and efficient retraining of the student through the teacher execution at the edge, aiming to prevent model overfitting. We evaluate the framework through two case studies: 1) human pose estimation and 2) car object detection, both implemented on an NVIDIA Jetson NX device.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4105-4116"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10745828","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NebulaFL: Self-Organizing Efficient Multilayer Federated Learning Framework With Adaptive Load Tuning in Heterogeneous Edge Systems NebulaFL:异构边缘系统中具有自适应负载调整功能的自组织高效多层联盟学习框架
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443715
Zirui Lian;Jing Cao;Qianyue Cao;Weihong Liu;Zongwei Zhu;Xuehai Zhou
{"title":"NebulaFL: Self-Organizing Efficient Multilayer Federated Learning Framework With Adaptive Load Tuning in Heterogeneous Edge Systems","authors":"Zirui Lian;Jing Cao;Qianyue Cao;Weihong Liu;Zongwei Zhu;Xuehai Zhou","doi":"10.1109/TCAD.2024.3443715","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443715","url":null,"abstract":"As a promising edge intelligence technology, federated learning (FL) enables Internet of Things (IoT) devices to train the models collaboratively while ensuring the data privacy and security. Recently, hierarchical FL (HFL) has been designed to promote distributed training in the intricate hierarchical structure of IoT. However, the coarse-grained hierarchical schemes usually fail to thoroughly adapt to the hierarchical environment, leading to high training latency. Meanwhile, highly heterogeneous communication and computation delays due to the device diversity (the system heterogeneity) and decentralized data distribution due to the decentralized device distribution (the data heterogeneity) exacerbate the above challenges. This article proposes NebulaFL, a dual heterogeneity-aware multilayer FL framework, to support efficient distributed training in IoT scenarios. NebulaFL proposes an innovative multilayer architecture organization scheme to adapt the complex hierarchical heterogeneous scenarios. Specifically, through a finer-grained division of the HFL hierarchy, hybrid synchronous-asynchronous training is implemented at both the global system and local device-layer levels. More importantly, to adaptively build a heterogeneity-aware hierarchical training architecture, NebulaFL considers the effect of dual heterogeneity in the architectural organization scheme to determine the optimal location of devices in a multilayer environment. To further improve the training efficiency during the training process, NebulaFL employs an augmented multiarmed bandit technique based on the reinforcement learning to adjust the device-layer training load by evaluating the dynamic training utility and convergence uncertainty feedback. Experiments demonstrate that NebulaFL achieves up to a \u0000<inline-formula> <tex-math>$15.68times $ </tex-math></inline-formula>\u0000 speed-up ratio and a 23.94% increase in the training accuracy compared to the latest or classic approaches.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3358-3369"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信