arXiv - CS - Distributed, Parallel, and Cluster Computing最新文献_第3页

DiReDi: Distillation and Reverse Distillation for AIoT Applications DiReDi：面向 AIoT 应用的蒸馏和反向蒸馏

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-12 DOI: arxiv-2409.08308

Chen Sun, Qing Tong, Wenshuang Yang, Wenqi Zhang

{"title":"DiReDi: Distillation and Reverse Distillation for AIoT Applications","authors":"Chen Sun, Qing Tong, Wenshuang Yang, Wenqi Zhang","doi":"arxiv-2409.08308","DOIUrl":"https://doi.org/arxiv-2409.08308","url":null,"abstract":"Typically, the significant efficiency can be achieved by deploying different\u0000edge AI models in various real world scenarios while a few large models manage\u0000those edge AI models remotely from cloud servers. However, customizing edge AI\u0000models for each user's specific application or extending current models to new\u0000application scenarios remains a challenge. Inappropriate local training or fine\u0000tuning of edge AI models by users can lead to model malfunction, potentially\u0000resulting in legal issues for the manufacturer. To address aforementioned\u0000issues, this paper proposes an innovative framework called \"DiReD\", which\u0000involves knowledge DIstillation & REverse DIstillation. In the initial step, an\u0000edge AI model is trained with presumed data and a KD process using the cloud AI\u0000model in the upper management cloud server. This edge AI model is then\u0000dispatched to edge AI devices solely for inference in the user's application\u0000scenario. When the user needs to update the edge AI model to better fit the\u0000actual scenario, the reverse distillation (RD) process is employed to extract\u0000the knowledge: the difference between user preferences and the manufacturer's\u0000presumptions from the edge AI model using the user's exclusive data. Only the\u0000extracted knowledge is reported back to the upper management cloud server to\u0000update the cloud AI model, thus protecting user privacy by not using any\u0000exclusive data. The updated cloud AI can then update the edge AI model with the\u0000extended knowledge. Simulation results demonstrate that the proposed \"DiReDi\"\u0000framework allows the manufacturer to update the user model by learning new\u0000knowledge from the user's actual scenario with private data. The initial\u0000redundant knowledge is reduced since the retraining emphasizes user private\u0000data.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DFDG: Data-Free Dual-Generator Adversarial Distillation for One-Shot Federated Learning DFDG：用于单次联合学习的无数据双生成器逆向精馏法

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-12 DOI: arxiv-2409.07734

Kangyang Luo, Shuai Wang, Yexuan Fu, Renrong Shao, Xiang Li, Yunshi Lan, Ming Gao, Jinlong Shu

{"title":"DFDG: Data-Free Dual-Generator Adversarial Distillation for One-Shot Federated Learning","authors":"Kangyang Luo, Shuai Wang, Yexuan Fu, Renrong Shao, Xiang Li, Yunshi Lan, Ming Gao, Jinlong Shu","doi":"arxiv-2409.07734","DOIUrl":"https://doi.org/arxiv-2409.07734","url":null,"abstract":"Federated Learning (FL) is a distributed machine learning scheme in which\u0000clients jointly participate in the collaborative training of a global model by\u0000sharing model information rather than their private datasets. In light of\u0000concerns associated with communication and privacy, one-shot FL with a single\u0000communication round has emerged as a de facto promising solution. However,\u0000existing one-shot FL methods either require public datasets, focus on model\u0000homogeneous settings, or distill limited knowledge from local models, making it\u0000difficult or even impractical to train a robust global model. To address these\u0000limitations, we propose a new data-free dual-generator adversarial distillation\u0000method (namely DFDG) for one-shot FL, which can explore a broader local models'\u0000training space via training dual generators. DFDG is executed in an adversarial\u0000manner and comprises two parts: dual-generator training and dual-model\u0000distillation. In dual-generator training, we delve into each generator\u0000concerning fidelity, transferability and diversity to ensure its utility, and\u0000additionally tailor the cross-divergence loss to lessen the overlap of dual\u0000generators' output spaces. In dual-model distillation, the trained dual\u0000generators work together to provide the training data for updates of the global\u0000model. At last, our extensive experiments on various image classification tasks\u0000show that DFDG achieves significant performance gains in accuracy compared to\u0000SOTA baselines.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Study on Asynchronous Vote-based Blockchains 基于异步投票的区块链研究

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-12 DOI: arxiv-2409.08161

Yibin Xu, Jianhua Shao, Tijs Slaats, Boris Düdder, Yongluan Zhou

{"title":"A Study on Asynchronous Vote-based Blockchains","authors":"Yibin Xu, Jianhua Shao, Tijs Slaats, Boris Düdder, Yongluan Zhou","doi":"arxiv-2409.08161","DOIUrl":"https://doi.org/arxiv-2409.08161","url":null,"abstract":"Vote-based blockchains construct a state machine replication (SMR) system\u0000among participating nodes, using Byzantine Fault Tolerance (BFT) consensus\u0000protocols to transition from one state to another. Currently, they rely on\u0000either synchronous or partially synchronous networks with leader-based\u0000coordination or costly Asynchronous Common Subset (ACS) protocols in\u0000asynchronous settings, making them impractical for large-scale asynchronous\u0000applications. To make Asynchronous SMR scalable, this paper proposes a emph{validated\u0000strong} BFT consensus model that allows leader-based coordination in\u0000asynchronous settings. Our BFT consensus model offers the same level of\u0000tolerance as binary byzantine agreement but does not demand consistency among\u0000honest nodes before they vote. An SMR using our model allows nodes to operate\u0000in different, tentative, but mutually exclusive states until they eventually\u0000converge on the same state. We propose an asynchronous BFT protocol for\u0000vote-based blockchains employing our consensus model to address several\u0000critical challenges: how to ensure that nodes eventually converge on the same\u0000state across voting rounds, how to assure that a blockchain will steadily\u0000progress through epochs while reaching consensus for previous epochs, and how\u0000to maintain robust byzantine fault tolerance. Our protocol greatly reduces message complexity and is the first one to\u0000achieve linear view changes without relying on threshold signatures. We prove\u0000that an asynchronous blockchain built on our protocol can operate with the\u0000emph{same} simplicity and efficiency as partially synchronous blockchains\u0000built on, e.g. HotStuff-2. This facilitates deploying asynchronous blockchains\u0000across large-scale networks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cooperative Inference with Interleaved Operator Partitioning for CNNs 利用交错算子分区为 CNN 进行合作推理

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-12 DOI: arxiv-2409.07693

Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, Chao Li

{"title":"Cooperative Inference with Interleaved Operator Partitioning for CNNs","authors":"Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, Chao Li","doi":"arxiv-2409.07693","DOIUrl":"https://doi.org/arxiv-2409.07693","url":null,"abstract":"Deploying deep learning models on Internet of Things (IoT) devices often\u0000faces challenges due to limited memory resources and computing capabilities.\u0000Cooperative inference is an important method for addressing this issue,\u0000requiring the partitioning and distributive deployment of an intelligent model.\u0000To perform horizontal partitions, existing cooperative inference methods take\u0000either the output channel of operators or the height and width of feature maps\u0000as the partition dimensions. In this manner, since the activation of operators\u0000is distributed, they have to be concatenated together before being fed to the\u0000next operator, which incurs the delay for cooperative inference. In this paper,\u0000we propose the Interleaved Operator Partitioning (IOP) strategy for CNN models.\u0000By partitioning an operator based on the output channel dimension and its\u0000successive operator based on the input channel dimension, activation\u0000concatenation becomes unnecessary, thereby reducing the number of communication\u0000connections, which consequently reduces cooperative inference de-lay. Based on\u0000IOP, we further present a model segmentation algorithm for minimizing\u0000cooperative inference time, which greedily selects operators for IOP pairing\u0000based on the inference delay benefit harvested. Experimental results\u0000demonstrate that compared with the state-of-the-art partition approaches used\u0000in CoEdge, the IOP strategy achieves 6.39% ~ 16.83% faster acceleration and\u0000reduces peak memory footprint by 21.22% ~ 49.98% for three classical image\u0000classification models.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Backup System with No Impact on Business Processing Utilizing Storage and Container Technologies 利用存储和容器技术，不影响业务处理的数据备份系统

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.07081

Satoru Watanabe

引用次数: 0

FreeRide: Harvesting Bubbles in Pipeline Parallelism FreeRide：在管道并行中收获气泡

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.06941

Jiashu ZhangYiming, Zihan PanYiming, MollyYiming, Xu, Khuzaima Daudjee, Sihang Liu

{"title":"FreeRide: Harvesting Bubbles in Pipeline Parallelism","authors":"Jiashu ZhangYiming, Zihan PanYiming, MollyYiming, Xu, Khuzaima Daudjee, Sihang Liu","doi":"arxiv-2409.06941","DOIUrl":"https://doi.org/arxiv-2409.06941","url":null,"abstract":"The occurrence of bubbles in pipeline parallelism is an inherent limitation\u0000that can account for more than 40% of the large language model (LLM) training\u0000time and is one of the main reasons for the underutilization of GPU resources\u0000in LLM training. Harvesting these bubbles for GPU side tasks can increase\u0000resource utilization and reduce training costs but comes with challenges.\u0000First, because bubbles are discontinuous with various shapes, programming side\u0000tasks becomes difficult while requiring excessive engineering effort. Second, a\u0000side task can compete with pipeline training for GPU resources and incur\u0000significant overhead. To address these challenges, we propose FreeRide, a\u0000system designed to harvest bubbles in pipeline parallelism for side tasks.\u0000FreeRide provides programmers with interfaces to implement side tasks easily,\u0000manages bubbles and side tasks during pipeline training, and controls access to\u0000GPU resources by side tasks to reduce overhead. We demonstrate that FreeRide\u0000achieves 7.8% average cost savings with a negligible overhead of about 1% in\u0000training LLMs while serving model training, graph analytics, and image\u0000processing side tasks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HERL: Tiered Federated Learning with Adaptive Homomorphic Encryption using Reinforcement Learning HERL：利用强化学习进行分层联合学习与自适应同态加密

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.07631

Jiaxang Tang, Zeshan Fayyaz, Mohammad A. Salahuddin, Raouf Boutaba, Zhi-Li Zhang, Ali Anwar

{"title":"HERL: Tiered Federated Learning with Adaptive Homomorphic Encryption using Reinforcement Learning","authors":"Jiaxang Tang, Zeshan Fayyaz, Mohammad A. Salahuddin, Raouf Boutaba, Zhi-Li Zhang, Ali Anwar","doi":"arxiv-2409.07631","DOIUrl":"https://doi.org/arxiv-2409.07631","url":null,"abstract":"Federated Learning is a well-researched approach for collaboratively training\u0000machine learning models across decentralized data while preserving privacy.\u0000However, integrating Homomorphic Encryption to ensure data confidentiality\u0000introduces significant computational and communication overheads, particularly\u0000in heterogeneous environments where clients have varying computational\u0000capacities and security needs. In this paper, we propose HERL, a Reinforcement\u0000Learning-based approach that uses Q-Learning to dynamically optimize encryption\u0000parameters, specifically the polynomial modulus degree, $N$, and the\u0000coefficient modulus, $q$, across different client tiers. Our proposed method\u0000involves first profiling and tiering clients according to the chosen clustering\u0000approach, followed by dynamically selecting the most suitable encryption\u0000parameters using an RL-agent. Experimental results demonstrate that our\u0000approach significantly reduces the computational overhead while maintaining\u0000utility and a high level of security. Empirical results show that HERL improves\u0000utility by 17%, reduces the convergence time by up to 24%, and increases\u0000convergence efficiency by up to 30%, with minimal security loss.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee 利用 OpenMP 卸载和 Codee 优化天气研究和预测模型

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.07232

ChayanonNamo, WichitrnithedHelen, Woo-Sun-YangHelen, YunHelen, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson Jr., Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste

引用次数: 0

Distributed Convolutional Neural Network Training on Mobile and Edge Clusters 移动和边缘集群上的分布式卷积神经网络训练

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.09083

Pranav Rama, Madison Threadgill, Andreas Gerstlauer

{"title":"Distributed Convolutional Neural Network Training on Mobile and Edge Clusters","authors":"Pranav Rama, Madison Threadgill, Andreas Gerstlauer","doi":"arxiv-2409.09083","DOIUrl":"https://doi.org/arxiv-2409.09083","url":null,"abstract":"The training of deep and/or convolutional neural networks (DNNs/CNNs) is\u0000traditionally done on servers with powerful CPUs and GPUs. Recent efforts have\u0000emerged to localize machine learning tasks fully on the edge. This brings\u0000advantages in reduced latency and increased privacy, but necessitates working\u0000with resource-constrained devices. Approaches for inference and training in\u0000mobile and edge devices based on pruning, quantization or incremental and\u0000transfer learning require trading off accuracy. Several works have explored\u0000distributing inference operations on mobile and edge clusters instead. However,\u0000there is limited literature on distributed training on the edge. Existing\u0000approaches all require a central, potentially powerful edge or cloud server for\u0000coordination or offloading. In this paper, we describe an approach for\u0000distributed CNN training exclusively on mobile and edge devices. Our approach\u0000is beneficial for the initial CNN layers that are feature map dominated. It is\u0000based on partitioning forward inference and back-propagation operations among\u0000devices through tiling and fusing to maximize locality and expose communication\u0000and memory-aware parallelism. We also introduce the concept of layer grouping\u0000to further fine-tune performance based on computation and communication\u0000trade-off. Results show that for a cluster of 2-6 quad-core Raspberry Pi3\u0000devices, training of an object-detection CNN provides a 2x-15x speedup with\u0000respect to a single core and up to 8x reduction in memory usage per device, all\u0000without sacrificing accuracy. Grouping offers up to 1.5x speedup depending on\u0000the reference profile and batch size.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural Networks D3-GNN：流图神经网络的动态分布式数据流

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-10 DOI: arxiv-2409.09079

Rustam Guliyev, Aparajita Haldar, Hakan Ferhatosmanoglu

{"title":"D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural Networks","authors":"Rustam Guliyev, Aparajita Haldar, Hakan Ferhatosmanoglu","doi":"arxiv-2409.09079","DOIUrl":"https://doi.org/arxiv-2409.09079","url":null,"abstract":"Graph Neural Network (GNN) models on streaming graphs entail algorithmic\u0000challenges to continuously capture its dynamic state, as well as systems\u0000challenges to optimize latency, memory, and throughput during both inference\u0000and training. We present D3-GNN, the first distributed, hybrid-parallel,\u0000streaming GNN system designed to handle real-time graph updates under online\u0000query setting. Our system addresses data management, algorithmic, and systems\u0000challenges, enabling continuous capturing of the dynamic state of the graph and\u0000updating node representations with fault-tolerance and optimal latency,\u0000load-balance, and throughput. D3-GNN utilizes streaming GNN aggregators and an\u0000unrolled, distributed computation graph architecture to handle cascading graph\u0000updates. To counteract data skew and neighborhood explosion issues, we\u0000introduce inter-layer and intra-layer windowed forward pass solutions.\u0000Experiments on large-scale graph streams demonstrate that D3-GNN achieves high\u0000efficiency and scalability. Compared to DGL, D3-GNN achieves a significant\u0000throughput improvement of about 76x for streaming workloads. The windowed\u0000enhancement further reduces running times by around 10x and message volumes by\u0000up to 15x at higher parallelism.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0