IEEE Transactions on Parallel and Distributed Systems最新文献

筛选
英文 中文
Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value 基于pagerank启发区域值的边缘环境低成本低延迟数据放置
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-25 DOI: 10.1109/TPDS.2024.3506625
Pengwei Wang;Junye Qiao;Yuying Zhao;Zhijun Ding
{"title":"Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value","authors":"Pengwei Wang;Junye Qiao;Yuying Zhao;Zhijun Ding","doi":"10.1109/TPDS.2024.3506625","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3506625","url":null,"abstract":"Edge storage offers low-latency services to users. However, due to strained edge resources and high costs, enterprises must choose the data that most warrant placement at the edge and place it in the right location. In practice, data exhibit temporal and spatial properties, and variability, which have a significant impact on their placement, but have been largely ignored in research. To address this, we introduce the concept of data temperature, which considers data characteristics over time and space. To consider the influence of spatial relevance among different regions for placing data, inspired by PageRank, we present a model using data temperature to assess the regional value of data, which effectively leverages collaboration within the edge storage system. We also propose a regional value-based algorithm (RVA) that minimizes cost while meeting user response time requirements. By taking into account the correlation between regions, the RVA can achieve lower latency than current methods when creating an equal or even smaller number of replicas. Experimental results validate the efficacy of the proposed method in terms of latency, success rate, and cost efficiency.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"185-196"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DegaFL: Decentralized Gradient Aggregation for Cross-Silo Federated Learning DegaFL:跨筒仓联邦学习的分散梯度聚合
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-18 DOI: 10.1109/TPDS.2024.3501581
Jialiang Han;Yudong Han;Xiang Jing;Gang Huang;Yun Ma
{"title":"DegaFL: Decentralized Gradient Aggregation for Cross-Silo Federated Learning","authors":"Jialiang Han;Yudong Han;Xiang Jing;Gang Huang;Yun Ma","doi":"10.1109/TPDS.2024.3501581","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3501581","url":null,"abstract":"Federated learning (FL) is an emerging promising paradigm of privacy-preserving machine learning (ML). An important type of FL is cross-silo FL, which enables a moderate number of organizations to cooperatively train a shared model by keeping confidential data locally and aggregating gradients on a central parameter server. However, the central server may be vulnerable to malicious attacks or software failures in practice. To address this issue, in this paper, we propose \u0000<inline-formula><tex-math>$mathtt{DegaFL} $</tex-math></inline-formula>\u0000, a novel decentralized gradient aggregation approach for cross-silo FL. \u0000<inline-formula><tex-math>$mathtt{DegaFL} $</tex-math></inline-formula>\u0000 eliminates the central server by aggregating gradients on each participant, and maintains and synchronizes gradients of only the current training round. Besides, we propose \u0000<inline-formula><tex-math>$mathtt{AdaAgg} $</tex-math></inline-formula>\u0000 to adaptively aggregate correct gradients from honest nodes and use HotStuff to ensure the consistency of the training round number and gradients among all nodes. Experimental results show that \u0000<inline-formula><tex-math>$mathtt{DegaFL} $</tex-math></inline-formula>\u0000 defends against common threat models with minimal accuracy loss, and achieves up to \u0000<inline-formula><tex-math>$50times$</tex-math></inline-formula>\u0000 reduction in storage overhead and up to \u0000<inline-formula><tex-math>$13times$</tex-math></inline-formula>\u0000 reduction in network overhead, compared to state-of-the-art decentralized FL approaches.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"212-225"},"PeriodicalIF":5.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph Analysis 分布式图分析的二维均衡分区和高效缓存
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-18 DOI: 10.1109/TPDS.2024.3501292
Shuai Lin;Rui Wang;Yongkun Li;Yinlong Xu;John C. S. Lui
{"title":"Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph Analysis","authors":"Shuai Lin;Rui Wang;Yongkun Li;Yinlong Xu;John C. S. Lui","doi":"10.1109/TPDS.2024.3501292","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3501292","url":null,"abstract":"Distributed graph analysis usually partitions a large graph into multiple small-sized subgraphs and distributes them into a cluster of machines for computing. Therefore, graph partitioning plays a crucial role in distributed graph analysis. However, the widely used existing graph partitioning schemes balance only in one dimension (number of edges or vertices) or incur a large number of edge cuts, so they degrade the performance of distributed graph analysis. In this article, we propose a novel graph partition scheme BPart and two enhanced algorithms BPart-C and BPart-S to achieve a balanced partition for both vertices and edges, and also reduce the number of edge cuts. Besides, we also propose a neighbor-aware caching scheme to further reduce the number of edge cuts so as to improve the efficiency of distributed graph analysis. Our experimental results show that BPart-C and BPart-S can achieve a better balance in both dimensions (the number of vertices and edges), and meanwhile reducing the number of edge cuts, compared to multiple existing graph partitioning algorithms, i.e., Chunk-V, Chunk-E, Fennel, and Hash. We also integrate these partitioning algorithms into two popular distributed graph systems, KnightKing and Gemini, to validate their impact on graph analysis efficiency. Results show that both BPart-C and BPart-S can significantly reduce the total running time of various graph applications by up to 60% and 70%, respectively. In addition, the neighbor-aware caching scheme can further improve the performance by up to 24%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"133-149"},"PeriodicalIF":5.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spreeze: High-Throughput Parallel Reinforcement Learning Framework spreze:高吞吐量并行强化学习框架
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-14 DOI: 10.1109/TPDS.2024.3497986
Jing Hou;Guang Chen;Ruiqi Zhang;Zhijun Li;Shangding Gu;Changjun Jiang
{"title":"Spreeze: High-Throughput Parallel Reinforcement Learning Framework","authors":"Jing Hou;Guang Chen;Ruiqi Zhang;Zhijun Li;Shangding Gu;Changjun Jiang","doi":"10.1109/TPDS.2024.3497986","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3497986","url":null,"abstract":"The promotion of large-scale applications of reinforcement learning (RL) requires efficient training computation. While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively burdensome communication frameworks hinder the attainment of the hardware's limit for final throughput and training effects on a single desktop. In this article, we propose Spreeze, a lightweight parallel framework for RL that efficiently utilizes a single desktop hardware resource to approach the throughput limit. We asynchronously parallelize the experience sampling, network update, performance evaluation, and visualization operations, and employ multiple efficient data transmission techniques to transfer various types of data between processes. The framework can automatically adjust the parallelization hyperparameters based on the computing ability of the hardware device in order to perform efficient large-batch updates. Based on the characteristics of the “Actor-Critic” RL algorithm, our framework uses dual GPUs to independently update the network of actors and critics in order to further improve throughput. Simulation results show that our framework can achieve up to 15,000 Hz experience sampling and 370,000 Hz network update frame rate using only a personal desktop computer, which is an order of magnitude higher than other mainstream parallel RL frameworks, resulting in a 73% reduction of training time. Our work on fully utilizing the hardware resources of a single desktop computer is fundamental to enabling efficient large-scale distributed RL training.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"282-292"},"PeriodicalIF":5.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Service Demand Variability on Data Center Performance 服务需求变化对数据中心性能的影响
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-14 DOI: 10.1109/TPDS.2024.3497792
Diletta Olliaro;Adityo Anggraito;Marco Ajmone Marsan;Simonetta Balsamo;Andrea Marin
{"title":"The Impact of Service Demand Variability on Data Center Performance","authors":"Diletta Olliaro;Adityo Anggraito;Marco Ajmone Marsan;Simonetta Balsamo;Andrea Marin","doi":"10.1109/TPDS.2024.3497792","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3497792","url":null,"abstract":"Modern data centers feature an extensive array of cores that handle quite a diverse range of jobs. Recent traces, shared by leading cloud data center enterprises like Google and Alibaba, reveal that the constant increase in data center services and computational power is accompanied by a growing variability in service demand requirements. The number of cores needed for a job can vary widely, ranging from one to several thousands, and the number of seconds a core is held by a job can span more than five orders of magnitude. In this context of extreme variability, the policies governing the allocation of cores to jobs play a crucial role in the performance of data centers. It is widely acknowledged that the First-In First-Out (FIFO) policy tends to underutilize available computing capacity due to the varying magnitudes of core requests. However, the impact of the extreme variability in service demands on job waiting and response times, that has been deeply investigated in traditional queuing models, is not as well understood in the case of data centers, as we will show. To address this issue, we investigate the dynamics of a data center cluster through analytical models in simple cases, and discrete event simulations based on real data. Our findings emphasize the significant impact of service demand variability, both in terms of requested cores and service times, and allow us to provide insight for enhancing data center performance. In particular, we show how data center performance can be improved thanks to the control of the interplay between service and waiting times through the assignment of cores to jobs.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"120-132"},"PeriodicalIF":5.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10753043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data Centers 回溯可用的 CPU 资源:在数据中心防止违反服务水平协议的 SMT 感知调度
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-08 DOI: 10.1109/TPDS.2024.3494879
Haoyu Liao;Tong-yu Liu;Jianmei Guo;Bo Huang;Dingyu Yang;Jonathan Ding
{"title":"Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data Centers","authors":"Haoyu Liao;Tong-yu Liu;Jianmei Guo;Bo Huang;Dingyu Yang;Jonathan Ding","doi":"10.1109/TPDS.2024.3494879","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3494879","url":null,"abstract":"The article focuses on an understudied yet fundamental problem: existing methods typically average the utilization of multiple hardware threads to evaluate the available CPU resources. However, the approach could underestimate the actual usage of the underlying physical core for Simultaneous Multi-Threading (SMT) processors, leading to an overestimation of remaining resources. The overestimation propagates from microarchitecture to operating systems and cloud schedulers, which may misguide scheduling decisions, exacerbate CPU overcommitment, and increase Service Level Agreement (SLA) violations. To address the potential overestimation problem, we propose an SMT-aware and purely data-driven approach named \u0000<italic>Remaining CPU</i>\u0000 (RCPU) that reserves more CPU resources to restrict CPU overcommitment and prevent SLA violations. RCPU requires only a few modifications to the existing cloud infrastructures and can be scaled up to large data centers. Extensive evaluations in the data center proved that RCPU contributes to a reduction of SLA violations by 18% on average for 98% of all latency-sensitive applications. Under a benchmarking experiment, we prove that RCPU increases the accuracy by 69% in terms of Mean Absolute Error (MAE) compared to the state-of-the-art.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"67-83"},"PeriodicalIF":5.6,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142736347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balanced Splitting: A Framework for Achieving Zero-Wait in the Multiserver-Job Model 平衡拆分:在多服务器任务模型中实现零等待的框架
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-07 DOI: 10.1109/TPDS.2024.3493631
Jonatha Anselmi;Josu Doncel
{"title":"Balanced Splitting: A Framework for Achieving Zero-Wait in the Multiserver-Job Model","authors":"Jonatha Anselmi;Josu Doncel","doi":"10.1109/TPDS.2024.3493631","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3493631","url":null,"abstract":"We present a new framework for designing nonpreemptive and job-size oblivious scheduling policies in the multiserver-job queueing model. The main requirement is to identify a \u0000<i>static and balanced sub-partition</i>\u0000 of the server set and ensure that the servers in each set of that sub-partition can only handle jobs of a given \u0000<i>class</i>\u0000 and in a first-come first-served order. A job class is determined by the number of servers to which it has exclusive access during its entire execution and the probability distribution of its service time. This approach aims to reduce delays by preventing small jobs from being blocked by larger ones that arrived first, and it is particularly beneficial when the job size variability intra resp. inter classes is small resp. large. In this setting, we propose a new scheduling policy, Balanced-Splitting. In our main results, we provide a sufficient condition for the stability of Balanced-Splitting and show that the resulting queueing probability, i.e., the probability that an arriving job needs to wait for processing upon arrival, vanishes in both the subcritical (the load is kept fixed to a constant less than one) and critical (the load approaches one from below) many-server limiting regimes. Crucial to our analysis is a connection with the M/GI/\u0000<inline-formula><tex-math>$s$</tex-math></inline-formula>\u0000/\u0000<inline-formula><tex-math>$s$</tex-math></inline-formula>\u0000 queue and Erlang’s loss formula, which allows our analysis to rely on fundamental results from queueing theory. Numerical simulations show that the proposed policy performs better than several preemptive/nonpreemptive size-aware/oblivious policies in various practical scenarios. This is also confirmed by simulations running on real traces from High Performance Computing (HPC) workloads. The delays induced by Balanced-Splitting are also competitive with those induced by state-of-the-art policies such as First-Fit-SRPT and ServerFilling-SRPT, though our approach has the advantage of not requiring preemption, nor the knowledge of job sizes.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"43-54"},"PeriodicalIF":5.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Edge Data Deduplication Under Uncertainties: A Robust Optimization Approach 不确定情况下的边缘重复数据删除:稳健的优化方法
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-07 DOI: 10.1109/TPDS.2024.3493959
Ruikun Luo;Qiang He;Mengxi Xu;Feifei Chen;Song Wu;Jing Yang;Yuan Gao;Hai Jin
{"title":"Edge Data Deduplication Under Uncertainties: A Robust Optimization Approach","authors":"Ruikun Luo;Qiang He;Mengxi Xu;Feifei Chen;Song Wu;Jing Yang;Yuan Gao;Hai Jin","doi":"10.1109/TPDS.2024.3493959","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3493959","url":null,"abstract":"The emergence of \u0000<italic>mobile edge computing</i>\u0000 (MEC) in distributed systems has sparked increased attention toward edge data management. A conflict arises from the disparity between limited edge resources and the continuously expanding data requests for data storage, making the reduction of data storage costs a critical objective. Despite the extensive studies of edge data deduplication as a data reduction technique, existing deduplication methods encounter numerous challenges in MEC environments. These challenges stem from disparities between edge servers and cloud data center edge servers, as well as uncertainties such as user mobility, leading to insufficient robustness in deduplication decision-making. Consequently, this paper presents a robust optimization-based approach for the edge data deduplication problem. By accounting for uncertainties including the number of data requirements and edge server failures, we propose two distinct solving algorithms: uEDDE-C, a two-stage algorithm based on column-and-constraint generation, and uEDDE-A, an approximation algorithm to address the high computation overhead of uEDDE-C. Our method facilitates efficient data deduplication in volatile edge network environments and maintains robustness across various uncertain scenarios. We validate the performance and robustness of uEDDE-C and uEDDE-A through theoretical analysis and experimental evaluations. The extensive experimental results demonstrate that our approach significantly reduces data storage cost and data retrieval latency while ensuring reliability in real-world MEC environments.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"84-95"},"PeriodicalIF":5.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10747105","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142736348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ripple: Enabling Decentralized Data Deduplication at the Edge 瑞波:在边缘实现去中心化重复数据删除
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-07 DOI: 10.1109/TPDS.2024.3493953
Ruikun Luo;Qiang He;Feifei Chen;Song Wu;Hai Jin;Yun Yang
{"title":"Ripple: Enabling Decentralized Data Deduplication at the Edge","authors":"Ruikun Luo;Qiang He;Feifei Chen;Song Wu;Hai Jin;Yun Yang","doi":"10.1109/TPDS.2024.3493953","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3493953","url":null,"abstract":"With its advantages in ensuring low data retrieval latency and reducing backhaul network traffic, edge computing is becoming a backbone solution for many latency-sensitive applications. An increasingly large number of data is being generated at the edge, stretching the limited capacity of edge storage systems. Improving resource utilization for edge storage systems has become a significant challenge in recent years. Existing solutions attempt to achieve this goal through data placement optimization, data partitioning, data sharing, etc. These approaches overlook the data redundancy in edge storage systems, which produces substantial storage resource wastage. This motivates the need for an approach for data deduplication at the edge. However, existing data deduplication methods rely on centralized control, which is not always feasible in practical edge computing environments. This article presents Ripple, the first approach that enables edge servers to deduplicate their data in a decentralized manner. At its core, it builds a data index for each edge server, enabling them to deduplicate data without central control. With Ripple, edge servers can 1) identify data duplicates; 2) remove redundant data without violating data retrieval latency constraints; and 3) ensure data availability after deduplication. The results of trace-driven experiments conducted in a testbed system demonstrate the usefulness of Ripple in practice. Compared with the state-of-the-art approach, Ripple improves the deduplication ratio by up to 16.79% and reduces data retrieval latency by an average of 60.42%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"55-66"},"PeriodicalIF":5.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10747114","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EdgeHydra: Fault-Tolerant Edge Data Distribution Based on Erasure Coding EdgeHydra:基于消除编码的容错边缘数据分发
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-11-07 DOI: 10.1109/TPDS.2024.3493034
Qiang He;Guobiao Zhang;Jiawei Wang;Ruikun Luo;Xiaohai Dai;Yuchong Hu;Feifei Chen;Hai Jin;Yun Yang
{"title":"EdgeHydra: Fault-Tolerant Edge Data Distribution Based on Erasure Coding","authors":"Qiang He;Guobiao Zhang;Jiawei Wang;Ruikun Luo;Xiaohai Dai;Yuchong Hu;Feifei Chen;Hai Jin;Yun Yang","doi":"10.1109/TPDS.2024.3493034","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3493034","url":null,"abstract":"In the edge computing environment, app vendors can distribute popular data from the cloud to edge servers to provide low-latency data retrieval. A key problem is how to distribute these data from the cloud to edge servers cost-effectively. Under current schemes, a file is divided into some data blocks for parallel transmissions from the cloud to target edge servers. Edge servers can then combine received data blocks to reconstruct the file. While this method expedites the data distribution process, it presents potential drawbacks. It is sensitive to transmission delays and transmission failures caused by runtime exceptions like network fluctuations and server failures. This paper presents EdgeHydra, the first edge data distribution scheme that tackles this challenge through fault tolerance based on erasure coding. Under EdgeHydra, a file is encoded into data blocks and parity blocks for parallel transmission from the cloud to target edge servers. An edge server can reconstruct the file upon the receipt of a sufficient number of these blocks without having to wait for all the blocks in transmission. It also innovatively employs a leaderless block supplement mechanism to ensure the receipt of sufficient blocks for individual target edge servers. These improve the robustness of the data distribution process significantly. Extensive experiments show that EdgeHydra can tolerate delays and failures in individual transmission links effectively, outperforming the state-of-the-art scheme by up to 50.54% in distribution time.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"29-42"},"PeriodicalIF":5.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10746622","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信