Performance & Reliability Oriented Combined File, Capacity Allocation on Distributed Systems

Proceeding of 13th IEEE Annual International Phoenix Conference on Computers and Communications Pub Date : 1994-04-12 DOI:10.1109/PCCC.1994.504127

A. Kumar, S. Ahuja

{"title":"Performance & Reliability Oriented Combined File, Capacity Allocation on Distributed Systems","authors":"A. Kumar, S. Ahuja","doi":"10.1109/PCCC.1994.504127","DOIUrl":null,"url":null,"abstract":"Distributed Computing Systems (DCSs) have the potential for high reliability. When the topology of a DCS is fixed, the DCS reliability depends mainly on the allocation of various resources. One of the important resources to be allocated on a DCS are various files. Another problem of significance deals with the allocation of capacities to the links of a DCS such that the performance (throughput) is maximized and a cost constraint is satisfied. In this paper, we have developed a performance based reliability oriented file and capacity allocation scheme for distributed systems. In this scheme capacities are allocated to the links of the DCS and the files are allocated to the nodes of the DCS such that the overall effectiveness of executing a program which requires files from remote node(s) is maximized. A Genetic Algorithm (GA) based approach is used to solve this problem. The effectiveness of the GA based approach is demonstrated by comparing the results with the results obtained using exhaustive search of the problem state space. The paper also studies the impact of the Probability of Crossover (Pc) and the Probability of Mutation (Pm) on the results obtained using the GA. 1. The File and Capacity Allocation Problem The reliability of executing various applications (programs) in a Distributed Computing System (DCS) depends on the topology of the DCS, the program locations, and the allocation of files on the DCS [l]. The system reliability can also be improved by introducing redundancy in the system [2,3]. If the topology of a DCS is fixed then the overall reliability of executing various applications depends mainly on the file allocation on the DCS [2]. If there are n processing nodes and m data files in the DCS, then the total number of possible assignments are nm. Thus the optimal allocation of files on the processing nodes is a problem of exponential complexity [2]. The file allocation problem is formulated in terms of cost factors and constraints, the objective being to allocate files such that the cost factor@) are minimized/maximized and all the constraints are satisfied [2]/ The cost factor(s) are represented by a cost funciton. The nature of a cost function and the constraints is problem dependent. For example, system reliability or htroughoput are important in communication networks, while response time is extremely important in real time systems. *This research is in part supported by the President’s Grant at the University of Louisville. 0-7803-1814-5194 $4.00 @ 1994 IEEE Sanjay P. Ahuja Department of Mathematical Sciences SUNY at Oneonta Oneonta, NY 13820 Conventional measures such as TR [4], survivability index [5], and computer network reliability (CNR) [6] compute fault tolerance of a DCS from a network component failure point of view only. The effect of file distribution on the DCS reliability is ignored. The parameters Distri-buted Program Reliability (DPR) [7] and Distributed System Reliability (DSR) [7] take into account the effect of file distribution on DCS reliability and hence are more suited to evaluating the reliability of program execution. The problem of file allocation on a DCS such that the DPR/DSR is maximized has been solved in [2]. It is to be noted, however, that the DPR/DSR reliability parameters do not take into consideration the capacity of the communication links and the total installed system capacity. The traffic requirement of the system is also ignored. It is therefore implicitly assumed that all the links of the DCS are always capable of the required flow. This is an invalid asumption since the links of a DCS have a finite, preassigned capacity. The problem of channel (link) capacity allocation is another problem of significance and deals with allocating capacity to the different links of the DCS whose topology is already fixed. The objective of this problem is to allocate capacity to the links of the DCS such that the throughput is maximized and the constraints are also met. One possible constraint could be that the total cost of installing capacities to the various links of the DCS not exceed a specified upper bound. Both these problems, i.e. file allocation with reliability optimization and capacity allocation have been solved individually in the literature [2]. However, while allocating files on a DCS such that DPR/DSR is maximized, the capacity allocation of the links is ignored, whereas the link capacities have a significant impact on the overall effectiveness of the DCS. On the other hand while solving the capacity assignment problem such that the throughput is maximized, the effect of file allocation on the throughput is ignored. Such single criterion optimization does not yield a DCS with a truly optimum file or capacity assignment. It is clear that these two criteria need to be optimized together to yield a truly optimal file and capacity assignment such that the overall effectiveness of the DCS is maximized. One way to accomplish this would be to perform a multi-criteria optimization. However, the newly developed parameter, Average Distributed Program Throughput [8] yields itself very well in solving this problem as it takes into","PeriodicalId":203232,"journal":{"name":"Proceeding of 13th IEEE Annual International Phoenix Conference on Computers and Communications","volume":"05 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceeding of 13th IEEE Annual International Phoenix Conference on Computers and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCCC.1994.504127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Distributed Computing Systems (DCSs) have the potential for high reliability. When the topology of a DCS is fixed, the DCS reliability depends mainly on the allocation of various resources. One of the important resources to be allocated on a DCS are various files. Another problem of significance deals with the allocation of capacities to the links of a DCS such that the performance (throughput) is maximized and a cost constraint is satisfied. In this paper, we have developed a performance based reliability oriented file and capacity allocation scheme for distributed systems. In this scheme capacities are allocated to the links of the DCS and the files are allocated to the nodes of the DCS such that the overall effectiveness of executing a program which requires files from remote node(s) is maximized. A Genetic Algorithm (GA) based approach is used to solve this problem. The effectiveness of the GA based approach is demonstrated by comparing the results with the results obtained using exhaustive search of the problem state space. The paper also studies the impact of the Probability of Crossover (Pc) and the Probability of Mutation (Pm) on the results obtained using the GA. 1. The File and Capacity Allocation Problem The reliability of executing various applications (programs) in a Distributed Computing System (DCS) depends on the topology of the DCS, the program locations, and the allocation of files on the DCS [l]. The system reliability can also be improved by introducing redundancy in the system [2,3]. If the topology of a DCS is fixed then the overall reliability of executing various applications depends mainly on the file allocation on the DCS [2]. If there are n processing nodes and m data files in the DCS, then the total number of possible assignments are nm. Thus the optimal allocation of files on the processing nodes is a problem of exponential complexity [2]. The file allocation problem is formulated in terms of cost factors and constraints, the objective being to allocate files such that the cost factor@) are minimized/maximized and all the constraints are satisfied [2]/ The cost factor(s) are represented by a cost funciton. The nature of a cost function and the constraints is problem dependent. For example, system reliability or htroughoput are important in communication networks, while response time is extremely important in real time systems. *This research is in part supported by the President’s Grant at the University of Louisville. 0-7803-1814-5194 $4.00 @ 1994 IEEE Sanjay P. Ahuja Department of Mathematical Sciences SUNY at Oneonta Oneonta, NY 13820 Conventional measures such as TR [4], survivability index [5], and computer network reliability (CNR) [6] compute fault tolerance of a DCS from a network component failure point of view only. The effect of file distribution on the DCS reliability is ignored. The parameters Distri-buted Program Reliability (DPR) [7] and Distributed System Reliability (DSR) [7] take into account the effect of file distribution on DCS reliability and hence are more suited to evaluating the reliability of program execution. The problem of file allocation on a DCS such that the DPR/DSR is maximized has been solved in [2]. It is to be noted, however, that the DPR/DSR reliability parameters do not take into consideration the capacity of the communication links and the total installed system capacity. The traffic requirement of the system is also ignored. It is therefore implicitly assumed that all the links of the DCS are always capable of the required flow. This is an invalid asumption since the links of a DCS have a finite, preassigned capacity. The problem of channel (link) capacity allocation is another problem of significance and deals with allocating capacity to the different links of the DCS whose topology is already fixed. The objective of this problem is to allocate capacity to the links of the DCS such that the throughput is maximized and the constraints are also met. One possible constraint could be that the total cost of installing capacities to the various links of the DCS not exceed a specified upper bound. Both these problems, i.e. file allocation with reliability optimization and capacity allocation have been solved individually in the literature [2]. However, while allocating files on a DCS such that DPR/DSR is maximized, the capacity allocation of the links is ignored, whereas the link capacities have a significant impact on the overall effectiveness of the DCS. On the other hand while solving the capacity assignment problem such that the throughput is maximized, the effect of file allocation on the throughput is ignored. Such single criterion optimization does not yield a DCS with a truly optimum file or capacity assignment. It is clear that these two criteria need to be optimized together to yield a truly optimal file and capacity assignment such that the overall effectiveness of the DCS is maximized. One way to accomplish this would be to perform a multi-criteria optimization. However, the newly developed parameter, Average Distributed Program Throughput [8] yields itself very well in solving this problem as it takes into

查看原文本刊更多论文

面向性能与可靠性的分布式系统组合文件、容量分配

分布式计算系统(dcs)具有高可靠性的潜力。当DCS的拓扑结构固定时，DCS的可靠性主要取决于各种资源的分配。要在DCS上分配的重要资源之一是各种文件。另一个重要的问题是如何将能力分配给DCS的链路，从而使性能(吞吐量)最大化并满足成本约束。本文提出了一种基于性能的、面向可靠性的分布式系统文件和容量分配方案。在这个方案中，容量被分配给DCS的链路，文件被分配给DCS的节点，这样执行需要从远程节点获取文件的程序的总体效率就得到了最大化。采用基于遗传算法(GA)的方法来解决这一问题。通过将结果与穷举搜索问题状态空间的结果进行比较，证明了基于遗传算法的方法的有效性。本文还研究了交叉概率(Pc)和突变概率(Pm)对遗传算法所得结果的影响。1. 在分布式计算系统(DCS)中执行各种应用程序(程序)的可靠性取决于DCS的拓扑结构、程序位置和DCS上的文件分配[1]。通过在系统中引入冗余也可以提高系统的可靠性[2,3]。如果DCS的拓扑结构是固定的，那么执行各种应用程序的总体可靠性主要取决于DCS上的文件分配[2]。如果DCS中有n个处理节点和m个数据文件，则可能的分配总数为nm。因此，处理节点上文件的最优分配是一个指数复杂度问题[2]。文件分配问题是根据成本因素和约束来制定的，目标是分配文件，使成本因素最小化/最大化，并满足所有约束[2]。成本因素由成本函数表示。成本函数的性质和约束是与问题相关的。例如，系统可靠性或吞吐量在通信网络中很重要，而响应时间在实时系统中极为重要。*本研究部分得到了路易斯维尔大学校长基金的支持。$4.00 @ 1994 IEEE Sanjay P. Ahuja纽约州立大学数学科学系，位于纽约州奥内塔奥内塔13820传统措施，如TR[4]，生存能力指数[5]和计算机网络可靠性(CNR)[6]仅从网络组件故障的角度计算DCS的容错性。忽略文件分布对DCS可靠性的影响。分布式程序可靠性(Distributed Program Reliability, DPR)[7]和分布式系统可靠性(Distributed System Reliability, DSR)[7]考虑了文件分布对DCS可靠性的影响，因此更适合于评估程序执行的可靠性。在DCS上实现DPR/DSR最大化的文件分配问题已经在[2]中得到了解决。然而，值得注意的是，DPR/DSR可靠性参数没有考虑到通信链路的容量和已安装的系统总容量。系统的流量需求也被忽略。因此，隐含地假设DCS的所有链接始终能够处理所需的流。这是一个无效的假设，因为DCS的链路具有有限的预先分配的容量。信道(链路)容量分配问题是另一个重要的问题，它涉及到将容量分配给拓扑已经固定的DCS的不同链路。该问题的目标是将容量分配给DCS的链路，从而使吞吐量最大化并满足约束。一个可能的限制是，为DCS的各个环节安装容量的总成本不超过规定的上限。文献[2]分别解决了可靠性优化的文件分配和容量分配这两个问题。但是，当在DCS上分配文件以使DPR/DSR最大化时，忽略了链路的容量分配，而链路容量对DCS的整体有效性有重大影响。另一方面，在解决吞吐量最大化的容量分配问题时，忽略了文件分配对吞吐量的影响。这种单准则优化不会产生具有真正最佳文件或容量分配的DCS。很明显，这两个标准需要一起进行优化，以产生真正最优的文件和容量分配，从而使DCS的总体效率最大化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceeding of 13th IEEE Annual International Phoenix Conference on Computers and Communications

自引率

0.00%

发文量