大规模分布式系统中可用性统计模型的挖掘:SETI@home的实证研究

2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems Pub Date : 2009-12-28 DOI:10.1109/MASCOT.2009.5367061

B. Javadi, Derrick Kondo, J. Vincent, David P. Anderson

{"title":"大规模分布式系统中可用性统计模型的挖掘:SETI@home的实证研究","authors":"B. Javadi, Derrick Kondo, J. Vincent, David P. Anderson","doi":"10.1109/MASCOT.2009.5367061","DOIUrl":null,"url":null,"abstract":"In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus non-stationary behavior) and fit different models (for example Exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modelled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real large-scale Internet-distributed system, namely SETI@home. We find that about 34% of hosts exhibit availability that is a truly random process, and that these hosts can often be modelled accurately with a few distinct distributions from different families. We believe that this characterization is fundamental in the design of stochastic scheduling algorithms across large-scale systems where host availability is uncertain.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"74","resultStr":"{\"title\":\"Mining for statistical models of availability in large-scale distributed systems: An empirical study of SETI@home\",\"authors\":\"B. Javadi, Derrick Kondo, J. Vincent, David P. Anderson\",\"doi\":\"10.1109/MASCOT.2009.5367061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus non-stationary behavior) and fit different models (for example Exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modelled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real large-scale Internet-distributed system, namely SETI@home. We find that about 34% of hosts exhibit availability that is a truly random process, and that these hosts can often be modelled accurately with a few distinct distributions from different families. We believe that this characterization is fundamental in the design of stochastic scheduling algorithms across large-scale systems where host availability is uncertain.\",\"PeriodicalId\":275737,\"journal\":{\"name\":\"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"74\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MASCOT.2009.5367061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MASCOT.2009.5367061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 74

摘要

在云、网格、P2P和志愿者分布式计算的时代，拥有数万台不可靠主机的大规模系统越来越普遍。这些系统总是由异构主机组成，这些主机的个体可用性通常表现出不同的统计特性(例如平稳与非平稳行为)，并适合不同的模型(例如指数、威布尔或帕累托概率分布)。在本文中，我们描述了一种发现主机子集的有效方法，这些主机子集的可用性具有相似的统计特性，并且可以用相似的概率分布建模。我们将此方法应用于从真实的大规模互联网分布式系统SETI@home获得的约23万个主机可用性跟踪。我们发现，大约34%的房东表现出的可用性是一个真正随机的过程，这些房东通常可以用不同家庭的几个不同分布准确地建模。我们认为，这种特性是设计大型系统随机调度算法的基础，其中主机可用性是不确定的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mining for statistical models of availability in large-scale distributed systems: An empirical study of SETI@home

In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus non-stationary behavior) and fit different models (for example Exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modelled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real large-scale Internet-distributed system, namely SETI@home. We find that about 34% of hosts exhibit availability that is a truly random process, and that these hosts can often be modelled accurately with a few distinct distributions from different families. We believe that this characterization is fundamental in the design of stochastic scheduling algorithms across large-scale systems where host availability is uncertain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems

自引率

0.00%

发文量