{"title":"A resiliency model for high performance infrastructure based on logical encapsulation","authors":"James J. Moore, C. Kesselman","doi":"10.1145/2287076.2287118","DOIUrl":null,"url":null,"abstract":"An emerging trend in distributed systems is the creation of dynamically provisioned heterogeneous high performance platforms that include the co-allocation of both virtualized computing and network attached storage volumes offering NAS and SAN level data services. These high performance computing environments support parallel applications performing traditional file system operations. As with any parallel platform the ability to continue computation in the face of component failures is an important characteristic. Achieving resiliency in heterogeneous environments presents unique challenges and opportunities not found in homogeneous aggregations of computing resources. We present a logical encapsulation model for heterogeneous high performance infrastructure, which enables a reactive resiliency approach for federations of virtual machines and externally hosted physical storage volumes. Asynchronous state capture and restoration models are presented for individual resources, which are composed into non-blocking resiliency models for logical encapsulations. We perform an evaluation that demonstrates our methodology has greater overall flexibility and significant performance improvements when compared to current resiliency approaches in virtualized distributed execution environments.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2287076.2287118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
An emerging trend in distributed systems is the creation of dynamically provisioned heterogeneous high performance platforms that include the co-allocation of both virtualized computing and network attached storage volumes offering NAS and SAN level data services. These high performance computing environments support parallel applications performing traditional file system operations. As with any parallel platform the ability to continue computation in the face of component failures is an important characteristic. Achieving resiliency in heterogeneous environments presents unique challenges and opportunities not found in homogeneous aggregations of computing resources. We present a logical encapsulation model for heterogeneous high performance infrastructure, which enables a reactive resiliency approach for federations of virtual machines and externally hosted physical storage volumes. Asynchronous state capture and restoration models are presented for individual resources, which are composed into non-blocking resiliency models for logical encapsulations. We perform an evaluation that demonstrates our methodology has greater overall flexibility and significant performance improvements when compared to current resiliency approaches in virtualized distributed execution environments.