生产分布式计算基础设施上的可伸缩和弹性工作流执行

2012 11th International Symposium on Parallel and Distributed Computing Pub Date : 2012-06-25 DOI:10.1109/ISPDC.2012.24

J. R. Balderrama, Tram Truong Huu, J. Montagnat

{"title":"生产分布式计算基础设施上的可伸缩和弹性工作流执行","authors":"J. R. Balderrama, Tram Truong Huu, J. Montagnat","doi":"10.1109/ISPDC.2012.24","DOIUrl":null,"url":null,"abstract":"In spite of the growing interest for grids and cloud infrastructures among scientific communities and the availability of such facilities at large-scale, achieving high performance in production environments remains challenging due to at least four factors: the low reliability of very large-scale distributed computing infrastructures, the performance overhead induced by shared facilities, the difficulty to obtain fair balance of all user jobs in such an heterogeneous environment, and the complexity of large-scale distributed applications deployment. All together, these difficulties make infrastructure exploitation complex, and often limited to experts. This paper introduces a pragmatic solution to tackle these four issues based on a service-oriented methodology, the reuse of existing middleware services, and the joint exploitation of local and distributed computing resources. Emphasis is put on the integrated environment ease of use. Results on an actual neuroscience application show the impact of the environment setup in terms of reliability and performance. Recommendations and best practices are derived from this experiment.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Scalable and Resilient Workflow Executions on Production Distributed Computing Infrastructures\",\"authors\":\"J. R. Balderrama, Tram Truong Huu, J. Montagnat\",\"doi\":\"10.1109/ISPDC.2012.24\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In spite of the growing interest for grids and cloud infrastructures among scientific communities and the availability of such facilities at large-scale, achieving high performance in production environments remains challenging due to at least four factors: the low reliability of very large-scale distributed computing infrastructures, the performance overhead induced by shared facilities, the difficulty to obtain fair balance of all user jobs in such an heterogeneous environment, and the complexity of large-scale distributed applications deployment. All together, these difficulties make infrastructure exploitation complex, and often limited to experts. This paper introduces a pragmatic solution to tackle these four issues based on a service-oriented methodology, the reuse of existing middleware services, and the joint exploitation of local and distributed computing resources. Emphasis is put on the integrated environment ease of use. Results on an actual neuroscience application show the impact of the environment setup in terms of reliability and performance. Recommendations and best practices are derived from this experiment.\",\"PeriodicalId\":287900,\"journal\":{\"name\":\"2012 11th International Symposium on Parallel and Distributed Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 11th International Symposium on Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPDC.2012.24\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 11th International Symposium on Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC.2012.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

尽管科学界对网格和云基础设施的兴趣越来越大，而且此类设施的大规模可用性也越来越高，但由于至少四个因素，在生产环境中实现高性能仍然具有挑战性:非常大规模的分布式计算基础设施的低可靠性、共享设施带来的性能开销、在这种异构环境中难以获得所有用户作业的公平平衡以及大规模分布式应用程序部署的复杂性。总之，这些困难使基础设施开发变得复杂，而且往往仅限于专家。本文介绍了一种实用的解决方案来解决这四个问题，该解决方案基于面向服务的方法、现有中间件服务的重用以及本地和分布式计算资源的联合利用。重点是集成环境的易用性。在一个实际的神经科学应用中，结果显示了环境设置在可靠性和性能方面的影响。建议和最佳实践来源于该实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable and Resilient Workflow Executions on Production Distributed Computing Infrastructures

In spite of the growing interest for grids and cloud infrastructures among scientific communities and the availability of such facilities at large-scale, achieving high performance in production environments remains challenging due to at least four factors: the low reliability of very large-scale distributed computing infrastructures, the performance overhead induced by shared facilities, the difficulty to obtain fair balance of all user jobs in such an heterogeneous environment, and the complexity of large-scale distributed applications deployment. All together, these difficulties make infrastructure exploitation complex, and often limited to experts. This paper introduces a pragmatic solution to tackle these four issues based on a service-oriented methodology, the reuse of existing middleware services, and the joint exploitation of local and distributed computing resources. Emphasis is put on the integrated environment ease of use. Results on an actual neuroscience application show the impact of the environment setup in terms of reliability and performance. Recommendations and best practices are derived from this experiment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 11th International Symposium on Parallel and Distributed Computing

自引率

0.00%

发文量