在非专用网格和云资源上构建在线特定领域计算服务:超级链接在线体验

M. Silberstein
{"title":"在非专用网格和云资源上构建在线特定领域计算服务:超级链接在线体验","authors":"M. Silberstein","doi":"10.1109/CCGrid.2011.46","DOIUrl":null,"url":null,"abstract":"Linkage analysis is a statistical method used by geneticists in everyday practice for mapping disease-susceptibility genes in the study of complex diseases. An essential first step in the study of genetic diseases, linkage computations may require years of CPU time. The recent DNA sampling revolution enabled unprecedented sampling density, but made the analysis even more computationally demanding. In this paper we describe a high performance online service for genetic linkage analysis, called Super link-online. The system enables anyone with Internet access to submit genetic data and analyze it as easily and quickly as if using a supercomputer. The analyses are automatically parallelized and executed on tens of thousands distributed CPUs in multiple clouds and grids. The first version of the system, which employed up to 3,000 CPUs in UW Madison and Technion Condor pools, has been successfully used since 2006 by hundreds of geneticists worldwide, with over 40 citations in the genetics literature. Here we describe the second version, which substantially improves the scalability and performance of first: it uses over 45,000 non-dedicated hosts, in 10different grids and clouds, including EC2 and the Superlink@Technion community grid. Improved system performance is obtained through a virtual grid hierarchy with dynamic load balancing and multi-grid overlay via the Grid Bot system, parallel pruning of short tasks for overhead minimization, and cost-efficient use of cloud resources in reliability-critical execution periods. These enhancements enabled execution of many previously infeasible analyses, which can now be completed within a few hours. The new version of the system, in production since 2009, has completed over 6500 different runs of over 10 million tasks, with total consumption of 420 CPU years.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Building an Online Domain-Specific Computing Service over Non-dedicated Grid and Cloud Resources: The Superlink-Online Experience\",\"authors\":\"M. Silberstein\",\"doi\":\"10.1109/CCGrid.2011.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Linkage analysis is a statistical method used by geneticists in everyday practice for mapping disease-susceptibility genes in the study of complex diseases. An essential first step in the study of genetic diseases, linkage computations may require years of CPU time. The recent DNA sampling revolution enabled unprecedented sampling density, but made the analysis even more computationally demanding. In this paper we describe a high performance online service for genetic linkage analysis, called Super link-online. The system enables anyone with Internet access to submit genetic data and analyze it as easily and quickly as if using a supercomputer. The analyses are automatically parallelized and executed on tens of thousands distributed CPUs in multiple clouds and grids. The first version of the system, which employed up to 3,000 CPUs in UW Madison and Technion Condor pools, has been successfully used since 2006 by hundreds of geneticists worldwide, with over 40 citations in the genetics literature. Here we describe the second version, which substantially improves the scalability and performance of first: it uses over 45,000 non-dedicated hosts, in 10different grids and clouds, including EC2 and the Superlink@Technion community grid. Improved system performance is obtained through a virtual grid hierarchy with dynamic load balancing and multi-grid overlay via the Grid Bot system, parallel pruning of short tasks for overhead minimization, and cost-efficient use of cloud resources in reliability-critical execution periods. These enhancements enabled execution of many previously infeasible analyses, which can now be completed within a few hours. The new version of the system, in production since 2009, has completed over 6500 different runs of over 10 million tasks, with total consumption of 420 CPU years.\",\"PeriodicalId\":376385,\"journal\":{\"name\":\"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2011.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2011.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

连锁分析是遗传学家在复杂疾病研究中用于绘制疾病易感基因的一种统计方法。在遗传疾病研究中必不可少的第一步,连锁计算可能需要数年的CPU时间。最近的DNA采样革命实现了前所未有的采样密度,但使分析对计算的要求更高。在本文中,我们描述了一个高性能的在线服务,遗传连锁分析,称为超级链接在线。该系统使任何能上网的人都能提交基因数据并进行分析,就像使用超级计算机一样简单快捷。分析自动并行化,并在多个云和网格中的数万个分布式cpu上执行。该系统的第一个版本,在威斯康星大学麦迪逊分校和以色列理工学院的秃鹫池中使用了多达3000个cpu,自2006年以来,已成功地被全球数百名遗传学家使用,在遗传学文献中引用了40多次。这里我们描述第二个版本,它大大提高了第一个版本的可伸缩性和性能:它在10个不同的网格和云中使用超过45,000个非专用主机,包括EC2和Superlink@Technion社区网格。通过通过grid Bot系统实现动态负载平衡和多网格覆盖的虚拟网格层次结构,并行修剪短任务以实现开销最小化,以及在可靠性关键执行期间经济高效地使用云资源,从而提高系统性能。这些增强功能支持执行许多以前不可行的分析,这些分析现在可以在几个小时内完成。该系统的新版本自2009年投入生产以来,已经完成了6500多个不同的运行,执行了超过1000万个任务,总消耗CPU年为420年。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Building an Online Domain-Specific Computing Service over Non-dedicated Grid and Cloud Resources: The Superlink-Online Experience
Linkage analysis is a statistical method used by geneticists in everyday practice for mapping disease-susceptibility genes in the study of complex diseases. An essential first step in the study of genetic diseases, linkage computations may require years of CPU time. The recent DNA sampling revolution enabled unprecedented sampling density, but made the analysis even more computationally demanding. In this paper we describe a high performance online service for genetic linkage analysis, called Super link-online. The system enables anyone with Internet access to submit genetic data and analyze it as easily and quickly as if using a supercomputer. The analyses are automatically parallelized and executed on tens of thousands distributed CPUs in multiple clouds and grids. The first version of the system, which employed up to 3,000 CPUs in UW Madison and Technion Condor pools, has been successfully used since 2006 by hundreds of geneticists worldwide, with over 40 citations in the genetics literature. Here we describe the second version, which substantially improves the scalability and performance of first: it uses over 45,000 non-dedicated hosts, in 10different grids and clouds, including EC2 and the Superlink@Technion community grid. Improved system performance is obtained through a virtual grid hierarchy with dynamic load balancing and multi-grid overlay via the Grid Bot system, parallel pruning of short tasks for overhead minimization, and cost-efficient use of cloud resources in reliability-critical execution periods. These enhancements enabled execution of many previously infeasible analyses, which can now be completed within a few hours. The new version of the system, in production since 2009, has completed over 6500 different runs of over 10 million tasks, with total consumption of 420 CPU years.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信