21世纪科学计算体系结构的基本原理和策略:使用商业对称多处理器作为超级计算机的案例

Int. J. High Speed Comput. Pub Date : 1997-09-01 DOI:10.1142/S0129053397000131

W. Johnston

{"title":"21世纪科学计算体系结构的基本原理和策略:使用商业对称多处理器作为超级计算机的案例","authors":"W. Johnston","doi":"10.1142/S0129053397000131","DOIUrl":null,"url":null,"abstract":"In this paper we argue that the next generation of supercomputers will be based on tight-knit clusters of symmetric multiprocessor systems in order to: (i) provide higher capacity at lower cost; (ii) enable easy future expansion, and (iii) ease the development of computational science applications. This strategy involves recognizing that the current vector supercomputer user community divides (roughly) into two groups, each of which will benefit from this approach: One, the \"capacity\" users (who tend to run production codes aimed at solving the science problems of today) will get better throughput than they do today by moving to large symmetric multiprocessor systems (SMPs), and a second group, the \"capability\" users (who tend to be developing new computational science techniques) will invest the time needed to get high performance from cluster-based parallel systems. In addition to the technology-based arguments for the strategy, we believe that it also supports a vision for a revitalization of scientific computing. This vision is that an architecture based on commodity components and computer science innovation will: (i) enable very scalable high performance computing to address the high-end computational science requirements; (ii) provide better throughput and a more productive code development environment for production supercomputing; (iii) provide a path to integration with the laboratory and experimental sciences, and (iv) be the basis of an on-going collaboration between the scientific community, the computing industry, and the research computer science community in order to provide a computing environment compatible with production codes and dynamically increasing in both hardware and software capability and capacity. We put forward the thesis that the current level of hardware performance and sophistication of the software environment found in commercial symmetric multiprocessor (SMP) systems, together with advances in distributed systems architectures, make clusters of SMPs one of the highest-performance, most cost-effective approaches to computing available today. The current capacity users of the C90-like system will be served in such an environment by having more of several critical resources than the current environment provides: much more CPU time per unit of real time, larger memory per node and much larger memory per cluster; and the capability users are served by an MPP-like performance and an architecture that enables continuous growth into the future. In addition to these primary arguments, secondary advantages of SMP clusters include: the ability to replicate this sort of system in smaller units to provide identical computing environments at the home sites and laboratories of scientific users; the future potential for using the global Internet for interconnecting large clusters at a central facility with smaller clusters at other sites to form a very high capability system; and a rapidly growing base of supporting commercial software. The arguments made to support this thesis are as follows: (1) Workstation vendors are increasingly turning their attention to parallelism in order to run increasingly complex software in their commercial product lines. The pace of development by the \"workstation\" manufacturers due to their very-large investment in research and development for hardware and software is so rapid that the special-purpose research aimed at just the high-performance market is no longer able to produce significant advantages over the mass-market products. We illustrate this trend and analyze its impact on the current performance of SMPs relative to vector supercomputers. (2) Several factors also suggest that \"clusters\" of SMPs will shortly out-perform traditional MPPs for reasons similar to those mentioned above. The mass-produced network architectures and components being used to interconnect SMP clusters are experiencing technology and capability growth trends similar to commodity computing systems. This is due to the economic drivers of the merging of computing and telecommunications technology, and the greatly increased demand for high bandwidth data communication. Very-high-speed general-purpose networks are now being produced for a large market, and the technology is experiencing the same kinds of rapid advances as workstation processor technology. The engineering required to build MPPs from special-purpose networks that are integrated in special ways with commercial microprocessors is costly and requires long engineering lead times. This results in delivered MPPs with less capable processors than are being delivered in workstations at the same time. (3) Commercial software now exists that provides integrated, MPP-style code development and system management for clusters of SMPs, and software architectures and components that will provide even more homogeneous views of clusters of SMPs are now emerging from several academic research groups. We propose that the next-generation scientific supercomputer center be built from clusters of SMPs, and suggest a strategy for an initial 50 Gflop configuration and incremental increases thereafter to reach a teraflop by just after the turn of the century. While this cluster uses what is called \"network of workstations\" technology, the individual nodes are, in and of themselves, powerful systems that typically have several gigaflops of CPU and several gigabytes of memory. The risks of this approach are analyzed, and found to be similar to those of MPPs. That is, the risks are primarily in software issues that are similar for SMPs and MPPs: namely, in the provision of a homogenous view of a distributed memory system. The argument is made that the capacity of today's large SMPs, taken together with already existing distributed systems software, will provide a versatile and powerful computational science environment. We also address the issues of application availability and code conversion to this new environment even if the homogeneous cluster software environment does not mature as quickly as expected. The throughput of the proposed SMP cluster architecture is substantial. The job mix is more easily load balanced because of the substantially greater memory size of the proposed cluster implementation as compared to a typical C90. The larger memory allows more jobs to be in the active schedule queue (in memory waiting to execute), and the larger \"local\" disk capacity of the cluster allows more data and results storage area for executing jobs.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Rationale and Strategy for a 21st Century Scientific Computing Architecture: the Case for Using Commercial Symmetric Multiprocessors as Supercomputers\",\"authors\":\"W. Johnston\",\"doi\":\"10.1142/S0129053397000131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we argue that the next generation of supercomputers will be based on tight-knit clusters of symmetric multiprocessor systems in order to: (i) provide higher capacity at lower cost; (ii) enable easy future expansion, and (iii) ease the development of computational science applications. This strategy involves recognizing that the current vector supercomputer user community divides (roughly) into two groups, each of which will benefit from this approach: One, the \\\"capacity\\\" users (who tend to run production codes aimed at solving the science problems of today) will get better throughput than they do today by moving to large symmetric multiprocessor systems (SMPs), and a second group, the \\\"capability\\\" users (who tend to be developing new computational science techniques) will invest the time needed to get high performance from cluster-based parallel systems. In addition to the technology-based arguments for the strategy, we believe that it also supports a vision for a revitalization of scientific computing. This vision is that an architecture based on commodity components and computer science innovation will: (i) enable very scalable high performance computing to address the high-end computational science requirements; (ii) provide better throughput and a more productive code development environment for production supercomputing; (iii) provide a path to integration with the laboratory and experimental sciences, and (iv) be the basis of an on-going collaboration between the scientific community, the computing industry, and the research computer science community in order to provide a computing environment compatible with production codes and dynamically increasing in both hardware and software capability and capacity. We put forward the thesis that the current level of hardware performance and sophistication of the software environment found in commercial symmetric multiprocessor (SMP) systems, together with advances in distributed systems architectures, make clusters of SMPs one of the highest-performance, most cost-effective approaches to computing available today. The current capacity users of the C90-like system will be served in such an environment by having more of several critical resources than the current environment provides: much more CPU time per unit of real time, larger memory per node and much larger memory per cluster; and the capability users are served by an MPP-like performance and an architecture that enables continuous growth into the future. In addition to these primary arguments, secondary advantages of SMP clusters include: the ability to replicate this sort of system in smaller units to provide identical computing environments at the home sites and laboratories of scientific users; the future potential for using the global Internet for interconnecting large clusters at a central facility with smaller clusters at other sites to form a very high capability system; and a rapidly growing base of supporting commercial software. The arguments made to support this thesis are as follows: (1) Workstation vendors are increasingly turning their attention to parallelism in order to run increasingly complex software in their commercial product lines. The pace of development by the \\\"workstation\\\" manufacturers due to their very-large investment in research and development for hardware and software is so rapid that the special-purpose research aimed at just the high-performance market is no longer able to produce significant advantages over the mass-market products. We illustrate this trend and analyze its impact on the current performance of SMPs relative to vector supercomputers. (2) Several factors also suggest that \\\"clusters\\\" of SMPs will shortly out-perform traditional MPPs for reasons similar to those mentioned above. The mass-produced network architectures and components being used to interconnect SMP clusters are experiencing technology and capability growth trends similar to commodity computing systems. This is due to the economic drivers of the merging of computing and telecommunications technology, and the greatly increased demand for high bandwidth data communication. Very-high-speed general-purpose networks are now being produced for a large market, and the technology is experiencing the same kinds of rapid advances as workstation processor technology. The engineering required to build MPPs from special-purpose networks that are integrated in special ways with commercial microprocessors is costly and requires long engineering lead times. This results in delivered MPPs with less capable processors than are being delivered in workstations at the same time. (3) Commercial software now exists that provides integrated, MPP-style code development and system management for clusters of SMPs, and software architectures and components that will provide even more homogeneous views of clusters of SMPs are now emerging from several academic research groups. We propose that the next-generation scientific supercomputer center be built from clusters of SMPs, and suggest a strategy for an initial 50 Gflop configuration and incremental increases thereafter to reach a teraflop by just after the turn of the century. While this cluster uses what is called \\\"network of workstations\\\" technology, the individual nodes are, in and of themselves, powerful systems that typically have several gigaflops of CPU and several gigabytes of memory. The risks of this approach are analyzed, and found to be similar to those of MPPs. That is, the risks are primarily in software issues that are similar for SMPs and MPPs: namely, in the provision of a homogenous view of a distributed memory system. The argument is made that the capacity of today's large SMPs, taken together with already existing distributed systems software, will provide a versatile and powerful computational science environment. We also address the issues of application availability and code conversion to this new environment even if the homogeneous cluster software environment does not mature as quickly as expected. The throughput of the proposed SMP cluster architecture is substantial. The job mix is more easily load balanced because of the substantially greater memory size of the proposed cluster implementation as compared to a typical C90. The larger memory allows more jobs to be in the active schedule queue (in memory waiting to execute), and the larger \\\"local\\\" disk capacity of the cluster allows more data and results storage area for executing jobs.\",\"PeriodicalId\":270006,\"journal\":{\"name\":\"Int. J. High Speed Comput.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. High Speed Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129053397000131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. High Speed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129053397000131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

在本文中，我们认为下一代超级计算机将基于对称多处理器系统的紧密集群，以便:(i)以更低的成本提供更高的容量;(ii)便于未来扩展，以及(iii)便于计算科学应用的开发。这个策略涉及到认识到当前的矢量超级计算机用户社区(大致)分为两组，每组都将从这种方法中受益:第一，“容量”用户(倾向于运行旨在解决当今科学问题的生产代码)通过迁移到大型对称多处理器系统(smp)将获得比现在更好的吞吐量，第二组，“能力”用户(倾向于开发新的计算科学技术)将投入所需的时间从基于集群的并行系统中获得高性能。除了基于技术的战略论点外，我们相信它还支持科学计算复兴的愿景。这一愿景是基于商品组件和计算机科学创新的体系结构将:(i)实现非常可扩展的高性能计算，以满足高端计算科学的需求;(ii)为生产超级计算提供更高的吞吐量和更高效的代码开发环境;(iii)提供与实验室和实验科学整合的途径，以及(iv)成为科学界，计算行业和研究计算机科学界之间持续合作的基础，以提供与生产代码兼容的计算环境，并动态增加硬件和软件的能力和容量。我们提出的论点是，在商业对称多处理器(SMP)系统中发现的当前硬件性能水平和软件环境的复杂性，以及分布式系统架构的进步，使SMP集群成为当今可用的最高性能，最具成本效益的计算方法之一。类c90系统的当前容量用户将在这样一个环境中得到服务，因为它拥有比当前环境提供的更多的几个关键资源:单位实时时间的CPU时间多得多，每个节点的内存更大，每个集群的内存更大;功能用户可以获得类似mpp的性能和架构，从而实现未来的持续增长。除了这些主要的论点之外，SMP集群的次要优势还包括:能够在更小的单元中复制这种系统，从而在科学用户的家庭站点和实验室中提供相同的计算环境;未来能否利用全球互联网将中央设施内的大型数据组与其他地点的小型数据组连接起来，形成一个非常高容量的系统;以及快速增长的支持商业软件基础。支持这一论点的论据如下:(1)工作站供应商越来越多地将注意力转向并行性，以便在其商业产品线中运行日益复杂的软件。由于“工作站”制造商在硬件和软件研发方面的巨大投资，其发展速度如此之快，以至于仅针对高性能市场的专用研究不再能够产生比大众市场产品显著的优势。我们说明了这一趋势，并分析了相对于矢量超级计算机，它对smp当前性能的影响。(2)几个因素也表明，由于与上述类似的原因，smp“集群”将很快超越传统的mpp。用于互连SMP集群的大规模生产的网络架构和组件正在经历类似于商品计算系统的技术和能力增长趋势。这是由于计算和电信技术融合的经济驱动力，以及对高带宽数据通信的需求大大增加。超高速通用网络目前正在为一个巨大的市场而生产，该技术正在经历与工作站处理器技术相同的快速发展。从以特殊方式与商业微处理器集成的专用网络中构建mpp所需的工程成本很高，并且需要很长的工程交付周期。这导致交付的mpp的处理器能力低于同时在工作站中交付的处理器。(3)现在存在的商业软件为smp集群提供了集成的、mpp风格的代码开发和系统管理，并且软件架构和组件将为smp集群提供更一致的视图，这些软件架构和组件现在正在几个学术研究小组中出现。我们建议下一代科学超级计算机中心由smp集群构建，并提出了一种策略，即初始配置为50 Gflop，然后逐步增加，以便在世纪之交之后达到teraflop。虽然这个集群使用的是所谓的“工作站网络”技术，但每个节点本身都是功能强大的系统，通常具有千兆次浮点运算的CPU和千兆字节的内存。分析了这种方法的风险，发现与mpp的风险相似。也就是说，风险主要存在于软件问题中，这与smp和mpp相似:也就是说，提供分布式内存系统的同质视图。有人认为，今天的大型smp的能力，加上已经存在的分布式系统软件，将提供一个多功能和强大的计算科学环境。我们还解决了应用程序可用性和代码转换到这个新环境的问题，即使同质集群软件环境没有像预期的那样迅速成熟。提出的SMP集群体系结构的吞吐量是可观的。作业组合更容易实现负载平衡，因为与典型的C90相比，所建议的集群实现的内存大小要大得多。更大的内存允许更多的作业处于活动调度队列中(在内存中等待执行)，而更大的集群“本地”磁盘容量允许更多的数据和结果存储区域用于执行作业。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rationale and Strategy for a 21st Century Scientific Computing Architecture: the Case for Using Commercial Symmetric Multiprocessors as Supercomputers

In this paper we argue that the next generation of supercomputers will be based on tight-knit clusters of symmetric multiprocessor systems in order to: (i) provide higher capacity at lower cost; (ii) enable easy future expansion, and (iii) ease the development of computational science applications. This strategy involves recognizing that the current vector supercomputer user community divides (roughly) into two groups, each of which will benefit from this approach: One, the "capacity" users (who tend to run production codes aimed at solving the science problems of today) will get better throughput than they do today by moving to large symmetric multiprocessor systems (SMPs), and a second group, the "capability" users (who tend to be developing new computational science techniques) will invest the time needed to get high performance from cluster-based parallel systems. In addition to the technology-based arguments for the strategy, we believe that it also supports a vision for a revitalization of scientific computing. This vision is that an architecture based on commodity components and computer science innovation will: (i) enable very scalable high performance computing to address the high-end computational science requirements; (ii) provide better throughput and a more productive code development environment for production supercomputing; (iii) provide a path to integration with the laboratory and experimental sciences, and (iv) be the basis of an on-going collaboration between the scientific community, the computing industry, and the research computer science community in order to provide a computing environment compatible with production codes and dynamically increasing in both hardware and software capability and capacity. We put forward the thesis that the current level of hardware performance and sophistication of the software environment found in commercial symmetric multiprocessor (SMP) systems, together with advances in distributed systems architectures, make clusters of SMPs one of the highest-performance, most cost-effective approaches to computing available today. The current capacity users of the C90-like system will be served in such an environment by having more of several critical resources than the current environment provides: much more CPU time per unit of real time, larger memory per node and much larger memory per cluster; and the capability users are served by an MPP-like performance and an architecture that enables continuous growth into the future. In addition to these primary arguments, secondary advantages of SMP clusters include: the ability to replicate this sort of system in smaller units to provide identical computing environments at the home sites and laboratories of scientific users; the future potential for using the global Internet for interconnecting large clusters at a central facility with smaller clusters at other sites to form a very high capability system; and a rapidly growing base of supporting commercial software. The arguments made to support this thesis are as follows: (1) Workstation vendors are increasingly turning their attention to parallelism in order to run increasingly complex software in their commercial product lines. The pace of development by the "workstation" manufacturers due to their very-large investment in research and development for hardware and software is so rapid that the special-purpose research aimed at just the high-performance market is no longer able to produce significant advantages over the mass-market products. We illustrate this trend and analyze its impact on the current performance of SMPs relative to vector supercomputers. (2) Several factors also suggest that "clusters" of SMPs will shortly out-perform traditional MPPs for reasons similar to those mentioned above. The mass-produced network architectures and components being used to interconnect SMP clusters are experiencing technology and capability growth trends similar to commodity computing systems. This is due to the economic drivers of the merging of computing and telecommunications technology, and the greatly increased demand for high bandwidth data communication. Very-high-speed general-purpose networks are now being produced for a large market, and the technology is experiencing the same kinds of rapid advances as workstation processor technology. The engineering required to build MPPs from special-purpose networks that are integrated in special ways with commercial microprocessors is costly and requires long engineering lead times. This results in delivered MPPs with less capable processors than are being delivered in workstations at the same time. (3) Commercial software now exists that provides integrated, MPP-style code development and system management for clusters of SMPs, and software architectures and components that will provide even more homogeneous views of clusters of SMPs are now emerging from several academic research groups. We propose that the next-generation scientific supercomputer center be built from clusters of SMPs, and suggest a strategy for an initial 50 Gflop configuration and incremental increases thereafter to reach a teraflop by just after the turn of the century. While this cluster uses what is called "network of workstations" technology, the individual nodes are, in and of themselves, powerful systems that typically have several gigaflops of CPU and several gigabytes of memory. The risks of this approach are analyzed, and found to be similar to those of MPPs. That is, the risks are primarily in software issues that are similar for SMPs and MPPs: namely, in the provision of a homogenous view of a distributed memory system. The argument is made that the capacity of today's large SMPs, taken together with already existing distributed systems software, will provide a versatile and powerful computational science environment. We also address the issues of application availability and code conversion to this new environment even if the homogeneous cluster software environment does not mature as quickly as expected. The throughput of the proposed SMP cluster architecture is substantial. The job mix is more easily load balanced because of the substantially greater memory size of the proposed cluster implementation as compared to a typical C90. The larger memory allows more jobs to be in the active schedule queue (in memory waiting to execute), and the larger "local" disk capacity of the cluster allows more data and results storage area for executing jobs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. High Speed Comput.

自引率

0.00%

发文量