{"title":"Rationale and Strategy for a 21st Century Scientific Computing Architecture: the Case for Using Commercial Symmetric Multiprocessors as Supercomputers","authors":"W. Johnston","doi":"10.1142/S0129053397000131","DOIUrl":null,"url":null,"abstract":"In this paper we argue that the next generation of supercomputers will be based on tight-knit clusters of symmetric multiprocessor systems in order to: (i) provide higher capacity at lower cost; (ii) enable easy future expansion, and (iii) ease the development of computational science applications. This strategy involves recognizing that the current vector supercomputer user community divides (roughly) into two groups, each of which will benefit from this approach: One, the \"capacity\" users (who tend to run production codes aimed at solving the science problems of today) will get better throughput than they do today by moving to large symmetric multiprocessor systems (SMPs), and a second group, the \"capability\" users (who tend to be developing new computational science techniques) will invest the time needed to get high performance from cluster-based parallel systems. In addition to the technology-based arguments for the strategy, we believe that it also supports a vision for a revitalization of scientific computing. This vision is that an architecture based on commodity components and computer science innovation will: (i) enable very scalable high performance computing to address the high-end computational science requirements; (ii) provide better throughput and a more productive code development environment for production supercomputing; (iii) provide a path to integration with the laboratory and experimental sciences, and (iv) be the basis of an on-going collaboration between the scientific community, the computing industry, and the research computer science community in order to provide a computing environment compatible with production codes and dynamically increasing in both hardware and software capability and capacity. We put forward the thesis that the current level of hardware performance and sophistication of the software environment found in commercial symmetric multiprocessor (SMP) systems, together with advances in distributed systems architectures, make clusters of SMPs one of the highest-performance, most cost-effective approaches to computing available today. The current capacity users of the C90-like system will be served in such an environment by having more of several critical resources than the current environment provides: much more CPU time per unit of real time, larger memory per node and much larger memory per cluster; and the capability users are served by an MPP-like performance and an architecture that enables continuous growth into the future. In addition to these primary arguments, secondary advantages of SMP clusters include: the ability to replicate this sort of system in smaller units to provide identical computing environments at the home sites and laboratories of scientific users; the future potential for using the global Internet for interconnecting large clusters at a central facility with smaller clusters at other sites to form a very high capability system; and a rapidly growing base of supporting commercial software. The arguments made to support this thesis are as follows: (1) Workstation vendors are increasingly turning their attention to parallelism in order to run increasingly complex software in their commercial product lines. The pace of development by the \"workstation\" manufacturers due to their very-large investment in research and development for hardware and software is so rapid that the special-purpose research aimed at just the high-performance market is no longer able to produce significant advantages over the mass-market products. We illustrate this trend and analyze its impact on the current performance of SMPs relative to vector supercomputers. (2) Several factors also suggest that \"clusters\" of SMPs will shortly out-perform traditional MPPs for reasons similar to those mentioned above. The mass-produced network architectures and components being used to interconnect SMP clusters are experiencing technology and capability growth trends similar to commodity computing systems. This is due to the economic drivers of the merging of computing and telecommunications technology, and the greatly increased demand for high bandwidth data communication. Very-high-speed general-purpose networks are now being produced for a large market, and the technology is experiencing the same kinds of rapid advances as workstation processor technology. The engineering required to build MPPs from special-purpose networks that are integrated in special ways with commercial microprocessors is costly and requires long engineering lead times. This results in delivered MPPs with less capable processors than are being delivered in workstations at the same time. (3) Commercial software now exists that provides integrated, MPP-style code development and system management for clusters of SMPs, and software architectures and components that will provide even more homogeneous views of clusters of SMPs are now emerging from several academic research groups. We propose that the next-generation scientific supercomputer center be built from clusters of SMPs, and suggest a strategy for an initial 50 Gflop configuration and incremental increases thereafter to reach a teraflop by just after the turn of the century. While this cluster uses what is called \"network of workstations\" technology, the individual nodes are, in and of themselves, powerful systems that typically have several gigaflops of CPU and several gigabytes of memory. The risks of this approach are analyzed, and found to be similar to those of MPPs. That is, the risks are primarily in software issues that are similar for SMPs and MPPs: namely, in the provision of a homogenous view of a distributed memory system. The argument is made that the capacity of today's large SMPs, taken together with already existing distributed systems software, will provide a versatile and powerful computational science environment. We also address the issues of application availability and code conversion to this new environment even if the homogeneous cluster software environment does not mature as quickly as expected. The throughput of the proposed SMP cluster architecture is substantial. The job mix is more easily load balanced because of the substantially greater memory size of the proposed cluster implementation as compared to a typical C90. The larger memory allows more jobs to be in the active schedule queue (in memory waiting to execute), and the larger \"local\" disk capacity of the cluster allows more data and results storage area for executing jobs.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. High Speed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129053397000131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
In this paper we argue that the next generation of supercomputers will be based on tight-knit clusters of symmetric multiprocessor systems in order to: (i) provide higher capacity at lower cost; (ii) enable easy future expansion, and (iii) ease the development of computational science applications. This strategy involves recognizing that the current vector supercomputer user community divides (roughly) into two groups, each of which will benefit from this approach: One, the "capacity" users (who tend to run production codes aimed at solving the science problems of today) will get better throughput than they do today by moving to large symmetric multiprocessor systems (SMPs), and a second group, the "capability" users (who tend to be developing new computational science techniques) will invest the time needed to get high performance from cluster-based parallel systems. In addition to the technology-based arguments for the strategy, we believe that it also supports a vision for a revitalization of scientific computing. This vision is that an architecture based on commodity components and computer science innovation will: (i) enable very scalable high performance computing to address the high-end computational science requirements; (ii) provide better throughput and a more productive code development environment for production supercomputing; (iii) provide a path to integration with the laboratory and experimental sciences, and (iv) be the basis of an on-going collaboration between the scientific community, the computing industry, and the research computer science community in order to provide a computing environment compatible with production codes and dynamically increasing in both hardware and software capability and capacity. We put forward the thesis that the current level of hardware performance and sophistication of the software environment found in commercial symmetric multiprocessor (SMP) systems, together with advances in distributed systems architectures, make clusters of SMPs one of the highest-performance, most cost-effective approaches to computing available today. The current capacity users of the C90-like system will be served in such an environment by having more of several critical resources than the current environment provides: much more CPU time per unit of real time, larger memory per node and much larger memory per cluster; and the capability users are served by an MPP-like performance and an architecture that enables continuous growth into the future. In addition to these primary arguments, secondary advantages of SMP clusters include: the ability to replicate this sort of system in smaller units to provide identical computing environments at the home sites and laboratories of scientific users; the future potential for using the global Internet for interconnecting large clusters at a central facility with smaller clusters at other sites to form a very high capability system; and a rapidly growing base of supporting commercial software. The arguments made to support this thesis are as follows: (1) Workstation vendors are increasingly turning their attention to parallelism in order to run increasingly complex software in their commercial product lines. The pace of development by the "workstation" manufacturers due to their very-large investment in research and development for hardware and software is so rapid that the special-purpose research aimed at just the high-performance market is no longer able to produce significant advantages over the mass-market products. We illustrate this trend and analyze its impact on the current performance of SMPs relative to vector supercomputers. (2) Several factors also suggest that "clusters" of SMPs will shortly out-perform traditional MPPs for reasons similar to those mentioned above. The mass-produced network architectures and components being used to interconnect SMP clusters are experiencing technology and capability growth trends similar to commodity computing systems. This is due to the economic drivers of the merging of computing and telecommunications technology, and the greatly increased demand for high bandwidth data communication. Very-high-speed general-purpose networks are now being produced for a large market, and the technology is experiencing the same kinds of rapid advances as workstation processor technology. The engineering required to build MPPs from special-purpose networks that are integrated in special ways with commercial microprocessors is costly and requires long engineering lead times. This results in delivered MPPs with less capable processors than are being delivered in workstations at the same time. (3) Commercial software now exists that provides integrated, MPP-style code development and system management for clusters of SMPs, and software architectures and components that will provide even more homogeneous views of clusters of SMPs are now emerging from several academic research groups. We propose that the next-generation scientific supercomputer center be built from clusters of SMPs, and suggest a strategy for an initial 50 Gflop configuration and incremental increases thereafter to reach a teraflop by just after the turn of the century. While this cluster uses what is called "network of workstations" technology, the individual nodes are, in and of themselves, powerful systems that typically have several gigaflops of CPU and several gigabytes of memory. The risks of this approach are analyzed, and found to be similar to those of MPPs. That is, the risks are primarily in software issues that are similar for SMPs and MPPs: namely, in the provision of a homogenous view of a distributed memory system. The argument is made that the capacity of today's large SMPs, taken together with already existing distributed systems software, will provide a versatile and powerful computational science environment. We also address the issues of application availability and code conversion to this new environment even if the homogeneous cluster software environment does not mature as quickly as expected. The throughput of the proposed SMP cluster architecture is substantial. The job mix is more easily load balanced because of the substantially greater memory size of the proposed cluster implementation as compared to a typical C90. The larger memory allows more jobs to be in the active schedule queue (in memory waiting to execute), and the larger "local" disk capacity of the cluster allows more data and results storage area for executing jobs.