大型多处理器的工作集、缓存大小和节点粒度问题

Proceedings of the 20th Annual International Symposium on Computer Architecture Pub Date : 1993-05-01 DOI:10.1109/ISCA.1993.698542

E. Rothberg, J. Singh, Anoop Gupta

{"title":"大型多处理器的工作集、缓存大小和节点粒度问题","authors":"E. Rothberg, J. Singh, Anoop Gupta","doi":"10.1109/ISCA.1993.698542","DOIUrl":null,"url":null,"abstract":"The distribution of resources among processors, memory and caches is a crucial question faced by designers of large-scale parallel machines. If a machine is to solve problems with a certain data set size, should it be built with a large number of processors each with a small amount of memory, or a smaller number of processors each with a large amount of memory? How much cache memory should be provided per processor for cost-effectiveness? And how do these decisions change as larger problems are run on larger machines?\nIn this paper, we explore the above questions based on the characteristics of five important classes of large-scale parallel scientific applications. We first show that all the applications have a hierarchy of well-defined per-processor working sets, whose size, performance impact and scaling characteristics can help determine how large different levels of a multiprocessor's cache hierarchy should be. Then, we use these working sets together with certain other important characteristics of the applications—such as communication to computation ratios, concurrency, and load balancing behavior—to reflect upon the broader question of the granularity of processing nodes in high-performance multiprocessors.\nWe find that very small caches whose sizes do not increase with the problem or machine size are adequate for all but two of the application classes. Even in the two exceptions, the working sets scale quite slowly with problem size, and the cache sizes needed for problems that will be run in the foreseeable future are small. We also find that relatively fine-grained machines, with large numbers of processors and quite small amounts of memory per processor, are appropriate for all the applications.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"118","resultStr":"{\"title\":\"Working Sets, Cache Sizes, And Node Granularity Issues For Large-scale Multiprocessors\",\"authors\":\"E. Rothberg, J. Singh, Anoop Gupta\",\"doi\":\"10.1109/ISCA.1993.698542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The distribution of resources among processors, memory and caches is a crucial question faced by designers of large-scale parallel machines. If a machine is to solve problems with a certain data set size, should it be built with a large number of processors each with a small amount of memory, or a smaller number of processors each with a large amount of memory? How much cache memory should be provided per processor for cost-effectiveness? And how do these decisions change as larger problems are run on larger machines?\\nIn this paper, we explore the above questions based on the characteristics of five important classes of large-scale parallel scientific applications. We first show that all the applications have a hierarchy of well-defined per-processor working sets, whose size, performance impact and scaling characteristics can help determine how large different levels of a multiprocessor's cache hierarchy should be. Then, we use these working sets together with certain other important characteristics of the applications—such as communication to computation ratios, concurrency, and load balancing behavior—to reflect upon the broader question of the granularity of processing nodes in high-performance multiprocessors.\\nWe find that very small caches whose sizes do not increase with the problem or machine size are adequate for all but two of the application classes. Even in the two exceptions, the working sets scale quite slowly with problem size, and the cache sizes needed for problems that will be run in the foreseeable future are small. We also find that relatively fine-grained machines, with large numbers of processors and quite small amounts of memory per processor, are appropriate for all the applications.\",\"PeriodicalId\":410022,\"journal\":{\"name\":\"Proceedings of the 20th Annual International Symposium on Computer Architecture\",\"volume\":\"145 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1993-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"118\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th Annual International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCA.1993.698542\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCA.1993.698542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 118

摘要

资源在处理器、内存和缓存之间的分配是大型并行机设计者面临的一个关键问题。如果一台机器要解决具有一定数据集大小的问题，那么它应该使用大量的处理器，每个处理器都有少量的内存，还是使用较少的处理器，每个处理器都有大量的内存?为了节省成本，每个处理器应该提供多少高速缓存?当更大的问题在更大的机器上运行时，这些决策是如何变化的?本文根据五类重要的大规模并行科学应用的特点，对上述问题进行了探讨。我们首先展示了所有应用程序都有一个定义良好的每处理器工作集的层次结构，其大小、性能影响和可伸缩性特征可以帮助确定多处理器缓存层次结构的不同级别应该有多大。然后，我们将这些工作集与应用程序的某些其他重要特征(例如通信与计算比、并发性和负载平衡行为)一起使用，以反映高性能多处理器中处理节点粒度这一更广泛的问题。我们发现，非常小的缓存(其大小不随问题或机器大小而增加)对于除了两个应用程序类之外的所有应用程序类都足够了。即使在这两个例外中，随着问题的大小，工作集的扩展速度也相当缓慢，而且在可预见的将来运行的问题所需的缓存大小也很小。我们还发现，具有大量处理器和每个处理器相当小的内存的相对细粒度的机器适合所有应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Working Sets, Cache Sizes, And Node Granularity Issues For Large-scale Multiprocessors

The distribution of resources among processors, memory and caches is a crucial question faced by designers of large-scale parallel machines. If a machine is to solve problems with a certain data set size, should it be built with a large number of processors each with a small amount of memory, or a smaller number of processors each with a large amount of memory? How much cache memory should be provided per processor for cost-effectiveness? And how do these decisions change as larger problems are run on larger machines? In this paper, we explore the above questions based on the characteristics of five important classes of large-scale parallel scientific applications. We first show that all the applications have a hierarchy of well-defined per-processor working sets, whose size, performance impact and scaling characteristics can help determine how large different levels of a multiprocessor's cache hierarchy should be. Then, we use these working sets together with certain other important characteristics of the applications—such as communication to computation ratios, concurrency, and load balancing behavior—to reflect upon the broader question of the granularity of processing nodes in high-performance multiprocessors. We find that very small caches whose sizes do not increase with the problem or machine size are adequate for all but two of the application classes. Even in the two exceptions, the working sets scale quite slowly with problem size, and the cache sizes needed for problems that will be run in the foreseeable future are small. We also find that relatively fine-grained machines, with large numbers of processors and quite small amounts of memory per processor, are appropriate for all the applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量