了解共享虚拟内存系统上的应用程序性能

L. Iftode, J. Singh, Kai Li
{"title":"了解共享虚拟内存系统上的应用程序性能","authors":"L. Iftode, J. Singh, Kai Li","doi":"10.1145/232973.232987","DOIUrl":null,"url":null,"abstract":"Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics.We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches---Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC)---with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at leant up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.","PeriodicalId":415354,"journal":{"name":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"80","resultStr":"{\"title\":\"Understanding Application Performance on Shared Virtual Memory Systems\",\"authors\":\"L. Iftode, J. Singh, Kai Li\",\"doi\":\"10.1145/232973.232987\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics.We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches---Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC)---with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at leant up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.\",\"PeriodicalId\":415354,\"journal\":{\"name\":\"23rd Annual International Symposium on Computer Architecture (ISCA'96)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1996-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"80\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"23rd Annual International Symposium on Computer Architecture (ISCA'96)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/232973.232987\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/232973.232987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 80

摘要

许多研究者为共享虚拟内存(SVM)系统提出了有趣的协议,并证明了并行程序的性能改进。然而,对于支持向量机系统在不同类别应用中的性能潜力,目前还没有明确的认识。本文通过对一系列应用程序的性能进行详细的研究,并结合应用程序的特点对其进行理解,从而开始填补这一空白。我们首先对应用程序中固有的数据共享模式进行了简要分类,以及它们如何与系统粒度交互以产生与支持向量机系统相关的通信模式。然后,我们使用详细的仿真来比较两种支持向量机方法——延迟发布一致性(LRC)和自动更新发布一致性(AURC)——彼此之间的性能以及与全硬件CC-NUMA方法的性能。我们将研究性能如何受到问题大小、机器大小、关键系统参数和使用优化程度较低的程序实现的影响。我们发现SVM确实可以在至少32个处理器的系统中执行一些重要的应用程序。然而,与CC-NUMA系统相比,应用程序之间的性能差异更大,获得良好并行性能所需的问题规模要大得多。在我们的系统假设下,硬件辅助的AURC系统往往比全软件的LRC表现得好得多,特别是在使用实际的缓存层次结构时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Understanding Application Performance on Shared Virtual Memory Systems
Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics.We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches---Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC)---with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at leant up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信