On the Evaluation of Dense Chip-Multiprocessor Architectures

Francisco J. Villa, M. Acacio, José M. García
{"title":"On the Evaluation of Dense Chip-Multiprocessor Architectures","authors":"Francisco J. Villa, M. Acacio, José M. García","doi":"10.1109/ICSAMOS.2006.300804","DOIUrl":null,"url":null,"abstract":"Chip-multiprocessors (CMPs) have been revealed as the most promising way of making efficient use of current improvements in integration scale. Nowadays, commercial CMP releases integrate at most 8 processor cores onto the chip. However, 16 or more processor cores are expected to be offered in near future dense-CMP (D-CMP) systems. In this way, these architectures impose new design restrictions, and some topics, such as the cache-coherence problem, must be reviewed. In this paper we present an exhaustive performance evaluation of two recently proposed D-CMP architectures, making special emphasis on the solution to the cache-coherence problem that each one of them introduces. The shared bus fabric architecture (SBF) features a snoop cache-coherence protocol and is based on a high-performance bus fabric interconnection network. The second architecture follows a directory-based approach and integrates a bi-dimensional mesh as the interconnection network. Our results show that the performance achieved by the SBF architecture is hard-limited by the bandwidth restrictions of the bus fabric. On the other hand, the directory-based architecture outperforms the SBF one, but presents some performance inefficiencies due to the additional indirection that the directory structure stored in the L2 cache level introduces","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAMOS.2006.300804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Chip-multiprocessors (CMPs) have been revealed as the most promising way of making efficient use of current improvements in integration scale. Nowadays, commercial CMP releases integrate at most 8 processor cores onto the chip. However, 16 or more processor cores are expected to be offered in near future dense-CMP (D-CMP) systems. In this way, these architectures impose new design restrictions, and some topics, such as the cache-coherence problem, must be reviewed. In this paper we present an exhaustive performance evaluation of two recently proposed D-CMP architectures, making special emphasis on the solution to the cache-coherence problem that each one of them introduces. The shared bus fabric architecture (SBF) features a snoop cache-coherence protocol and is based on a high-performance bus fabric interconnection network. The second architecture follows a directory-based approach and integrates a bi-dimensional mesh as the interconnection network. Our results show that the performance achieved by the SBF architecture is hard-limited by the bandwidth restrictions of the bus fabric. On the other hand, the directory-based architecture outperforms the SBF one, but presents some performance inefficiencies due to the additional indirection that the directory structure stored in the L2 cache level introduces
密集芯片-多处理器体系结构的评价
芯片多处理器(cmp)已被揭示为最有希望有效利用当前集成规模改进的方法。如今,商用CMP版本最多在芯片上集成8个处理器内核。然而,在不久的将来,密集cmp (D-CMP)系统预计将提供16个或更多的处理器内核。通过这种方式,这些体系结构施加了新的设计限制,并且必须回顾一些主题,例如缓存一致性问题。在本文中,我们对最近提出的两种D-CMP架构进行了详尽的性能评估,特别强调了它们各自引入的缓存一致性问题的解决方案。共享总线结构(SBF)以snoop缓存一致性协议为特征,基于高性能总线结构互连网络。第二种体系结构采用基于目录的方法,并集成了一个二维网格作为互连网络。结果表明,SBF架构的性能受到总线结构带宽限制的限制。另一方面,基于目录的体系结构优于SBF体系结构,但是由于存储在L2缓存级别的目录结构引入了额外的间接性,因此存在一些性能低下的问题
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信