密集芯片-多处理器体系结构的评价

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-17 DOI:10.1109/ICSAMOS.2006.300804

Francisco J. Villa, M. Acacio, José M. García

{"title":"密集芯片-多处理器体系结构的评价","authors":"Francisco J. Villa, M. Acacio, José M. García","doi":"10.1109/ICSAMOS.2006.300804","DOIUrl":null,"url":null,"abstract":"Chip-multiprocessors (CMPs) have been revealed as the most promising way of making efficient use of current improvements in integration scale. Nowadays, commercial CMP releases integrate at most 8 processor cores onto the chip. However, 16 or more processor cores are expected to be offered in near future dense-CMP (D-CMP) systems. In this way, these architectures impose new design restrictions, and some topics, such as the cache-coherence problem, must be reviewed. In this paper we present an exhaustive performance evaluation of two recently proposed D-CMP architectures, making special emphasis on the solution to the cache-coherence problem that each one of them introduces. The shared bus fabric architecture (SBF) features a snoop cache-coherence protocol and is based on a high-performance bus fabric interconnection network. The second architecture follows a directory-based approach and integrates a bi-dimensional mesh as the interconnection network. Our results show that the performance achieved by the SBF architecture is hard-limited by the bandwidth restrictions of the bus fabric. On the other hand, the directory-based architecture outperforms the SBF one, but presents some performance inefficiencies due to the additional indirection that the directory structure stored in the L2 cache level introduces","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"On the Evaluation of Dense Chip-Multiprocessor Architectures\",\"authors\":\"Francisco J. Villa, M. Acacio, José M. García\",\"doi\":\"10.1109/ICSAMOS.2006.300804\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chip-multiprocessors (CMPs) have been revealed as the most promising way of making efficient use of current improvements in integration scale. Nowadays, commercial CMP releases integrate at most 8 processor cores onto the chip. However, 16 or more processor cores are expected to be offered in near future dense-CMP (D-CMP) systems. In this way, these architectures impose new design restrictions, and some topics, such as the cache-coherence problem, must be reviewed. In this paper we present an exhaustive performance evaluation of two recently proposed D-CMP architectures, making special emphasis on the solution to the cache-coherence problem that each one of them introduces. The shared bus fabric architecture (SBF) features a snoop cache-coherence protocol and is based on a high-performance bus fabric interconnection network. The second architecture follows a directory-based approach and integrates a bi-dimensional mesh as the interconnection network. Our results show that the performance achieved by the SBF architecture is hard-limited by the bandwidth restrictions of the bus fabric. On the other hand, the directory-based architecture outperforms the SBF one, but presents some performance inefficiencies due to the additional indirection that the directory structure stored in the L2 cache level introduces\",\"PeriodicalId\":204190,\"journal\":{\"name\":\"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAMOS.2006.300804\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAMOS.2006.300804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

芯片多处理器(cmp)已被揭示为最有希望有效利用当前集成规模改进的方法。如今，商用CMP版本最多在芯片上集成8个处理器内核。然而，在不久的将来，密集cmp (D-CMP)系统预计将提供16个或更多的处理器内核。通过这种方式，这些体系结构施加了新的设计限制，并且必须回顾一些主题，例如缓存一致性问题。在本文中，我们对最近提出的两种D-CMP架构进行了详尽的性能评估，特别强调了它们各自引入的缓存一致性问题的解决方案。共享总线结构(SBF)以snoop缓存一致性协议为特征，基于高性能总线结构互连网络。第二种体系结构采用基于目录的方法，并集成了一个二维网格作为互连网络。结果表明，SBF架构的性能受到总线结构带宽限制的限制。另一方面，基于目录的体系结构优于SBF体系结构，但是由于存储在L2缓存级别的目录结构引入了额外的间接性，因此存在一些性能低下的问题

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Evaluation of Dense Chip-Multiprocessor Architectures

Chip-multiprocessors (CMPs) have been revealed as the most promising way of making efficient use of current improvements in integration scale. Nowadays, commercial CMP releases integrate at most 8 processor cores onto the chip. However, 16 or more processor cores are expected to be offered in near future dense-CMP (D-CMP) systems. In this way, these architectures impose new design restrictions, and some topics, such as the cache-coherence problem, must be reviewed. In this paper we present an exhaustive performance evaluation of two recently proposed D-CMP architectures, making special emphasis on the solution to the cache-coherence problem that each one of them introduces. The shared bus fabric architecture (SBF) features a snoop cache-coherence protocol and is based on a high-performance bus fabric interconnection network. The second architecture follows a directory-based approach and integrates a bi-dimensional mesh as the interconnection network. Our results show that the performance achieved by the SBF architecture is hard-limited by the bandwidth restrictions of the bus fabric. On the other hand, the directory-based architecture outperforms the SBF one, but presents some performance inefficiencies due to the additional indirection that the directory structure stored in the L2 cache level introduces

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation

自引率

0.00%

发文量