Lluc Alvarez, Ramon Bertran Monfort, Marc González, X. Martorell, N. Navarro, E. Ayguadé
{"title":"cmp中主动核心复制方案的设计空间探索","authors":"Lluc Alvarez, Ramon Bertran Monfort, Marc González, X. Martorell, N. Navarro, E. Ayguadé","doi":"10.1145/1996130.1996169","DOIUrl":null,"url":null,"abstract":"Chip multiprocessors (CMPs) are the dominating architectures nowadays. There is a big variety of designs in current CMPs, with different number of cores and memory subsystems. This is because they are used in a wide spectrum of domains, each of them with their own design goals. This pa per studies different chip configurations in terms of number of cores, size of the shared L3 cache and off-chip bandwidth requirements in order to find what is the most efficient design for High Performance Computing applications. Results show that CMP schemes that reduce the shared L3 cache in order to make room for additional cores achieve speedups of up to 3.31x against a baseline architecture.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design space exploration for aggressive core replication schemes in CMPs\",\"authors\":\"Lluc Alvarez, Ramon Bertran Monfort, Marc González, X. Martorell, N. Navarro, E. Ayguadé\",\"doi\":\"10.1145/1996130.1996169\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chip multiprocessors (CMPs) are the dominating architectures nowadays. There is a big variety of designs in current CMPs, with different number of cores and memory subsystems. This is because they are used in a wide spectrum of domains, each of them with their own design goals. This pa per studies different chip configurations in terms of number of cores, size of the shared L3 cache and off-chip bandwidth requirements in order to find what is the most efficient design for High Performance Computing applications. Results show that CMP schemes that reduce the shared L3 cache in order to make room for additional cores achieve speedups of up to 3.31x against a baseline architecture.\",\"PeriodicalId\":330072,\"journal\":{\"name\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1996130.1996169\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1996130.1996169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Design space exploration for aggressive core replication schemes in CMPs
Chip multiprocessors (CMPs) are the dominating architectures nowadays. There is a big variety of designs in current CMPs, with different number of cores and memory subsystems. This is because they are used in a wide spectrum of domains, each of them with their own design goals. This pa per studies different chip configurations in terms of number of cores, size of the shared L3 cache and off-chip bandwidth requirements in order to find what is the most efficient design for High Performance Computing applications. Results show that CMP schemes that reduce the shared L3 cache in order to make room for additional cores achieve speedups of up to 3.31x against a baseline architecture.