{"title":"改进多线程处理器上的单线程读取性能","authors":"J. Moure, R. B. García, Dolores Rexachs, E. Luque","doi":"10.1109/DSD.2001.952344","DOIUrl":null,"url":null,"abstract":"Multithreaded processors, by simultaneously using both the thread-level parallelism and the instruction-level parallelism of applications, achieve larger instruction per cycle rate than single-thread processors. On a multi-thread workload, a clustered organization maximizes performances. On a single-thread workload, however, all but one of the clusters are idle, degrading single-thread performance significantly. Using a clustered multi-thread performance as a baseline, we propose and analyze several mechanisms and policies to improve single-thread execution exploiting the existing hardware without a significant multi-thread performance loss. We focus on the fetch unit, which is maybe the most performance-critical stage. Essentially, we analyze three ways of exploiting the idle fetch clusters: allowing a single thread accessing its neighbor clusters, use the idle fetch clusters to provide multiple-path execution, or use them to widen the effective single-three fetch block.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving single-thread fetch performance on a multithreaded processor\",\"authors\":\"J. Moure, R. B. García, Dolores Rexachs, E. Luque\",\"doi\":\"10.1109/DSD.2001.952344\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multithreaded processors, by simultaneously using both the thread-level parallelism and the instruction-level parallelism of applications, achieve larger instruction per cycle rate than single-thread processors. On a multi-thread workload, a clustered organization maximizes performances. On a single-thread workload, however, all but one of the clusters are idle, degrading single-thread performance significantly. Using a clustered multi-thread performance as a baseline, we propose and analyze several mechanisms and policies to improve single-thread execution exploiting the existing hardware without a significant multi-thread performance loss. We focus on the fetch unit, which is maybe the most performance-critical stage. Essentially, we analyze three ways of exploiting the idle fetch clusters: allowing a single thread accessing its neighbor clusters, use the idle fetch clusters to provide multiple-path execution, or use them to widen the effective single-three fetch block.\",\"PeriodicalId\":285358,\"journal\":{\"name\":\"Proceedings Euromicro Symposium on Digital Systems Design\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Euromicro Symposium on Digital Systems Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSD.2001.952344\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Euromicro Symposium on Digital Systems Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2001.952344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving single-thread fetch performance on a multithreaded processor
Multithreaded processors, by simultaneously using both the thread-level parallelism and the instruction-level parallelism of applications, achieve larger instruction per cycle rate than single-thread processors. On a multi-thread workload, a clustered organization maximizes performances. On a single-thread workload, however, all but one of the clusters are idle, degrading single-thread performance significantly. Using a clustered multi-thread performance as a baseline, we propose and analyze several mechanisms and policies to improve single-thread execution exploiting the existing hardware without a significant multi-thread performance loss. We focus on the fetch unit, which is maybe the most performance-critical stage. Essentially, we analyze three ways of exploiting the idle fetch clusters: allowing a single thread accessing its neighbor clusters, use the idle fetch clusters to provide multiple-path execution, or use them to widen the effective single-three fetch block.