{"title":"使用多路径执行处理TLS系统中的分支","authors":"Polychronis Xekalakis, Marcelo H. Cintra","doi":"10.1109/HPCA.2010.5416632","DOIUrl":null,"url":null,"abstract":"Thread-Level Speculation (TLS) has been proposed to facilitate the extraction of parallel threads from sequential applications. Most prior work on TLS has focused on architectural features directly related to supporting the main TLS operations. In this work we, instead, investigate how a common microarchitectural feature, namely branch prediction, interacts with TLS. We show that branch prediction for TLS is even more important than it is for sequential execution. Unfortunately, branch prediction for TLS systems is also inherently harder. Code partitioning and re-executions of squashed threads pollute the branch history making it harder for predictors to be accurate. We thus propose to augment the hardware, so as to accommodate Multi-Path Execution (MP) within the existing TLS protocol. Under the MP execution model, all paths following a number of hard-to-predict conditional branches are followed simultaneously. MP execution thus removes branches that would have been otherwise mispredicted, helping in this way the core to exploit more ILP. We show that, with only minimal hardware support, one can combine these two execution models into a unified one. Experimental results show that our combined execution model achieves speedups of up to 23.2%, with an average of 9.2%, over an existing state-of-the-art TLS system and speedups of up to 138 %, with an average of 28.2%, when compared with MP execution for a subset of the SPEC2000 Int benchmark suite.","PeriodicalId":368621,"journal":{"name":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Handling branches in TLS systems with Multi-Path Execution\",\"authors\":\"Polychronis Xekalakis, Marcelo H. Cintra\",\"doi\":\"10.1109/HPCA.2010.5416632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thread-Level Speculation (TLS) has been proposed to facilitate the extraction of parallel threads from sequential applications. Most prior work on TLS has focused on architectural features directly related to supporting the main TLS operations. In this work we, instead, investigate how a common microarchitectural feature, namely branch prediction, interacts with TLS. We show that branch prediction for TLS is even more important than it is for sequential execution. Unfortunately, branch prediction for TLS systems is also inherently harder. Code partitioning and re-executions of squashed threads pollute the branch history making it harder for predictors to be accurate. We thus propose to augment the hardware, so as to accommodate Multi-Path Execution (MP) within the existing TLS protocol. Under the MP execution model, all paths following a number of hard-to-predict conditional branches are followed simultaneously. MP execution thus removes branches that would have been otherwise mispredicted, helping in this way the core to exploit more ILP. We show that, with only minimal hardware support, one can combine these two execution models into a unified one. Experimental results show that our combined execution model achieves speedups of up to 23.2%, with an average of 9.2%, over an existing state-of-the-art TLS system and speedups of up to 138 %, with an average of 28.2%, when compared with MP execution for a subset of the SPEC2000 Int benchmark suite.\",\"PeriodicalId\":368621,\"journal\":{\"name\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"volume\":\"124 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2010.5416632\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2010.5416632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Handling branches in TLS systems with Multi-Path Execution
Thread-Level Speculation (TLS) has been proposed to facilitate the extraction of parallel threads from sequential applications. Most prior work on TLS has focused on architectural features directly related to supporting the main TLS operations. In this work we, instead, investigate how a common microarchitectural feature, namely branch prediction, interacts with TLS. We show that branch prediction for TLS is even more important than it is for sequential execution. Unfortunately, branch prediction for TLS systems is also inherently harder. Code partitioning and re-executions of squashed threads pollute the branch history making it harder for predictors to be accurate. We thus propose to augment the hardware, so as to accommodate Multi-Path Execution (MP) within the existing TLS protocol. Under the MP execution model, all paths following a number of hard-to-predict conditional branches are followed simultaneously. MP execution thus removes branches that would have been otherwise mispredicted, helping in this way the core to exploit more ILP. We show that, with only minimal hardware support, one can combine these two execution models into a unified one. Experimental results show that our combined execution model achieves speedups of up to 23.2%, with an average of 9.2%, over an existing state-of-the-art TLS system and speedups of up to 138 %, with an average of 28.2%, when compared with MP execution for a subset of the SPEC2000 Int benchmark suite.