{"title":"Boosting CPU Performance using Pipelined Branch and Jump Folding Hardware with Turbo Module","authors":"Mong Tee Sim","doi":"10.1109/MCSoC51149.2021.00060","DOIUrl":null,"url":null,"abstract":"The new generation of embedded applications demands both high performance and energy efficiency. This paper presents a new hardware design to support architecture-level thread isolation, together with logics to fold the branch and jump instructions and a Turbo module, thereby reducing the overall number of instructions flowing through the CPU without causing any pipeline stalls. By pipelining the branch and jump folding logics from multiple threads of execution, the hardware can continuously operate at the peak CPU speed, with reduced power consumption by reducing the number of microcontrollers required in the system. We show that this novel technique can accelerate the system performance, increase the instruction per cycle up to 1.36, and with the Turbo module, up to 1.823, without requiring any extra programming effort by developers. We used the Dhrystone, Coremark, and ten selected benchmark metrics to validate the performance and functionality of our system.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC51149.2021.00060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The new generation of embedded applications demands both high performance and energy efficiency. This paper presents a new hardware design to support architecture-level thread isolation, together with logics to fold the branch and jump instructions and a Turbo module, thereby reducing the overall number of instructions flowing through the CPU without causing any pipeline stalls. By pipelining the branch and jump folding logics from multiple threads of execution, the hardware can continuously operate at the peak CPU speed, with reduced power consumption by reducing the number of microcontrollers required in the system. We show that this novel technique can accelerate the system performance, increase the instruction per cycle up to 1.36, and with the Turbo module, up to 1.823, without requiring any extra programming effort by developers. We used the Dhrystone, Coremark, and ten selected benchmark metrics to validate the performance and functionality of our system.