顺序处理器上的数据预取

2018 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2018-07-01 DOI:10.1109/HPCS.2018.00061

Cristobal Ortega, Victor Garcia, Miquel Moretó, Marc Casas, Roxana Rusitoru

{"title":"顺序处理器上的数据预取","authors":"Cristobal Ortega, Victor Garcia, Miquel Moretó, Marc Casas, Roxana Rusitoru","doi":"10.1109/HPCS.2018.00061","DOIUrl":null,"url":null,"abstract":"Low-power processors have attracted attention due to their energy-efficiency. A large market, such as the mobile one, relies on these processors for this very reason. Even High Performance Computing (HPC) systems are starting to consider low-power processors as a way to achieve exascale performance within 20MW, however, they must meet the right performance/Watt balance. Current low-power processors contain in-order cores, which cannot re-order instructions to avoid data dependency-induced stalls. Whilst this is useful to reduce the chip's total power consumption, it brings several challenges. Due to the evolving performance gap between memory and processor, memory is a significant bottleneck. In-order cores cannot re-order instructions and are memory latency bound, something data prefetching can help alleviate by ensuring data is readily available. In this work, we do an exhaustive analysis of available data prefetching techniques in state-of-the-art in-order cores. We analyze 5 static prefetchers and 2 dynamic aggressiveness and destination mechanisms applied to 3 data prefetchers on a set of HPC mini- and proxy-applications, whilst running on in-order processors. We show that next-line prefetching can achieve nearly top performance with a reasonable bandwidth consumption when throttled, whilst neighbor prefetchers have been found to be best, overall.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"48 33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Data Prefetching on In-order Processors\",\"authors\":\"Cristobal Ortega, Victor Garcia, Miquel Moretó, Marc Casas, Roxana Rusitoru\",\"doi\":\"10.1109/HPCS.2018.00061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Low-power processors have attracted attention due to their energy-efficiency. A large market, such as the mobile one, relies on these processors for this very reason. Even High Performance Computing (HPC) systems are starting to consider low-power processors as a way to achieve exascale performance within 20MW, however, they must meet the right performance/Watt balance. Current low-power processors contain in-order cores, which cannot re-order instructions to avoid data dependency-induced stalls. Whilst this is useful to reduce the chip's total power consumption, it brings several challenges. Due to the evolving performance gap between memory and processor, memory is a significant bottleneck. In-order cores cannot re-order instructions and are memory latency bound, something data prefetching can help alleviate by ensuring data is readily available. In this work, we do an exhaustive analysis of available data prefetching techniques in state-of-the-art in-order cores. We analyze 5 static prefetchers and 2 dynamic aggressiveness and destination mechanisms applied to 3 data prefetchers on a set of HPC mini- and proxy-applications, whilst running on in-order processors. We show that next-line prefetching can achieve nearly top performance with a reasonable bandwidth consumption when throttled, whilst neighbor prefetchers have been found to be best, overall.\",\"PeriodicalId\":308138,\"journal\":{\"name\":\"2018 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"48 33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2018.00061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2018.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

低功耗处理器因其高能效而备受关注。一个巨大的市场，比如移动市场，正是因为这个原因才依赖于这些处理器。即使是高性能计算(HPC)系统也开始考虑将低功耗处理器作为在20MW内实现百亿亿级性能的一种方式，然而，它们必须满足适当的性能/瓦特平衡。当前的低功耗处理器包含有序内核，它不能重新排序指令以避免数据依赖导致的停机。虽然这有助于降低芯片的总功耗，但也带来了一些挑战。由于内存和处理器之间不断发展的性能差距，内存是一个重要的瓶颈。有序内核不能重新排序指令，并且内存延迟受限，数据预取可以通过确保数据随时可用来帮助缓解这一点。在这项工作中，我们对最先进的顺序核中可用的数据预取技术进行了详尽的分析。我们分析了5个静态预取器和2个动态攻击和目标机制，这些机制应用于一组HPC迷你和代理应用程序上的3个数据预取器，同时运行在顺序处理器上。我们表明，当节流时，下一行预取可以在合理的带宽消耗下实现近乎最佳的性能，而邻居预取被发现是最好的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data Prefetching on In-order Processors

Low-power processors have attracted attention due to their energy-efficiency. A large market, such as the mobile one, relies on these processors for this very reason. Even High Performance Computing (HPC) systems are starting to consider low-power processors as a way to achieve exascale performance within 20MW, however, they must meet the right performance/Watt balance. Current low-power processors contain in-order cores, which cannot re-order instructions to avoid data dependency-induced stalls. Whilst this is useful to reduce the chip's total power consumption, it brings several challenges. Due to the evolving performance gap between memory and processor, memory is a significant bottleneck. In-order cores cannot re-order instructions and are memory latency bound, something data prefetching can help alleviate by ensuring data is readily available. In this work, we do an exhaustive analysis of available data prefetching techniques in state-of-the-art in-order cores. We analyze 5 static prefetchers and 2 dynamic aggressiveness and destination mechanisms applied to 3 data prefetchers on a set of HPC mini- and proxy-applications, whilst running on in-order processors. We show that next-line prefetching can achieve nearly top performance with a reasonable bandwidth consumption when throttled, whilst neighbor prefetchers have been found to be best, overall.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量