通过学习多处理器推测并行化中的跨线程违规来消除挤压

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI:10.1109/HPCA.2002.995697

Marcelo H. Cintra, J. Torrellas

{"title":"通过学习多处理器推测并行化中的跨线程违规来消除挤压","authors":"Marcelo H. Cintra, J. Torrellas","doi":"10.1109/HPCA.2002.995697","DOIUrl":null,"url":null,"abstract":"With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. This paper proposes a new framework of hardware mechanisms to eliminate most squashes due to data dependences in multiprocessors. The framework works by learning and predicting violations, and applying delayed-disambiguation, value prediction, and stall and release. The framework is suited for directory-based multiprocessors that track memory accesses at the system level with the coarse granularity of memory lines. Simulations of a 16-processor machine show that the framework is very effective. By adding our framework to a speculative CC-NUMA with 64-byte memory lines, we speed-up applications by an average of 4.3 times. Moreover, the resulting system is even 23% faster than a machine that tracks memory accesses at the fine granularity of words-a sophisticated system that is not compatible with mainstream cache coherence protocols.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"332 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"90","resultStr":"{\"title\":\"Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors\",\"authors\":\"Marcelo H. Cintra, J. Torrellas\",\"doi\":\"10.1109/HPCA.2002.995697\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. This paper proposes a new framework of hardware mechanisms to eliminate most squashes due to data dependences in multiprocessors. The framework works by learning and predicting violations, and applying delayed-disambiguation, value prediction, and stall and release. The framework is suited for directory-based multiprocessors that track memory accesses at the system level with the coarse granularity of memory lines. Simulations of a 16-processor machine show that the framework is very effective. By adding our framework to a speculative CC-NUMA with 64-byte memory lines, we speed-up applications by an average of 4.3 times. Moreover, the resulting system is even 23% faster than a machine that tracks memory accesses at the fine granularity of words-a sophisticated system that is not compatible with mainstream cache coherence protocols.\",\"PeriodicalId\":408620,\"journal\":{\"name\":\"Proceedings Eighth International Symposium on High Performance Computer Architecture\",\"volume\":\"332 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-02-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"90\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Eighth International Symposium on High Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2002.995697\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eighth International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2002.995697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 90

摘要

通过推测线程级并行化，无法完全被编译器分析的代码将被并行执行。如果硬件检测到跨线程依赖冲突，它会将违规线程压扁并恢复执行。不幸的是，频繁的挤压会削弱性能。本文提出了一种新的硬件机制框架，以消除多处理器中由于数据依赖而产生的大多数压扁现象。该框架通过学习和预测违规，并应用延迟消歧、值预测以及暂停和释放来工作。该框架适用于基于目录的多处理器，这些多处理器在系统级别以粗粒度的内存行跟踪内存访问。在一台16处理器机器上的仿真表明，该框架是非常有效的。通过将我们的框架添加到具有64字节内存行的推测CC-NUMA中，我们将应用程序的速度平均提高了4.3倍。此外，由此产生的系统甚至比按单词粒度跟踪内存访问的机器还要快23%——这是一种与主流缓存一致性协议不兼容的复杂系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors

With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. This paper proposes a new framework of hardware mechanisms to eliminate most squashes due to data dependences in multiprocessors. The framework works by learning and predicting violations, and applying delayed-disambiguation, value prediction, and stall and release. The framework is suited for directory-based multiprocessors that track memory accesses at the system level with the coarse granularity of memory lines. Simulations of a 16-processor machine show that the framework is very effective. By adding our framework to a speculative CC-NUMA with 64-byte memory lines, we speed-up applications by an average of 4.3 times. Moreover, the resulting system is even 23% faster than a machine that tracks memory accesses at the fine granularity of words-a sophisticated system that is not compatible with mainstream cache coherence protocols.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings Eighth International Symposium on High Performance Computer Architecture

自引率

0.00%

发文量