The Interaction Of Software Prefetching With Ilp Processors In Shared-memory Systems

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI:10.1145/264107.264158

Parthasarathy Ranganathan, Vijay S. Pai, Hazim Abdel-Shafi, S. Adve

{"title":"The Interaction Of Software Prefetching With Ilp Processors In Shared-memory Systems","authors":"Parthasarathy Ranganathan, Vijay S. Pai, Hazim Abdel-Shafi, S. Adve","doi":"10.1145/264107.264158","DOIUrl":null,"url":null,"abstract":"Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. Recent work has shown that memory latency remains a significant performance bottleneck for shared-memory multiprocessor systems built of such processors.This paper provides the first study of the effectiveness of software-controlled non-binding prefetching in shared memory multiprocessors built of state-of-the-art ILP-based processors. We find that software prefetching results in significant reductions in execution time (12% to 31%) for three out of five applications on an ILP system. However, compared to previous-generation system, software prefetching is significantly less effective in reducing the memory stall component of execution time on an ILP system. Consequently, even after adding software prefetching, memory stall time accounts for over 30% of the total execution time in four out of five applications on our ILP system.This paper also investigates the interaction of software prefetching with memory consistency models on ILP-based multiprocessors. In particular, we seek to determine whether software prefetching can equalize the performance of sequential consistency (SC) and release consistency (RC). We find that even with software prefetching, for three out of five applications, RC provides a significant reduction in execution time (15% to 40%) compared to SC.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/264107.264158","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 42

Abstract

Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. Recent work has shown that memory latency remains a significant performance bottleneck for shared-memory multiprocessor systems built of such processors.This paper provides the first study of the effectiveness of software-controlled non-binding prefetching in shared memory multiprocessors built of state-of-the-art ILP-based processors. We find that software prefetching results in significant reductions in execution time (12% to 31%) for three out of five applications on an ILP system. However, compared to previous-generation system, software prefetching is significantly less effective in reducing the memory stall component of execution time on an ILP system. Consequently, even after adding software prefetching, memory stall time accounts for over 30% of the total execution time in four out of five applications on our ILP system.This paper also investigates the interaction of software prefetching with memory consistency models on ILP-based multiprocessors. In particular, we seek to determine whether software prefetching can equalize the performance of sequential consistency (SC) and release consistency (RC). We find that even with software prefetching, for three out of five applications, RC provides a significant reduction in execution time (15% to 40%) compared to SC.

查看原文本刊更多论文

共享内存系统中软件预取与Ilp处理器的交互

当前的微处理器通过多问题、动态调度和非阻塞读取等技术积极利用指令级并行性(ILP)。最近的研究表明，内存延迟仍然是由这些处理器构建的共享内存多处理器系统的一个重要性能瓶颈。本文首次研究了由最先进的基于ilp的处理器构建的共享内存多处理器中软件控制的非绑定预取的有效性。我们发现，对于ILP系统上的五分之三的应用程序，软件预取可以显著减少执行时间(12%到31%)。然而，与上一代系统相比，软件预取在减少ILP系统上执行时间的内存失速组件方面的效果要差得多。因此，即使在添加了软件预取之后，在我们的ILP系统上，有五分之四的应用程序的内存停顿时间占总执行时间的30%以上。本文还研究了基于ilp的多处理器上软件预取与内存一致性模型的相互作用。特别是，我们试图确定软件预取是否可以平衡顺序一致性(SC)和发布一致性(RC)的性能。我们发现，即使使用软件预取，对于五分之三的应用程序，RC与SC相比，显著减少了执行时间(15%到40%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量