{"title":"Using idle workstations to implement predictive prefetching","authors":"Jasmine Y. Q. Wang, J. Ong, Y. Coady, M. Feeley","doi":"10.1109/HPDC.2000.868638","DOIUrl":null,"url":null,"abstract":"The benefits of Markov-based predictive prefetching have been largely overshadowed by the overhead required to produce high-quality predictions. While both theoretical and simulation results for prediction algorithms appear promising, substantial limitations exist in practice. This outcome can be partially attributed to the fact that practical implementations ultimately make compromises in order to reduce overhead. These compromises limit the level of algorithm complexity, the variety of access patterns and the granularity of trace data that the implementation supports. This paper describes the design and implementation of GMS-3P (Global Memory System with Parallel Predictive Prefetching), an operating system kernel extension that offloads prediction overhead to idle network nodes. GMS-3P builds on the GMS global memory system, which pages to and from remote workstation memory. In GMS-3P, the target node sends an online trace of an application's page faults to an idle node that is running a Markov-based prediction algorithm. The prediction node then uses GMS to prefetch pages to the target node from the memory of other workstations in the network. Our preliminary results show that predictive prefetching can reduce the remote-memory page fault time by 60% or more and that, by offloading prediction overhead to an idle node, GMS-3P can reduce this improved latency by between 24% and 44%, depending on the Markov model order.","PeriodicalId":400728,"journal":{"name":"Proceedings the Ninth International Symposium on High-Performance Distributed Computing","volume":"248 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings the Ninth International Symposium on High-Performance Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2000.868638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
The benefits of Markov-based predictive prefetching have been largely overshadowed by the overhead required to produce high-quality predictions. While both theoretical and simulation results for prediction algorithms appear promising, substantial limitations exist in practice. This outcome can be partially attributed to the fact that practical implementations ultimately make compromises in order to reduce overhead. These compromises limit the level of algorithm complexity, the variety of access patterns and the granularity of trace data that the implementation supports. This paper describes the design and implementation of GMS-3P (Global Memory System with Parallel Predictive Prefetching), an operating system kernel extension that offloads prediction overhead to idle network nodes. GMS-3P builds on the GMS global memory system, which pages to and from remote workstation memory. In GMS-3P, the target node sends an online trace of an application's page faults to an idle node that is running a Markov-based prediction algorithm. The prediction node then uses GMS to prefetch pages to the target node from the memory of other workstations in the network. Our preliminary results show that predictive prefetching can reduce the remote-memory page fault time by 60% or more and that, by offloading prediction overhead to an idle node, GMS-3P can reduce this improved latency by between 24% and 44%, depending on the Markov model order.
基于马尔可夫的预测预取的好处在很大程度上被产生高质量预测所需的开销所掩盖。虽然预测算法的理论和仿真结果都很有希望,但在实践中存在很大的局限性。这一结果可以部分归因于实际实现最终为了减少开销而做出妥协的事实。这些妥协限制了算法的复杂性、访问模式的多样性以及实现所支持的跟踪数据的粒度。本文描述了GMS-3P (Global Memory System with Parallel Predictive prefetch)的设计和实现,GMS-3P是一个操作系统内核扩展,可以将预测开销转移到空闲的网络节点上。GMS- 3p建立在GMS全局内存系统之上,该系统在远程工作站内存之间进行分页。在GMS-3P中,目标节点将应用程序页面错误的在线跟踪发送到正在运行基于markov的预测算法的空闲节点。然后,预测节点使用GMS从网络中其他工作站的内存中预取页面到目标节点。我们的初步结果表明,预测性预取可以将远程内存页面故障时间减少60%或更多,并且通过将预测开销卸载到空闲节点,GMS-3P可以将这种改进的延迟减少24%到44%,具体取决于马尔可夫模型的顺序。