Workshop on Memory Performance Issues最新文献

筛选
英文 中文
Memory coherence activity prediction in commercial workloads 商业工作负载中的内存一致性活动预测
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054949
Stephen Somogyi, T. Wenisch, N. Hardavellas, Jangwoo Kim, A. Ailamaki, B. Falsafi
{"title":"Memory coherence activity prediction in commercial workloads","authors":"Stephen Somogyi, T. Wenisch, N. Hardavellas, Jangwoo Kim, A. Ailamaki, B. Falsafi","doi":"10.1145/1054943.1054949","DOIUrl":"https://doi.org/10.1145/1054943.1054949","url":null,"abstract":"Recent research indicates that prediction-based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memory multiprocessors. Important commercial applications also show sensitivity to coherence latency, which will become more acute in the future as technology scales. Therefore it is important to investigate prediction of memory coherence activity in the context of commercial workloads.This paper studies a trace-based Downgrade Predictor (DGP) for predicting last stores to shared cache blocks, and a pattern-based Consumer Set Predictor (CSP) for predicting subsequent readers. We evaluate this class of predictors for the first time on commercial applications and demonstrate that our DGP correctly predicts 47%-76% of last stores. Memory sharing patterns in commercial workloads are inherently non-repetitive; hence CSP cannot attain high coverage. We perform an opportunity study of a DGP enhanced through competitive underlying predictors, and in commercial and scientific applications, demonstrate potential to increase coverage up to 14%.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132061515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Micro-architecture techniques in the intel® E8870 scalable memory controller 英特尔®E8870可扩展内存控制器中的微架构技术
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054948
F. Briggs, S. Chittor, Kai Cheng
{"title":"Micro-architecture techniques in the intel® E8870 scalable memory controller","authors":"F. Briggs, S. Chittor, Kai Cheng","doi":"10.1145/1054943.1054948","DOIUrl":"https://doi.org/10.1145/1054943.1054948","url":null,"abstract":"This paper describes several selected micro-architectural tradeoffs and optimizations for the scalable memory controller of the Intel E8870 chipset architecture. The Intel E8870 chipset architecture supports scalable coherent multiprocessor systems using 2 to 16 processors, and a point-to-point Scalability Port (SP) Protocol. The scalable memory controller micro-architecture applies a number of micro-architecture techniques to reduce the local & remote idle and loaded latencies. The performance optimizations were achieved within the constraints of maintaining functional correctness, while reducing implementation complexity and cost. High bandwidth point-to-point interconnects and distributed memory are expected to be more common in future platforms to support powerful multi-core processors. The selected techniques discussed in this paper will be applicable to scalable memory controllers needed in those platforms. These techniques have been proven for production systems for the Itanium® II Processor platforms.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132865243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort 关于预取和重用在减少L1数据缓存流量方面的有效性:Snort的案例研究
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054955
G. Surendra, Subhasish Banerjee, S. Nandy
{"title":"On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort","authors":"G. Surendra, Subhasish Banerjee, S. Nandy","doi":"10.1145/1054943.1054955","DOIUrl":"https://doi.org/10.1145/1054943.1054955","url":null,"abstract":"Reducing the number of data cache accesses improves performance, port efficiency, bandwidth and motivates the use of single ported caches instead of complex and expensive multi-ported ones. In this paper we consider an intrusion detection system as a target application and study the effectiveness of two techniques - (i) prefetching data from the cache into local buffers in the processor core and (ii) load Instruction Reuse (IR) - in reducing data cache traffic. The analysis is carried out using a microarchitecture and instruction set representative of a programmable processor with the aim of determining if the above techniques are viable for a programmable pattern matching engine found in many network processors. We find that IR is the most generic and efficient technique which reduces cache traffic by up to 60%. However, a combination of prefetching and IR with application specific tuning performs as well as and sometimes better than IR alone.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"201202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116486698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compiler-optimized usage of partitioned memories 分区内存的编译器优化使用
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054959
L. Wehmeyer, Urs Helmig, P. Marwedel
{"title":"Compiler-optimized usage of partitioned memories","authors":"L. Wehmeyer, Urs Helmig, P. Marwedel","doi":"10.1145/1054943.1054959","DOIUrl":"https://doi.org/10.1145/1054943.1054959","url":null,"abstract":"In order to meet the requirements concerning both performance and energy consumption in embedded systems, new memory architectures are being introduced. Beside the well-known use of caches in the memory hierarchy, processor cores today also include small onchip memories called scratchpad memories whose usage is not controlled by hardware, but rather by the programmer or the compiler. Techniques for utilization of these scratchpads have been known for some time. Some new processors provide more than one scratchpad, making it necessary to enhance the workflow such that this complex memory architecture can be efficiently utilized. In this work, we present an energy model and an ILP formulation to optimally assign memory objects to different partitions of scratchpad memories at compile time, achieving energy savings of up to 22% compared to previous approaches.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114282443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Understanding the effects of wrong-path memory references on processor performance 了解错误路径内存引用对处理器性能的影响
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054951
O. Mutlu, Hyesoon Kim, D. N. Armstrong, Y. Patt
{"title":"Understanding the effects of wrong-path memory references on processor performance","authors":"O. Mutlu, Hyesoon Kim, D. N. Armstrong, Y. Patt","doi":"10.1145/1054943.1054951","DOIUrl":"https://doi.org/10.1145/1054943.1054951","url":null,"abstract":"High-performance out-of-order processors spend a significant portion of their execution time on the incorrect program path even though they employ aggressive branch prediction algorithms. Although memory references generated on the wrong path do not change the architectural state of the processor, they can affect the arrangement of data in the memory hierarchy. This paper examines the effects of wrong-path memory references on processor performance. It is shown that these references significantly affect the IPC (Instructions Per Cycle) performance of a processor. Not modeling them can lead to errors of up to 10% in IPC estimates for the SPEC2000 integer benchmarks; 7 out of 12 benchmarks experience an error of greater than 2% in IPC estimates. In general, the error in the IPC increases with increasing memory latency and instruction window size.We find that wrong-path references are usually beneficial for performance, because they prefetch data that will be used by later correct-path references. L2 cache pollution is found to be the most significant negative effect of wrong-path references. Code examples are shown to provide insights into how wrong-path references affect performance.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131748325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A case for multi-level main memory 多级主存储器的一个实例
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054944
M. Ekman, P. Stenström
{"title":"A case for multi-level main memory","authors":"M. Ekman, P. Stenström","doi":"10.1145/1054943.1054944","DOIUrl":"https://doi.org/10.1145/1054943.1054944","url":null,"abstract":"Current trends suggest that the number of memory chips per processor chip will increase at least a factor of ten in seven years. This will make DRAM cost, the space and the power it consumes a serious problem. The main question raised in this research is how cost, size, and power consumption can be reduced by transforming traditional flat main-memory systems into a multi-level hierarchy. We make the case for a multi-level main memory hierarchy by proposing and evaluating the performance of an implementation that enables aggressive use of memory compression, sharing of memory resources among computers, and dynamic power management of unused regions of memory. This paper presents the key design strategies to make this happen. We evaluate our implementation using complete runs of applications from the Spec 2K suite, SpecJBB, and SAP --- typical desktop and server applications. We show that only 30% of the entire memory resources typically needed must be accessed at DRAM speed whereas the rest can be accessed at a speed that is a magnitude slower. The resulting performance overhead is shown to be only 1.2% on average.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115531761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Evaluating kilo-instruction multiprocessors 求千指令多处理器
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054953
M. Galluzzi, R. Beivide, Valentin Puente, J. Gregorio, A. Cristal, M. Valero
{"title":"Evaluating kilo-instruction multiprocessors","authors":"M. Galluzzi, R. Beivide, Valentin Puente, J. Gregorio, A. Cristal, M. Valero","doi":"10.1145/1054943.1054953","DOIUrl":"https://doi.org/10.1145/1054943.1054953","url":null,"abstract":"The ever increasing gap in processor and memory speeds has a very negative impact on performance. One possible solution to overcome this problem is the Kilo-instruction processor. It is a recent proposed architecture able to hide large memory latencies by having thousands of in-flight instructions. Current multiprocessor systems also have to deal with this increasing memory latency while facing other sources of latencies: those coming from communication among processors. What we propose, in this paper, is the use of Kilo-instruction processors as computing nodes for small-scale CCNUMA multiprocessors. We evaluate what we appropriately call Kilo-instruction Multiprocessors. This kind of systems appears to achieve very good performance while showing two interesting behaviours. First, the great amount of in-flight instructions makes the system not just to hide the latencies coming from the memory accesses but also the inherent communication latencies involved in remote memory accesses. Second, the significant pressure imposed by many in-flight instructions translates into a very high contention for the interconnection network, what indicates us that more efforts need to be employed in designing routers capable of managing high traffic levels.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122634974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A low-power memory hierarchy for a fully programmable baseband processor 用于完全可编程基带处理器的低功耗存储器层次结构
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054957
W. Raab, Hans-Martin Blüthgen, U. Ramacher
{"title":"A low-power memory hierarchy for a fully programmable baseband processor","authors":"W. Raab, Hans-Martin Blüthgen, U. Ramacher","doi":"10.1145/1054943.1054957","DOIUrl":"https://doi.org/10.1145/1054943.1054957","url":null,"abstract":"Future terminals for wireless communication not only must support multiple standards but execute several of them concurrently. To meet these requirements, flexibility and ease of programming of integrated circuits for digital baseband processing are increasingly important criteria for the deployment of such devices, while power consumption and area of the devices remain as critical as in the past.The paper presents the architecture of a fully programmable system-on-chip for digital signal processing in the baseband of contemporary and up-coming standards for wireless communication. Particular focus is given to the memory hierarchy of the multi-processor system and the measures to minimize the power it dissipates. The reduction of the power consumption of the entire chip is estimated to amount to 28% compared to a straightforward approach.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128683421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信