Workshop on Memory Performance Issues最新文献

筛选
英文 中文
Addressing mode driven low power data caches for embedded processors 用于嵌入式处理器的寻址模式驱动的低功耗数据缓存
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054961
R. Peri, John Fernando, R. Kolagotla
{"title":"Addressing mode driven low power data caches for embedded processors","authors":"R. Peri, John Fernando, R. Kolagotla","doi":"10.1145/1054943.1054961","DOIUrl":"https://doi.org/10.1145/1054943.1054961","url":null,"abstract":"The size and speed of first-level caches and SRAMs of embedded processors continue to increase in response to demands for higher performance. In power-sensitive devices like PDAs and cellular handsets, decreasing power consumption while increasing performance is desirable. Contemporary caches typically exploit locality in memory access patterns but do not exploit locality information encoded in addressing modes used to access memory. We present two schemes that use locality information inherent in memory addressing modes to reduce power consumption of cache or SRAM nearest to the processor. The level-0 data buffer scheme introduces a set of data buffers controlled by the addressing mode to eliminate over a third of all reads to the next level of memory (cache or SRAM). These buffers can also reduce load-use penalty in processors with long load pipelines. The address register tag-buffer scheme exploits the addressing mode to reduce tag array look-up in set associative first-level caches.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124957609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study of performance impact of memory controller features in multi-processor server environment 多处理器服务器环境下内存控制器特性对性能影响的研究
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054954
C. Natarajan, Bruce Christenson, F. Briggs
{"title":"A study of performance impact of memory controller features in multi-processor server environment","authors":"C. Natarajan, Bruce Christenson, F. Briggs","doi":"10.1145/1054943.1054954","DOIUrl":"https://doi.org/10.1145/1054943.1054954","url":null,"abstract":"With the growing imbalance between processor and memory performance it becomes more and more important to optimize the memory controller features to obtain the maximum possible performance out of the memory subsystem. This paper presents a study of the performance impact of several memory controller features in multi-processor (MP) server environments that use a DDR/DDR2 based memory subsystem. The results from our studies show that significant performance improvements can be obtained by carefully optimizing the memory controller features. For instance, one of our studies shows that in a system with an in-order shared bus connecting the CPUs and memory controller, an intelligent read-to-write switching memory controller feature can provide the same order of benefit as doubling the number of interleaved memory ranks. Another study shows that much lower average loaded read latency across a wider range of throughput can be obtained by a delayed write scheduling feature.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133344514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
The Opie compiler from row-major source to Morton-ordered matrices 从行为主源到莫顿有序矩阵的Opie编译器
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054962
Steven T. Gabriel, David S. Wise
{"title":"The Opie compiler from row-major source to Morton-ordered matrices","authors":"Steven T. Gabriel, David S. Wise","doi":"10.1145/1054943.1054962","DOIUrl":"https://doi.org/10.1145/1054943.1054962","url":null,"abstract":"The Opie Project aims to develop a compiler to transform C codes written for row-major matrix representation into equivalent codes for Morton-order matrix representation, and to apply its techniques to other languages. Accepting a possible reduction in performance we seek to compile a library of usable code to support future development of new algorithms better suited to Morton-ordered matrices.This paper reports the formalism behind the OPIE compiler for C, its status: now compiling several standard Level-2 and Level-3 linear algebra operations, and a demonstration of a breakthrough reflected in a huge reduction of L1, L2, TLB misses. Overall performance improves on the Intel Xeon architecture.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"47 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130807587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Cache organizations for clustered microarchitectures 集群微架构的缓存组织
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054950
José González, Fernando Latorre, Antonio González
{"title":"Cache organizations for clustered microarchitectures","authors":"José González, Fernando Latorre, Antonio González","doi":"10.1145/1054943.1054950","DOIUrl":"https://doi.org/10.1145/1054943.1054950","url":null,"abstract":"Clustered microarchitectures are an effective organization to deal with the problem of wire delays and complexity by partitioning some of the processor resources. The organization of the data cache is a key factor in these processors due to its effect on cache miss rate and inter-cluster communications. This paper investigates alternative designs of the data cache: centralized, distributed, replicated and physically distributed cache architectures are analyzed. Results show similar average performance but significant performance variations depending on the application features, specially cache miss ratio and communications. In addition, we also propose a novel instruction steering scheme in order to reduce communications. This scheme conditionally stalls the dispatch of instructions depending on the occupancy of the clusters, whenever the current instruction cannot be steered to the cluster holding most of the inputs. This new steering outperforms traditional schemes. Results show, an average speedup of 5% and up to 15% for some applications.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133599574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
An analytical model for software-only main memory compression 纯软件主存压缩的分析模型
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054958
I. Tuduce, T. Gross
{"title":"An analytical model for software-only main memory compression","authors":"I. Tuduce, T. Gross","doi":"10.1145/1054943.1054958","DOIUrl":"https://doi.org/10.1145/1054943.1054958","url":null,"abstract":"Many applications with large data spaces that cannot run on a typical workstation (due to page faults) call for techniques to expand the effective memory size. One such technique is memory compression.Understanding what applications under what conditions can benefit from main memory compression is complicated due to various tradeoffs and the dynamic characteristics of applications. For instance, a large area to store compressed data increases the effective memory size considerably but also decreases the amount of memory that can hold uncompressed data.This paper presents an analytical model that states the conditions for a compressed-memory system to yield performance improvements. Parameters of the model are the compression algorithm efficiency, the amount of data being compressed, and the application memory access pattern. Such a model can be used by an operating system to compute the size of the compressed-memory level that can improve an application's performance.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127418165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A low cost, multithreaded processing-in-memory system 一种低成本、多线程的内存处理系统
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054946
J. Brockman, Shyamkumar Thoziyoor, Shannon K. Kuntz, P. Kogge
{"title":"A low cost, multithreaded processing-in-memory system","authors":"J. Brockman, Shyamkumar Thoziyoor, Shannon K. Kuntz, P. Kogge","doi":"10.1145/1054943.1054946","DOIUrl":"https://doi.org/10.1145/1054943.1054946","url":null,"abstract":"This paper discusses die cost vs. performance tradeoffs for a PIM system that could serve as the memory system of a host processor. For an increase of less than twice the cost of a commodity DRAM part, it is possible to realize a performance speedup of nearly a factor of 4 on irregular applications. This cost efficiency derives from developing a custom multithreaded processor architecture and implementation style that is well-suited for embedding in a memory. Specifically, it takes advantage of the low latency and high row bandwidth to both simplify processor design --- reducing area --- as well as to improve processing throughput. To support our claims of cost and performance, we have used simulation, analysis of existing chips, and also designed and fully implemented a prototype chip, PIM Lite.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123655169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
A compressed memory hierarchy using an indirect index cache 使用间接索引缓存的压缩内存层次结构
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054945
Erik G. Hallnor, S. Reinhardt
{"title":"A compressed memory hierarchy using an indirect index cache","authors":"Erik G. Hallnor, S. Reinhardt","doi":"10.1145/1054943.1054945","DOIUrl":"https://doi.org/10.1145/1054943.1054945","url":null,"abstract":"The large and growing impact of memory hierarchies on overall system performance compels designers to investigate innovative techniques to improve memory-system efficiency. We propose and analyze a memory hierarchy that increases both the effective capacity of memory structures and the effective bandwidth of interconnects by storing and transmitting data in compressed form.Caches play a key role in hiding memory latencies. However, cache sizes are constrained by die area and cost. A cache's effective size can be increased by storing compressed data, if the storage unused by a compressed block can be allocated to other blocks. We use a modified Indirect Index Cache to allocate variable amounts of storage to different blocks, depending on their compressibility.By coupling our compressed cache design with a similarly compressed main memory, we can easily transfer data between these structures in a compressed state, increasing the effective memory bus bandwidth. This optimization further improves performance when bus bandwidth is critical.Our simulation results, using the SPEC CPU2000 benchmarks, show that our design increases performance by up to 225% on some benchmarks while degrading performance in general by no more than 2%, other than a 12% decrease on a single benchmark. Compressed bus transfers alone account for up to 80% of this improvement, with the remainder coming from increased effective cache capacity. As memory latencies increase, our design becomes even more beneficial.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123392756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
SCIMA-SMP: on-chip memory processor architecture for SMP SCIMA-SMP:用于SMP的片上存储器处理器架构
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054960
C. Takahashi, Masaaki Kondo, T. Boku, D. Takahashi, Hiroshi Nakamura, M. Sato
{"title":"SCIMA-SMP: on-chip memory processor architecture for SMP","authors":"C. Takahashi, Masaaki Kondo, T. Boku, D. Takahashi, Hiroshi Nakamura, M. Sato","doi":"10.1145/1054943.1054960","DOIUrl":"https://doi.org/10.1145/1054943.1054960","url":null,"abstract":"In this paper, we propose a processor architecture with programmable on-chip memory for a high-performance SMP (symmetric multi-processor) node named SCIMA-SMP (Software Controlled Integrated Memory Architecture for SMP) with the intent of solving the performance gap problem between a processor and off-chip memory. With special instructions which enable the explicit data transfer between on-chip memory and off-chip memory, this architecture is able to control the data transfer timing and its granularity by the application program, and the SMP bus is utilized efficiently compared with traditional cache-only architecture. Through the performance evaluation based on clock-level simulation for various HPC applications, we confirmed that this architecture largely reduces the bus access cycle by avoiding redundant data transfer and controlling the granularity of the data movement between on-chip and off-chip memory.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122293411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A localizing directory coherence protocol 本地化目录一致性协议
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054947
Collin McCurdy, C. Fischer
{"title":"A localizing directory coherence protocol","authors":"Collin McCurdy, C. Fischer","doi":"10.1145/1054943.1054947","DOIUrl":"https://doi.org/10.1145/1054943.1054947","url":null,"abstract":"User-controllable coherence revives the idea of cooperation between software and hardware in an attempt to bridge the gap between efficient small-scale shared memory machines and massive distributed memory machines. It proposes a new multiprocessor architecture which has both a global address-space and multiple processor-local address-spaces with new memory instructions and a new coherence protocol to manage the dual address-spaces.The purpose of this paper is twofold. First, we solidify the semantics of instruction set extensions that enable \"localization\" -- the act of moving data from the global address-space to a processor's local address-space -- thus clearly defining the requirements for a localizing coherence protocol. Second, we demonstrate the feasibility of localizing coherence by describing the workings of a full-scale directory-based protocol that we have implemented and tested using an existing protocol specification tool.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133592251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalable cache memory design for large-scale SMT architectures 大规模SMT架构的可扩展高速缓存设计
Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI: 10.1145/1054943.1054952
M. Mudawar
{"title":"Scalable cache memory design for large-scale SMT architectures","authors":"M. Mudawar","doi":"10.1145/1054943.1054952","DOIUrl":"https://doi.org/10.1145/1054943.1054952","url":null,"abstract":"The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for band-width. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This cache hierarchy has a high overhead due to the principle of containment. It also has a complex design to maintain cache coherence across all levels. Furthermore, this cache hierarchy is not suitable for future large-scale SMT processors, which will demand high bandwidth instruction and data caches with a large number of ports.This paper suggests the elimination of the cache hierarchy and replacing it with one-level caches for instruction and data. Multiple instruction caches can be used in parallel to scale the instruction fetch bandwidth and the overall cache capacity. A one-level data cache can be split into a number of block-interleaved cache banks to serve multiple memory requests in parallel. An interconnect is used to connect the data cache ports to the different cache banks, thus increasing the data cache access time. This paper shows that large-scale SMTs can tolerate long data cache hit times. It also shows that small line buffers can enhance the performance and reduce the required number of ports to the banked data cache memory.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"60 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信