Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture最新文献_第3页

Implementation of atomic primitives on distributed shared memory multiprocessors 分布式共享内存多处理器上原子原语的实现

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386540

Maged M. Michael, M. Scott

引用次数: 29

DASC cache

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386548

André Seznec

{"title":"DASC cache","authors":"André Seznec","doi":"10.1109/HPCA.1995.386548","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386548","url":null,"abstract":"For many microprocessors, cache hit time determines the clock cycle. On the other hand, cache miss penalty(measured in instruction issue delays) becomes higher and higher. Conciliating low cache miss ratio with low cache hit time is an important issue. When caches are virtually indexed, the operating system (or some specific hardware) has to manage data consistency of caches and memory. Unfortunately, conciliating physical indexing of the cache and low cache hit time is very difficult. In this paper, we propose the Direct-mapped Access Set-associative Check cache (DASC) for addressing both difficulties. On a DASC cache, the cache array is direct-mapped, so the cache hit time is low. However the tag array is set-associative and the external miss ratio on a DASC cache is the same as the miss ratio on a set-associative cache. When the size of an associativity degree of the tag array is tied to the minimum page size, a virtually indexed but physically tagged DASC cache correctly handles all difficulties associated with cache consistency. Trace driven simulations show that, for cache sizes in the range of 16 to 64 Kbytes and for page sizes in the range 4 to 8 Kbytes, a DASC cache is a valuable trade-off allowing fast cache hit time and low cache miss ratio while cache consistency management is performed by hardware.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125222499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Non-consistent dual register files to reduce register pressure 不一致的双寄存器文件减少寄存器压力

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386558

J. Llosa, M. Valero, E. Ayguadé

引用次数: 30

A VLSI architecture for computing the tree-to-tree distance 一种用于计算树到树距离的VLSI架构

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386530

R. Sastry, N. Ranganathan

{"title":"A VLSI architecture for computing the tree-to-tree distance","authors":"R. Sastry, N. Ranganathan","doi":"10.1109/HPCA.1995.386530","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386530","url":null,"abstract":"The distance between two labeled ordered trees, /spl alpha/ and /spl beta/ is the minimum cost sequence of editing operations (insertions, deletions and substitutions, needed to transform or into /spl beta/ such that the predecessor-descendant relation between nodes and the ordering of nodes is not changed). Approximate tree matching has applications in genetic sequence comparison, scene analysis, error recovery and correction in programming languages, and cluster analysis. Edit distance determination is a computationally intensive task, and the design of special purpose hardware could result in a significant speed up. This paper describes in detail a VLSI architecture for computing the edit distance between arbitrary ordered trees, based on a parallel, systolic realization of the dynamic programming algorithm proposed by S.Y. Lu (1979). This architecture represents a significant improvement over that described by Sastry and Ranganathan (1994), which restricted the type of trees that could be processed by it. Two partitioning strategies to process trees of arbitrary sizes and structures on a fixed size implementation in multiple passes are proposed and analyzed.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115573439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Massively parallel array processor for logic, fault, and design error simulation 用于逻辑、故障和设计错误仿真的大规模并行阵列处理器

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386529

Y. Hur, S. Szygenda, E. S. Fehr, G. Ott, Sungho Kang

引用次数: 2

U-cache: a cost-effective solution to synonym problem U-cache:一个高效的同义词问题解决方案

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386538

Jesung Kim, Sang Lyul Min, Sanghoon Jeon, ByoungChul Ahn, Deog-Kyoon Jeong, Chong-Sang Kim

引用次数: 2

Efficient and balanced adaptive routing in two-dimensional meshes 二维网格中高效均衡的自适应路由

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386550

Jatin Upadhyay, Vara Varavithya, P. Mohapatra

引用次数: 31

Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms 多目标虫洞k-ary n-cube网络中的快速屏障同步

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386542

D. Panda

引用次数: 49

Implementing register interlocks in parallel-pipeline, multiple instruction queue, superscalar processors 在并行流水线、多指令队列、超标量处理器中实现寄存器互锁

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386559

S. Weiss

引用次数: 5

Abstracting network characteristics and locality properties of parallel systems 抽象并行系统的网络特性和局部性

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI: 10.1109/HPCA.1995.386555

A. Sivasubramaniam, A. Singla, U. Ramachandran, H. Venkateswaran

{"title":"Abstracting network characteristics and locality properties of parallel systems","authors":"A. Sivasubramaniam, A. Singla, U. Ramachandran, H. Venkateswaran","doi":"10.1109/HPCA.1995.386555","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386555","url":null,"abstract":"Abstracting features of parallel systems is a technique that has been traditionally used in theoretical and analytical models for program development and performance evaluation. We explore the use of abstractions in execution-driven simulators in order to speed up simulation. In particular, we evaluate abstractions for the interconnection network and locality, properties of parallel systems in the context of simulating cache-coherent shared memory (CC-NUMA) multiprocessors. We use the recently proposed LogP model to abstract the network. We abstract locality by modeling a cache at each processing node in the system which is maintained coherent, without modeling the overheads associated with coherence maintenance. Such an abstraction tries to capture the true communication characteristics of the application without modeling any hardware induced artifacts. Using a suite of applications and three network topologies simulated on a novel simulation platform, we show that the latency overhead modeled by LogP is fairly accurate. On the other hand, the contention overhead can become pessimistic when the applications display sufficient communication locality. Our abstraction for data locality closely models the behavior of the target system over the chosen range of applications. The simulation model which incorporated these abstractions was around 250-300% faster than the simulation of the target machine.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127238684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10