K. Gharachorloo, D. Lenoski, J. Laudon, Phillip B. Gibbons, Anoop Gupta, J. Hennessy
{"title":"Memory consistency and event ordering in scalable shared-memory multiprocessors","authors":"K. Gharachorloo, D. Lenoski, J. Laudon, Phillip B. Gibbons, Anoop Gupta, J. Hennessy","doi":"10.1145/285930.285997","DOIUrl":"https://doi.org/10.1145/285930.285997","url":null,"abstract":"A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, with the discussion concentrating on issues relevant to scalable architectures.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114344582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers","authors":"N. Jouppi","doi":"10.1109/ISCA.1990.134547","DOIUrl":"https://doi.org/10.1109/ISCA.1990.134547","url":null,"abstract":"Hardware techniques for improving the performance of caches are presented. Miss caching places a small, fully associative cache between a cache and its refill path. Misses in the cache that hit in the miss cache have only a 1-cycle miss penalty. Small miss caches of 2 to 5 entries are shown to be very effective in removing mapping conflict misses in first-level direct-mapped caches. Victim caching is an improvement to miss caching in that it loads the small fully associative cache with the victim of a miss and not the requested line. Small victim caches of 1 to 5 entries are even more effective at removing conflict misses than miss caching. Stream buffers prefetch cache lines starting at a cache miss address. The prefetched data are placed in the buffer and not in the cache. Stream buffers are useful in removing capacity and compulsory cache misses, as well as some instruction cache conflict misses. An extension to the basic stream buffer, called a multiway stream buffer, is introduced.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116764427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new approach to fast control of r/sup 2/*r/sup 2/ 3-stage Benes networks of r*r crossbar switches","authors":"A. Youssef, B. Arden","doi":"10.1109/ISCA.1990.134507","DOIUrl":"https://doi.org/10.1109/ISCA.1990.134507","url":null,"abstract":"The authors introduce an approach to fast control of N*N three-stage Benes networks of r*r crossbar switches as building blocks. The approach consists of setting the leftmost column of switches to an appropriately chosen configuration so that the network becomes self-routed while still able to realize a given family of permutations. This approach requires that, for any given family of permutations, a configuration for the leftmost column be found. Such a family is called compatibles; and the configuration of the leftmost column is called the compatibility factor. Compatibility is characterized, and a technique to determine compatibility and the compatibility factor is developed and applied to Omega -realizable permutations, the permutations needed to emulate a hypercube, and the families of permutations required by FFT, bitonic sorting, tree computations, multidimensional mesh and torus computations, and multigrid computations. An O(log/sup 2/N) time routing algorithm for the three-stage Benes is also developed. Finally, since only three compatibility factors are required by the preceding families of permutations, it is proposed that the first column be replaced by three multiplexed connections yielding a self-routing network with strong communication capabilities.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127543284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}