ASPLOS VI最新文献_第3页

Compiler optimizations for improving data locality 用于改进数据局部性的编译器优化

ASPLOS VI Pub Date : 1994-11-01 DOI: 10.1145/195473.195557

S. Carr, K. McKinley, C. Tseng

{"title":"Compiler optimizations for improving data locality","authors":"S. Carr, K. McKinley, C. Tseng","doi":"10.1145/195473.195557","DOIUrl":"https://doi.org/10.1145/195473.195557","url":null,"abstract":"In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In this paper, we present compiler optimizations to improve data locality based on a simple yet accurate cost model. The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations. The cost model drives the application of compound transformations consisting of loop permutation, loop fusion, loop distribution, and loop reversal. We demonstrate that these program transformations are useful for optimizing many programs.\u0000To validate our optimization strategy, we implemented our algorithms and ran experiments on a large collection of scientific programs and kernels. Experiments with kernels illustrate that our model and algorithm can select and achieve the best performance. For over thirty complete applications, we executed the original and transformed versions and simulated cache hit rates. We collected statistics about the inherent characteristics of these programs and our ability to improve their data locality. To our knowledge, these studies are the first of such breadth and depth. We found performance improvements were difficult to achieve because benchmark programs typically have high hit rates even for small data caches; however, our optimizations significantly improved several programs.","PeriodicalId":140481,"journal":{"name":"ASPLOS VI","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131764068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 332

Interleaving: a multithreading technique targeting multiprocessors and workstations 交错:一种针对多处理器和工作站的多线程技术

ASPLOS VI Pub Date : 1994-11-01 DOI: 10.1145/195473.195576

J. Laudon, Anoop Gupta, M. Horowitz

引用次数: 127

Separating data and control transfer in distributed operating systems 在分布式操作系统中分离数据和控制传输

ASPLOS VI Pub Date : 1994-11-01 DOI: 10.1145/195473.195481

C. Thekkath, H. Levy, Ed Lazowska

{"title":"Separating data and control transfer in distributed operating systems","authors":"C. Thekkath, H. Levy, Ed Lazowska","doi":"10.1145/195473.195481","DOIUrl":"https://doi.org/10.1145/195473.195481","url":null,"abstract":"Advances in processor architecture and technology have resulted in workstations in the 100+ MIPS range. As well, newer local-area networks such as ATM promise a ten- to hundred-fold increase in throughput, much reduced latency, greater scalability, and greatly increased reliability, when compared to current LANs such as Ethernet.\u0000We believe that these new network and processor technologies will permit tighter coupling of distributed systems at the hardware level, and that distributed systems software should be designed to benefit from that tighter coupling. In this paper, we propose an alternative way of structuring distributed systems that takes advantage of a communication model based on remote network access (reads and writes) to protected memory segments.\u0000A key feature of the new structure, directly supported by the communication model, is the separation of data transfer and control transfer. This is in contrast to the structure of traditional distributed systems, which are typically organized using message passing or remote procedure call (RPC). In RPC-style systems, data and control are inextricably linked—all RPCs must transfer both data and control, even if the control transfer is unnecessary.\u0000We have implemented our model on DECstation hardware connected by an ATM network. We demonstrate how separating data transfer and control transfer can eliminate unnecessary control transfers and facilitate tighter coupling of the client and server. This has the potential to increase performance and reduce server load, which supports scaling in the face of an increasing number of clients. For example, for a small set of file server operations, our analysis shows a 50% decrease in server load when we switched from a communications mechanism requiring both control transfer and data transfer, to an alternative structure based on pure data transfer.","PeriodicalId":140481,"journal":{"name":"ASPLOS VI","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114841739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

Reducing branch costs via branch alignment 通过分支对齐减少分支成本

ASPLOS VI Pub Date : 1994-11-01 DOI: 10.1145/195473.195553

B. Calder, D. Grunwald

引用次数: 123

Reactive synchronization algorithms for multiprocessors 多处理器响应式同步算法

ASPLOS VI Pub Date : 1994-11-01 DOI: 10.1145/195473.195490

B. Lim

{"title":"Reactive synchronization algorithms for multiprocessors","authors":"B. Lim","doi":"10.1145/195473.195490","DOIUrl":"https://doi.org/10.1145/195473.195490","url":null,"abstract":"Synchronization algorithms that are efficient across a wide range of applications and operating conditions are hard to design because their performance depends on unpredictable run-time factors. The designer of a synchronization algorithm has a choice of protocols to use for implementing the synchronization operation. For example, candidate protocols for locks include test-and-set protocols and queueing protocols. Frequently, the best choice of protocols depends on the level of contention: previous research has shown that test-and-set protocols for locks outperform queueing protocols at low contention, while the opposite is true at high contention.\u0000This paper investigates reactive synchronization algorithms that dynamically choose protocols in response to the level of contention. We describe reactive algorithms for spin locks and fetch-and-op that choose among several shared-memory and message-passing protocols. Dynamically choosing protocols presents a challenge: a reactive algorithm needs to select and change protocols efficiently, and has to allow for the possibility that multiple processes may be executing different protocols at the same time. We describe the notion of consensus objects that the reactive algorithms use to preserve correctness in the face of dynamic protocol changes.\u0000Experimental measurements demonstrate that reactive algorithms perform close to the best static choice of protocols at all levels of contention. Furthermore, with mixed levels of contention, reactive algorithms outperform passive algorithms with fixed protocols, provided that contention levels do not change too frequently. Measurements of several parallel applications show that reactive algorithms result in modest performance gains for spin locks and significant gains for fetch-and-op.","PeriodicalId":140481,"journal":{"name":"ASPLOS VI","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114066536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 127

Avoiding conflict misses dynamically in large direct-mapped caches 在大型直接映射缓存中动态避免冲突缺失

ASPLOS VI Pub Date : 1994-11-01 DOI: 10.1145/195473.195527

B. Bershad, Dennis Lee, T. Romer, J. B. Chen

引用次数: 191