ASPLOS XI最新文献

筛选
英文 中文
Scalable selective re-execution for EDGE architectures EDGE架构的可伸缩选择性重新执行
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1024393.1024408
R. Desikan, S. Sethumadhavan, D. Burger, S. Keckler
{"title":"Scalable selective re-execution for EDGE architectures","authors":"R. Desikan, S. Sethumadhavan, D. Burger, S. Keckler","doi":"10.1145/1024393.1024408","DOIUrl":"https://doi.org/10.1145/1024393.1024408","url":null,"abstract":"Pipeline flushes are becoming increasingly expensive in modern microprocessors with large instruction windows and deep pipelines. Selective re-execution is a technique that can reduce the penalty of mis-speculations by re-executing only instructions affected by the mis-speculation, instead of all instructions. In this paper we introduce a new selective re-execution mechanism that exploits the properties of a dataflow-like Explicit Data Graph Execution (EDGE) architecture to support efficient mis-speculation recovery, while scaling to window sizes of thousands of instructions with high performance. This distributed selective re-execution (DSRE) protocol permits multiple speculative waves of computation to be traversing a dataflow graph simultaneously, with a commit wave propagating behind them to ensure correct execution. We evaluate one application of this protocol to provide efficient recovery for load-store dependence speculation. Unlike traditional dataflow architectures which resorted to single-assignment memory semantics, the DSRE protocol combines dataflow execution with speculation to enable high performance and conventional sequential memory semantics. Our experiments show that the DSRE protocol results in an average 17% speedup over the best dependence predictor proposed to date, and obtains 82% of the performance possible with a perfect oracle directing the issue of loads.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123816017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Coherence decoupling: making use of incoherence 相干解耦:利用不相干
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1024393.1024406
Jaehyuk Huh, Jichuan Chang, D. Burger, G. Sohi
{"title":"Coherence decoupling: making use of incoherence","authors":"Jaehyuk Huh, Jichuan Chang, D. Burger, G. Sohi","doi":"10.1145/1024393.1024406","DOIUrl":"https://doi.org/10.1145/1024393.1024406","url":null,"abstract":"This paper explores a new technique called coherence decoupling, which breaks a traditional cache coherence protocol into two protocols: a Speculative Cache Lookup (SCL) protocol and a safe, backing coherence protocol. The SCL protocol produces a speculative load value, typically from an invalid cache line, permitting the processor to compute with incoherent data. In parallel, the coherence protocol obtains the necessary coherence permissions and the correct value. Eventually, the speculative use of the incoherent data can be verified against the coherent data. Thus, coherence decoupling can greatly reduce --- if not eliminate --- the effects of false sharing. Furthermore, coherence decoupling can also reduce latencies incurred by true sharing. SCL protocols reduce those latencies by speculatively writing updates into invalid lines, thereby increasing the accuracy of speculation, without complicating the simple, underlying coherence protocol that guarantees correctness.The performance benefits of coherence decoupling are evaluated using a full-system simulator and a mix of commercial and scientific benchmarks. Our results show that 40% to 90% of all coherence misses can be speculated correctly, and therefore their latencies partially or fully hidden. This capability results in performance improvements ranging from 3% to over 16%, in most cases where the latencies of coherence misses have an effect on performance.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128449534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
An ultra low-power processor for sensor networks 用于传感器网络的超低功耗处理器
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1024393.1024397
Virantha N. Ekanayake, IV ClintonKelly, R. Manohar
{"title":"An ultra low-power processor for sensor networks","authors":"Virantha N. Ekanayake, IV ClintonKelly, R. Manohar","doi":"10.1145/1024393.1024397","DOIUrl":"https://doi.org/10.1145/1024393.1024397","url":null,"abstract":"We present a novel processor architecture designed specifically for use in low-power wireless sensor-network nodes. Our sensor network asynchronous processor (SNAP/LE) is based on an asynchronous data-driven 16-bit RISC core with an extremely low-power idle state, and a wakeup response latency on the order of tens of nanoseconds. The processor instruction set is optimized for sensor-network applications, with support for event scheduling, pseudo-random number generation, bitfield operations, and radio/sensor interfaces. SNAP/LE has a hardware event queue and event coprocessors, which allow the processor to avoid the overhead of operating system software (such as task schedulers and external interrupt servicing), while still providing a straightforward programming interface to the designer. The processor can meet performance levels required for data monitoring applications while executing instructions with tens of picojoules of energy.We evaluate the energy consumption of SNAP/LE with several applications representative of the workload found in data-gathering wireless sensor networks. We compare our architecture and software against existing platforms for sensor networks, quantifying both the software and hardware benefits of our approach.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"241 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131550466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 137
Compiler orchestrated prefetching via speculation and predication 编译器通过推测和预测编排预取
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1024393.1024416
R. Rabbah, Hariharan Sandanagobalane, M. Ekpanyapong, W. Wong
{"title":"Compiler orchestrated prefetching via speculation and predication","authors":"R. Rabbah, Hariharan Sandanagobalane, M. Ekpanyapong, W. Wong","doi":"10.1145/1024393.1024416","DOIUrl":"https://doi.org/10.1145/1024393.1024416","url":null,"abstract":"This paper introduces a compiler orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. We focus the scope of the optimization on specific subsets of the program dependence graph that succinctly characterize the memory access pattern of both regular array-based applications and irregular pointer-intensive programs. We illustrate how program embedded precomputation via speculative execution can accurately predict and effectively prefetch future memory references with negligible overhead. The proposed techniques reduce the total running time of seven SPEC benchmarks and two OLDEN benchmarks by 27% on an Itanium 2 processor. The improvements are in addition to several state-of-the-art optimizations including software pipelining and data prefetching. In addition, we use cycle-accurate simulations to identify important and lightweight architectural innovations that further mitigate the memory system bottleneck. In particular, we focus on the notoriously challenging class of pointer-chasing applications, and demonstrate how they may benefit from a novel scheme of it sentineled prefetching. Our results for twelve SPEC benchmarks demonstrate that 45% of the processor stalls that are caused by the memory system are avoidable. The techniques in this paper can effectively mask long memory latencies with little instruction overhead, and can readily contribute to the performance of processors today.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128734015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Programming with transactional coherence and consistency (TCC) 使用事务一致性和一致性(TCC)进行编程
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1037949.1024395
Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Michael K. Chen, C. Kozyrakis, K. Olukotun
{"title":"Programming with transactional coherence and consistency (TCC)","authors":"Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Michael K. Chen, C. Kozyrakis, K. Olukotun","doi":"10.1145/1037949.1024395","DOIUrl":"https://doi.org/10.1145/1037949.1024395","url":null,"abstract":"Transactional Coherence and Consistency (TCC) offers a way to simplify parallel programming by executing all code within transactions. In TCC systems, transactions serve as the fundamental unit of parallel work, communication and coherence. As each transaction completes, it writes all of its newly produced state to shared memory atomically, while restarting other processors that have speculatively read stale data. With this mechanism, a TCC-based system automatically handles data synchronization correctly, without programmer intervention. To gain the benefits of TCC, programs must be decomposed into transactions. We describe two basic programming language constructs for decomposing programs into transactions, a loop conversion syntax and a general transaction-forking mechanism. With these constructs, writing correct parallel programs requires only small, incremental changes to correct sequential programs. The performance of these programs may then easily be optimized, based on feedback from real program execution, using a few simple techniques.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121153244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 145
HOIST: a system for automatically deriving static analyzers for embedded systems HOIST:用于自动生成嵌入式系统静态分析器的系统
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1024393.1024410
J. Regehr, A. Reid
{"title":"HOIST: a system for automatically deriving static analyzers for embedded systems","authors":"J. Regehr, A. Reid","doi":"10.1145/1024393.1024410","DOIUrl":"https://doi.org/10.1145/1024393.1024410","url":null,"abstract":"Embedded software must meet conflicting requirements such as be-ing highly reliable, running on resource-constrained platforms, and being developed rapidly. Static program analysis can help meet all of these goals. People developing analyzers for embedded object code face a difficult problem: writing an abstract version of each instruction in the target architecture(s). This is currently done by hand, resulting in abstract operations that are both buggy and im-precise. We have developed Hoist: a novel system that solves these problems by automatically constructing abstract operations using a microprocessor (or simulator) as its own specification. With almost no input from a human, Hoist generates a collection of C func-tions that are ready to be linked into an abstract interpreter. We demonstrate that Hoist generates abstract operations that are cor-rect, having been extensively tested, sufficiently fast, and substan-tially more precise than manually written abstract operations. Hoist is currently limited to eight-bit machines due to costs exponential in the word size of the target architecture. It is essential to be able to analyze software running on these small processors: they are important and ubiquitous, with many embedded and safety-critical systems being based on them.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129629058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Devirtualizable virtual machines enabling general, single-node, online maintenance 可反虚拟化的虚拟机,支持一般、单节点、在线维护
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1024393.1024419
David E. Lowell, Yasushi Saito, Eileen J. Samberg
{"title":"Devirtualizable virtual machines enabling general, single-node, online maintenance","authors":"David E. Lowell, Yasushi Saito, Eileen J. Samberg","doi":"10.1145/1024393.1024419","DOIUrl":"https://doi.org/10.1145/1024393.1024419","url":null,"abstract":"Maintenance is the dominant source of downtime at high availability sites. Unfortunately, the dominant mechanism for reducing this downtime, cluster rolling upgrade, has two shortcomings that have prevented its broad acceptance. First, cluster-style maintenance over many nodes is typically performed a few nodes at a time, mak-ing maintenance slow and often impractical. Second, cluster-style maintenance does not work on single-node systems, despite the fact that their unavailability during maintenance can be painful for organizations. In this paper, we propose a novel technique for online maintenance that uses virtual machines to provide maintenance on single nodes, allowing parallel maintenance over multiple nodes, and online maintenance for standalone servers. We present the Microvisor, our prototype virtual machine system that is custom tailored to the needs of online maintenance. Unlike general purpose virtual machine environments that induce continual 10-20% over-head, the Microvisor virtualizes the hardware only during periods of active maintenance, letting the guest OS run at full speed most of the time. Unlike past attempts at virtual machine optimization, we do not compromise OS transparency. We instead give up generality and tailor our virtual machine system to the minimum needs of online maintenance, eschewing features, such as I/O and memory virtualization, that it does not strictly require. The result is a very thin virtual machine system that induces only 5.6% CPU overhead when virtualizing the hardware, and zero CPU overhead when devirtualized. Using the Microvisor, we demonstrate an online OS upgrade on a live, single-node web server, reducing downtime from one hour to less than one minute.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129758835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
D-SPTF: decentralized request distribution in brick-based storage systems D-SPTF:基于砖的存储系统中的分散请求分发
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1024393.1024399
Christopher R. Lumb, Richard A. Golding
{"title":"D-SPTF: decentralized request distribution in brick-based storage systems","authors":"Christopher R. Lumb, Richard A. Golding","doi":"10.1145/1024393.1024399","DOIUrl":"https://doi.org/10.1145/1024393.1024399","url":null,"abstract":"Distributed Shortest-Positioning Time First (D-SPTF) is a request distribution protocol for decentralized systems of storage servers. D-SPTF exploits high-speed interconnects to dynamically select which server, among those with a replica, should service each read request. In doing so, it simultaneously balances load, exploits the aggregate cache capacity, and reduces positioning times for cache misses. For network latencies expected in storage clusters (e.g., 10--200μs), D-SPTF performs as well as would a hypothetical centralized system with the same collection of CPU, cache, and disk resources. Compared to popular decentralized approaches, D-SPTF achieves up to 65% higher throughput and adapts more cleanly to heterogenous server capabilities.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130816371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Spatial computation 空间计算
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1024393.1024396
M. Budiu, Girish Venkataramani, Tiberiu Chelcea, S. Goldstein
{"title":"Spatial computation","authors":"M. Budiu, Girish Venkataramani, Tiberiu Chelcea, S. Goldstein","doi":"10.1145/1024393.1024396","DOIUrl":"https://doi.org/10.1145/1024393.1024396","url":null,"abstract":"This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the expense of computation units.In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient.In this work we demonstrate three features of ASH: (1) that such architectures can be built by automatic compilation of C programs; (2) that distributed computation is in some respects fundamentally different from monolithic superscalar processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x worst-case).","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134111392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 150
Secure program execution via dynamic information flow tracking 通过动态信息流跟踪安全程序执行
ASPLOS XI Pub Date : 2004-10-07 DOI: 10.1145/1024393.1024404
Edward Suh, Jaewook Lee, Srini Devadas, David Zhang
{"title":"Secure program execution via dynamic information flow tracking","authors":"Edward Suh, Jaewook Lee, Srini Devadas, David Zhang","doi":"10.1145/1024393.1024404","DOIUrl":"https://doi.org/10.1145/1024393.1024404","url":null,"abstract":"We present a simple architectural mechanism called dynamic information flow tracking that can significantly improve the security of computing systems with negligible performance overhead. Dynamic information flow tracking protects programs against malicious software attacks by identifying spurious information flows from untrusted I/O and restricting the usage of the spurious information.Every security attack to take control of a program needs to transfer the program's control to malevolent code. In our approach, the operating system identifies a set of input channels as spurious, and the processor tracks all information flows from those inputs. A broad range of attacks are effectively defeated by checking the use of the spurious values as instructions and pointers.Our protection is transparent to users or application programmers; the executables can be used without any modification. Also, our scheme only incurs, on average, a memory overhead of 1.4% and a performance overhead of 1.1%.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134245097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 830
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信