Proceedings of the 19th international conference on Architectural support for programming languages and operating systems最新文献_第3页

Session details: Parallelism II 会话细节:并行II

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/3260931

B. Falsafi

引用次数: 0

Sapper: a language for hardware-level security policy enforcement Sapper:用于硬件级安全策略实施的语言

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541947

Xun Li, Vineeth Kashyap, J. Oberg, Mohit Tiwari, Vasanth Ram Rajarathinam, R. Kastner, T. Sherwood, B. Hardekopf, F. Chong

{"title":"Sapper: a language for hardware-level security policy enforcement","authors":"Xun Li, Vineeth Kashyap, J. Oberg, Mohit Tiwari, Vasanth Ram Rajarathinam, R. Kastner, T. Sherwood, B. Hardekopf, F. Chong","doi":"10.1145/2541940.2541947","DOIUrl":"https://doi.org/10.1145/2541940.2541947","url":null,"abstract":"Privacy and integrity are important security concerns. These concerns are addressed by controlling information flow, i.e., restricting how information can flow through a system. Most proposed systems that restrict information flow make the implicit assumption that the hardware used by the system is fully ``correct'' and that the hardware's instruction set accurately describes its behavior in all circumstances. The truth is more complicated: modern hardware designs defy complete verification; many aspects of the timing and ordering of events are left totally unspecified; and implementation bugs present themselves with surprising frequency. In this work we describe Sapper, a novel hardware description language for designing security-critical hardware components. Sapper seeks to address these problems by using static analysis at compile-time to automatically insert dynamic checks in the resulting hardware that provably enforce a given information flow policy at execution time. We present Sapper's design and formal semantics along with a proof sketch of its security. In addition, we have implemented a compiler for Sapper and used it to create a non-trivial secure embedded processor with many modern microarchitectural features. We empirically evaluate the resulting hardware's area and energy overhead and compare them with alternative designs.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127978204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 117

Leveraging the short-term memory of hardware to diagnose production-run software failures 利用硬件的短期记忆来诊断生产运行的软件故障

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541973

Joy Arulraj, Guoliang Jin, Shan Lu

{"title":"Leveraging the short-term memory of hardware to diagnose production-run software failures","authors":"Joy Arulraj, Guoliang Jin, Shan Lu","doi":"10.1145/2541940.2541973","DOIUrl":"https://doi.org/10.1145/2541940.2541973","url":null,"abstract":"Failures caused by software bugs are widespread in production runs, causing severe losses for end users. Unfortunately, diagnosing production-run failures is challenging. Existing work cannot satisfy privacy, run-time overhead, diagnosis capability, and diagnosis latency requirements all at once. This paper designs a low overhead, low latency, privacy preserving production-run failure diagnosis system based on two observations. First, short-term memory of program execution is often sufficient for failure diagnosis, as many bugs have short propagation distances. Second, maintaining a short-term memory of execution is much cheaper than maintaining a record of the whole execution. Following these observations, we first identify an existing hardware unit, Last Branch Record (LBR), that records the last few taken branches to help diagnose sequential bugs. We then propose a simple hardware extension, Last Cache-coherence Record (LCR), to record the last few cache accesses with specified coherence states and hence help diagnose concurrency bugs. Finally, we design LBRA and LCRA to automatically locate failure root causes using LBR and LCR. Our evaluation uses 31 real-world sequential and concurrency bug failures from 18 representative open-source software. The results show that with just 16 record entries, LBR and LCR enable our system to automatically locate the root causes for 27 out of 31 failures, with less than 3% run-time overhead. As our system does not rely on sampling,","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"342 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131073142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Data-parallel finite-state machines 数据并行有限状态机

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541988

Todd Mytkowicz, Madan Musuvathi, Wolfram Schulte

引用次数: 82

Resolved: specialized architectures, languages, and system software should supplant general-purpose alternatives within a decade 解决方案:专门的体系结构、语言和系统软件应该在十年内取代通用的替代品

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2654822.2563369

D. Wood

{"title":"Resolved: specialized architectures, languages, and system software should supplant general-purpose alternatives within a decade","authors":"D. Wood","doi":"10.1145/2654822.2563369","DOIUrl":"https://doi.org/10.1145/2654822.2563369","url":null,"abstract":"The field of computing has struggled since its inception with the tension between specialization and generalization. Specialized architectures, programming languages, and system software promise better performance (across many metrics, including efficiency, productivity, etc.) for workloads that match their specialization objective. General-purpose architectures, languages, and system software sacrifice extremes of performance for specific workloads, seeking acceptable performance across a much wider range. While specialized alternatives have always had their place, general-purpose architectures, languages, and system software have dominated main-stream computing systems for the past several decades. But with Dennard scaling already gone and the end of Moore's Law looming, some have argued that general-purpose computing platforms must naturally give way to specialization. In this debate, two teams of highly-opinionated experts will debate the proposition that specialized architectures, languages, and system software should largely supplant general-purpose alternatives within the next decade. Arguments in favor of specialization include energy efficiency in the post-Dennard scaling era, performance scaling in the post-Moore's law era, and improvements in programmer productivity. Arguments against include the large investment needed to create specialized hardware and software components, lack of tools and interfaces to create reusable components, the semantic gap from overspecialization, and security vulnerabilities and general correctness issues due to interoperation of specialized components.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"53 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132227971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Disengaged scheduling for fair, protected access to fast computational accelerators 为公平、受保护地访问快速计算加速器而进行的非参与调度

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541963

Konstantinos Menychtas, Kai Shen, M. Scott

{"title":"Disengaged scheduling for fair, protected access to fast computational accelerators","authors":"Konstantinos Menychtas, Kai Shen, M. Scott","doi":"10.1145/2541940.2541963","DOIUrl":"https://doi.org/10.1145/2541940.2541963","url":null,"abstract":"Today's operating systems treat GPUs and other computational accelerators as if they were simple devices, with bounded and predictable response times. With accelerators assuming an increasing share of the workload on modern machines, this strategy is already problematic, and likely to become untenable soon. If the operating system is to enforce fair sharing of the machine, it must assume responsibility for accelerator scheduling and resource management. Fair, safe scheduling is a particular challenge on fast accelerators, which allow applications to avoid kernel-crossing overhead by interacting directly with the device. We propose a disengaged scheduling strategy in which the kernel intercedes between applications and the accelerator on an infrequent basis, to monitor their use of accelerator cycles and to determine which applications should be granted access over the next time interval. Our strategy assumes a well defined, narrow interface exported by the accelerator. We build upon such an interface, systematically inferred for the latest Nvidia GPUs. We construct several example schedulers, including Disengaged Timeslice with overuse control that guarantees fairness and Disengaged Fair Queueing that is effective in limiting resource idleness, but probabilistic. Both schedulers ensure fair sharing of the GPU, even among uncooperative or adversarial applications; Disengaged Fair Queueing incurs a 4% overhead on average (max 18%) compared to direct device access across our evaluation scenarios.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133505815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

VSwapper: a memory swapper for virtualized environments VSwapper:虚拟化环境的内存交换器

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541969

Nadav Amit, Dan Tsafrir, A. Schuster

{"title":"VSwapper: a memory swapper for virtualized environments","authors":"Nadav Amit, Dan Tsafrir, A. Schuster","doi":"10.1145/2541940.2541969","DOIUrl":"https://doi.org/10.1145/2541940.2541969","url":null,"abstract":"The number of guest virtual machines that can be consolidated on one physical host is typically limited by the memory size, motivating memory overcommitment. Guests are given a choice to either install a \"balloon\" driver to coordinate the overcommitment activity, or to experience degraded performance due to uncooperative swapping. Ballooning, however, is not a complete solution, as hosts must still fall back on uncooperative swapping in various circumstances. Additionally, ballooning takes time to accommodate change, and so guests might experience degraded performance under changing conditions. Our goal is to improve the performance of hosts when they fall back on uncooperative swapping and/or operate under changing load conditions. We carefully isolate and characterize the causes for the associated poor performance, which include various types of superfluous swap operations, decayed swap file sequentiality, and ineffective prefetch decisions upon page faults. We address these problems by implementing VSwapper, a guest-agnostic memory swapper for virtual environments that allows efficient, uncooperative overcommitment. With inactive ballooning, VSwapper yields up to an order of magnitude performance improvement. Combined with ballooning, VSwapper can achieve up to double the performance under changing load conditions.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"394 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116226096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

High-performance fractal coherence 高性能分形相干性

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541982

G. Voskuilen, T. N. Vijaykumar

{"title":"High-performance fractal coherence","authors":"G. Voskuilen, T. N. Vijaykumar","doi":"10.1145/2541940.2541982","DOIUrl":"https://doi.org/10.1145/2541940.2541982","url":null,"abstract":"Bugs in cache coherence protocols can cause system failures. Despite many advances, verification runs into state explosion for even moderately-sized systems. As multicores' core counts increase, coherence verifiability continues to be a key problem. A recent proposal, called fractal coherence, avoids the state explosion problem by applying the idea of observational equivalence between a larger system and its smaller sub-systems. A fractal protocol for a larger system is verified by design if a minimal sub-system is verified completely. While fractal coherence is a significant step forward, there are two shortcomings: (1) Architectural limitation: To achieve fractal coherence's logical hierarchy, TreeFractal, the specific fractal protocol, employs a tree architecture where each miss traverses many levels up and down the tree and each level redundantly holds its sub-trees' coherence tags. (2) Protocol restrictions: TreeFractal imposes a restriction on responses to read requests that forces read requests to obtain clean blocks from the nearest sharer even if the shared L2 or L3 is faster. These limitations impose significant performance and coherence tag state overheads. In this paper, we propose architectural support for coherence protocols to achieve scalable performance and verifiability. To address the architectural limitation, we propose FlatFractal, a directory-based architecture which decouples fractal coherence's logical hierarchy from the architecture and eliminates redundant tag state. To address the protocol restriction, we propose a simple change to the protocol that, while preserving observational equivalence, allows read requests to obtain the blocks from the shared L2 or L3. Our simulations show that for 16 cores, FlatFractal performs, on average, 57% better than TreeFractal and within 3% of a conventional directory.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117028933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Fence-free work stealing on bounded TSO processors 在有界的TSO处理器上进行无栅栏工作窃取

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541987

Adam Morrison, Y. Afek

引用次数: 25

Low-level detection of language-level data races with LARD 使用LARD对语言级数据竞争进行低级检测

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541955

Benjamin P. Wood, L. Ceze, D. Grossman

{"title":"Low-level detection of language-level data races with LARD","authors":"Benjamin P. Wood, L. Ceze, D. Grossman","doi":"10.1145/2541940.2541955","DOIUrl":"https://doi.org/10.1145/2541940.2541955","url":null,"abstract":"Researchers have proposed always-on data-race exceptions as a way to avoid the ill effects of data races, but slow performance of accurate dynamic data-race detection remains a barrier to the adoption of always-on data-race exceptions. Proposals for accurate low-level (e.g., hardware) data-race detection have the potential to reduce this performance barrier. This paper explains why low-level data-race detectors are wrong for programs written in high-level languages (e.g., Java): they miss true data races and report false data races in these programs. To bring the benefits of low-level data-race detection to high-level languages, we design low-level abstractable race detection (LARD), an extension of the interface between low-level data-race detectors and run-time systems that enables accurate language-level data-race detection using low-level detection mechanisms. We implement accurate LARD data-race exception support for Java, coupling a modified Jikes RVM Java virtual machine and a simulated hardware race detector. We evaluate our detector's accuracy against an accurate dynamic Java data-race detector and other low-level race detectors without LARD, showing that naive accurate nlow-level data-race detectors suffer from many missed and false language-level races in practice, and that LARD prevents this inaccuracy.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131278219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29