Proceedings of the 19th international conference on Architectural support for programming languages and operating systems最新文献_第7页

Session details: Keynote 会议详情:

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/3260919

A. Davis

引用次数: 0

Session details: Compilers, optimization, and co-design 会议细节:编译器、优化和协同设计

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/3260932

K. Pingali

引用次数: 0

Session details: Keynote 会议详情:

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/3260926

R. Balasubramonian

引用次数: 0

Inside windows azure: the challenges and opportunities of a cloud operating system windows azure内部:云操作系统的挑战与机遇

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2654822.2560008

B. Calder

引用次数: 3

Session details: Heterogeneous computing 会话详细信息:异构计算

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/3260925

Debbie Marr

引用次数: 0

Paraprox: pattern-based approximation for data parallel applications Paraprox:数据并行应用的基于模式的近似

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541948

M. Samadi, D. Jamshidi, Janghaeng Lee, S. Mahlke

{"title":"Paraprox: pattern-based approximation for data parallel applications","authors":"M. Samadi, D. Jamshidi, Janghaeng Lee, S. Mahlke","doi":"10.1145/2541940.2541948","DOIUrl":"https://doi.org/10.1145/2541940.2541948","url":null,"abstract":"Approximate computing is an approach where reduced accuracy of results is traded off for increased speed, throughput, or both. Loss of accuracy is not permissible in all computing domains, but there are a growing number of data-intensive domains where the output of programs need not be perfectly correct to provide useful results or even noticeable differences to the end user. These soft domains include multimedia processing, machine learning, and data mining/analysis. An important challenge with approximate computing is transparency to insulate both software and hardware developers from the time, cost, and difficulty of using approximation. This paper proposes a software-only system, Paraprox, for realizing transparent approximation of data-parallel programs that operates on commodity hardware systems. Paraprox starts with a data-parallel kernel implemented using OpenCL or CUDA and creates a parameterized approximate kernel that is tuned at runtime to maximize performance subject to a target output quality (TOQ) that is supplied by the user. Approximate kernels are created by recognizing common computation idioms found in data-parallel programs (e.g., Map, Scatter/Gather, Reduction, Scan, Stencil, and Partition) and substituting approximate implementations in their place. Across a set of 13 soft data-parallel applications with at most 10% quality degradation, Paraprox yields an average performance gain of 2.7x on a NVIDIA GTX 560 GPU and 2.5x on an Intel Core i7 quad-core processor compared to accurate execution on each platform.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123988428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 215

Price theory based power management for heterogeneous multi-cores 基于价格理论的异构多核电源管理

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI: 10.1145/2541940.2541974

Thannirmalai Somu Muthukaruppan, A. Pathania, T. Mitra

引用次数: 93

Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces gpu上地址转换的架构支持:为CPU/ gpu设计具有统一地址空间的内存管理单元

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2013-07-01 DOI: 10.1145/2541940.2541942

Bharath Pichai, Lisa R. Hsu, A. Bhattacharjee

{"title":"Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces","authors":"Bharath Pichai, Lisa R. Hsu, A. Bhattacharjee","doi":"10.1145/2541940.2541942","DOIUrl":"https://doi.org/10.1145/2541940.2541942","url":null,"abstract":"The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory. To this end, we are the first to explore GPU Memory Management Units(MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs) for address translation in unified heterogeneous systems. We show the performance challenges posed by GPU warp schedulers on TLBs accessed in parallel with L1 caches, which provide many well-known programmability benefits. In response, we propose modest TLB and PTW augmentations that recover most of the performance lost by introducing L1 parallel TLB access. We also show that a little TLB-awareness can make other GPU performance enhancements (e.g., cache-conscious warp scheduling and dynamic warp formation on branch divergence) feasible in the face of cache-parallel address translation, bringing overheads in the range deemed acceptable for CPUs (10-15% of runtime). We presume this initial design leaves room for improvement but anticipate that our bigger insight, that a little TLB-awareness goes a long way in GPUs, will spur further work in this fruitful area.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128963678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 152

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems 第19届编程语言和操作系统架构支持国际会议论文集

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 1900-01-01 DOI: 10.1145/2541940

引用次数: 3