Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation最新文献

筛选
英文 中文
Dynamic trace-based analysis of vectorization potential of applications 基于动态轨迹的矢量化潜力分析的应用
Justin Holewinski, R. Ramamurthi, Mahesh Ravishankar, Naznin Fauzia, L. Pouchet, A. Rountev, P. Sadayappan
{"title":"Dynamic trace-based analysis of vectorization potential of applications","authors":"Justin Holewinski, R. Ramamurthi, Mahesh Ravishankar, Naznin Fauzia, L. Pouchet, A. Rountev, P. Sadayappan","doi":"10.1145/2254064.2254108","DOIUrl":"https://doi.org/10.1145/2254064.2254108","url":null,"abstract":"Recent hardware trends with GPUs and the increasing vector lengths of SSE-like ISA extensions for multicore CPUs imply that effective exploitation of SIMD parallelism is critical for achieving high performance on emerging and future architectures. A vast majority of existing applications were developed without any attention by their developers towards effective vectorizability of the codes. While developers of production compilers such as GNU gcc, Intel icc, PGI pgcc, and IBM xlc have invested considerable effort and made significant advances in enhancing automatic vectorization capabilities, these compilers still cannot effectively vectorize many existing scientific and engineering codes. It is therefore of considerable interest to analyze existing applications to assess the inherent latent potential for SIMD parallelism, exploitable through further compiler advances and/or via manual code changes. In this paper we develop an approach to infer a program's SIMD parallelization potential by analyzing the dynamic data-dependence graph derived from a sequential execution trace. By considering only the observed run-time data dependences for the trace, and by relaxing the execution order of operations to allow any dependence-preserving reordering, we can detect potential SIMD parallelism that may otherwise be missed by more conservative compile-time analyses. We show that for several benchmarks our tool discovers regions of code within computationally-intensive loops that exhibit high potential for SIMD parallelism but are not vectorized by state-of-the-art compilers. We present several case studies of the use of the tool, both in identifying opportunities to enhance the transformation capabilities of vectorizing compilers, as well as in pointing to code regions to manually modify in order to enable auto-vectorization and performance improvement by existing compilers.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116076411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
A compiler framework for extracting superword level parallelism 一个用于提取超词级并行性的编译器框架
Jun Liu, Yuanrui Zhang, Ohyoung Jang, W. Ding, M. Kandemir
{"title":"A compiler framework for extracting superword level parallelism","authors":"Jun Liu, Yuanrui Zhang, Ohyoung Jang, W. Ding, M. Kandemir","doi":"10.1145/2254064.2254106","DOIUrl":"https://doi.org/10.1145/2254064.2254106","url":null,"abstract":"SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. In this paper, we propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling, of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2% over a state-of-the-art SLP optimization algorithm.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122331985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Dynamic synthesis for relaxed memory models 放松记忆模型的动态综合
Feng Liu, Nayden Nedev, Nedyalko Prisadnikov, Martin T. Vechev, Eran Yahav
{"title":"Dynamic synthesis for relaxed memory models","authors":"Feng Liu, Nayden Nedev, Nedyalko Prisadnikov, Martin T. Vechev, Eran Yahav","doi":"10.1145/2254064.2254115","DOIUrl":"https://doi.org/10.1145/2254064.2254115","url":null,"abstract":"Modern architectures implement relaxed memory models which may reorder memory operations or execute them non-atomically. Special instructions called memory fences are provided, allowing control of this behavior. To implement a concurrent algorithm for a modern architecture, the programmer is forced to manually reason about subtle relaxed behaviors and figure out ways to control these behaviors by adding fences to the program. Not only is this process time consuming and error-prone, but it has to be repeated every time the implementation is ported to a different architecture. In this paper, we present the first scalable framework for handling real-world concurrent algorithms running on relaxed architectures. Given a concurrent C program, a safety specification, and a description of the memory model, our framework tests the program on the memory model to expose violations of the specification, and synthesizes a set of necessary ordering constraints that prevent these violations. The ordering constraints are then realized as additional fences in the program. We implemented our approach in a tool called DFence based on LLVM and used it to infer fences in a number of concurrent algorithms. Using DFence, we perform the first in-depth study of the interaction between fences in real-world concurrent C programs, correctness criteria such as sequential consistency and linearizability, and memory models such as TSO and PSO, yielding many interesting observations. We believe that this is the first tool that can handle programs at the scale and complexity of a lock-free memory allocator.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129625835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
JANUS: exploiting parallelism via hindsight JANUS:利用后见之明的并行性
Omer Tripp, R. Manevich, J. Field, Shmuel Sagiv
{"title":"JANUS: exploiting parallelism via hindsight","authors":"Omer Tripp, R. Manevich, J. Field, Shmuel Sagiv","doi":"10.1145/2254064.2254083","DOIUrl":"https://doi.org/10.1145/2254064.2254083","url":null,"abstract":"This paper addresses the problem of reducing unnecessary conflicts in optimistic synchronization. Optimistic synchronization must ensure that any two concurrently executing transactions that commit are properly synchronized. Conflict detection is an approximate check for this condition. For efficiency, the traditional approach to conflict detection conservatively checks that the memory locations mutually accessed by two concurrent transactions are accessed only for reading. We present JANUS, a parallelization system that performs conflict detection by considering sequences of operations and their composite effect on the system's state. This is done efficiently, such that the runtime overhead due to conflict detection is on a par with that of write-conflict-based detection. In certain common scenarios, this mode of refinement dramatically improves the precision of conflict detection, thereby reducing the number of false conflicts. Our empirical evaluation of JANUS shows that this precision gain reduces the abort rate by an order of magnitude (22x on average), and achieves a speedup of up to 2.5x, on a suite of real-world benchmarks where no parallelism is exploited by the standard approach.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"794 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123905325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
RockSalt: better, faster, stronger SFI for the x86 RockSalt:更好、更快、更强的x86 SFI
Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, Edward Gan
{"title":"RockSalt: better, faster, stronger SFI for the x86","authors":"Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, Edward Gan","doi":"10.1145/2254064.2254111","DOIUrl":"https://doi.org/10.1145/2254064.2254111","url":null,"abstract":"Software-based fault isolation (SFI), as used in Google's Native Client (NaCl), relies upon a conceptually simple machine-code analysis to enforce a security policy. But for complicated architectures such as the x86, it is all too easy to get the details of the analysis wrong. We have built a new checker that is smaller, faster, and has a much reduced trusted computing base when compared to Google's original analysis. The key to our approach is automatically generating the bulk of the analysis from a declarative description which we relate to a formal model of a subset of the x86 instruction set architecture. The x86 model, developed in Coq, is of independent interest and should be usable for a wide range of machine-level verification tasks.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127410978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 141
Automated error diagnosis using abductive inference 使用溯因推理的自动错误诊断
Işıl Dillig, Thomas Dillig, A. Aiken
{"title":"Automated error diagnosis using abductive inference","authors":"Işıl Dillig, Thomas Dillig, A. Aiken","doi":"10.1145/2254064.2254087","DOIUrl":"https://doi.org/10.1145/2254064.2254087","url":null,"abstract":"When program verification tools fail to verify a program, either the program is buggy or the report is a false alarm. In this situation, the burden is on the user to manually classify the report, but this task is time-consuming, error-prone, and does not utilize facts already proven by the analysis. We present a new technique for assisting users in classifying error reports. Our technique computes small, relevant queries presented to a user that capture exactly the information the analysis is missing to either discharge or validate the error. Our insight is that identifying these missing facts is an instance of the abductive inference problem in logic, and we present a new algorithm for computing the smallest and most general abductions in this setting. We perform the first user study to rigorously evaluate the accuracy and effort involved in manual classification of error reports. Our study demonstrates that our new technique is very useful for improving both the speed and accuracy of error report classification. Specifically, our approach improves classification accuracy from 33% to 90% and reduces the time programmers take to classify error reports from approximately 5 minutes to under 1 minute.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116280652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 100
Understanding and detecting real-world performance bugs 理解和检测真实世界的性能错误
Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, Shan Lu
{"title":"Understanding and detecting real-world performance bugs","authors":"Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, Shan Lu","doi":"10.1145/2254064.2254075","DOIUrl":"https://doi.org/10.1145/2254064.2254075","url":null,"abstract":"Developers frequently use inefficient code sequences that could be fixed by simple patches. These inefficient code sequences can cause significant performance degradation and resource waste, referred to as performance bugs. Meager increases in single threaded performance in the multi-core era and increasing emphasis on energy efficiency call for more effort in tackling performance bugs. This paper conducts a comprehensive study of 110 real-world performance bugs that are randomly sampled from five representative software suites (Apache, Chrome, GCC, Mozilla, and MySQL). The findings of this study provide guidance for future work to avoid, expose, detect, and fix performance bugs. Guided by our characteristics study, efficiency rules are extracted from 25 patches and are used to detect performance bugs. 332 previously unknown performance problems are found in the latest versions of MySQL, Apache, and Mozilla applications, including 219 performance problems found by applying rules across applications.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123622488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 346
Synthesising graphics card programs from DSLs 从dsl合成图形卡程序
Luke Cartey, Rune B. Lyngsø, O. Moor
{"title":"Synthesising graphics card programs from DSLs","authors":"Luke Cartey, Rune B. Lyngsø, O. Moor","doi":"10.1145/2254064.2254080","DOIUrl":"https://doi.org/10.1145/2254064.2254080","url":null,"abstract":"Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions. However, this increase can be difficult to achieve, and doing so often requires a fundamental rethink. This is especially problematic in scientific computing, where experts do not want to learn yet another architecture. In this paper we develop a method for automatically parallelising recursive functions of the sort found in scientific papers. Using a static analysis of the function dependencies we identify sets - partitions - of independent elements, which we use to synthesise an efficient GPU implementation using polyhedral code generation techniques. We then augment our language with DSL extensions to support a wider variety of applications, and demonstrate the effectiveness of this with three case studies, showing significant performance improvement over equivalent CPU methods, and similar efficiency to hand-tuned GPU implementations.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121051885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Session details: Performance analysis 会话详细信息:性能分析
Amer Diwan
{"title":"Session details: Performance analysis","authors":"Amer Diwan","doi":"10.1145/3250580","DOIUrl":"https://doi.org/10.1145/3250580","url":null,"abstract":"","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123040665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test-case reduction for C compiler bugs 减少C编译器错误的测试用例
J. Regehr, Yang Chen, Pascal Cuoq, E. Eide, Chucky Ellison, Xuejun Yang
{"title":"Test-case reduction for C compiler bugs","authors":"J. Regehr, Yang Chen, Pascal Cuoq, E. Eide, Chucky Ellison, Xuejun Yang","doi":"10.1145/2254064.2254104","DOIUrl":"https://doi.org/10.1145/2254064.2254104","url":null,"abstract":"To report a compiler bug, one must often find a small test case that triggers the bug. The existing approach to automated test-case reduction, delta debugging, works by removing substrings of the original input; the result is a concatenation of substrings that delta cannot remove. We have found this approach less than ideal for reducing C programs because it typically yields test cases that are too large or even invalid (relying on undefined behavior). To obtain small and valid test cases consistently, we designed and implemented three new, domain-specific test-case reducers. The best of these is based on a novel framework in which a generic fixpoint computation invokes modular transformations that perform reduction operations. This reducer produces outputs that are, on average, more than 25 times smaller than those produced by our other reducers or by the existing reducer that is most commonly used by compiler developers. We conclude that effective program reduction requires more than straightforward delta debugging.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125165605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 255
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信