Proceedings of the Second Workshop on Optimizing Stencil Computations最新文献

Improving Parallelism of Recursive Stencil Computations without Sacrificing Cache Performance 在不牺牲缓存性能的前提下提高递归模板计算的并行性

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI: 10.1145/2686745.2686752

Yuan Tang, R. You, Haibin Kan, Jesmin Jahan Tithi, P. Ganapathi, R. Chowdhury

{"title":"Improving Parallelism of Recursive Stencil Computations without Sacrificing Cache Performance","authors":"Yuan Tang, R. You, Haibin Kan, Jesmin Jahan Tithi, P. Ganapathi, R. Chowdhury","doi":"10.1145/2686745.2686752","DOIUrl":"https://doi.org/10.1145/2686745.2686752","url":null,"abstract":"The state-of-the-art \"trapezoidal decomposition algorithm\" for stencil computations on modern multicore machines use recursive divide-and-conquer (DAC) to achieve asymptotically optimal cache complexity cache-obliviously. But the same DAC approach restricts parallelism by introducing artificial dependencies among subtasks in addition to those arising from the defining stencil equations. As a result, the trapezoidal decomposition algorithm has suboptimal parallelism. In this paper we present a variant of the parallel trapezoidal decomposition algorithm called \"cache-oblivious wavefront\" (COW) that starts execution of recursive subtasks earlier than the start time prescribed by the original algorithm without violating any real dependencies implied by the underlying recurrences, and thus reducing serialization due to artificial dependencies. The reduction in serialization leads to an improvement in parallelism. Moreover, since we do not change the DAC-based decomposition of tasks used in the original algorithm, cache performance does not suffer. We provide experimental measurements of absolute running times, burdened span by Cilkview, and L1/L2 cache misses by PAPI to validate our claims.","PeriodicalId":367066,"journal":{"name":"Proceedings of the Second Workshop on Optimizing Stencil Computations","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127548024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Converting Stencils to Accumulations Forcommunication-Avoiding Optimizationin Geometric Multigrid 几何多重网格中转换模板到累积的通信避免优化

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI: 10.1145/2686745.2686749

P. Basu, Samuel Williams, Brian Van Straalen, L. Oliker, Mary W. Hall

引用次数: 4

Stencils in Scientific Computations 科学计算中的模板

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI: 10.1145/2686745.2686756

A. Dubey

{"title":"Stencils in Scientific Computations","authors":"A. Dubey","doi":"10.1145/2686745.2686756","DOIUrl":"https://doi.org/10.1145/2686745.2686756","url":null,"abstract":"Stencils occur in many areas, but they are ubiquitous in scientific computing. They range from the simple Jacobi iterations to the extremely complex ones used in the solution of highly nonlinear partial differential equations (PDE). High level programming languages typically used in implementation of scientific software, by not providing explicit support for stencils, force each implementation to make choices about expressing its specifics such as dimensionality, data layout, order of access and order of operations. These choices often hide the opportunity for optimizations from the compilers. Therehave been attempts to provide abstractions for simpler stencils, and they have met with success in some areas, but multiphysics scientific applications present challenges that cannot be met by simple stencil abstractions. The applications may have hierarchy, or non-uniformity, or both in their discretizations which cannot be expressed by stencils describing uniform discretizations. The physics operators being applied maybe non-linear which would demand composability of stencils. As the order of the solution method increases, the size and the reach of stencil also increases, and there may be conditions that imply the application of the stencil to an arbitrary subset of the discretized points. And finally, if there are multiple steps involved in an update, intermediate results need to be managed. AMR Shift Calculus, (Phil Colella and Brian Van Straalen 2014), provides a generalized abstraction that addresses many of these concerns. It provides a means of expressing stencil computations in the form of a collection of shift operations combined with associated weights, that can be applied to a specified collection of discretized points. The shift calculus also addresses the hierarchy in the discretization, and defines operators on stencils that allow more complex stencils to be composed from simpler ones. Because the shift calculus makes it possible to express the computation concisely and precisely, it gets around the problem of false dependencies. Additionally, the composability of the stencil operators exposes possibilities of loop or even function fusion, and the granularity for holding intermediate values to the compiler for better optimization opportunities. The included slide presentation is organized in five sections. The first section gives examples of discretization from simple Poisson to complex compressible Navier-Stokes (CNS) equations and addresses thelevel of abstraction needed to express the computations on these discretizations. The second section outlines several challenges that are unique to scientific applications, and the ways in which many abstractions that have proved useful elsewhere fail to work with scientific computing. The third section goes on to describe the AMR shift calculus with emphasis on features that are typically not found in other approaches to stencils based abstractions, but are necessary for the solving complex PDE'","PeriodicalId":367066,"journal":{"name":"Proceedings of the Second Workshop on Optimizing Stencil Computations","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133881372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

StenSAL: A Single Assignment Language for Relentlessly Executing Explicit Stencil Algorithms StenSAL:用于无情执行显式模板算法的单一赋值语言

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI: 10.1145/2686745.2686747

Lucas A. Wilson, Jeffery von Ronne

引用次数: 2

HLSF: A High-Level; C++-Based Framework for Stencil Computations on Accelerators HLSF:高级;基于c++的加速器模板计算框架

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI: 10.1145/2686745.2686751

Fabian Dütsch, K. Djelassi, Michael Haidl, S. Gorlatch

引用次数: 7

An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns 用通用科学计算模式组合模板的可扩展框架

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI: 10.1145/2686745.2686750

L. Truong, Chick Markley, A. Fox

{"title":"An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns","authors":"L. Truong, Chick Markley, A. Fox","doi":"10.1145/2686745.2686750","DOIUrl":"https://doi.org/10.1145/2686745.2686750","url":null,"abstract":"The SEJITS framework supports creating embedded domain-specific languages (DSELs) and code generators, a pair of which is called a specializer, with much less effort than creating a full DSL compiler---typically just a few hundred lines of code. SEJITS' main benefit is allowing application writers to stay entirely in high-level languages such as Python by using specialized Python functions (that is, functions written in one of the Python-embedded DSELs) to generate code that runs at native speed. One existing SEJITS DSEL is Sepya [10], a Python DSEL for stencil computations that generates OpenMP and Cilk+ code competitive with existing DSL compilers such as Pochoir and Halide. We extend Sepya to generate OpenCL code for targetting GPUs, and in the process, extend SEJITS with support for meta-specializers, whose job is to enable and optimize the composition of existing specializers written by third parties. In this work, we demonstrate meta-specialization by detecting and removing extraneous data copies to and from the GPU to compose multiple specializer calls (stencil and non-stencil). We also explore the variants of loop fusion to further improve performance of composing these operations. The performance of the generated stencil code is 20x faster SciPy and competitive with existing stencil DSELs on realistic code excerpts. Since meta-specializers must compose and optimize specializers created by third parties, we extend SEJITS with support for meta-specializer hooks, allowing existing specializers to be incrementally enabled for meta-specialization without breaking backwards compatibility. The Sepya and SEJITS extensions together extend the range of platforms for which highly optimized code can be generated and open new possibilities for optimizing the composition of existing specializers.","PeriodicalId":367066,"journal":{"name":"Proceedings of the Second Workshop on Optimizing Stencil Computations","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127265872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Proceedings of the Second Workshop on Optimizing Stencil Computations 第二届优化模板计算研讨会论文集

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI: 10.1145/2686745

Saman P. Amarasinghe, S. Kamil, P. Sadayappan

引用次数: 0

Trace-Driven Memory Access Pattern Recognition in Computational Kernels 计算核中的跟踪驱动内存访问模式识别

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI: 10.1145/2686745.2686748

Eunjung Park, Christos Kartsaklis, T. Janjusic, John Cavazos

引用次数: 5

Nanoblock Unroll: Towards the Automatic Generation of Stencil Codes with the Optimal Performance 纳米块展开:实现性能最优的模板代码自动生成

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI: 10.1145/2686745.2686746

T. Muranushi, Keigo Nitadori, J. Makino

引用次数: 2