Proceedings of the Conference on High Performance Graphics 2009最新文献

Understanding the efficiency of ray traversal on GPUs 理解gpu上光线遍历的效率

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572792

Timo Aila, S. Laine

引用次数: 488

A parallel algorithm for construction of uniform grids 构造均匀网格的并行算法

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572773

Javor Kalojanov, P. Slusallek

引用次数: 105

Stream compaction for deferred shading 用于延迟着色的流压缩

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572797

Jared Hoberock, Victor Lu, Yuntao Jia, J. Hart

引用次数: 31

Scaling of 3D game engine workloads on modern multi-GPU systems 在现代多gpu系统上缩放3D游戏引擎工作负载

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572776

Jordi Roca Monfort, Mark Grossman

引用次数: 18

Faster incoherent rays: Multi-BVH ray stream tracing 更快的非相干射线:多bvh射线流追踪

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572793

John A. Tsakok

{"title":"Faster incoherent rays: Multi-BVH ray stream tracing","authors":"John A. Tsakok","doi":"10.1145/1572769.1572793","DOIUrl":"https://doi.org/10.1145/1572769.1572793","url":null,"abstract":"High fidelity rendering via ray tracing requires tracing incoherent rays for global illumination and other secondary effects. Recent research show that the performance benefits from fast packet traversal schemes that exploit high coherence are lost when coherency is low due to inefficient use of the CPU's SIMD units. In an effort to solve this problem, methods have been proposed which try to extract the remaining coherency from secondary rays through ray sorting, reordering and streaming. Another category of traversal methods have also been proposed which ignore coherency altogether and use a higher order tree branching factor while tracing single rays at a time. These single ray methods not only target applications with incoherent rays but are also scalable with larger SIMD widths. This paper combines ideas from both categories to form a new traversal method which extracts coherency from a group of rays through simple filtering while still providing a fast single ray traversal in cases where there is no coherency present. This new algorithm does not depend on the use of packets which cleanly decouples traversal from shading and is scalable for larger SIMD widths. Results show that overall performance benefits are obtained on a current generation CPU architecture.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130431519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Efficient depth peeling via bucket sort 高效深度剥皮通过桶排序

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572779

Fang Liu, Meng-Cheng Huang, Xuehui Liu, E. Wu

{"title":"Efficient depth peeling via bucket sort","authors":"Fang Liu, Meng-Cheng Huang, Xuehui Liu, E. Wu","doi":"10.1145/1572769.1572779","DOIUrl":"https://doi.org/10.1145/1572769.1572779","url":null,"abstract":"In this paper we present an efficient algorithm for multi-layer depth peeling via bucket sort of fragments on GPU, which makes it possible to capture up to 32 layers simultaneously with correct depth ordering in a single geometry pass. We exploit multiple render targets (MRT) as storage and construct a bucket array of size 32 per pixel. Each bucket is capable of holding only one fragment, and can be concurrently updated using the MAX/MIN blending operation. During the rasterization, the depth range of each pixel location is divided into consecutive subintervals uniformly, and a linear bucket sort is performed so that fragments within each subintervals will be routed into the corresponding buckets. In a following fullscreen shader pass, the bucket array can be sequentially accessed to get the sorted fragments for further applications. Collisions will happen when more than one fragment is routed to the same bucket, which can be alleviated by multi-pass approach. We also develop a two-pass approach to further reduce the collisions, namely adaptive bucket depth peeling. In the first geometry pass, the depth range is redivided into non-uniform subintervals according to the depth distribution to make sure that there is only one fragment within each subinterval. In the following bucket sorting pass, there will be only one fragment routed into each bucket and collisions will be substantially reduced. Our algorithm shows up to 32 times speedup to the classical depth peeling especially for large scenes with high depth complexity, and the experimental results are visually faithful to the ground truth. Also it has no requirement of pre-sorting geometries or post-sorting fragments, and is free of read-modify-write (RMW) hazards.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132515234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 72

CFU: multi-purpose configurable filtering unit for mobile multimedia applications on graphics hardware CFU:多用途可配置的过滤单元，用于图形硬件上的移动多媒体应用

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572775

Chih-Hao Sun, K. Lok, You-Ming Tsao, Chia-Ming Chang, Shao-Yi Chien

{"title":"CFU: multi-purpose configurable filtering unit for mobile multimedia applications on graphics hardware","authors":"Chih-Hao Sun, K. Lok, You-Ming Tsao, Chia-Ming Chang, Shao-Yi Chien","doi":"10.1145/1572769.1572775","DOIUrl":"https://doi.org/10.1145/1572769.1572775","url":null,"abstract":"In order to increase the capability of mobile GPUs in image/video processing, a multi-purpose configurable filtering unit (CFU), which is a new configurable unit for image filtering on stream processing architecture, is proposed in this paper. CFU is located in the texture unit of a GPU and can efficiently execute many kinds of filtering operations by directly accessing multi-bank texture cache and specially-designed data-paths. The following programmabilities are supported in our proposed CFU. First, different sampling point windows can be selected by programmers. Besides, the arithmetic type of the filter can be chosen. Not only original texture filtering functions and finite impulse response (FIR) filters, morphological operations in computer vision are also embedded in CFU. Furthermore, the weighting coefficients of FIR filters and morphological operations can be defined by programmers. Simulation results show that in average, compared with conventional texture unit, 25.35% of processing time in H.264/AVC motion compensation and 58.6% of processing time in video segmentation can be reduced with the assistance of CFU.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"3 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123732492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Efficient ray traced soft shadows using multi-frusta tracing 高效光线跟踪软阴影使用多台跟踪

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572791

Carsten Benthin, I. Wald

{"title":"Efficient ray traced soft shadows using multi-frusta tracing","authors":"Carsten Benthin, I. Wald","doi":"10.1145/1572769.1572791","DOIUrl":"https://doi.org/10.1145/1572769.1572791","url":null,"abstract":"Ray tracing has long been considered to be superior to rasterization because its ability to trace arbitrary rays, allowing it to simulate virtually any physical light transport effect by just tracing rays. Yet, to look plausible, extraordinary amounts of rays for effects such as soft shadows are typically required. This makes the prospects of real-time performance rather remote. Rasterization, in contrast, has a record of producing such effects in real-time through employing specialized and approximate solutions for individual effects. Though ray tracing may still be the right choice for effects like reflections and refractions, using specialized solutions for certain important effects also makes sense for a ray tracer. In this paper, we propose a special solution to ray trace soft shadows that is particularly targeted for Intel's Larrabee architecture. We use a specialized frustum tracing that traces multiple frusta of specialized \"light-weight\" shadow packets in parallel, while generating rays within each frustum on demand. The technique can easily be integrated into any packet ray tracer, and fits well into the wide SIMD and cache-size constraints of the Larrabee architecture. Our technique allows to reach rates of up to several dozen million rays per second per Larrabee core, outperforming traditional packet techniques by up to 6x. This high performance combined with a simple light-weight illumination filtering step allows to achieve real-time soft shadows for game-like scenes.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121539576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Parallel view-dependent tessellation of Catmull-Clark subdivision surfaces Catmull-Clark细分曲面的平行视图相关镶嵌

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572785

Anjul Patney, Mohamed S. Ebeida, John Douglas Owens

{"title":"Parallel view-dependent tessellation of Catmull-Clark subdivision surfaces","authors":"Anjul Patney, Mohamed S. Ebeida, John Douglas Owens","doi":"10.1145/1572769.1572785","DOIUrl":"https://doi.org/10.1145/1572769.1572785","url":null,"abstract":"We present a strategy for performing view-adaptive, crack-free tessellation of Catmull-Clark subdivision surfaces entirely on programmable graphics hardware. Our scheme extends the concept of breadth-first subdivision, which up to this point has only been applied to parametric patches. While mesh representations designed for a CPU often involve pointer-based structures and irregular perelement storage, neither of these is well-suited to GPU execution. To solve this problem, we use a simple yet effective data structure for representing a subdivision mesh, and design a careful algorithm to update the mesh in a completely parallel manner. We demonstrate that in spite of the complexities of the subdivision procedure, real-time tessellation to pixel-sized primitives can be done. Our implementation does not rely on any approximation of the limit surface, and avoids both subdivision cracks and T-junctions in the subdivided mesh. Using the approach in this paper, we are able to perform real-time subdivision for several static as well as animated models. Rendering performance is scalable for increasingly complex models.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116466003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Spatial splits in bounding volume hierarchies 边界卷层次结构中的空间分割

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI: 10.1145/1572769.1572771

Martin Stich, Heiko Friedrich, Andreas Dietrich

引用次数: 158