Efficient depth peeling via bucket sort

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI:10.1145/1572769.1572779

Fang Liu, Meng-Cheng Huang, Xuehui Liu, E. Wu

{"title":"Efficient depth peeling via bucket sort","authors":"Fang Liu, Meng-Cheng Huang, Xuehui Liu, E. Wu","doi":"10.1145/1572769.1572779","DOIUrl":null,"url":null,"abstract":"In this paper we present an efficient algorithm for multi-layer depth peeling via bucket sort of fragments on GPU, which makes it possible to capture up to 32 layers simultaneously with correct depth ordering in a single geometry pass. We exploit multiple render targets (MRT) as storage and construct a bucket array of size 32 per pixel. Each bucket is capable of holding only one fragment, and can be concurrently updated using the MAX/MIN blending operation. During the rasterization, the depth range of each pixel location is divided into consecutive subintervals uniformly, and a linear bucket sort is performed so that fragments within each subintervals will be routed into the corresponding buckets. In a following fullscreen shader pass, the bucket array can be sequentially accessed to get the sorted fragments for further applications. Collisions will happen when more than one fragment is routed to the same bucket, which can be alleviated by multi-pass approach. We also develop a two-pass approach to further reduce the collisions, namely adaptive bucket depth peeling. In the first geometry pass, the depth range is redivided into non-uniform subintervals according to the depth distribution to make sure that there is only one fragment within each subinterval. In the following bucket sorting pass, there will be only one fragment routed into each bucket and collisions will be substantially reduced. Our algorithm shows up to 32 times speedup to the classical depth peeling especially for large scenes with high depth complexity, and the experimental results are visually faithful to the ground truth. Also it has no requirement of pre-sorting geometries or post-sorting fragments, and is free of read-modify-write (RMW) hazards.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on High Performance Graphics 2009","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1572769.1572779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 72

Abstract

In this paper we present an efficient algorithm for multi-layer depth peeling via bucket sort of fragments on GPU, which makes it possible to capture up to 32 layers simultaneously with correct depth ordering in a single geometry pass. We exploit multiple render targets (MRT) as storage and construct a bucket array of size 32 per pixel. Each bucket is capable of holding only one fragment, and can be concurrently updated using the MAX/MIN blending operation. During the rasterization, the depth range of each pixel location is divided into consecutive subintervals uniformly, and a linear bucket sort is performed so that fragments within each subintervals will be routed into the corresponding buckets. In a following fullscreen shader pass, the bucket array can be sequentially accessed to get the sorted fragments for further applications. Collisions will happen when more than one fragment is routed to the same bucket, which can be alleviated by multi-pass approach. We also develop a two-pass approach to further reduce the collisions, namely adaptive bucket depth peeling. In the first geometry pass, the depth range is redivided into non-uniform subintervals according to the depth distribution to make sure that there is only one fragment within each subinterval. In the following bucket sorting pass, there will be only one fragment routed into each bucket and collisions will be substantially reduced. Our algorithm shows up to 32 times speedup to the classical depth peeling especially for large scenes with high depth complexity, and the experimental results are visually faithful to the ground truth. Also it has no requirement of pre-sorting geometries or post-sorting fragments, and is free of read-modify-write (RMW) hazards.

查看原文本刊更多论文

高效深度剥皮通过桶排序

在本文中，我们提出了一种高效的基于GPU的桶排序碎片多层深度剥离算法，该算法可以在单个几何通道中同时捕获多达32层，并且深度排序正确。我们利用多个渲染目标(MRT)作为存储，并构建一个大小为每像素32的桶数组。每个桶只能容纳一个片段，并且可以使用MAX/MIN混合操作并发更新。在栅格化过程中，将每个像素位置的深度范围统一划分为连续的子区间，并进行线性桶排序，将每个子区间内的碎片路由到相应的桶中。在接下来的全屏着色器传递中，桶数组可以被顺序访问以获得排序的片段，以供进一步的应用。当多个碎片被路由到同一个桶时，会发生碰撞，可以通过多通道方法来缓解这种冲突。我们还开发了一种两步方法来进一步减少碰撞，即自适应桶深度剥离。在第一次几何通道中，根据深度分布将深度范围重新划分为非均匀子区间，以确保每个子区间内只有一个碎片。在接下来的桶排序过程中，将只有一个片段路由到每个桶中，并且碰撞将大大减少。特别是对于深度复杂度较高的大型场景，我们的算法比经典的深度剥离速度提高了32倍，实验结果在视觉上忠实于地面真实情况。此外，它不需要预排序几何形状或后排序片段，并且没有读-修改-写(RMW)风险。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Conference on High Performance Graphics 2009

自引率

0.00%

发文量