Vibhav Vineet, P. Harish, Suryakant Patidar, P J Narayanan
{"title":"Fast minimum spanning tree for large graphs on the GPU","authors":"Vibhav Vineet, P. Harish, Suryakant Patidar, P J Narayanan","doi":"10.1145/1572769.1572796","DOIUrl":"https://doi.org/10.1145/1572769.1572796","url":null,"abstract":"Graphics Processor Units are used for many general purpose processing due to high compute power available on them. Regular, data-parallel algorithms map well to the SIMD architecture of current GPU. Irregular algorithms on discrete structures like graphs are harder to map to them. Efficient data-mapping primitives can play crucial role in mapping such algorithms onto the GPU. In this paper, we present a minimum spanning tree algorithm on Nvidia GPUs under CUDA, as a recursive formulation of Borůvka's approach for undirected graphs. We implement it using scalable primitives such as scan, segmented scan and split. The irregular steps of supervertex formation and recursive graph construction are mapped to primitives like split to categories involving vertex ids and edge weights. We obtain 30 to 50 times speedup over the CPU implementation on most graphs and 3 to 10 times speedup over our previous GPU implementation. We construct the minimum spanning tree on a 5 million node and 30 million edge graph in under 1 second on one quarter of the Tesla S1070 GPU.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"347 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116493687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating shadow rays using volumetric occluders and modified kd-tree traversal","authors":"Peter Djeu, S. Keely, W. Hunt","doi":"10.1145/1572769.1572781","DOIUrl":"https://doi.org/10.1145/1572769.1572781","url":null,"abstract":"Monte Carlo ray tracing remains a simple and elegant method for generating robust shadows. This approach, however, is often hampered by the time needed to evaluate the numerous shadow ray queries required to generate a high-quality image. We propose the use of volumetric occluders stored within a kd-tree in order to accelerate shadow rays cast on a closed, watertight mesh. Intersection with a volumetric occluder is much cheaper than intersection with mesh geometry, although performing these intersections requires modification to the traversal order through the kd-tree. We propose two such modifications, both of which enable the use of volumetric occluders for cheap shadow ray termination. We also propose using a software-managed cache to store and reuse volumetric occluders for even earlier termination. Our approach provides a performance improvement of up to 2.0x for our test scenes while producing images identical to those produced by the unaccelerated baseline.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126050039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient stream compaction on wide SIMD many-core architectures","authors":"M. Billeter, Ola Olsson, Ulf Assarsson","doi":"10.1145/1572769.1572795","DOIUrl":"https://doi.org/10.1145/1572769.1572795","url":null,"abstract":"Stream compaction is a common parallel primitive used to remove unwanted elements in sparse data. This allows highly parallel algorithms to maintain performance over several processing steps and reduces overall memory usage. For wide SIMD many-core architectures, we present a novel stream compaction algorithm and explore several variations thereof. Our algorithm is designed to maximize concurrent execution, with minimal use of synchronization. Bandwidth and auxiliary storage requirements are reduced significantly, which allows for substantially better performance. We have tested our algorithms using CUDA on a PC with an NVIDIA GeForce GTX280 GPU. On this hardware, our reference implementation provides a 3x speedup over previous published algorithms.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123880548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image space gathering","authors":"A. Robison, P. Shirley","doi":"10.1145/1572769.1572784","DOIUrl":"https://doi.org/10.1145/1572769.1572784","url":null,"abstract":"Soft shadows, glossy reflections and depth of field are valuable effects for realistic rendering and are often computed using distribution ray tracing (DRT). These \"blurry\" effects often need not be accurate and are sometimes simulated by blurring an image with sharper effects, such as blurring hard shadows to simulate soft shadows. One of the most effective examples of such a blurring algorithm is percentage closer soft shadows (PCSS). That technique, however, does not naturally extend to shadows generated in image space, such as those computed by a ray tracer, nor does it extend to glossy reflections or depth of field. This limitation can be overcome by generalizing PCSS to be phrased in terms of a gather from image space textures implemented with cross bilateral filtering. This paper demonstrates a framework to create visually compelling and phenomenologically accurate approximations of DRT effects based on repeatedly gathering from bilaterally weighted image space texture samples. These gathering and filtering operations are well supported by modern parallel architectures, enabling this technique to run at interactive rates.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121642798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedded function composition","authors":"T. Whitted, J. Kajiya, Erik Ruf, Ray Bittner","doi":"10.1145/1572769.1572777","DOIUrl":"https://doi.org/10.1145/1572769.1572777","url":null,"abstract":"A low-level graphics processor is assembled from a collection of hardwired functions of screen coordinates embedded directly in the display. Configuration of these functions is controlled by a buffer containing parameters delivered to the processor on-the-fly during display scan. The processor is modular and scalable in keeping with the demands of large, high resolution displays.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128199922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bongjun Jin, I. Ihm, Byungjoon Chang, Chanmin Park, Won-Jong Lee, Seokyoon Jung
{"title":"Selective and adaptive supersampling for real-time ray tracing","authors":"Bongjun Jin, I. Ihm, Byungjoon Chang, Chanmin Park, Won-Jong Lee, Seokyoon Jung","doi":"10.1145/1572769.1572788","DOIUrl":"https://doi.org/10.1145/1572769.1572788","url":null,"abstract":"While supersampling is an essential element for high quality rendering, high sampling rates, routinely employed in offline rendering, are still considered quite burdensome for real-time ray tracing. In this paper, we propose a selective and adaptive supersampling technique aimed at the development of a real-time ray tracer on today's many-core processors. For efficient utilization of very precious computing time, this technique explores both image---space and object---space attributes, which can be easily gathered during the ray tracing computation, minimizing rendering artifacts by cleverly distributing ray samples to rendering elements according to priorities that are selectively set by a user. Our implementation on the current GPU demonstrates that the presented algorithm makes high sampling rates as effective as 9 to 16 samples per pixel more affordable than before for real-time ray tracing.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132388387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware-accelerated global illumination by image space photon mapping","authors":"M. McGuire, D. Luebke","doi":"10.1145/1572769.1572783","DOIUrl":"https://doi.org/10.1145/1572769.1572783","url":null,"abstract":"We describe an extension to photon mapping that recasts the most expensive steps of the algorithm -- the initial and final photon bounces -- as image-space operations amenable to GPU acceleration. This enables global illumination for real-time applications as well as accelerating it for offline rendering. Image Space Photon Mapping (ISPM) rasterizes a light-space bounce map of emitted photons surviving initial-bounce Russian roulette sampling on a GPU. It then traces photons conventionally on the CPU. Traditional photon mapping estimates final radiance by gathering photons from a k-d tree. ISPM instead scatters indirect illumination by rasterizing an array of photon volumes. Each volume bounds a filter kernel based on the a priori probability density of each photon path. These two steps exploit the fact that initial path segments from point lights and final ones into a pinhole camera each have a common center of projection. An optional step uses joint bilateral upsampling of irradiance to reduce the fill requirements of rasterizing photon volumes. ISPM preserves the accurate and physically-based nature of photon mapping, supports arbitrary BSDFs, and captures both high- and low-frequency illumination effects such as caustics and diffuse color interreflection. An implementation on a consumer GPU and 8-core CPU renders highquality global illumination at up to 26 Hz at HD (1920x1080) resolution, for complex scenes containing moving objects and lights.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"359 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133489116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Morphological antialiasing","authors":"A. Reshetov","doi":"10.1145/1572769.1572787","DOIUrl":"https://doi.org/10.1145/1572769.1572787","url":null,"abstract":"We present a new algorithm that creates plausibly antialiased images by looking for certain patterns in an original image and then blending colors in the neighborhood of these patterns according to a set of simple rules. We construct these rules to work as a post-processing step in ray tracing applications, allowing approximate, yet fast and robust antialiasing. The algorithm works for any rendering technique and scene complexity. It does not require casting any additional rays and handles all possible effects, including reflections and refractions.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123955537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A directionally adaptive edge anti-aliasing filter","authors":"K. Iourcha, Jason C. Yang, Andrew Pomianowski","doi":"10.1145/1572769.1572789","DOIUrl":"https://doi.org/10.1145/1572769.1572789","url":null,"abstract":"The latest generation of graphics hardware provides direct access to multisample anti-aliasing (MSAA) rendering data. By taking advantage of these existing pixel subsample values, an intelligent reconstruction filter can be computed using programmable GPU shader units. This paper describes an adaptive anti-aliasing (AA) filter for real-time rendering on the GPU. Improved quality is achieved by using information from neighboring pixel samples to compute both an approximation of the gradient of primitive edges and the final pixel color.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122617533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Fatahalian, Edward Luong, S. Boulos, K. Akeley, W. Mark, P. Hanrahan
{"title":"Data-parallel rasterization of micropolygons with defocus and motion blur","authors":"K. Fatahalian, Edward Luong, S. Boulos, K. Akeley, W. Mark, P. Hanrahan","doi":"10.1145/1572769.1572780","DOIUrl":"https://doi.org/10.1145/1572769.1572780","url":null,"abstract":"Current GPUs rasterize micropolygons (polygons approximately one pixel in size) inefficiently. We design and analyze the costs of three alternative data-parallel algorithms for rasterizing micropolygon workloads for the real-time domain. First, we demonstrate that efficient micropolygon rasterization requires parallelism across many polygons, not just within a single polygon. Second, we produce a data-parallel implementation of an existing stochastic rasterization algorithm by Pixar, which is able to produce motion blur and depth-of-field effects. Third, we provide an algorithm that leverages interleaved sampling for motion blur and camera defocus. This algorithm outperforms Pixar's algorithm when rendering objects undergoing moderate defocus or high motion and has the added benefit of predictable performance.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124372952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}