针对放射治疗中大量射线的gpu优化体积射线追踪

ACM Trans. Embed. Comput. Syst. Pub Date : 2013-12-01 DOI:10.1145/2539036.2539038

Bo Zhou, K. Xiao, D. Chen, X. Hu

{"title":"针对放射治疗中大量射线的gpu优化体积射线追踪","authors":"Bo Zhou, K. Xiao, D. Chen, X. Hu","doi":"10.1145/2539036.2539038","DOIUrl":null,"url":null,"abstract":"Ray tracing within a uniform grid volume is a fundamental process invoked frequently by many applications, especially radiation-dose calculation methods in radiotherapy. However, the conflicting features between the GPU memory architecture and the memory-accessing patterns of volume ray tracing lead to inefficient usage of GPU memory bandwidth and waste of capability of modern GPUs. To improve the ray tracing performance on GPU, we propose a lookup-table-based ray tracing method which is specially optimized towards the GPU memory system for processing a massive number of rays. The proposed method is based on a key observation that many of these applications normally involves a massive number of rays, but their ray tracing may not need to follow a specific execution order. Therefore, we divide the 3D space into many regions (called pyramids) and group together the rays falling into the same pyramid. For each ray group, the volume is rotated and resampled for their raytracing. This divide-and-rotate strategy allows the memory access of the ray tracing process to adopt a table-lookup approach and leads to better memory coalescing on GPU. Our proposed method was thoroughly evaluated in four volume setups with randomly-generated rays. The collapsed-cone convolution/superposition (CCCS) dose calculation method is also implemented with/without the proposed approach to verify the feasibility of our method. Compared with the direct GPU implementation of the popular 3DDDA algorithm, our method provides a speedup in the range of 1.91--2.94X for the volume settings we used. Major performance factors, including ray origins, volume size, and pyramid size, are also analyzed. The proposed technique was also found to be able to give a speedup of 1.61--2.17X over the original GPU implementation of the CCCS algorithm. Our experiment results indicate that the proposed approach is capable of offering better coalesced memory access which eventually boosts the raytracing performance on GPU. Moreover, our approach is conceptually simple and can be readily included into various applications.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"816 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"GPU-optimized volume ray tracing for massive numbers of rays in radiotherapy\",\"authors\":\"Bo Zhou, K. Xiao, D. Chen, X. Hu\",\"doi\":\"10.1145/2539036.2539038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ray tracing within a uniform grid volume is a fundamental process invoked frequently by many applications, especially radiation-dose calculation methods in radiotherapy. However, the conflicting features between the GPU memory architecture and the memory-accessing patterns of volume ray tracing lead to inefficient usage of GPU memory bandwidth and waste of capability of modern GPUs. To improve the ray tracing performance on GPU, we propose a lookup-table-based ray tracing method which is specially optimized towards the GPU memory system for processing a massive number of rays. The proposed method is based on a key observation that many of these applications normally involves a massive number of rays, but their ray tracing may not need to follow a specific execution order. Therefore, we divide the 3D space into many regions (called pyramids) and group together the rays falling into the same pyramid. For each ray group, the volume is rotated and resampled for their raytracing. This divide-and-rotate strategy allows the memory access of the ray tracing process to adopt a table-lookup approach and leads to better memory coalescing on GPU. Our proposed method was thoroughly evaluated in four volume setups with randomly-generated rays. The collapsed-cone convolution/superposition (CCCS) dose calculation method is also implemented with/without the proposed approach to verify the feasibility of our method. Compared with the direct GPU implementation of the popular 3DDDA algorithm, our method provides a speedup in the range of 1.91--2.94X for the volume settings we used. Major performance factors, including ray origins, volume size, and pyramid size, are also analyzed. The proposed technique was also found to be able to give a speedup of 1.61--2.17X over the original GPU implementation of the CCCS algorithm. Our experiment results indicate that the proposed approach is capable of offering better coalesced memory access which eventually boosts the raytracing performance on GPU. Moreover, our approach is conceptually simple and can be readily included into various applications.\",\"PeriodicalId\":183677,\"journal\":{\"name\":\"ACM Trans. Embed. Comput. Syst.\",\"volume\":\"816 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Embed. Comput. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2539036.2539038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Embed. Comput. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2539036.2539038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

均匀网格体积内的射线追踪是许多应用中经常用到的基本过程，特别是放射治疗中的辐射剂量计算方法。然而，GPU内存架构与体射线追踪的内存访问模式之间的冲突导致了GPU内存带宽的低效使用和现代GPU性能的浪费。为了提高在GPU上的光线跟踪性能，我们提出了一种针对GPU存储系统处理大量光线进行优化的基于查询表的光线跟踪方法。所提出的方法是基于一个关键的观察，即许多这些应用程序通常涉及大量的光线，但它们的光线跟踪可能不需要遵循特定的执行顺序。因此，我们将三维空间划分为许多区域(称为金字塔)，并将落入同一金字塔的光线组合在一起。对于每个射线组，体积被旋转并重新采样以进行光线跟踪。这种分割和旋转策略允许光线追踪过程的内存访问采用表查找方法，并导致GPU上更好的内存合并。我们提出的方法在随机产生的射线的四个体积设置中进行了彻底的评估。采用/不采用本文提出的方法实现了坍缩锥卷积/叠加(CCCS)剂量计算方法，以验证本文方法的可行性。与流行的3DDDA算法的直接GPU实现相比，我们的方法为我们使用的音量设置提供了1.91- 2.94X的加速范围。主要的性能因素，包括射线起源、体积大小和金字塔大小，也进行了分析。与CCCS算法的原始GPU实现相比，所提出的技术也能够提供1.61- 2.17X的加速。我们的实验结果表明，该方法能够提供更好的合并内存访问，最终提高GPU上的光线跟踪性能。此外，我们的方法在概念上很简单，可以很容易地包含到各种应用程序中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GPU-optimized volume ray tracing for massive numbers of rays in radiotherapy

Ray tracing within a uniform grid volume is a fundamental process invoked frequently by many applications, especially radiation-dose calculation methods in radiotherapy. However, the conflicting features between the GPU memory architecture and the memory-accessing patterns of volume ray tracing lead to inefficient usage of GPU memory bandwidth and waste of capability of modern GPUs. To improve the ray tracing performance on GPU, we propose a lookup-table-based ray tracing method which is specially optimized towards the GPU memory system for processing a massive number of rays. The proposed method is based on a key observation that many of these applications normally involves a massive number of rays, but their ray tracing may not need to follow a specific execution order. Therefore, we divide the 3D space into many regions (called pyramids) and group together the rays falling into the same pyramid. For each ray group, the volume is rotated and resampled for their raytracing. This divide-and-rotate strategy allows the memory access of the ray tracing process to adopt a table-lookup approach and leads to better memory coalescing on GPU. Our proposed method was thoroughly evaluated in four volume setups with randomly-generated rays. The collapsed-cone convolution/superposition (CCCS) dose calculation method is also implemented with/without the proposed approach to verify the feasibility of our method. Compared with the direct GPU implementation of the popular 3DDDA algorithm, our method provides a speedup in the range of 1.91--2.94X for the volume settings we used. Major performance factors, including ray origins, volume size, and pyramid size, are also analyzed. The proposed technique was also found to be able to give a speedup of 1.61--2.17X over the original GPU implementation of the CCCS algorithm. Our experiment results indicate that the proposed approach is capable of offering better coalesced memory access which eventually boosts the raytracing performance on GPU. Moreover, our approach is conceptually simple and can be readily included into various applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Trans. Embed. Comput. Syst.

自引率

0.00%

发文量