基于剪切-翘曲分解的共享内存多处理器实时体绘制

P. Lacroute
{"title":"基于剪切-翘曲分解的共享内存多处理器实时体绘制","authors":"P. Lacroute","doi":"10.1145/218327.218331","DOIUrl":null,"url":null,"abstract":"This paper presents a new parallel volume rendering algorithm that can render 2563 voxel medical data sets at over 10 Hz and 1283 voxel data sets at over 30 Hz on a 16-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recentlyreported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for sharedmemory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets. CR Categories: D.1.3 [Concurrent Programming]: Parallel Programming; 1.3.3 [Computer Graphics]: Picture/Image Generation--Display Algorithms; L3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism. Additional","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"121","resultStr":"{\"title\":\"Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization\",\"authors\":\"P. Lacroute\",\"doi\":\"10.1145/218327.218331\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a new parallel volume rendering algorithm that can render 2563 voxel medical data sets at over 10 Hz and 1283 voxel data sets at over 30 Hz on a 16-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recentlyreported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for sharedmemory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets. CR Categories: D.1.3 [Concurrent Programming]: Parallel Programming; 1.3.3 [Computer Graphics]: Picture/Image Generation--Display Algorithms; L3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism. Additional\",\"PeriodicalId\":101947,\"journal\":{\"name\":\"Proceedings of the IEEE symposium on Parallel rendering\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"121\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the IEEE symposium on Parallel rendering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/218327.218331\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE symposium on Parallel rendering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/218327.218331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 121

摘要

本文提出了一种新的并行体绘制算法,该算法可以在16处理器Silicon Graphics Challenge上以超过10 Hz的速度渲染2563体素医疗数据集,并以超过30 Hz的速度渲染1283体素数据集。该算法通过最小化执行时间的三个组成部分来实现这些结果:计算时间、同步时间和数据通信时间。由于并行算法基于最近报道的剪切-翘曲串行体绘制算法,计算时间短,比以前的串行算法快5倍以上。通过使用动态负载平衡和最小化同步事件的任务分区,可以最大限度地减少同步时间。数据通信成本很低,因为该算法是为共享内存多处理器实现的,共享内存多处理器是一类具有低延迟细粒度通信和硬件缓存以隐藏延迟的硬件支持的机器。我们从执行中得出两个结论。首先,我们发现在共享内存架构上,数据重新分配和通信成本不会主导渲染时间。其次,我们发现缓存局部性要求限制了体绘制算法的并行性。具体来说,我们的结果表明,具有数百个处理器的共享内存机器只对呈现非常大的数据集有用。CR分类:D.1.3[并发编程]:并行编程;1.3.3【计算机图形学】:图片/图像生成—显示算法;L3.7[计算机图形学]:三维图形和现实主义。额外的
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization
This paper presents a new parallel volume rendering algorithm that can render 2563 voxel medical data sets at over 10 Hz and 1283 voxel data sets at over 30 Hz on a 16-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recentlyreported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for sharedmemory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets. CR Categories: D.1.3 [Concurrent Programming]: Parallel Programming; 1.3.3 [Computer Graphics]: Picture/Image Generation--Display Algorithms; L3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism. Additional
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信