多gpu集群上快速生成千兆级全息图的数据分布方法

T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai
{"title":"多gpu集群上快速生成千兆级全息图的数据分布方法","authors":"T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai","doi":"10.1145/3231104.3231105","DOIUrl":null,"url":null,"abstract":"The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.","PeriodicalId":164914,"journal":{"name":"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Data Distribution Method for Fast Giga-scale Hologram Generation on a Multi-GPU Cluster\",\"authors\":\"T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai\",\"doi\":\"10.1145/3231104.3231105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.\",\"PeriodicalId\":164914,\"journal\":{\"name\":\"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3231104.3231105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3231104.3231105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

长期以来,人们一直期待3D全息显示器成为未来的人机界面,因为它不需要用户佩戴特殊的设备。然而,除了显示设备技术的延迟外,其繁重的计算需求也阻碍了这种显示的实现。最近的一项研究表明,为了实现高分辨率和宽视角,需要实时处理数十亿像素的物体和全息图。针对这一问题,我们首先提出了一种新的数据分布方法,该方法利用基于fft的基本O(N log N)计算,但在多gpu集群的计算过程中不需要任何节点间通信。然后,我们在一个多gpu集群上实现了该方法,应用了几种单节点和多节点优化和并行化技术。实验结果表明,节点内优化比原单节点代码提高了11.52倍的速度。此外,使用8个节点,每个节点2个gpu的多节点优化,从3.2千兆像素对象生成1.6千兆像素全息图的执行时间为4.28秒。这意味着使用传统的基于fft的算法,CPU的顺序处理速度提高了237.92倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Data Distribution Method for Fast Giga-scale Hologram Generation on a Multi-GPU Cluster
The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信