T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai
{"title":"多gpu集群上快速生成千兆级全息图的数据分布方法","authors":"T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai","doi":"10.1145/3231104.3231105","DOIUrl":null,"url":null,"abstract":"The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.","PeriodicalId":164914,"journal":{"name":"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Data Distribution Method for Fast Giga-scale Hologram Generation on a Multi-GPU Cluster\",\"authors\":\"T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai\",\"doi\":\"10.1145/3231104.3231105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.\",\"PeriodicalId\":164914,\"journal\":{\"name\":\"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3231104.3231105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3231104.3231105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data Distribution Method for Fast Giga-scale Hologram Generation on a Multi-GPU Cluster
The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.