{"title":"Remote interactive visualization and analysis (RIVA) using parallel supercomputers","authors":"Peggy Li, W. Duquette, D. Curkendall","doi":"10.1145/218327.218340","DOIUrl":"https://doi.org/10.1145/218327.218340","url":null,"abstract":"JPL's Remote Interactive Visualization and Analysis System (RIVA) is described in detail. RIVA's kernel is a highly scalable perspective renderer tailored especially for the demands of large datasets beyond the sensible reach of workstations. The algorithmic details of this renderer are described, particularly the aspects key to achieving the algorithm's overall scalability. The paper summarizes the performance achieved for machine sizes up to more than 500 nodes and for initial input image/terrain bases of up to a gigabyte. The RIVA system integrates workstation graphics, massively parallel computing technology, and gigabit communication networks to provide a flexible interactive environment for scientific data perusal, analysis and visualization. Early experience with using RIVA to interactively explore multivariate datasets is reported and some example results given.","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"600 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116302136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Wei, Gordon Stoll, D. Clark, E. Felten, Kai Li, P. Hanrahan
{"title":"Synchronization for a multi-port frame buffer on a mesh-connected multicomputer","authors":"Bin Wei, Gordon Stoll, D. Clark, E. Felten, Kai Li, P. Hanrahan","doi":"10.1145/218327.218341","DOIUrl":"https://doi.org/10.1145/218327.218341","url":null,"abstract":"Parallel rendering on multicomputers involves the parallelization of geometry processing, rasterization and composition. A known approach to support the back end of parallel rendering on multicomputers is to connect a multiport frame buffer directly to the multicomputer routing network to take advantage of the aggregate bandwidth available on the network. However, a multi-port frame buffer design raises the question of how to synchronize the processors with the frame buffer in order to perform global control operations. The challenge is to provide a simple and efficient synchronization algorithm that requires minimal hardware support. This paper describes a softwarebased solution to the synchronization problem for a multiport frame buffer on the Paragon mesh routing network. Simulations on the Paragon multicomputer show that our algorithm is indeed efficient.","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A load balanced SIMD polygon renderer","authors":"S. Whitman","doi":"10.1145/218327.218338","DOIUrl":"https://doi.org/10.1145/218327.218338","url":null,"abstract":"This document describes a parallel polygon rendering algorit.hrn designed for a SIMD supercomputer architecture. The overall algorithm can be implemented on any SIMD machine; in this paper, we expand upon an implementation which is specific to the Princeton Engine, a product of the David Sarnoff Research Center. The algorithm has a number of improvements over previously developed SIMD renderers in that load balancing is considered and designed into the algorithm with minimal overhead.","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130047094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization","authors":"P. Lacroute","doi":"10.1145/218327.218331","DOIUrl":"https://doi.org/10.1145/218327.218331","url":null,"abstract":"This paper presents a new parallel volume rendering algorithm that can render 2563 voxel medical data sets at over 10 Hz and 1283 voxel data sets at over 30 Hz on a 16-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recentlyreported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for sharedmemory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets. CR Categories: D.1.3 [Concurrent Programming]: Parallel Programming; 1.3.3 [Computer Graphics]: Picture/Image Generation--Display Algorithms; L3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism. Additional","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122536402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast volume rendering using an efficient, scalable parallel formulation of the shear-warp algorithm","authors":"M. Amin, A. Grama, Vineet Singh","doi":"10.1145/218327.218330","DOIUrl":"https://doi.org/10.1145/218327.218330","url":null,"abstract":"This paper presents a fast and scalable parallel algorithm for volume rendering and its implementation on distributed-memory parallel computers. This parallel algorithm is based on the shear-warp algorithm of Lacroute and Levoy. Coupled with optimizations that exploit coherence in the volume and image space, the shear-warp algorithm is currently acknowledged to be the fastest sequential volume rendering algorithm. We have designed a memory efficient parallel formulation of this algorithm that (1) drastically reduces communication requirements by using a novel data partitioning scheme and (2) improves multi-frame performance with an adaptive load-balancing technique. All the optimizations of the Lacroute-Levoy algorithm are preserved in the parallel formulation. The paper also provides an analytical model of performance for the parallel formulation that shows that it is possible to sustain excellent performance across a wide range of practical problem sizes and number of processors. Our implementation, running on a 128 processor TMC CM-5 distributed-memory parallel computer, renders a 256 voxel medical data set at 12 frames/sec.","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131455012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the IEEE symposium on Parallel rendering","authors":"S. Uselton, M. Cox, C. Wittenbrink","doi":"10.1145/218327","DOIUrl":"https://doi.org/10.1145/218327","url":null,"abstract":"","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130250752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Zareski, B. Wade, Philip M. Hubbard, P. Shirley
{"title":"Efficient parallel global illumination using density estimation","authors":"David Zareski, B. Wade, Philip M. Hubbard, P. Shirley","doi":"10.1145/218327.218336","DOIUrl":"https://doi.org/10.1145/218327.218336","url":null,"abstract":"This paper presents a multi-computer, parallel version of the recently-proposed \"Density Estimation\" (DE) global illumination method, designed for computing solutions of environments with high geometric complexity (as many as hundreds of thousands of initial surfaces). In addition to the diffuse inter-reflections commonly handled by conventional radiosity methods, this new method can also handle energy transport involving arbitrary non-diffuse surfaces. Output can either be Gouraud-shaded elements for interactive walkthroughs, or ray-traced images for higher quality still frames. The key difference of the DE algorithm from conventional radiosity, in germs of its ability to parallelize efficiently, is its microscopic wew of energy transport, which avoids the O(n 2) pairwise surface interactions of most previous macroscopic radiosity algorithms (i.e.. those without clustering). Parallel DE is implemented as two separate parallel programs which perform different phases of the DE method. The first program performs the particle-tracing phase, and the second performs the density-estimation and rneshing phases. Each parallel program consists of a single master task and multiple worker tasks executing on separate workstations connected over a local area network. Communication is performed using the PVM software package and a shared file system. The goal of this effort is to provide a near-linear speedup for solutions to existing environment models using tens of processors. The parallel efficiency of the first program has been measured to be above 90% for as many as 16 workers. and the parallel efficiency of the second program has been measured to be above 70% for as many as 12 workers. C R","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121403029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Load balancing for a parallel radiosity algorithm","authors":"W. Stürzlinger, G. Schaufler, J. Volkert","doi":"10.1145/218327.218335","DOIUrl":"https://doi.org/10.1145/218327.218335","url":null,"abstract":"The radiosity method models the interaction of light between diffuse surfaces, thereby accurately predicting global illumination effects. Due to the high computational effort to calculate the transfer of light between surfaces and the memory requirements for the scene description, a distributed, paraUelized version of the algorithrn is needed for scenes consisting of thousands of surfaces. We present several load distribution schemes for such a parallel algorithm which includes progressive refinement and adaptive subdivision for fast solutions of high quality. The load is distributed before the calculations in a static way. During the computation the load is redistributed dynamically to make up for individual differences in processor loads. The dynamic load balancing scheme never generates more data packets than the original algorithm and avoids overloading processors through actions taken by the scheme. CR","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"146 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129867802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image composition methods for sort-last polygon rendering on 2-D mesh architectures","authors":"Tong-Yee Lee, C. Raghavendra, J. Nicholas","doi":"10.1145/218327.218337","DOIUrl":"https://doi.org/10.1145/218327.218337","url":null,"abstract":"In this paper, a new sort-last parallel polygon rendering implementation is given for 2-D mesh message-passing architectures such as the Ineel Delta and Paragon. Our implementation provides a very fast rendering rate for extremely large sets of polygons, a requirement of scientific visualization, C A D / C A M , and many other applications. We implement and evaluate our scheme on the Intel Delta parallel computer at Caltech. Using 512 processors to render Eric Haines's SPD s tandard scenes, our scheme achieves a rendering rate of 2.8 4.0 million triangles/second. K e y w o r d s : Polygon Rendering, SPD, Delta, Load Balancing","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127372748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation results and analysis of a parallel progressive radiosity","authors":"P. Guitton, J. Roman, Gilles Subrenat","doi":"10.1145/218327.218334","DOIUrl":"https://doi.org/10.1145/218327.218334","url":null,"abstract":"The quality of synthetic images depends, first, on the quality of the modelling of the three-dimensional scenes to visualize; more numerous are the geometrical and optical details, more realistic are the resulting images. Unfortunately, such scene descriptions need a big amount of memory, as well as a long time of computation. In order to deal with these restrictions, we propose a parallel implementation for an extended stochastic progressive radiosity method, where form factors are computed with a ray tracing scheme, on a network of processors with a distributed memory and a message passing mechanism. Our program has already treated very big scenes (more than one million patches for example).","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127800653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}