J. Bester, Ian T Foster, C. Kesselman, J. Tedesco, S. Tuecke
{"title":"GASS: a data movement and access service for wide area computing systems","authors":"J. Bester, Ian T Foster, C. Kesselman, J. Tedesco, S. Tuecke","doi":"10.1145/301816.301839","DOIUrl":"https://doi.org/10.1145/301816.301839","url":null,"abstract":"In wide area computing, programs frequently execute at sites that are distant from their data. Data access mechanisms are required that place limited functionality demands on an application or host system yet permit high-performance implementations. To address these requirements, we propose a data movement and access service called Global Access to Secondary Storage (GASS). This service defines a global name space via Uniform Resource Locators and allows applications to access remote files via standard I/O interfaces. High performance is achieved by incorporating default data movement strategies that are specialized for I/O patterns common in wide area applications and by providing support for programmer management of data movement. GASS forms part of the Globus toolkit, a set of services for high-performance distributed computing. GASS itself makes use of Globus services for security and communication, and other Globus components use GASS services for executable staging and real-time remote monitoring. Application experiences demonstrate that the library has practical utility.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115462559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rakesh D. Barve, Phillip B. Gibbons, B. Hillyer, Yossi Matias, Elizabeth A. M. Shriver, J. Vitter
{"title":"Round-like behavior in multiple disks on a bus","authors":"Rakesh D. Barve, Phillip B. Gibbons, B. Hillyer, Yossi Matias, Elizabeth A. M. Shriver, J. Vitter","doi":"10.1145/301816.301821","DOIUrl":"https://doi.org/10.1145/301816.301821","url":null,"abstract":"In modern I/O architectures, multiple disk drives are attached to each I/O bus. Under I/O-intensive workloads, the disk latency for a request can be overlapped with the disk latency and data transfers of requests to other disks, potentidly resulting in an aggregate I/O throughput at nearly bus bandwidth. This paper reports on a performance impairment that results from a previously unknown form of convoy behavior in disk I/O, which we call munds. In rounds, independent requests to distinct disks convoy, so that each disk services one request before any disk services its next re quest. We analyze log tiles to describe read performance of multiple Seagate Wren-7 disks that share a SCSI bus under a heavy workload, demonstrating the rounds behavior and quantifying its performance impact.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126825309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On implementing MPI-IO portably and with high performance","authors":"R. Thakur, W. Gropp, E. Lusk","doi":"10.1145/301816.301826","DOIUrl":"https://doi.org/10.1145/301816.301826","url":null,"abstract":"We discuss the issues involved in implementing MPI-IO portably on multiple machines and file systems and also achieving high performance. One way to implement MPI-IO portably is to implement it on top of the basic Unix I/O functions (open, lseek, read, write, and close), which are themselves portable. We argue that this approach has limitations in both functionality and performance. We instead advocate an implementation approach that combines a large portion of portable code and a small portion of code that is optimized separately for different machines and file systems. We have used such an approach to develop a high-performance, portable MPI-IO implementation, called ROMIO. In addition to basic I/O functionality, we consider the issues of supporting other MPI-IO features, such as 64-bit file sizes, noncontiguous accesses, collective I/O, asynchronous I/O, consistency and atomicity semantics, user-supplied hints, shared file pointers, portable data representation, and file preallocation. We describe how we implemented each of these features on various machines and file systems. The machines we consider are the HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, SGI Origin2000, and networks of workstations; and the file systems we consider are HP HFS, IBM PIOFS, Intel PFS, NEC SFS, SGI XFS, NFS, and any general Unix file system (UFS). We also present our thoughts on how a file system can be designed to better support MPI-IO. We provide a list of features desired from a file system that would help in implementing MPI-IO correctly and with high performance.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"268 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127774274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Remzi H. Arpaci-Dusseau, Eric Anderson, N. Treuhaft, D. Culler, J. Hellerstein, D. Patterson, K. Yelick
{"title":"Cluster I/O with River: making the fast case common","authors":"Remzi H. Arpaci-Dusseau, Eric Anderson, N. Treuhaft, D. Culler, J. Hellerstein, D. Patterson, K. Yelick","doi":"10.1145/301816.301823","DOIUrl":"https://doi.org/10.1145/301816.301823","url":null,"abstract":"We introduce River, a data-flow programming environment and I/O substrate for clusters of computers. River is designed to provide maximum performance in the common case — even in the face of nonuniformities in hardware, software, and workload. River is based on two simple design features: a high-performance distributed queue, and a storage redundancy mechanism called graduated declustering. We have implemented a number of data-intensive applications on River, which validate our design with near-ideal performance in a variety of non-uniform performance scenarios.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"268 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132311735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The impact of spatial layout of jobs on parallel I/O performance","authors":"Jens Mache, V. Lo, M. Livingston, Sharad Garg","doi":"10.1145/301816.301830","DOIUrl":"https://doi.org/10.1145/301816.301830","url":null,"abstract":"Input/Output is a big obstacle to effective use of tenflopsscale computing systems, Motivated by earlier parallel I/O meaurements on an Intel TFLOPS machine, we conduct studies to determine the sensitivity of parallel I/O performance on multi-progmmmed mesh-connected machines with respect to number of I/O nodes, number of compute nodes, network link bandwidth, I/O node bandwidth, spatial layout of jobs, and read or write demands of applications. Our extensive simulations and analytical modeling yield important insights into the limitations on parallel I/O performance due to network contention, and into the possible gains in parallel I/O performance that can be achieved by tuning the spatial layout of jobs. Applying these results, we devise a new processor allocation strategy that is sensitive to parallel I/O traffic and the resulting network contention. In performance evaluations driven by synthetic workloads and by a real workload trace captured at the San Diego Supercomputing Center, the new strategy improves the average response time of parallel I/O intensive jobs by up to a factor of 4.5.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133685949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Thread scheduling for out-of-core applications with memory server on multicomputers","authors":"Yuanyuan Zhou, Limin Wang, D. Clark, Kai Li","doi":"10.1145/301816.301833","DOIUrl":"https://doi.org/10.1145/301816.301833","url":null,"abstract":"Out-of-core applications perform poorly in paged virtual memory (VM) systems because demand paging involves slow disk I/O accesses. Much research has been done on reducing the I/O overhead in such applications by either reducing the number of I/Os or lowering the cost of each I/O operation. In this paper, we investigate a method that combines finegrained threading with a memory server model to improve the performance of out-of-core applications on multicomputers. The memory server model decreases the average cost of I/O operations by paging to remote memory, while the fine-grained thread scheduling reduces the number of I/O accesses by improving the data locality of applications. We have evaluated this method on an Intel Paragon with 7 applications. Our results show that the memory server system performs better than the VM disk paging by a factor of 5 for sequential applications and a factor of 1.5 to 2.2 for parallel applications. The fine-grained threading alone improves the VM disk paging performance by a factor of 10 and 1.2 to 3 respectively for sequential and parallel applications. Overall, the combination of these two techniques outperforms the VM disk paging by more than a factor of 12 for sequential applications and a factor of 3 to 6 for parallel applications.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114517934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Kuo, M. Winslett, Yong Cho, Jonghyun Lee, Ying Chen
{"title":"Efficient input and output for scientific simulations","authors":"S. Kuo, M. Winslett, Yong Cho, Jonghyun Lee, Ying Chen","doi":"10.1145/301816.301828","DOIUrl":"https://doi.org/10.1145/301816.301828","url":null,"abstract":"Large simulations which run for hundreds of hours on paralle l computers often periodically generate snapshots of states, wh ich are later post-processed to visualize the simulated physical p henomenon. For many applications, fast I/O during post-processing, wh ich is dependent on an efficient organization of data on disk, is as i mportant as minimizing computation-time I/O. In this paper we pr opose optimizations to support efficient parallel I/O for scienti fic simulations and subsequent visualizations. We present an orderin g mechanism to linearize data on disk, a performance model to help t o choose a proper stripe unit size, and a scheduling algorithm to inimize communication contention. Our experiments on an IBM S P show that the combination of these strategies provides a 2025% performance boost.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125993000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smart file objects: a remote file access paradigm","authors":"J. Weissman","doi":"10.1145/301816.301842","DOIUrl":"https://doi.org/10.1145/301816.301842","url":null,"abstract":"This paper describes o new scheme for remote file access called Smart File Objects (SFO). The SF0 is an object-oriented application-specific file access paradigm designed to attack the bottleneck imposed by high Iotency low bandwidth networks such (IS wide-area and wireless networks. The SF0 uses application and network inform&on to adaptively prefetch needed data in pnmllel with the aecurion of the application. The SF0 con offer additional advantages such (IS non-blocking I/O, bulk I/O, improved& access APIs, and increased relinbiliry We describe the SF0 concept, a prototype implementation in the Mentat system, and the results obtained with o distributed gene sequence application running across the Internet and vBNS. The results show the P otential of the SF0 opprooch to improve application performance.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124945440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design issues of a cooperative cache with no coherence problems","authors":"Toni Cortes, S. Girona, Jesús Labarta","doi":"10.1145/266220.266224","DOIUrl":"https://doi.org/10.1145/266220.266224","url":null,"abstract":"In this paper, we examine some of the important problems observed in the design of cooperative caches. Solutions to the coherence, load-balancing and fault-tolerance problems are presented. These solutions have been implemented as a part of PAFS, a parallel/distributed file system, and its performance has been compared to the one achieved by xFS. Using the comparison results, we have observed that the proposed ideas not only solve the main problems of cooperative caches, but also increase the overall system performance. Although the solutions presented in this paper were targeted to a parallel machine, reasonable good results have also been obtained for networks of workstations.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129659295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiprocessor out-of-core FFTs with distributed memory and parallel disks (extended abstract)","authors":"T. Cormen, J. Wegmann, D. Nicol","doi":"10.1145/266220.266227","DOIUrl":"https://doi.org/10.1145/266220.266227","url":null,"abstract":"This paper extends an earlier out-of-core Fast Fourier Transform (FFT) method for a uniprocessor with the Parallel Disk Model (PDM) to use multiple processors. Four out-of-core multiprocessor methods are examined. Operationally, these methods di er in the size of minibutter y\" computed in memory and how the data are organized on the disks and in the distributed memory of the multiprocessor. The methods also perform di ering amounts of I/O and communication. Two of them have the remarkable property that even though they are computing the FFT on a multiprocessor, all interprocessor communication occurs outside the mini-butter y computations. Performance results on a small workstation cluster indicate that except for unusual combinations of problem size and memory size, the methods that do not perform interprocessor communication during the mini-butter y computations require approximately 86% of the time of those that do. Moreover, the faster methods are much easier to implement.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129893301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}