Workshop on I/O in Parallel and Distributed Systems最新文献_第3页

Performance of the gallery parallel file system 库并行文件系统的性能

Workshop on I/O in Parallel and Distributed Systems Pub Date : 1996-05-27 DOI: 10.1145/236017.236038

N. Nieuwejaar, D. Kotz

引用次数: 15

ENWRICH: a compute-processor write caching scheme for parallel file systems ENWRICH:一种并行文件系统的计算机处理器写缓存方案

Workshop on I/O in Parallel and Distributed Systems Pub Date : 1996-05-27 DOI: 10.1145/236017.236034

A. Purakayastha, C. Ellis, D. Kotz

{"title":"ENWRICH: a compute-processor write caching scheme for parallel file systems","authors":"A. Purakayastha, C. Ellis, D. Kotz","doi":"10.1145/236017.236034","DOIUrl":"https://doi.org/10.1145/236017.236034","url":null,"abstract":"Many parallel scientific applications need high-performance I/O. Unfortunately, end-to-end parallel-I/O performance has not been able to keep up with substantial improvements in parallel-I/O hardware because of poor parallel file-system software. Many radical changes, both at the interface level and the implementation level, have recently been proposed. One such proposed interface is {em collective I/O}, which allows parallel jobs to request transfer of large contiguous objects in a single request, thereby preserving useful semantic information that would otherwise be lost if the transfer were expressed as per-processor non-contiguous requests. Kotz has proposed {em disk-directed I/O} as an efficient implementation technique for collective-I/O operations, where the compute processors make a single collective data-transfer request, and the I/O processors thereafter take full control of the actual data transfer, exploiting their detailed knowledge of the disk-layout to attain substantially improved performance. Recent parallel file-system usage studies show that writes to write-only files are a dominant part of the workload. Therefore, optimizing writes could have a significant impact on overall performance. In this paper, we propose ENWRICH, a compute-processor write-caching scheme for write-only files in parallel file systems. ENWRICH combines low-overhead write caching at the compute processors with high performance disk-directed I/O at the I/O processors to achieve both low latency and high bandwidth. This combination facilitates the use of the powerful disk-directed I/O technique independent of any particular choice of interface. By collecting writes over many files and applications, ENWRICH lets the I/O processors optimize disk I/O over a large pool of requests. We evaluate our design via simulated implementation and show that ENWRICH achieves high performance for various configurations and workloads.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122601500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Tuning the performance of I/O-intensive parallel applications 调优I/ o密集型并行应用程序的性能

Workshop on I/O in Parallel and Distributed Systems Pub Date : 1996-05-27 DOI: 10.1145/236017.236027

A. Acharya, Mustafa Uysal, R. Bennett, Assaf Mendelson, M. Beynon, J. Hollingsworth, J. Saltz, A. Sussman

引用次数: 98

Efficient data-parallel files via automatic mode detection 高效的数据并行文件通过自动模式检测

Workshop on I/O in Parallel and Distributed Systems Pub Date : 1996-05-27 DOI: 10.1145/236017.236025

J. Moore, P. Hatcher, M. J. Quinn

引用次数: 3

Scalable message passing in Panda 熊猫中可伸缩的消息传递

Workshop on I/O in Parallel and Distributed Systems Pub Date : 1996-05-27 DOI: 10.1145/236017.236042

Ying Chen, M. Winslett, K. Seamons, S. Kuo, Yong-Woon Cho, M. Subramaniam

{"title":"Scalable message passing in Panda","authors":"Ying Chen, M. Winslett, K. Seamons, S. Kuo, Yong-Woon Cho, M. Subramaniam","doi":"10.1145/236017.236042","DOIUrl":"https://doi.org/10.1145/236017.236042","url":null,"abstract":"To provide high performance for applications with a wide variety of i/o requirements and to support many different parallel platforms, the design of a parallel i/o system must provide for efficient utilization of available bandwidth both for disk traffic and for message passing. In this paper we discuss the message-passing scalability of the server-directed i/o architecture of Panda, a library for synchronized i/o of multidimensional arrays on parallel platforms. We show how to improve i/o performance in situations where messagepassing is a bottleneck, by combining the server-directed i/o strategy for highly efficient use of available disk bandwidth with new mechanisms to minimize internal communication and computation overhead in Panda. We present experimental results that show that with these improvements, Panda will provide high i/o performance for a wider range of applications, such as applications running with slow interconnects, applications performing i/o operations on large numbers of arrays, or applications that require drastic data rearrangements as data are moved between memory and disk (e.g., array transposition). We also argue that in the future, the improved approach to message-passing will allow Panda to support applications that are not closely synchronized or that run in heterogeneous environments.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129770964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations 太阳能的设计和实现，一个可扩展的核心外线性代数计算的便携式库

Workshop on I/O in Parallel and Distributed Systems Pub Date : 1996-05-27 DOI: 10.1145/236017.236029

Sivan Toledo, F. Gustavson

{"title":"The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations","authors":"Sivan Toledo, F. Gustavson","doi":"10.1145/236017.236029","DOIUrl":"https://doi.org/10.1145/236017.236029","url":null,"abstract":"SOLAR is a portable high-perfonnance library for out-of-core dense matrix computations. It combines portability with high perfonnance by using existing high-perfonnance in-core subroutine libraries and by using an optimized matrix input-output library. SOLAR works on parallel computers, workstations, and personal computers. It supports in-core computations on both shared-memory and distributed-memory machines, and its matrix input-output library supports both conventional 1/0 interfaces and parallel 110 interfaces. This paper discusses the overall design of SOLAR, its interfaces, and the design of several important subroutines. Experimental results show that SOLAR can factor on a single workstation an out-of-core positive-definite symmetric matrix at a rate exceeding 215 Mflops, and an out-of-core general matrix at a rate exceeding 195 Mflops. Less than 16% of the running time is spent on 110 in these computations. These results indicate that SOLAR's portability does not compromise its perfonnance. We expect that the combination of portability, modularity, and the use of a high-level 110 interface will make the library an important platfonn for research on out-of-core algorithms and on parallel 110.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133197322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 103

Bounds on the separation of two parallel disk models 两个平行磁盘模型的分离边界

Workshop on I/O in Parallel and Distributed Systems Pub Date : 1996-05-27 DOI: 10.1145/236017.236044

Chris Armen

{"title":"Bounds on the separation of two parallel disk models","authors":"Chris Armen","doi":"10.1145/236017.236044","DOIUrl":"https://doi.org/10.1145/236017.236044","url":null,"abstract":"The single-disk, D-head model of parallel I/0 was introduced by Agarwal and Vitter to analyze algorithms for problem instances that are too large to fit in primary memory. Subsequently Vitter and Shriver proposed a more realistic model in which the disk space is partitioned into D disks, with a single head per disk. To date, each problem for which there is a known optimal algorithm for both models has the same asymptotic bounds on both models. Therefore, it has been unknown whether the models are equivalent or whether the singledisk model is strictly more powerful. In this pape:r we provide evidence that the single-disk model is strictly more powerful. We prove a lower bound on any general simulation of the single-disk model on the multi-disk model and establish randomized and deterministic upper bounds. Let N be the problem size and let T be the number of parallel I/Os required by a program on the single-disk model. Then any simulation of this pro€:ram on the multi-disk model will require Q ( T 10~/:,C:Ct~b:~) parallel I/Os. This lower bound holds even if replication is allowed in the multi-disk model. *Department of Computer Science, University of Hartford, 200 Bloomfield Avenue, W. Hartford, CT 06117-1599. Email: armenGhartford.edu. This work was done while the author was a graduate student at Dartmouth College. Permission 10 make digital/hard copies of all or part of this material for personal or classroom use its granted without fee provided that the copies a.re not made or distributed for profit or commercial advantage, the copynght notice, the title of the publication and its date appear, and notice is given that copyright is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires specific permission and/or fee. 10PADS'96, Philadelphia PA, USA o 1996 ACM 0-89791-813-4/96/05 .. $3.50 122 We also show an 0 Co~0f0~ D) randomized upper bound and an 0 (log D(log log D) ) deterministic upper bound. These results exploit an interesting analogy between the disk models and the PRAM and DCM models of parallel computation.","PeriodicalId":442608,"journal":{"name":"Workshop on I/O in Parallel and Distributed Systems","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131019475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8