{"title":"Scheduling with QoS in parallel I/O systems","authors":"Ajay Gulati, P. Varman","doi":"10.1145/1162628.1162629","DOIUrl":"https://doi.org/10.1145/1162628.1162629","url":null,"abstract":"Parallel I/O architectures are increasingly deployed for high performance computing and in shared data centers. In these environments it is desirable to provide QoS-based allocation of disk bandwidth to different applications sharing the I/O system. In this paper, we introduce a model of disk bandwidth allocation, and provide efficient scheduling algorithms to assign the bandwidth among the concurrent applications.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130845768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel out-of-core computing system using PVFS for Linux clusters","authors":"Jianqi Tang, Binxing Fang, Mingzeng Hu, Hongli Zhang","doi":"10.1145/1162628.1162633","DOIUrl":"https://doi.org/10.1145/1162628.1162633","url":null,"abstract":"Cluster systems become a new and popular approach to parallel computing. More and more scientists and engineers use clusters to solve problems with large sets of data for its high processing power, low price and good scalability. Since the traditional out-of-core programs are difficult to write and the virtual memory system does not perform well, we develop a parallel out-of-core computing system using PVFS named POCCS. POCCS provides convenient interface to write out-of-core codes and the global view of the out-of-core data. The software architecture, data storage model and system implementation are described in this paper. The experimental results show that POCCS extends the problem sizes that can be solved and the performance of POCCS is better than the virtual memory system while the data set is large.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126549983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A case for virtualized arrays of RAID","authors":"A. Brinkmann, Kay Salzwedel, Mario Vodisek","doi":"10.1145/1162628.1162630","DOIUrl":"https://doi.org/10.1145/1162628.1162630","url":null,"abstract":"Redundant arrays of independent disks, also called RAID arrays, have gained a wide popularity in the last twenty years. Most of the disks used in the server market are currently based on RAID technology. The primary reason for introducing RAID technology in 1988 has been the fact that large disk systems have become much slower and more expensive than the connection of a large number of inexpensive disks and the use of them as an array.The times seem to repeat themselves. Today, large scale RAID arrays have become incredible big and expensive. It seems that it makes sense to replace them by a collection of smaller and inexpensive arrays of JBODs or mid-ranged RAID arrays. In this paper we will show that combining these systems with state-of-the-art virtualization technology can lead to a system that is faster and less expensive than an enterprise storage system, while being as easy to manage and as reliable. Therefore we will outline the most important features of storage management and compare there realization in enterprise class storage systems and in current and future virtualization environments.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133624094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A performance-oriented energy efficient file system","authors":"Dong Li, Jun Wang","doi":"10.1145/1162628.1162636","DOIUrl":"https://doi.org/10.1145/1162628.1162636","url":null,"abstract":"Current general-purpose file systems emphasize the consistency of standard file system semantics and performance issues rather than energy-efficiency. In this paper we present a novel energy efficient file system called EEFS to effectively both reduce energy consumption and improve performance by separately managing those small-sized files with a good group access locality. To keep compatibility, EEFS consists of two working modules: a normal Unix-like File System (UFS) and a group-structured file system (GFS) that are transparent to user applications. EEFS contributes a new grouping policy that can construct files groups with group access locality and be used to migrate files between UFS and GFS. Comprehensive trace-driven simulation experiments show that EEFS achieves a great energy savings by up to 50% compared to that of the general-purpose UNIX file system, and simultaneously delivers a better file I/O performance by up to 21%.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131733534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtualization with prefetching abilities based on iSCSI","authors":"Peter Bleckmann, Gunnar Schomaker, A. Slowik","doi":"10.1145/1162628.1162634","DOIUrl":"https://doi.org/10.1145/1162628.1162634","url":null,"abstract":"The Internet-SCSI protocol [iSCSI] allows a client to interact with a remote SCSI-capable target by means of block-oriented commands encapsulated within TCP/IP packets. Thereby, iSCSI greatly simplifies storage virtualization, since clients can access storage in a unified manner, no matter whether the I/O-path is short or long distance. Intermediate devices located on the path between a client and a target can easily intercept iSCSI sessions and rewrite packets for the sake of load balancing, prefetching, or redundancy, to mention just a few beneficial applications. Within this paper we describe the design and implementation of such an iSCSI capable intermediate device that deploys prefetching strategies in combination with redundant disks to reduce average I/O-latency. Depending on its location within the network, this virtualization and prefetching device can hide wide area access latency and reduce network contention targeting remote SCSI-devices to a large extent.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123229548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of iSCSI target software","authors":"Fujita Tomonori, Ogawara Masanori","doi":"10.1145/1162628.1162632","DOIUrl":"https://doi.org/10.1145/1162628.1162632","url":null,"abstract":"We analyzed the design and performance of iSCSI storage systems, built into general purpose operating systems. Our experiments revealed that a storage system that uses specialized functions, in conjunction with the modified operating system, outperforms a storage system that only uses the standard functions provided by the operating system. However, our results also show that careful design enables the latter approach to provide a comparable performance to that of the former, in common workloads.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125640532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An overview on MEMS-based storage, its research issues and open problems","authors":"Yifeng Zhu","doi":"10.1145/1162628.1162635","DOIUrl":"https://doi.org/10.1145/1162628.1162635","url":null,"abstract":"A disruptive new storage technology based on Microelectromechanical Systems (MEMS) is emerging as an exciting complement to the memory hierarchy. This study reviews and summarizes the current research about integrating this new technology into computer systems from four levels: device, architecture, system and application. In addition, several potential research issues in MEMS storage are identified, including (1) exploiting idle read/write tips to perform prefetching, (2) reversal access to save seek time, (3) fault-tolerance design inside storage devices, (4) power consumption modeling, (5) reevaluation of existing disk-type I/O optimization algorithms.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130302530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RAMS: a RDMA-enabled I/O cache architecture for clustered network servers","authors":"Peng Gu, Jun Wang","doi":"10.1145/1162628.1162637","DOIUrl":"https://doi.org/10.1145/1162628.1162637","url":null,"abstract":"Previous studies show that intra-cluster communication easily becomes a major performance bottleneck for a wide range of small write-sharing workloads especially read-only workloads in modern clustered network servers. A Remote Direct Memory Access (RDMA) technique has been recommended by many researchers to address the problem but how to well utilize RDMA is still in its infancy. This paper proposed a novel solution to boost intra-cluster communication performance by creatively developing a RDMA-enabled collaborative I/O cache Architecture called RAMS, which aims to smartly cache the most recently used RDMA-based intra-cluster data transfer processes for future reuse. RAMS makes two major contributions to facilitate the RDMA deployment: 1) design a novel RDMA-based user-level buffer cache architecture to cache both intra-cluster transferred data and data references; 2) develop three propagated update protocols to attack a RDMA read failure problem. Comprehensive experimental results show that three proposed new update protocols of RAMS can slash the RDMA read failure rate by 75%, and indirectly boost the system throughput by more than 50%, compared with a baseline system using Remote Procedure Call (RPC).","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131654497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Increasing the capacity of RAID5 by online gradual assimilation","authors":"J. González, Toni Cortes","doi":"10.1145/1162628.1162631","DOIUrl":"https://doi.org/10.1145/1162628.1162631","url":null,"abstract":"Disk arrays level 5 (RAID5) are very commonly used in many environments. This kind of arrays has the advantage of parallel access, fault tolerance and little waste of space for redundancy issues. Nevertheless, this kind of storage architecture has a problem when more disks have to be added to the array. Currently, there is no simple, efficient and on-line mechanism to add any number of new disks (not replacing them), and this is an important drawback in systems that cannot be stopped when the storage capacity needs to be increased. We propose an algorithm to add N disks to an array while it continues running. The proposed algorithm for a gradual assimilation of disks has three major advantages: it has an easily controlled overhead, it allows the user to benefit from the higher parallelism achieved by the part of the array that has already been converted, and finally, it can be used in 7/24 systems.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131854257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demotion-based exclusive caching through demote buffering: design and evaluations over different networks","authors":"Jiesheng Wu, P. Wyckoff, D. Panda","doi":"10.1145/1162618.1162627","DOIUrl":"https://doi.org/10.1145/1162618.1162627","url":null,"abstract":"Multi-level buffer cache architecture has been widely deployed in today's multiple-tier computing environments. However, caches in different levels are inclusive. To make better use of these caches and to achieve the expected performance commensurate to the aggregate cache size, exclusive caching has been proposed. Demotion-based exclusive caching [1] introduces a DEMOTE operation to transfer blocks discarded by a upper level cache to a lower level cache. In this paper, we propose a DEMOTE buffering mechanism over storage networks to reduce the visible costs of DEMOTE operations and provide more flexibility for optimizations. We evaluate the performance of DEMOTE buffering using simulations across both synthetic and real-life workloads on three different networks and protocol layers (TCP/IP on Fast Ethernet, IBNice on InfiniBand, and VAPI on InfiniBand). Our results show that DEMOTE buffering can effectively hide demotion costs. A maximum speedup of 1.4x over the original DEMOTE approach is achieved for some workloads. Speedups in the range of 1.08--1.15x are achieved for two real-life workloads. The vast performance gains results from overlapping demotions and other activities, reduced communication operations and high utilization of the network bandwidth.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123731471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}