{"title":"ParColl: Partitioned Collective I/O on the Cray XT","authors":"Weikuan Yu, J. Vetter","doi":"10.1109/ICPP.2008.76","DOIUrl":null,"url":null,"abstract":"Collective I/O orchestrates I/O from parallel processes by aggregating fine-grained requests into large ones. However, its performance is typically a fraction of the potential I/O bandwidth on large scale platforms such as Cray XT. Based on our analysis, the time spent in global process synchronization dominates the actual time in file reads/writes, which imposes a 'collective wall' on the performance of collective I/O. In this paper, we introduce a novel technique called partitioned collective I/O (ParColl). ParColl augments the original two-phase collective I/O protocol with new mechanisms for file area partitioning, I/O aggregator distribution and intermediate file views. Through these mechanisms, a group of processes and their targeted file are consistently divided into a collection of small subgroups, each performing I/O aggregation in a disjoint manner. File consistency is maintained through intermediate file views when necessary. Together, these mechanisms greatly reduce the cost of global synchronization. Our experimental results demonstrate that ParColl significantly improves the performance and the scalability of collective I/O. In one case, we show a 416% improvement on 1024 processes for a visualization I/O benchmark. We also show that the I/O patterns in scientific applications can benefit significantly from this technique, e.g. BT-I/O and Flash I/O.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 37th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2008.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35
Abstract
Collective I/O orchestrates I/O from parallel processes by aggregating fine-grained requests into large ones. However, its performance is typically a fraction of the potential I/O bandwidth on large scale platforms such as Cray XT. Based on our analysis, the time spent in global process synchronization dominates the actual time in file reads/writes, which imposes a 'collective wall' on the performance of collective I/O. In this paper, we introduce a novel technique called partitioned collective I/O (ParColl). ParColl augments the original two-phase collective I/O protocol with new mechanisms for file area partitioning, I/O aggregator distribution and intermediate file views. Through these mechanisms, a group of processes and their targeted file are consistently divided into a collection of small subgroups, each performing I/O aggregation in a disjoint manner. File consistency is maintained through intermediate file views when necessary. Together, these mechanisms greatly reduce the cost of global synchronization. Our experimental results demonstrate that ParColl significantly improves the performance and the scalability of collective I/O. In one case, we show a 416% improvement on 1024 processes for a visualization I/O benchmark. We also show that the I/O patterns in scientific applications can benefit significantly from this technique, e.g. BT-I/O and Flash I/O.