Yuan Tian, S. Klasky, Weikuan Yu, H. Abbasi, Bin Wang, N. Podhorszki, R. Grout, M. Wolf
{"title":"SMART-IO: SysteM-AwaRe Two-Level Data Organization for Efficient Scientific Analytics","authors":"Yuan Tian, S. Klasky, Weikuan Yu, H. Abbasi, Bin Wang, N. Podhorszki, R. Grout, M. Wolf","doi":"10.1109/MASCOTS.2012.30","DOIUrl":null,"url":null,"abstract":"Current I/O techniques have pushed the write performance close to the system peak, but they usually overlook the read side of problem. With the mounting needs of scientific discovery, it is important to provide good read performance for many common access patterns. Such demand requires an organization scheme that can effectively utilize the underlying storage system. However, the mismatch between conventional data layout on disk and common scientific access patterns leads to significant performance degradation when a subset of data is accessed. To this end, we design a system-aware Optimized Chunking model, which aims to find an optimized organization that can strike for a good balance between data transfer efficiency and processing overhead. To enable such model for scientific applications, we propose SMART-IO, a two-level data organization framework that can organize the blocks of multidimensional data efficiently. This scheme can adapt data layouts based on data characteristics and underlying storage systems, and enable efficient scientific analytics. Our experimental results demonstrate that SMART-IO can significantly improve the read performance for challenging access patterns, and speed up data analytics. For a mission critical combustion simulation code S3D, Smart-IO achieves up to 72 times speedup for planar reads of a 3-D variable compared to the logically contiguous data layout.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MASCOTS.2012.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Current I/O techniques have pushed the write performance close to the system peak, but they usually overlook the read side of problem. With the mounting needs of scientific discovery, it is important to provide good read performance for many common access patterns. Such demand requires an organization scheme that can effectively utilize the underlying storage system. However, the mismatch between conventional data layout on disk and common scientific access patterns leads to significant performance degradation when a subset of data is accessed. To this end, we design a system-aware Optimized Chunking model, which aims to find an optimized organization that can strike for a good balance between data transfer efficiency and processing overhead. To enable such model for scientific applications, we propose SMART-IO, a two-level data organization framework that can organize the blocks of multidimensional data efficiently. This scheme can adapt data layouts based on data characteristics and underlying storage systems, and enable efficient scientific analytics. Our experimental results demonstrate that SMART-IO can significantly improve the read performance for challenging access patterns, and speed up data analytics. For a mission critical combustion simulation code S3D, Smart-IO achieves up to 72 times speedup for planar reads of a 3-D variable compared to the logically contiguous data layout.