Zhao Zhang, Lei Huang, J. G. Pauloski, Ian T Foster
{"title":"基于压缩数据的神经网络高效I/O训练","authors":"Zhao Zhang, Lei Huang, J. G. Pauloski, Ian T Foster","doi":"10.1109/IPDPS47924.2020.00050","DOIUrl":null,"url":null,"abstract":"FanStore is a shared object store that enables efficient and scalable neural network training on supercomputers. By providing a global cache layer on node-local burst buffers using a compressed representation, it significantly enhances the processing capability of deep learning (DL) applications on existing hardware. In addition, FanStore allows POSIX-compliant file access to the compressed data in user space. We investigate the tradeoff between runtime overhead and data compression ratio using real-world datasets and applications, and propose a compressor selection algorithm to maximize storage capacity given performance constraints. We consider both asynchronous (i.e., with prefetching) and synchronous I/O strategies, and propose mechanisms for selecting compressors for both approaches. Using FanStore, the same storage hardware can host 2–13× more data for example applications without significant runtime overhead. Empirically, our experiments show that FanStore scales to 512 compute nodes with near linear performance scalability.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"87 1","pages":"409-418"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Efficient I/O for Neural Network Training with Compressed Data\",\"authors\":\"Zhao Zhang, Lei Huang, J. G. Pauloski, Ian T Foster\",\"doi\":\"10.1109/IPDPS47924.2020.00050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"FanStore is a shared object store that enables efficient and scalable neural network training on supercomputers. By providing a global cache layer on node-local burst buffers using a compressed representation, it significantly enhances the processing capability of deep learning (DL) applications on existing hardware. In addition, FanStore allows POSIX-compliant file access to the compressed data in user space. We investigate the tradeoff between runtime overhead and data compression ratio using real-world datasets and applications, and propose a compressor selection algorithm to maximize storage capacity given performance constraints. We consider both asynchronous (i.e., with prefetching) and synchronous I/O strategies, and propose mechanisms for selecting compressors for both approaches. Using FanStore, the same storage hardware can host 2–13× more data for example applications without significant runtime overhead. Empirically, our experiments show that FanStore scales to 512 compute nodes with near linear performance scalability.\",\"PeriodicalId\":6805,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"87 1\",\"pages\":\"409-418\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS47924.2020.00050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient I/O for Neural Network Training with Compressed Data
FanStore is a shared object store that enables efficient and scalable neural network training on supercomputers. By providing a global cache layer on node-local burst buffers using a compressed representation, it significantly enhances the processing capability of deep learning (DL) applications on existing hardware. In addition, FanStore allows POSIX-compliant file access to the compressed data in user space. We investigate the tradeoff between runtime overhead and data compression ratio using real-world datasets and applications, and propose a compressor selection algorithm to maximize storage capacity given performance constraints. We consider both asynchronous (i.e., with prefetching) and synchronous I/O strategies, and propose mechanisms for selecting compressors for both approaches. Using FanStore, the same storage hardware can host 2–13× more data for example applications without significant runtime overhead. Empirically, our experiments show that FanStore scales to 512 compute nodes with near linear performance scalability.