提高海量数字档案的小文件I/O性能

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI:10.1109/eScience.2017.39

Hwajung Kim, H. Yeom

{"title":"提高海量数字档案的小文件I/O性能","authors":"Hwajung Kim, H. Yeom","doi":"10.1109/eScience.2017.39","DOIUrl":null,"url":null,"abstract":"With the growth of online services, a large amount of files have been generated by users or by the service itself. To make it easier to service users with different network environments and devices, online services usually keep different versions of the same file with various sizes. For users with high speed network and top of the line displays, a large size file with high precision can be supplied while users with mobile devices typically receive a smaller file with less precision. In some cases, a large file can be divided into small files to make it easier to transmit over the wide area networks. As a result, underlying filesystem should efficiently maintain a large number of small files. Providing such a huge number of files to applications is one of new challenges of existing filesystems. In this paper, we propose techniques to efficiently manage a large number of files in digital archives using data characteristics and access patterns of the application. Based on the knowledge we have of the upper layer applications, we have modified both in-memory and on-disk inode structure of the existing filesystem and were able to dramatically reduce the number of storage I/O operations to service the same files. Our experimental results show that the proposed methods significantly reduce the number of storage I/O operations both for reading and writing files, especially for small-sized ones. Moreover, we demonstrated that proposed techniques reduce the application-level latency as well as improve file operation throughput, using several synthetic- and microbenchmarks.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Improving Small File I/O Performance for Massive Digital Archives\",\"authors\":\"Hwajung Kim, H. Yeom\",\"doi\":\"10.1109/eScience.2017.39\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the growth of online services, a large amount of files have been generated by users or by the service itself. To make it easier to service users with different network environments and devices, online services usually keep different versions of the same file with various sizes. For users with high speed network and top of the line displays, a large size file with high precision can be supplied while users with mobile devices typically receive a smaller file with less precision. In some cases, a large file can be divided into small files to make it easier to transmit over the wide area networks. As a result, underlying filesystem should efficiently maintain a large number of small files. Providing such a huge number of files to applications is one of new challenges of existing filesystems. In this paper, we propose techniques to efficiently manage a large number of files in digital archives using data characteristics and access patterns of the application. Based on the knowledge we have of the upper layer applications, we have modified both in-memory and on-disk inode structure of the existing filesystem and were able to dramatically reduce the number of storage I/O operations to service the same files. Our experimental results show that the proposed methods significantly reduce the number of storage I/O operations both for reading and writing files, especially for small-sized ones. Moreover, we demonstrated that proposed techniques reduce the application-level latency as well as improve file operation throughput, using several synthetic- and microbenchmarks.\",\"PeriodicalId\":137652,\"journal\":{\"name\":\"2017 IEEE 13th International Conference on e-Science (e-Science)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 13th International Conference on e-Science (e-Science)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/eScience.2017.39\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 13th International Conference on e-Science (e-Science)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2017.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

随着在线服务的发展，用户或服务本身产生了大量的文件。为了方便服务不同网络环境和设备的用户，在线服务通常会保留不同大小的同一文件的不同版本。对于使用高速网络和顶级显示器的用户，可以提供高精度的大尺寸文件，而使用移动设备的用户通常接收精度较低的小文件。在某些情况下，可以将一个大文件分成几个小文件，以便于在广域网上传输。因此，底层文件系统应该有效地维护大量的小文件。向应用程序提供如此大量的文件是现有文件系统面临的新挑战之一。本文利用数字档案的数据特性和应用程序的访问模式，提出了有效管理数字档案中大量文件的技术。基于我们对上层应用程序的了解，我们修改了现有文件系统的内存和磁盘索引节点结构，并且能够显著减少为相同文件提供服务的存储I/O操作的数量。我们的实验结果表明，所提出的方法显着减少了读取和写入文件的存储I/O操作次数，特别是对于小型文件。此外，我们使用几个综合和微基准测试证明了所提出的技术减少了应用程序级延迟并提高了文件操作吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Small File I/O Performance for Massive Digital Archives

With the growth of online services, a large amount of files have been generated by users or by the service itself. To make it easier to service users with different network environments and devices, online services usually keep different versions of the same file with various sizes. For users with high speed network and top of the line displays, a large size file with high precision can be supplied while users with mobile devices typically receive a smaller file with less precision. In some cases, a large file can be divided into small files to make it easier to transmit over the wide area networks. As a result, underlying filesystem should efficiently maintain a large number of small files. Providing such a huge number of files to applications is one of new challenges of existing filesystems. In this paper, we propose techniques to efficiently manage a large number of files in digital archives using data characteristics and access patterns of the application. Based on the knowledge we have of the upper layer applications, we have modified both in-memory and on-disk inode structure of the existing filesystem and were able to dramatically reduce the number of storage I/O operations to service the same files. Our experimental results show that the proposed methods significantly reduce the number of storage I/O operations both for reading and writing files, especially for small-sized ones. Moreover, we demonstrated that proposed techniques reduce the application-level latency as well as improve file operation throughput, using several synthetic- and microbenchmarks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 13th International Conference on e-Science (e-Science)

自引率

0.00%

发文量