EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization

2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI:10.1109/CLUSTER.2011.18

Yuan Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorszki, Qing Liu, Yandong Wang, Weikuan Yu

{"title":"EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization","authors":"Yuan Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorszki, Qing Liu, Yandong Wang, Weikuan Yu","doi":"10.1109/CLUSTER.2011.18","DOIUrl":null,"url":null,"abstract":"Large scale scientific applications are often bottlenecked due to the writing of checkpoint-restart data. Much work has been focused on improving their write performance. With the mounting needs of scientific discovery from these datasets, it is also important to provide good read performance for many common access patterns, which requires effective data organization. To address this issue, we introduce Elastic Data Organization (EDO), which can transparently enable different data organization strategies for scientific applications. Through its flexible data ordering algorithms, EDO harmonizes different access patterns with the underlying file system. Two levels of data ordering are introduced in EDO. One works at the level of data groups (a.k.a process groups). It uses Hilbert Space Filling Curves (SFC) to balance the distribution of data groups across storage targets. Another governs the ordering of data elements within a data group. It divides a data group into sub chunks and strikes a good balance between the size of sub chunks and the number of seek operations. Our experimental results demonstrate that EDO is able to achieve balanced data distribution across all dimensions and improve the read performance of multidimensional datasets in scientific applications.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2011.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

Abstract

Large scale scientific applications are often bottlenecked due to the writing of checkpoint-restart data. Much work has been focused on improving their write performance. With the mounting needs of scientific discovery from these datasets, it is also important to provide good read performance for many common access patterns, which requires effective data organization. To address this issue, we introduce Elastic Data Organization (EDO), which can transparently enable different data organization strategies for scientific applications. Through its flexible data ordering algorithms, EDO harmonizes different access patterns with the underlying file system. Two levels of data ordering are introduced in EDO. One works at the level of data groups (a.k.a process groups). It uses Hilbert Space Filling Curves (SFC) to balance the distribution of data groups across storage targets. Another governs the ordering of data elements within a data group. It divides a data group into sub chunks and strikes a good balance between the size of sub chunks and the number of seek operations. Our experimental results demonstrate that EDO is able to achieve balanced data distribution across all dimensions and improve the read performance of multidimensional datasets in scientific applications.

查看原文本刊更多论文

EDO:通过弹性数据组织提高科学应用的读取性能

大规模的科学应用常常因为写入检查点重启数据而遇到瓶颈。很多工作都集中在提高它们的写入性能上。随着对这些数据集的科学发现需求的增加，为许多常见的访问模式提供良好的读取性能也很重要，这需要有效的数据组织。为了解决这个问题，我们引入了弹性数据组织(EDO)，它可以透明地为科学应用程序启用不同的数据组织策略。通过其灵活的数据排序算法，EDO将不同的访问模式与底层文件系统协调起来。在EDO中引入了两个级别的数据排序。一个在数据组(又称流程组)级别工作。它使用希尔伯特空间填充曲线(SFC)来平衡数据组在存储目标之间的分布。另一个控制数据组中数据元素的顺序。它将数据组划分为子块，并在子块的大小和查找操作的数量之间取得良好的平衡。实验结果表明，EDO能够在科学应用中实现多维数据集的均衡分布，提高多维数据集的读取性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量