EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization

Yuan Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorszki, Qing Liu, Yandong Wang, Weikuan Yu
{"title":"EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization","authors":"Yuan Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorszki, Qing Liu, Yandong Wang, Weikuan Yu","doi":"10.1109/CLUSTER.2011.18","DOIUrl":null,"url":null,"abstract":"Large scale scientific applications are often bottlenecked due to the writing of checkpoint-restart data. Much work has been focused on improving their write performance. With the mounting needs of scientific discovery from these datasets, it is also important to provide good read performance for many common access patterns, which requires effective data organization. To address this issue, we introduce Elastic Data Organization (EDO), which can transparently enable different data organization strategies for scientific applications. Through its flexible data ordering algorithms, EDO harmonizes different access patterns with the underlying file system. Two levels of data ordering are introduced in EDO. One works at the level of data groups (a.k.a process groups). It uses Hilbert Space Filling Curves (SFC) to balance the distribution of data groups across storage targets. Another governs the ordering of data elements within a data group. It divides a data group into sub chunks and strikes a good balance between the size of sub chunks and the number of seek operations. Our experimental results demonstrate that EDO is able to achieve balanced data distribution across all dimensions and improve the read performance of multidimensional datasets in scientific applications.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2011.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 58

Abstract

Large scale scientific applications are often bottlenecked due to the writing of checkpoint-restart data. Much work has been focused on improving their write performance. With the mounting needs of scientific discovery from these datasets, it is also important to provide good read performance for many common access patterns, which requires effective data organization. To address this issue, we introduce Elastic Data Organization (EDO), which can transparently enable different data organization strategies for scientific applications. Through its flexible data ordering algorithms, EDO harmonizes different access patterns with the underlying file system. Two levels of data ordering are introduced in EDO. One works at the level of data groups (a.k.a process groups). It uses Hilbert Space Filling Curves (SFC) to balance the distribution of data groups across storage targets. Another governs the ordering of data elements within a data group. It divides a data group into sub chunks and strikes a good balance between the size of sub chunks and the number of seek operations. Our experimental results demonstrate that EDO is able to achieve balanced data distribution across all dimensions and improve the read performance of multidimensional datasets in scientific applications.
EDO:通过弹性数据组织提高科学应用的读取性能
大规模的科学应用常常因为写入检查点重启数据而遇到瓶颈。很多工作都集中在提高它们的写入性能上。随着对这些数据集的科学发现需求的增加,为许多常见的访问模式提供良好的读取性能也很重要,这需要有效的数据组织。为了解决这个问题,我们引入了弹性数据组织(EDO),它可以透明地为科学应用程序启用不同的数据组织策略。通过其灵活的数据排序算法,EDO将不同的访问模式与底层文件系统协调起来。在EDO中引入了两个级别的数据排序。一个在数据组(又称流程组)级别工作。它使用希尔伯特空间填充曲线(SFC)来平衡数据组在存储目标之间的分布。另一个控制数据组中数据元素的顺序。它将数据组划分为子块,并在子块的大小和查找操作的数量之间取得良好的平衡。实验结果表明,EDO能够在科学应用中实现多维数据集的均衡分布,提高多维数据集的读取性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信