一种高效、非侵入式、基于日志的集群科学模拟I/O机制

Soumyadeb Mitra, R. Sinha, M. Winslett, X. Jiao
{"title":"一种高效、非侵入式、基于日志的集群科学模拟I/O机制","authors":"Soumyadeb Mitra, R. Sinha, M. Winslett, X. Jiao","doi":"10.1109/CLUSTR.2005.347041","DOIUrl":null,"url":null,"abstract":"Scientific simulations are often very I/O intensive, requiring high I/O bandwidth to store the data generated by the simulation. Traditional supercomputers have specialized I/O systems with multiple I/O nodes and specialized interconnects to handle such high I/O loads. However, with the increased availability of inexpensive clusters of workstations, more and more simulations are now run on clusters. Unfortunately, cluster supercomputers are usually not very well equipped for I/O, making I/O a serious bottleneck for such applications. To address this problem, we propose log-based I/O (LBIO), an approach that can substantially increase the I/O performance of simulations on clusters by utilizing free space on the cluster's local disks to stage data on its way to remote storage. LBIO uses local disks to create a log of all I/O calls, and uses a background thread to replay the log at the rate that best utilizes the server and network resources. LBIO is implemented as an easy-to-use, non-intrusive library - a user can turn on LBIO by adding a single initialization call to the simulation code. LBIO also works with existing scientific I/O libraries like HDF, as well as collective libraries like ROMIO. Our performance studies on microbenchmarks and a real-world scientific simulation code show that LBIO can provide upto 35% improvement in I/O performance for raw I/O and over 50% for I/O through libraries like ROMIO or HDF","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An Efficient, Nonintrusive, Log-Based I/O Mechanism for Scientific Simulations on Clusters\",\"authors\":\"Soumyadeb Mitra, R. Sinha, M. Winslett, X. Jiao\",\"doi\":\"10.1109/CLUSTR.2005.347041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scientific simulations are often very I/O intensive, requiring high I/O bandwidth to store the data generated by the simulation. Traditional supercomputers have specialized I/O systems with multiple I/O nodes and specialized interconnects to handle such high I/O loads. However, with the increased availability of inexpensive clusters of workstations, more and more simulations are now run on clusters. Unfortunately, cluster supercomputers are usually not very well equipped for I/O, making I/O a serious bottleneck for such applications. To address this problem, we propose log-based I/O (LBIO), an approach that can substantially increase the I/O performance of simulations on clusters by utilizing free space on the cluster's local disks to stage data on its way to remote storage. LBIO uses local disks to create a log of all I/O calls, and uses a background thread to replay the log at the rate that best utilizes the server and network resources. LBIO is implemented as an easy-to-use, non-intrusive library - a user can turn on LBIO by adding a single initialization call to the simulation code. LBIO also works with existing scientific I/O libraries like HDF, as well as collective libraries like ROMIO. Our performance studies on microbenchmarks and a real-world scientific simulation code show that LBIO can provide upto 35% improvement in I/O performance for raw I/O and over 50% for I/O through libraries like ROMIO or HDF\",\"PeriodicalId\":255312,\"journal\":{\"name\":\"2005 IEEE International Conference on Cluster Computing\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTR.2005.347041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2005.347041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

科学模拟通常是非常密集的I/O,需要很高的I/O带宽来存储模拟生成的数据。传统的超级计算机有专门的I/O系统,有多个I/O节点和专门的互连,以处理如此高的I/O负载。然而,随着廉价工作站集群可用性的增加,现在越来越多的模拟在集群上运行。不幸的是,集群超级计算机通常不能很好地配置I/O,这使得I/O成为此类应用程序的严重瓶颈。为了解决这个问题,我们提出了基于日志的I/O (LBIO),这种方法可以通过利用集群本地磁盘上的空闲空间在数据传输到远程存储的过程中暂存数据,从而大大提高集群模拟的I/O性能。LBIO使用本地磁盘创建所有I/O调用的日志,并使用后台线程以最充分利用服务器和网络资源的速率重播日志。LBIO是作为一个易于使用的非侵入式库实现的——用户可以通过向仿真代码添加单个初始化调用来打开LBIO。LBIO还可以与现有的科学I/O库(如HDF)以及像romeo这样的集体库一起工作。我们在微基准测试和真实世界的科学模拟代码上的性能研究表明,LBIO可以为原始I/O提供高达35%的I/O性能改进,通过像ROMIO或HDF这样的库可以提供超过50%的I/O性能改进
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Efficient, Nonintrusive, Log-Based I/O Mechanism for Scientific Simulations on Clusters
Scientific simulations are often very I/O intensive, requiring high I/O bandwidth to store the data generated by the simulation. Traditional supercomputers have specialized I/O systems with multiple I/O nodes and specialized interconnects to handle such high I/O loads. However, with the increased availability of inexpensive clusters of workstations, more and more simulations are now run on clusters. Unfortunately, cluster supercomputers are usually not very well equipped for I/O, making I/O a serious bottleneck for such applications. To address this problem, we propose log-based I/O (LBIO), an approach that can substantially increase the I/O performance of simulations on clusters by utilizing free space on the cluster's local disks to stage data on its way to remote storage. LBIO uses local disks to create a log of all I/O calls, and uses a background thread to replay the log at the rate that best utilizes the server and network resources. LBIO is implemented as an easy-to-use, non-intrusive library - a user can turn on LBIO by adding a single initialization call to the simulation code. LBIO also works with existing scientific I/O libraries like HDF, as well as collective libraries like ROMIO. Our performance studies on microbenchmarks and a real-world scientific simulation code show that LBIO can provide upto 35% improvement in I/O performance for raw I/O and over 50% for I/O through libraries like ROMIO or HDF
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信