数据传输的实际障碍:为什么研究人员仍然喜欢scp

H. Nam, Jason Hill, S. Parete-Koon
{"title":"数据传输的实际障碍:为什么研究人员仍然喜欢scp","authors":"H. Nam, Jason Hill, S. Parete-Koon","doi":"10.1145/2534695.2534703","DOIUrl":null,"url":null,"abstract":"The importance of computing facilities is heralded every six months with the announcement of the new Top500 list, showcasing the world's fastest supercomputers. Unfortunately, with great computing capability does not come great long-term data storage capacity, which often means users must move their data to their local site archive, to remote sites where they may be doing future computation or analysis, or back to their home institution, else face the dreaded data purge that most HPC centers employ to keep utilization of large parallel filesystems low to manage performance and capacity. At HPC centers, data transfer is crucial to the scientific workflow and will increase in importance as computing systems grow in size. The Energy Sciences Network (ESnet) recently launched its fifth generation network, a 100 Gbps high-performance, unclassified national network connecting more than 40 DOE research sites to support scientific research and collaboration. Despite the tenfold increase in bandwidth to DOE research sites amenable to multiple data transfer streams and high throughput, in practice, researchers often under-utilize the network and resort to painfully-slow single stream transfer methods such as scp to avoid the complexity of using multiple stream tools such as GridFTP and bbcp, and contend with frustration from the lack of consistency of available tools between sites. In this study we survey and assess the data transfer methods provided at several DOE supported computing facilities, including both leadership-computing facilities, connected through ESnet. We present observed transfer rates, suggested optimizations, and discuss the obstacles the tools must overcome to receive wide-spread adoption over scp.","PeriodicalId":108576,"journal":{"name":"Network-aware Data Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"The practical obstacles of data transfer: why researchers still love scp\",\"authors\":\"H. Nam, Jason Hill, S. Parete-Koon\",\"doi\":\"10.1145/2534695.2534703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The importance of computing facilities is heralded every six months with the announcement of the new Top500 list, showcasing the world's fastest supercomputers. Unfortunately, with great computing capability does not come great long-term data storage capacity, which often means users must move their data to their local site archive, to remote sites where they may be doing future computation or analysis, or back to their home institution, else face the dreaded data purge that most HPC centers employ to keep utilization of large parallel filesystems low to manage performance and capacity. At HPC centers, data transfer is crucial to the scientific workflow and will increase in importance as computing systems grow in size. The Energy Sciences Network (ESnet) recently launched its fifth generation network, a 100 Gbps high-performance, unclassified national network connecting more than 40 DOE research sites to support scientific research and collaboration. Despite the tenfold increase in bandwidth to DOE research sites amenable to multiple data transfer streams and high throughput, in practice, researchers often under-utilize the network and resort to painfully-slow single stream transfer methods such as scp to avoid the complexity of using multiple stream tools such as GridFTP and bbcp, and contend with frustration from the lack of consistency of available tools between sites. In this study we survey and assess the data transfer methods provided at several DOE supported computing facilities, including both leadership-computing facilities, connected through ESnet. We present observed transfer rates, suggested optimizations, and discuss the obstacles the tools must overcome to receive wide-spread adoption over scp.\",\"PeriodicalId\":108576,\"journal\":{\"name\":\"Network-aware Data Management\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Network-aware Data Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2534695.2534703\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network-aware Data Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2534695.2534703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

计算设施的重要性每六个月就会公布一次新的500强榜单,展示世界上最快的超级计算机。不幸的是,强大的计算能力并不能带来强大的长期数据存储能力,这通常意味着用户必须将他们的数据移动到本地站点存档,移动到远程站点,在那里他们可能会进行未来的计算或分析,或者返回到他们的家庭机构,否则将面临可怕的数据清除,大多数HPC中心都采用这种方法来保持大型并行文件系统的利用率较低,以管理性能和容量。在高性能计算中心,数据传输对科学工作流程至关重要,随着计算系统规模的扩大,数据传输的重要性也会增加。能源科学网络(ESnet)最近启动了其第五代网络,这是一个100gbps高性能、非机密的国家网络,连接了40多个能源部研究站点,以支持科学研究和合作。尽管美国能源部研究站点的带宽增加了十倍,可以适应多种数据传输流和高吞吐量,但在实践中,研究人员经常未充分利用网络,并采用缓慢的单流传输方法,如scp,以避免使用多种流工具(如GridFTP和bbcp)的复杂性,并因站点之间可用工具缺乏一致性而感到沮丧。在这项研究中,我们调查和评估了几个能源部支持的计算设施提供的数据传输方法,包括通过ESnet连接的领导计算设施。我们提出了观察到的传输速率,建议的优化,并讨论了这些工具必须克服的障碍,以便在scp上得到广泛采用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The practical obstacles of data transfer: why researchers still love scp
The importance of computing facilities is heralded every six months with the announcement of the new Top500 list, showcasing the world's fastest supercomputers. Unfortunately, with great computing capability does not come great long-term data storage capacity, which often means users must move their data to their local site archive, to remote sites where they may be doing future computation or analysis, or back to their home institution, else face the dreaded data purge that most HPC centers employ to keep utilization of large parallel filesystems low to manage performance and capacity. At HPC centers, data transfer is crucial to the scientific workflow and will increase in importance as computing systems grow in size. The Energy Sciences Network (ESnet) recently launched its fifth generation network, a 100 Gbps high-performance, unclassified national network connecting more than 40 DOE research sites to support scientific research and collaboration. Despite the tenfold increase in bandwidth to DOE research sites amenable to multiple data transfer streams and high throughput, in practice, researchers often under-utilize the network and resort to painfully-slow single stream transfer methods such as scp to avoid the complexity of using multiple stream tools such as GridFTP and bbcp, and contend with frustration from the lack of consistency of available tools between sites. In this study we survey and assess the data transfer methods provided at several DOE supported computing facilities, including both leadership-computing facilities, connected through ESnet. We present observed transfer rates, suggested optimizations, and discuss the obstacles the tools must overcome to receive wide-spread adoption over scp.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信