2008 3rd Petascale Data Storage Workshop最新文献

筛选
英文 中文
Revisiting the metadata architecture of parallel file systems 回顾并行文件系统的元数据体系结构
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811892
N. Ali, A. Devulapalli, D. Dalessandro, P. Wyckoff, P. Sadayappan
{"title":"Revisiting the metadata architecture of parallel file systems","authors":"N. Ali, A. Devulapalli, D. Dalessandro, P. Wyckoff, P. Sadayappan","doi":"10.1109/PDSW.2008.4811892","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811892","url":null,"abstract":"As the types of problems we solve in high-performance computing and other areas become more complex, the amount of data generated and used is growing at a rapid rate. Today many terabytes of data are common; tomorrow petabytes of data will be the norm. Much work has been put into increasing capacity and I/O performance for large-scale storage systems. However, one often ignored area is metadata management. Metadata can have a significant impact on the performance of a system. Past approaches have moved metadata activities to a separate server in order to avoid potential interference with data operations. However, with the advent of object-based storage technology, there is a compelling argument to re-couple metadata and data. In this paper we present two metadata management schemes, both of which remove the need for a separate metadata server and replace it with object-based storage.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134546361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Logan: Automatic management for evolvable, large-scale, archival storage Logan:自动管理可进化的、大规模的档案存储
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811890
M. Storer, K. Greenan, I. Adams, E. L. Miller, D. Long, K. Voruganti
{"title":"Logan: Automatic management for evolvable, large-scale, archival storage","authors":"M. Storer, K. Greenan, I. Adams, E. L. Miller, D. Long, K. Voruganti","doi":"10.1109/PDSW.2008.4811890","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811890","url":null,"abstract":"Archival storage systems designed to preserve scientific data, business data, and consumer data must maintain and safeguard tens to hundreds of petabytes of data on tens of thousands of media for decades. Such systems are currently designed in the same way as higher-performance, shorter-term storage systems, which have a useful lifetime but must be replaced in their entirety via a ldquofork-liftrdquo upgrade. Thus, while existing solutions can provide good energy efficiency and relatively low cost, they do not adapt well to continuous improvements in technology, becoming less efficient relative to current technology as they age. In an archival storage environment, this paradigm implies an endless series of wholesale migrations and upgrades to remain efficient and up to date. Our approach, Logan, manages node addition, removal, and failure on a distributed network of intelligent storage appliances, allowing the system to gradually evolve as device technology advances. By automatically handling most of the common administration chores-integrating new devices into the system, managing groups of devices that work together to provide redundancy, and recovering from failed devices-Logan reduces management overhead and thus cost. Logan can also improve cost and space efficiency by identifying and decommissioning outdated devices, thus reducing space and power requirements for the archival storage system.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127859479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Comparing performance of solid state devices and mechanical disks 比较固态器件和机械磁盘的性能
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811886
Milo Polte, J. Simsa, Garth A. Gibson
{"title":"Comparing performance of solid state devices and mechanical disks","authors":"Milo Polte, J. Simsa, Garth A. Gibson","doi":"10.1109/PDSW.2008.4811886","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811886","url":null,"abstract":"In terms of performance, solid state devices promise to be superior technology to mechanical disks. This study investigates performance of several up-to-date high-end consumer and enterprise Flash solid state devices (SSDs) and relates their performance to that of mechanical disks. For the purpose of this evaluation, the IOZone benchmark is run in single-threaded mode with varying request size and access pattern on an ext3 filesystem mounted on these devices. The price of the measured devices is then used to allow for comparison of price per performance. Measurements presented in this study offer an evaluation of cost-effectiveness of a Flash based SSD storage solution over a range of workloads. In particular, for sequential access pattern the SSDs are up to 10 times faster for reads and up to 5 times faster than the disks. For random reads, the SSDs provide up to 200times performance advantage. For random writes the SSDs provide up to 135times performance advantage. After weighting these numbers against the prices of the tested devices, we can conclude that SSDs are approaching price per performance of magnetic disks for sequential access patterns workloads and are superior technology to magnetic disks for random access patterns.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130827696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Input/output APIs and data organization for high performance scientific computing 高性能科学计算的输入/输出api和数据组织
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811881
J. Lofstead, F. Zheng, S. Klasky, K. Schwan
{"title":"Input/output APIs and data organization for high performance scientific computing","authors":"J. Lofstead, F. Zheng, S. Klasky, K. Schwan","doi":"10.1109/PDSW.2008.4811881","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811881","url":null,"abstract":"Scientific Data Management has become essential to the productivity of scientists using ever larger machines and running applications that produce ever more data. There are several specific issues when running on petascale (and beyond) machines. One is the need for massively parallel data output, which in part, depends on the data formats and semantics being used. Here, the inhibition of parallelism by file system notions of strict and immediate consistency can be addressed with ldrdelayed data consistencypsila methods. Such methods can also be used to remove the runtime coordination steps required for immediate consistency from machine resources like Bluegene's separate networks for barrier calls and its dedicated IO nodes, thereby freeing them to instead, perform alternate tasks that enhance data output performance and/or richness. Second, once data is generated, it is important to be able to efficiently access it, which implies the need for rapid data characterization and indexing. This can be achieved by adding small amounts of metadata to the output process, thereby permitting scientists to quickly make informed decisions about which files to process from large-scale science runs. Third, failure probabilities increase with an increasing number of nodes, which suggests the need for organizing output data to be resilient to failures in which the output from a single or from a small number of nodes is lost or corrupted. This paper demonstrates the utility of using delayed consistency methods for the process of data output from the compute nodes of petascale machines. It also demonstrates the advantages derived from resilient data organization coupled with lightweight methods for data indexing. An implementation of these techniques is realized in ADIOS, the Adaptable IO System, and its BP intermediate file format. The implementation is designed to be compatible with existing, well-known file formats like HDF-5 and NetCDF, thereby permitting end users to exploit the rich tool chains for these formats. Initial performance evaluations of the approach exhibit substantial performance advantages over using native parallel HDF-5 in the Chimera supernova code.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133143641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Scalable full-text search for petascale file systems 可伸缩的全文搜索千万亿级文件系统
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811884
A. Leung, E. L. Miller
{"title":"Scalable full-text search for petascale file systems","authors":"A. Leung, E. L. Miller","doi":"10.1109/PDSW.2008.4811884","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811884","url":null,"abstract":"As file system capacities reach the petascale, it is becoming increasingly difficult for users to organize, find, and manage their data. File system search has the potential to greatly improve how users manage and access files. Unfortunately, existing file system search is designed for smaller scale systems, making it difficult for existing solutions to scale to petascale files systems. In this paper, we motivate the importance of file system search in petascale file systems and present a new full text file system search design for petascale file systems. Unlike existing solutions, our design exploits file system properties. Using a novel index partitioning mechanism that utilizes file system namespace locality, we are able to improve search scalability and performance and we discuss how such a design can potentially improve search security and ranking.We describe how our design can be implemented within the Ceph petascale file system.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124937299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Introducing map-reduce to high end computing 将map-reduce引入高端计算
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811889
Grant Mackey, S. Sehrish, John Bent, J. López, S. Habib, J. Wang
{"title":"Introducing map-reduce to high end computing","authors":"Grant Mackey, S. Sehrish, John Bent, J. López, S. Habib, J. Wang","doi":"10.1109/PDSW.2008.4811889","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811889","url":null,"abstract":"In this work we present an scientific application that has been given a Hadoop MapReduce implementation. We also discuss other scientific fields of supercomputing that could benefit from a MapReduce implementation. We recognize in this work that Hadoop has potential benefit for more applications than simply data mining, but that it is not a panacea for all data intensive applications. We provide an example of how the halo finding application, when applied to large astrophysics datasets, benefits from the model of the Hadoop architecture. The halo finding application uses a friends of friends algorithm to quickly cluster together large sets of particles to output files which a visualization software can interpret. The current implementation requires that large datasets be moved from storage to computation resources for every simulation of astronomy data. Our Hadoop implementation allows for an in-place halo finding application on the datasets, which removes the time consuming process of transferring data between resources.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124679709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Arbitrary dimension Reed-Solomon coding and decoding for extended RAID on GPUs 任意维度里德-所罗门编码和解码扩展RAID在gpu上
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811887
M. Curry, A. Skjellum, H. Ward, R. Brightwell
{"title":"Arbitrary dimension Reed-Solomon coding and decoding for extended RAID on GPUs","authors":"M. Curry, A. Skjellum, H. Ward, R. Brightwell","doi":"10.1109/PDSW.2008.4811887","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811887","url":null,"abstract":"Reed-Solomon coding is a method of generating arbitrary amounts of checksum information from original data via matrix-vector multiplication in finite fields. Previous work has shown that CPUs are not well-matched to this type of computation, but recent graphical processing units (GPUs) have been shown through a case study to perform this encoding quickly for the 3 + 3 (three data + three parity) case. In order to be utilized in a true RAID-like system, it is important to understand how well this computation can scale in the number of data disks supported. This paper details the performance of a general Reed-Solomon encoding and decoding library that is suitable for use in RAID-like systems. Both generation and recovery are performance-tested and discussed.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129423769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Fast log-based concurrent writing of checkpoints 基于日志的检查点的快速并发写入
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811882
Milo Polte, Jiri Simsa, Wittawat Tantisiriroj, Garth A. Gibson, Shobhit Dayal, Mikhail Chainani, Dilip Kumar Uppugandla
{"title":"Fast log-based concurrent writing of checkpoints","authors":"Milo Polte, Jiri Simsa, Wittawat Tantisiriroj, Garth A. Gibson, Shobhit Dayal, Mikhail Chainani, Dilip Kumar Uppugandla","doi":"10.1109/PDSW.2008.4811882","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811882","url":null,"abstract":"This report describes how a file system level log-based technique can improve the write performance of many-to-one write checkpoint workload typical for high performance computations. It is shown that a simple log-based organization can provide for substantial improvements in the write performance while retaining the convenience of a single flat file abstraction. The improvement of the write performance comes at the cost of degraded read performance however. Techniques to alleviate the read performance penalty, such as file reconstruction on the first read, are discussed.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129366709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Zest Checkpoint storage system for large supercomputers 用于大型超级计算机的Zest Checkpoint存储系统
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811883
P. Nowoczynski, N. Stone, J. Yanovich, J. Sommerfield
{"title":"Zest Checkpoint storage system for large supercomputers","authors":"P. Nowoczynski, N. Stone, J. Yanovich, J. Sommerfield","doi":"10.1109/PDSW.2008.4811883","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811883","url":null,"abstract":"The PSC has developed a prototype distributed file system infrastructure that vastly accelerates aggregated write bandwidth on large compute platforms. Write bandwidth, more than read bandwidth, is the dominant bottleneck in HPC I/O scenarios due to writing checkpoint data, visualization data and post-processing (multi-stage) data. We have prototyped a scalable solution that will be directly applicable to future petascale compute platforms having of order 10^6 cores. Our design emphasizes high-efficiency scalability, low-cost commodity components, lightweight software layers, end-to-end parallelism, client-side caching and software parity, and a unique model of load-balancing outgoing I/O onto high-speed intermediate storage followed by asynchronous reconstruction to a 3rd-party parallel file system.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131564676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Just-in-time staging of large input data for supercomputing jobs 为超级计算工作提供大输入数据的实时分期
2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811891
H. M. Monti, A. Butt, S. Vazhkudai
{"title":"Just-in-time staging of large input data for supercomputing jobs","authors":"H. M. Monti, A. Butt, S. Vazhkudai","doi":"10.1109/PDSW.2008.4811891","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811891","url":null,"abstract":"High performance computing is facing a data deluge from state-of-the-art colliders and observatories. Large data-sets from these facilities, and other end-user sites, are often inputs to intensive analyses on modern supercomputers. Timely staging in of input data at the supercomputer's local storage can not only optimize space usage, but also protect against delays due to storage system failures. To this end, we propose a just-in-time staging framework that uses a combination of batch-queue predictions, user-specified intermediate nodes, and decentralized data delivery to coincide input data staging with job startup. Our preliminary prototype has been integrated with widely used tools such as the PBS job submission system, BitTorrent data delivery, and Network Weather Service network monitoring facility.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122380106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信