2008 3rd Petascale Data Storage Workshop最新文献

Revisiting the metadata architecture of parallel file systems 回顾并行文件系统的元数据体系结构

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811892

N. Ali, A. Devulapalli, D. Dalessandro, P. Wyckoff, P. Sadayappan

引用次数: 16

Logan: Automatic management for evolvable, large-scale, archival storage Logan:自动管理可进化的、大规模的档案存储

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811890

M. Storer, K. Greenan, I. Adams, E. L. Miller, D. Long, K. Voruganti

{"title":"Logan: Automatic management for evolvable, large-scale, archival storage","authors":"M. Storer, K. Greenan, I. Adams, E. L. Miller, D. Long, K. Voruganti","doi":"10.1109/PDSW.2008.4811890","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811890","url":null,"abstract":"Archival storage systems designed to preserve scientific data, business data, and consumer data must maintain and safeguard tens to hundreds of petabytes of data on tens of thousands of media for decades. Such systems are currently designed in the same way as higher-performance, shorter-term storage systems, which have a useful lifetime but must be replaced in their entirety via a ldquofork-liftrdquo upgrade. Thus, while existing solutions can provide good energy efficiency and relatively low cost, they do not adapt well to continuous improvements in technology, becoming less efficient relative to current technology as they age. In an archival storage environment, this paradigm implies an endless series of wholesale migrations and upgrades to remain efficient and up to date. Our approach, Logan, manages node addition, removal, and failure on a distributed network of intelligent storage appliances, allowing the system to gradually evolve as device technology advances. By automatically handling most of the common administration chores-integrating new devices into the system, managing groups of devices that work together to provide redundancy, and recovering from failed devices-Logan reduces management overhead and thus cost. Logan can also improve cost and space efficiency by identifying and decommissioning outdated devices, thus reducing space and power requirements for the archival storage system.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127859479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Comparing performance of solid state devices and mechanical disks 比较固态器件和机械磁盘的性能

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811886

Milo Polte, J. Simsa, Garth A. Gibson

{"title":"Comparing performance of solid state devices and mechanical disks","authors":"Milo Polte, J. Simsa, Garth A. Gibson","doi":"10.1109/PDSW.2008.4811886","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811886","url":null,"abstract":"In terms of performance, solid state devices promise to be superior technology to mechanical disks. This study investigates performance of several up-to-date high-end consumer and enterprise Flash solid state devices (SSDs) and relates their performance to that of mechanical disks. For the purpose of this evaluation, the IOZone benchmark is run in single-threaded mode with varying request size and access pattern on an ext3 filesystem mounted on these devices. The price of the measured devices is then used to allow for comparison of price per performance. Measurements presented in this study offer an evaluation of cost-effectiveness of a Flash based SSD storage solution over a range of workloads. In particular, for sequential access pattern the SSDs are up to 10 times faster for reads and up to 5 times faster than the disks. For random reads, the SSDs provide up to 200times performance advantage. For random writes the SSDs provide up to 135times performance advantage. After weighting these numbers against the prices of the tested devices, we can conclude that SSDs are approaching price per performance of magnetic disks for sequential access patterns workloads and are superior technology to magnetic disks for random access patterns.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130827696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Input/output APIs and data organization for high performance scientific computing 高性能科学计算的输入/输出api和数据组织

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811881

J. Lofstead, F. Zheng, S. Klasky, K. Schwan

{"title":"Input/output APIs and data organization for high performance scientific computing","authors":"J. Lofstead, F. Zheng, S. Klasky, K. Schwan","doi":"10.1109/PDSW.2008.4811881","DOIUrl":"https://doi.org/10.1109/PDSW.2008.4811881","url":null,"abstract":"Scientific Data Management has become essential to the productivity of scientists using ever larger machines and running applications that produce ever more data. There are several specific issues when running on petascale (and beyond) machines. One is the need for massively parallel data output, which in part, depends on the data formats and semantics being used. Here, the inhibition of parallelism by file system notions of strict and immediate consistency can be addressed with ldrdelayed data consistencypsila methods. Such methods can also be used to remove the runtime coordination steps required for immediate consistency from machine resources like Bluegene's separate networks for barrier calls and its dedicated IO nodes, thereby freeing them to instead, perform alternate tasks that enhance data output performance and/or richness. Second, once data is generated, it is important to be able to efficiently access it, which implies the need for rapid data characterization and indexing. This can be achieved by adding small amounts of metadata to the output process, thereby permitting scientists to quickly make informed decisions about which files to process from large-scale science runs. Third, failure probabilities increase with an increasing number of nodes, which suggests the need for organizing output data to be resilient to failures in which the output from a single or from a small number of nodes is lost or corrupted. This paper demonstrates the utility of using delayed consistency methods for the process of data output from the compute nodes of petascale machines. It also demonstrates the advantages derived from resilient data organization coupled with lightweight methods for data indexing. An implementation of these techniques is realized in ADIOS, the Adaptable IO System, and its BP intermediate file format. The implementation is designed to be compatible with existing, well-known file formats like HDF-5 and NetCDF, thereby permitting end users to exploit the rich tool chains for these formats. Initial performance evaluations of the approach exhibit substantial performance advantages over using native parallel HDF-5 in the Chimera supernova code.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133143641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Scalable full-text search for petascale file systems 可伸缩的全文搜索千万亿级文件系统

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811884

A. Leung, E. L. Miller

引用次数: 2

Introducing map-reduce to high end computing 将map-reduce引入高端计算

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811889

Grant Mackey, S. Sehrish, John Bent, J. López, S. Habib, J. Wang

引用次数: 75

Arbitrary dimension Reed-Solomon coding and decoding for extended RAID on GPUs 任意维度里德-所罗门编码和解码扩展RAID在gpu上

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811887

M. Curry, A. Skjellum, H. Ward, R. Brightwell

引用次数: 21

Fast log-based concurrent writing of checkpoints 基于日志的检查点的快速并发写入

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811882

Milo Polte, Jiri Simsa, Wittawat Tantisiriroj, Garth A. Gibson, Shobhit Dayal, Mikhail Chainani, Dilip Kumar Uppugandla

引用次数: 23

Zest Checkpoint storage system for large supercomputers 用于大型超级计算机的Zest Checkpoint存储系统

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811883

P. Nowoczynski, N. Stone, J. Yanovich, J. Sommerfield

引用次数: 34

Just-in-time staging of large input data for supercomputing jobs 为超级计算工作提供大输入数据的实时分期

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI: 10.1109/PDSW.2008.4811891

H. M. Monti, A. Butt, S. Vazhkudai

引用次数: 14