Lessons Learned from Managing a Petabyte

J. Becla, Daniel L. Wang
{"title":"Lessons Learned from Managing a Petabyte","authors":"J. Becla, Daniel L. Wang","doi":"10.2172/839755","DOIUrl":null,"url":null,"abstract":"The amount of data collected and stored by the average business doubles each year. Many commercial databases are already approaching hundreds of terabytes, and at this rate, will soon be managing petabytes. More data enables new functionality and capability, but the larger scale reveals new problems and issues hidden in ''smaller'' terascale environments. This paper presents some of these new problems along with implemented solutions in the framework of a petabyte dataset for a large High Energy Physics experiment. Through experience with two persistence technologies, a commercial database and a file-based approach, we expose format-independent concepts and issues prevalent at this new scale of computing.","PeriodicalId":118073,"journal":{"name":"Conference on Innovative Data Systems Research","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2005-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Innovative Data Systems Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2172/839755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 50

Abstract

The amount of data collected and stored by the average business doubles each year. Many commercial databases are already approaching hundreds of terabytes, and at this rate, will soon be managing petabytes. More data enables new functionality and capability, but the larger scale reveals new problems and issues hidden in ''smaller'' terascale environments. This paper presents some of these new problems along with implemented solutions in the framework of a petabyte dataset for a large High Energy Physics experiment. Through experience with two persistence technologies, a commercial database and a file-based approach, we expose format-independent concepts and issues prevalent at this new scale of computing.
管理pb的经验教训
一般企业收集和存储的数据量每年都会翻一番。许多商业数据库已经接近数百太字节,按照这个速度,很快就会达到pb级。更多的数据可以实现新的功能和能力,但更大的规模揭示了隐藏在“更小”的万亿级环境中的新问题和问题。本文介绍了其中的一些新问题以及在大型高能物理实验的pb数据集框架下实现的解决方案。通过使用两种持久性技术(商业数据库和基于文件的方法)的经验,我们揭示了在这种新的计算规模中流行的与格式无关的概念和问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信