Light-Weight Data Management Solutions for Visualization and Dissemination of Massive Scientific Datasets - Position Paper

G. Agrawal, Yunde Su
{"title":"Light-Weight Data Management Solutions for Visualization and Dissemination of Massive Scientific Datasets - Position Paper","authors":"G. Agrawal, Yunde Su","doi":"10.1109/SC.Companion.2012.157","DOIUrl":null,"url":null,"abstract":"Many of the `big-data' challenges today are arising from increasing computing ability, as data collected from simulations has become extremely valuable for a variety of scientific endeavors. With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. As a specific example, the Global Cloud-Resolving Model (GCRM) currently has a grid-cell size of 4 km, and already produces 1 petabyte of data for a 10 day simulation. Future plans include simulations with a grid-cell size of 1 km, which will increase the data generation 64 folds. Finer granularity of simulation data offers both an opportunity and a challenge. On one hand, it can allow understanding of underlying phenomenon and features in a way that would not be possible with coarser granularity. On the other hand, larger datasets are extremely difficult to store, manage, disseminate, analyze, and visualize. Neither the memory capacity of parallel machines, memory access speeds, nor disk bandwidths are increasing at the same rate as computing power, contributing to the difficulty in storing, managing, and analyzing these datasets. Simulation data is often disseminated widely, through portals like the Earth System Grid (ESG), and downloaded by researchers all over the world. Such dissemination efforts are hampered by dataset size growth, as wide area data transfer bandwidths are growing at a much slower pace. Finally, while visualizing datasets, human perception is inherently limited.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"144 1","pages":"1296-1300"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Many of the `big-data' challenges today are arising from increasing computing ability, as data collected from simulations has become extremely valuable for a variety of scientific endeavors. With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. As a specific example, the Global Cloud-Resolving Model (GCRM) currently has a grid-cell size of 4 km, and already produces 1 petabyte of data for a 10 day simulation. Future plans include simulations with a grid-cell size of 1 km, which will increase the data generation 64 folds. Finer granularity of simulation data offers both an opportunity and a challenge. On one hand, it can allow understanding of underlying phenomenon and features in a way that would not be possible with coarser granularity. On the other hand, larger datasets are extremely difficult to store, manage, disseminate, analyze, and visualize. Neither the memory capacity of parallel machines, memory access speeds, nor disk bandwidths are increasing at the same rate as computing power, contributing to the difficulty in storing, managing, and analyzing these datasets. Simulation data is often disseminated widely, through portals like the Earth System Grid (ESG), and downloaded by researchers all over the world. Such dissemination efforts are hampered by dataset size growth, as wide area data transfer bandwidths are growing at a much slower pace. Finally, while visualizing datasets, human perception is inherently limited.
大规模科学数据集可视化和传播的轻量级数据管理解决方案-立场文件
今天的许多“大数据”挑战都来自于不断提高的计算能力,因为从模拟中收集的数据对各种科学努力都变得非常有价值。随着并行机器的计算能力不断增强,科学模拟正在更精细的空间和时间尺度上进行,导致数据爆炸。作为一个具体的例子,全球云分辨模型(GCRM)目前的网格单元大小为4公里,并且已经为10天的模拟产生了1拍字节的数据。未来的计划包括网格单元大小为1公里的模拟,这将使数据生成增加64倍。更细粒度的模拟数据提供了机遇和挑战。一方面,它允许以一种粗粒度无法实现的方式理解底层现象和特征。另一方面,大型数据集非常难以存储、管理、传播、分析和可视化。并行机器的内存容量、内存访问速度和磁盘带宽都没有以与计算能力相同的速度增长,这增加了存储、管理和分析这些数据集的难度。模拟数据通常通过地球系统网格(ESG)等门户网站广泛传播,并由世界各地的研究人员下载。这种传播努力受到数据集规模增长的阻碍,因为广域数据传输带宽的增长速度要慢得多。最后,在可视化数据集时,人类的感知是有限的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信