大规模科学数据集可视化和传播的轻量级数据管理解决方案-立场文件

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI:10.1109/SC.Companion.2012.157

G. Agrawal, Yunde Su

{"title":"大规模科学数据集可视化和传播的轻量级数据管理解决方案-立场文件","authors":"G. Agrawal, Yunde Su","doi":"10.1109/SC.Companion.2012.157","DOIUrl":null,"url":null,"abstract":"Many of the `big-data' challenges today are arising from increasing computing ability, as data collected from simulations has become extremely valuable for a variety of scientific endeavors. With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. As a specific example, the Global Cloud-Resolving Model (GCRM) currently has a grid-cell size of 4 km, and already produces 1 petabyte of data for a 10 day simulation. Future plans include simulations with a grid-cell size of 1 km, which will increase the data generation 64 folds. Finer granularity of simulation data offers both an opportunity and a challenge. On one hand, it can allow understanding of underlying phenomenon and features in a way that would not be possible with coarser granularity. On the other hand, larger datasets are extremely difficult to store, manage, disseminate, analyze, and visualize. Neither the memory capacity of parallel machines, memory access speeds, nor disk bandwidths are increasing at the same rate as computing power, contributing to the difficulty in storing, managing, and analyzing these datasets. Simulation data is often disseminated widely, through portals like the Earth System Grid (ESG), and downloaded by researchers all over the world. Such dissemination efforts are hampered by dataset size growth, as wide area data transfer bandwidths are growing at a much slower pace. Finally, while visualizing datasets, human perception is inherently limited.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"144 1","pages":"1296-1300"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Light-Weight Data Management Solutions for Visualization and Dissemination of Massive Scientific Datasets - Position Paper\",\"authors\":\"G. Agrawal, Yunde Su\",\"doi\":\"10.1109/SC.Companion.2012.157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many of the `big-data' challenges today are arising from increasing computing ability, as data collected from simulations has become extremely valuable for a variety of scientific endeavors. With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. As a specific example, the Global Cloud-Resolving Model (GCRM) currently has a grid-cell size of 4 km, and already produces 1 petabyte of data for a 10 day simulation. Future plans include simulations with a grid-cell size of 1 km, which will increase the data generation 64 folds. Finer granularity of simulation data offers both an opportunity and a challenge. On one hand, it can allow understanding of underlying phenomenon and features in a way that would not be possible with coarser granularity. On the other hand, larger datasets are extremely difficult to store, manage, disseminate, analyze, and visualize. Neither the memory capacity of parallel machines, memory access speeds, nor disk bandwidths are increasing at the same rate as computing power, contributing to the difficulty in storing, managing, and analyzing these datasets. Simulation data is often disseminated widely, through portals like the Earth System Grid (ESG), and downloaded by researchers all over the world. Such dissemination efforts are hampered by dataset size growth, as wide area data transfer bandwidths are growing at a much slower pace. Finally, while visualizing datasets, human perception is inherently limited.\",\"PeriodicalId\":6346,\"journal\":{\"name\":\"2012 SC Companion: High Performance Computing, Networking Storage and Analysis\",\"volume\":\"144 1\",\"pages\":\"1296-1300\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 SC Companion: High Performance Computing, Networking Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC.Companion.2012.157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

今天的许多“大数据”挑战都来自于不断提高的计算能力，因为从模拟中收集的数据对各种科学努力都变得非常有价值。随着并行机器的计算能力不断增强，科学模拟正在更精细的空间和时间尺度上进行，导致数据爆炸。作为一个具体的例子，全球云分辨模型(GCRM)目前的网格单元大小为4公里，并且已经为10天的模拟产生了1拍字节的数据。未来的计划包括网格单元大小为1公里的模拟，这将使数据生成增加64倍。更细粒度的模拟数据提供了机遇和挑战。一方面，它允许以一种粗粒度无法实现的方式理解底层现象和特征。另一方面，大型数据集非常难以存储、管理、传播、分析和可视化。并行机器的内存容量、内存访问速度和磁盘带宽都没有以与计算能力相同的速度增长，这增加了存储、管理和分析这些数据集的难度。模拟数据通常通过地球系统网格(ESG)等门户网站广泛传播，并由世界各地的研究人员下载。这种传播努力受到数据集规模增长的阻碍，因为广域数据传输带宽的增长速度要慢得多。最后，在可视化数据集时，人类的感知是有限的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Light-Weight Data Management Solutions for Visualization and Dissemination of Massive Scientific Datasets - Position Paper

Many of the `big-data' challenges today are arising from increasing computing ability, as data collected from simulations has become extremely valuable for a variety of scientific endeavors. With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. As a specific example, the Global Cloud-Resolving Model (GCRM) currently has a grid-cell size of 4 km, and already produces 1 petabyte of data for a 10 day simulation. Future plans include simulations with a grid-cell size of 1 km, which will increase the data generation 64 folds. Finer granularity of simulation data offers both an opportunity and a challenge. On one hand, it can allow understanding of underlying phenomenon and features in a way that would not be possible with coarser granularity. On the other hand, larger datasets are extremely difficult to store, manage, disseminate, analyze, and visualize. Neither the memory capacity of parallel machines, memory access speeds, nor disk bandwidths are increasing at the same rate as computing power, contributing to the difficulty in storing, managing, and analyzing these datasets. Simulation data is often disseminated widely, through portals like the Earth System Grid (ESG), and downloaded by researchers all over the world. Such dissemination efforts are hampered by dataset size growth, as wide area data transfer bandwidths are growing at a much slower pace. Finally, while visualizing datasets, human perception is inherently limited.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

自引率

0.00%

发文量