使用Hadoop和Neteeza架构的危险图工作流中的大数据和计算

International Symposium on Design and Implementation of Symbolic Computation Systems Pub Date : 2013-11-18 DOI:10.1145/2534645.2534648

S. Rohit, A. Patra, V. Chaudhary

{"title":"使用Hadoop和Neteeza架构的危险图工作流中的大数据和计算","authors":"S. Rohit, A. Patra, V. Chaudhary","doi":"10.1145/2534645.2534648","DOIUrl":null,"url":null,"abstract":"Uncertainty Quantification(UQ) using simulation ensembles leads to twin challenges of managing large amount of data and performing cpu intensive computing. While algorithmic innovations using surrogates, localization and parallelization can make the problem feasible one still has very large data and compute tasks. Such integration of large data analytics and computationally expensive tasks is increasingly common. We present here an approach to solving this problem by using a mix of hardware and a workflow that maps tasks to appropriate hardware. We experiment with two computing environments -- the first is an integration of a Netezza data warehouse appliance and a high performance cluster and the second a hadoop based environment. Our approach is based on segregating the data intensive and compute intensive tasks and assigning the right architecture to each. We present here the computing models and the new schemes in the context of generating probabilistic hazard maps using ensemble runs of the volcanic debris avalanche simulator TITAN2D and UQ methodology.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large data and computation in a hazard map workflow using Hadoop and Neteeza architectures\",\"authors\":\"S. Rohit, A. Patra, V. Chaudhary\",\"doi\":\"10.1145/2534645.2534648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Uncertainty Quantification(UQ) using simulation ensembles leads to twin challenges of managing large amount of data and performing cpu intensive computing. While algorithmic innovations using surrogates, localization and parallelization can make the problem feasible one still has very large data and compute tasks. Such integration of large data analytics and computationally expensive tasks is increasingly common. We present here an approach to solving this problem by using a mix of hardware and a workflow that maps tasks to appropriate hardware. We experiment with two computing environments -- the first is an integration of a Netezza data warehouse appliance and a high performance cluster and the second a hadoop based environment. Our approach is based on segregating the data intensive and compute intensive tasks and assigning the right architecture to each. We present here the computing models and the new schemes in the context of generating probabilistic hazard maps using ensemble runs of the volcanic debris avalanche simulator TITAN2D and UQ methodology.\",\"PeriodicalId\":166804,\"journal\":{\"name\":\"International Symposium on Design and Implementation of Symbolic Computation Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Design and Implementation of Symbolic Computation Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2534645.2534648\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Design and Implementation of Symbolic Computation Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2534645.2534648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

使用仿真集成的不确定性量化(UQ)导致管理大量数据和执行cpu密集型计算的双重挑战。虽然使用代理、本地化和并行化的算法创新可以使问题变得可行，但仍然有非常大的数据和计算任务。这种将大数据分析和计算成本高昂的任务集成在一起的情况越来越普遍。我们在这里提出了一种解决这个问题的方法，即混合使用硬件和将任务映射到适当硬件的工作流。我们试验了两种计算环境——第一个是Netezza数据仓库设备和高性能集群的集成，第二个是基于hadoop的环境。我们的方法是基于分离数据密集型和计算密集型任务，并为每个任务分配正确的体系结构。我们在这里提出了计算模型和新方案，在使用火山碎屑雪崩模拟器TITAN2D和UQ方法的集成运行生成概率危险图的背景下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large data and computation in a hazard map workflow using Hadoop and Neteeza architectures

Uncertainty Quantification(UQ) using simulation ensembles leads to twin challenges of managing large amount of data and performing cpu intensive computing. While algorithmic innovations using surrogates, localization and parallelization can make the problem feasible one still has very large data and compute tasks. Such integration of large data analytics and computationally expensive tasks is increasingly common. We present here an approach to solving this problem by using a mix of hardware and a workflow that maps tasks to appropriate hardware. We experiment with two computing environments -- the first is an integration of a Netezza data warehouse appliance and a high performance cluster and the second a hadoop based environment. Our approach is based on segregating the data intensive and compute intensive tasks and assigning the right architecture to each. We present here the computing models and the new schemes in the context of generating probabilistic hazard maps using ensemble runs of the volcanic debris avalanche simulator TITAN2D and UQ methodology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Symposium on Design and Implementation of Symbolic Computation Systems

自引率

0.00%

发文量