Multivariate Geographic Clustering in A Metacomputing Environment Using Globus

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI:10.1145/331532.331537

G. Mahinthakumar, F. Hoffman, W. Hargrove, N. Karonis

{"title":"Multivariate Geographic Clustering in A Metacomputing Environment Using Globus","authors":"G. Mahinthakumar, F. Hoffman, W. Hargrove, N. Karonis","doi":"10.1145/331532.331537","DOIUrl":null,"url":null,"abstract":"The authors present a metacomputing application of multivariate, nonhierarchical statistical clustering to geographic environmental data from the 48 conterminous United States in order to produce maps of regions of ecological similarity, called ecoregions. These maps represent finer scale regionalizations than do those generated by the traditional technique: an expert with a marker pen. Several variables (e.g., temperature, organic matter, rainfall etc.) thought to affect the growth of vegetation are clustered at resolutions as fine as one square kilometer (1 km2). These data can represent over 7.8 million map cells in an n-dimensional (n = 9 to 25) data space. A parallel version of the iterative statistical clustering algorithm is developed by the authors using the MPI (Message Passing Interface) message passing routines. The parallel algorithm uses a classical, self-scheduling, single-program, multiple data (SPMD) organization; performs dynamic load balancing for reasonable performance in heterogeneous metacomputing environments; and provides fault tolerance by saving intermediate results for easy restarts in case of hardware failure. The parallel algorithm was tested on various geographically distributed heterogeneous metacomputing configurations involving an IBM SP3TM, an IBM SP2TM, and two SGI Origin 2000TM ’s. The tests were performed with minimal code modification, and were made possible by GlobusTM (a metacomputing software toolkit) and the Globus-enabled version of MPI (MPICH-G). Our performance tests indicate that while the algorithm works reasonably well under the metacomputing environment for a moderate number of processors, the communication overhead can become prohibitive for large processor configurations.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 1999 Conference (SC'99)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/331532.331537","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 31

Abstract

The authors present a metacomputing application of multivariate, nonhierarchical statistical clustering to geographic environmental data from the 48 conterminous United States in order to produce maps of regions of ecological similarity, called ecoregions. These maps represent finer scale regionalizations than do those generated by the traditional technique: an expert with a marker pen. Several variables (e.g., temperature, organic matter, rainfall etc.) thought to affect the growth of vegetation are clustered at resolutions as fine as one square kilometer (1 km2). These data can represent over 7.8 million map cells in an n-dimensional (n = 9 to 25) data space. A parallel version of the iterative statistical clustering algorithm is developed by the authors using the MPI (Message Passing Interface) message passing routines. The parallel algorithm uses a classical, self-scheduling, single-program, multiple data (SPMD) organization; performs dynamic load balancing for reasonable performance in heterogeneous metacomputing environments; and provides fault tolerance by saving intermediate results for easy restarts in case of hardware failure. The parallel algorithm was tested on various geographically distributed heterogeneous metacomputing configurations involving an IBM SP3TM, an IBM SP2TM, and two SGI Origin 2000TM ’s. The tests were performed with minimal code modification, and were made possible by GlobusTM (a metacomputing software toolkit) and the Globus-enabled version of MPI (MPICH-G). Our performance tests indicate that while the algorithm works reasonably well under the metacomputing environment for a moderate number of processors, the communication overhead can become prohibitive for large processor configurations.

查看原文本刊更多论文

基于Globus的元计算环境下多元地理聚类

作者提出了一种多元、非分层统计聚类的元计算应用于来自美国48个相邻地区的地理环境数据，以产生生态相似区域的地图，称为生态区。这些地图代表了比传统技术生成的更精细的区域划分:专家用记号笔。一些被认为会影响植被生长的变量(如温度、有机物、降雨等)聚集在分辨率小至1平方公里(1平方公里)的区域。这些数据可以在n维(n = 9到25)数据空间中表示超过780万个地图单元。作者利用MPI(消息传递接口)消息传递例程开发了迭代统计聚类算法的并行版本。并行算法采用经典的自调度单程序多数据(SPMD)组织;在异构元计算环境下进行动态负载均衡，实现合理的性能;并通过保存中间结果，以便在硬件发生故障时轻松重启，从而提供容错性。在不同地理分布的异构元计算配置(包括IBM SP3TM、IBM SP2TM和两个SGI Origin 2000TM)上对并行算法进行了测试。通过GlobusTM(一种元计算软件工具包)和支持globus的MPI版本(MPICH-G)，以最少的代码修改执行了这些测试。我们的性能测试表明，虽然该算法在中等数量处理器的元计算环境下工作得相当好，但对于大型处理器配置，通信开销可能会变得令人望而却步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM/IEEE SC 1999 Conference (SC'99)

自引率

0.00%

发文量