Multivariate Geographic Clustering in A Metacomputing Environment Using Globus

G. Mahinthakumar, F. Hoffman, W. Hargrove, N. Karonis
{"title":"Multivariate Geographic Clustering in A Metacomputing Environment Using Globus","authors":"G. Mahinthakumar, F. Hoffman, W. Hargrove, N. Karonis","doi":"10.1145/331532.331537","DOIUrl":null,"url":null,"abstract":"The authors present a metacomputing application of multivariate, nonhierarchical statistical clustering to geographic environmental data from the 48 conterminous United States in order to produce maps of regions of ecological similarity, called ecoregions. These maps represent finer scale regionalizations than do those generated by the traditional technique: an expert with a marker pen. Several variables (e.g., temperature, organic matter, rainfall etc.) thought to affect the growth of vegetation are clustered at resolutions as fine as one square kilometer (1 km2). These data can represent over 7.8 million map cells in an n-dimensional (n = 9 to 25) data space. A parallel version of the iterative statistical clustering algorithm is developed by the authors using the MPI (Message Passing Interface) message passing routines. The parallel algorithm uses a classical, self-scheduling, single-program, multiple data (SPMD) organization; performs dynamic load balancing for reasonable performance in heterogeneous metacomputing environments; and provides fault tolerance by saving intermediate results for easy restarts in case of hardware failure. The parallel algorithm was tested on various geographically distributed heterogeneous metacomputing configurations involving an IBM SP3TM, an IBM SP2TM, and two SGI Origin 2000TM ’s. The tests were performed with minimal code modification, and were made possible by GlobusTM (a metacomputing software toolkit) and the Globus-enabled version of MPI (MPICH-G). Our performance tests indicate that while the algorithm works reasonably well under the metacomputing environment for a moderate number of processors, the communication overhead can become prohibitive for large processor configurations.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 1999 Conference (SC'99)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/331532.331537","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

Abstract

The authors present a metacomputing application of multivariate, nonhierarchical statistical clustering to geographic environmental data from the 48 conterminous United States in order to produce maps of regions of ecological similarity, called ecoregions. These maps represent finer scale regionalizations than do those generated by the traditional technique: an expert with a marker pen. Several variables (e.g., temperature, organic matter, rainfall etc.) thought to affect the growth of vegetation are clustered at resolutions as fine as one square kilometer (1 km2). These data can represent over 7.8 million map cells in an n-dimensional (n = 9 to 25) data space. A parallel version of the iterative statistical clustering algorithm is developed by the authors using the MPI (Message Passing Interface) message passing routines. The parallel algorithm uses a classical, self-scheduling, single-program, multiple data (SPMD) organization; performs dynamic load balancing for reasonable performance in heterogeneous metacomputing environments; and provides fault tolerance by saving intermediate results for easy restarts in case of hardware failure. The parallel algorithm was tested on various geographically distributed heterogeneous metacomputing configurations involving an IBM SP3TM, an IBM SP2TM, and two SGI Origin 2000TM ’s. The tests were performed with minimal code modification, and were made possible by GlobusTM (a metacomputing software toolkit) and the Globus-enabled version of MPI (MPICH-G). Our performance tests indicate that while the algorithm works reasonably well under the metacomputing environment for a moderate number of processors, the communication overhead can become prohibitive for large processor configurations.
基于Globus的元计算环境下多元地理聚类
作者提出了一种多元、非分层统计聚类的元计算应用于来自美国48个相邻地区的地理环境数据,以产生生态相似区域的地图,称为生态区。这些地图代表了比传统技术生成的更精细的区域划分:专家用记号笔。一些被认为会影响植被生长的变量(如温度、有机物、降雨等)聚集在分辨率小至1平方公里(1平方公里)的区域。这些数据可以在n维(n = 9到25)数据空间中表示超过780万个地图单元。作者利用MPI(消息传递接口)消息传递例程开发了迭代统计聚类算法的并行版本。并行算法采用经典的自调度单程序多数据(SPMD)组织;在异构元计算环境下进行动态负载均衡,实现合理的性能;并通过保存中间结果,以便在硬件发生故障时轻松重启,从而提供容错性。在不同地理分布的异构元计算配置(包括IBM SP3TM、IBM SP2TM和两个SGI Origin 2000TM)上对并行算法进行了测试。通过GlobusTM(一种元计算软件工具包)和支持globus的MPI版本(MPICH-G),以最少的代码修改执行了这些测试。我们的性能测试表明,虽然该算法在中等数量处理器的元计算环境下工作得相当好,但对于大型处理器配置,通信开销可能会变得令人望而却步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信