具有邻近约束的分层聚类

IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
G. Guénard, P. Legendre
{"title":"具有邻近约束的分层聚类","authors":"G. Guénard, P. Legendre","doi":"10.18637/jss.v103.i07","DOIUrl":null,"url":null,"abstract":"This article presents a new implementation of hierarchical clustering for the R language that allows one to apply spatial or temporal contiguity constraints during the clustering process. The need for contiguity constraint arises, for instance, when one wants to partition a map into different domains of similar physical conditions, identify discontinuities in time series, group regional administrative units with respect to their performance, and so on. To increase computation efficiency, we programmed the core functions in plain C . The result is a new R function, constr.hclust , which is distributed in package adespatial . The program implements the general agglomerative hierarchical clustering algorithm described by Lance and Williams (1966; 1967), with the particularity of allowing only clusters that are contiguous in geographic space or along time to fuse at any given step. Contiguity can be defined with respect to space or time. Information about spatial contiguity is provided by a connection network among sites, with edges describing the links between connected sites. Clustering with a temporal contiguity constraint is also known as chronological clustering. Information on temporal contiguity can be implicitly provided as the rank positions of observations in the time series. The implementation was mirrored on that found in the hierarchical clustering function hclust of the standard R package stats ( R Core Team 2022). We transcribed that function from Fortran to C and added the functionality to apply constraints when running the function. The implementation is efficient. It is limited mainly by input/output access as massive amounts of memory are potentially needed to store copies of the dissimilarity matrix and update its elements when analyzing large problems. We provided R computer code for plotting results for numbers of clusters.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"2013 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Hierarchical Clustering with Contiguity Constraint in R\",\"authors\":\"G. Guénard, P. Legendre\",\"doi\":\"10.18637/jss.v103.i07\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents a new implementation of hierarchical clustering for the R language that allows one to apply spatial or temporal contiguity constraints during the clustering process. The need for contiguity constraint arises, for instance, when one wants to partition a map into different domains of similar physical conditions, identify discontinuities in time series, group regional administrative units with respect to their performance, and so on. To increase computation efficiency, we programmed the core functions in plain C . The result is a new R function, constr.hclust , which is distributed in package adespatial . The program implements the general agglomerative hierarchical clustering algorithm described by Lance and Williams (1966; 1967), with the particularity of allowing only clusters that are contiguous in geographic space or along time to fuse at any given step. Contiguity can be defined with respect to space or time. Information about spatial contiguity is provided by a connection network among sites, with edges describing the links between connected sites. Clustering with a temporal contiguity constraint is also known as chronological clustering. Information on temporal contiguity can be implicitly provided as the rank positions of observations in the time series. The implementation was mirrored on that found in the hierarchical clustering function hclust of the standard R package stats ( R Core Team 2022). We transcribed that function from Fortran to C and added the functionality to apply constraints when running the function. The implementation is efficient. It is limited mainly by input/output access as massive amounts of memory are potentially needed to store copies of the dissimilarity matrix and update its elements when analyzing large problems. We provided R computer code for plotting results for numbers of clusters.\",\"PeriodicalId\":17237,\"journal\":{\"name\":\"Journal of Statistical Software\",\"volume\":\"2013 1\",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistical Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.18637/jss.v103.i07\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Software","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.18637/jss.v103.i07","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 11

摘要

本文为R语言提供了一种新的分层聚类实现,它允许在聚类过程中应用空间或时间上的连续性约束。例如,当需要将地图划分为具有相似物理条件的不同域、识别时间序列中的不连续性、根据其性能对区域管理单元进行分组等等时,就会出现对连续性约束的需求。为了提高计算效率,我们用C语言编写了核心函数。结果是一个新的R函数,constr。Hclust,它分布在包空间中。该程序实现了Lance和Williams (1966;1967),其特点是只允许在地理空间或时间上连续的集群在任何给定的步骤上融合。连续性可以根据空间或时间来定义。关于空间连续性的信息由站点之间的连接网络提供,边缘描述了连接站点之间的链接。具有时间连续性约束的聚类也称为时间聚类。时间连续性的信息可以隐式地作为观测值在时间序列中的秩位置提供。该实现是基于标准R包统计(R Core Team 2022)的分层聚类功能hclust中的实现的。我们将该函数从Fortran转录到C,并添加了在运行函数时应用约束的功能。实现是高效的。它主要受到输入/输出访问的限制,因为在分析大型问题时,可能需要大量内存来存储不同矩阵的副本并更新其元素。我们提供了R计算机代码来绘制集群数量的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hierarchical Clustering with Contiguity Constraint in R
This article presents a new implementation of hierarchical clustering for the R language that allows one to apply spatial or temporal contiguity constraints during the clustering process. The need for contiguity constraint arises, for instance, when one wants to partition a map into different domains of similar physical conditions, identify discontinuities in time series, group regional administrative units with respect to their performance, and so on. To increase computation efficiency, we programmed the core functions in plain C . The result is a new R function, constr.hclust , which is distributed in package adespatial . The program implements the general agglomerative hierarchical clustering algorithm described by Lance and Williams (1966; 1967), with the particularity of allowing only clusters that are contiguous in geographic space or along time to fuse at any given step. Contiguity can be defined with respect to space or time. Information about spatial contiguity is provided by a connection network among sites, with edges describing the links between connected sites. Clustering with a temporal contiguity constraint is also known as chronological clustering. Information on temporal contiguity can be implicitly provided as the rank positions of observations in the time series. The implementation was mirrored on that found in the hierarchical clustering function hclust of the standard R package stats ( R Core Team 2022). We transcribed that function from Fortran to C and added the functionality to apply constraints when running the function. The implementation is efficient. It is limited mainly by input/output access as massive amounts of memory are potentially needed to store copies of the dissimilarity matrix and update its elements when analyzing large problems. We provided R computer code for plotting results for numbers of clusters.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Statistical Software
Journal of Statistical Software 工程技术-计算机:跨学科应用
CiteScore
10.70
自引率
1.70%
发文量
40
审稿时长
6-12 weeks
期刊介绍: The Journal of Statistical Software (JSS) publishes open-source software and corresponding reproducible articles discussing all aspects of the design, implementation, documentation, application, evaluation, comparison, maintainance and distribution of software dedicated to improvement of state-of-the-art in statistical computing in all areas of empirical research. Open-source code and articles are jointly reviewed and published in this journal and should be accessible to a broad community of practitioners, teachers, and researchers in the field of statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信