Orhun Aydin, Mark V. Janikas, R. Assunção, Ting-Hwan Lee
{"title":"SKATER-CON: Unsupervised Regionalization via Stochastic Tree Partitioning within a Consensus Framework Using Random Spanning Trees: Research Paper","authors":"Orhun Aydin, Mark V. Janikas, R. Assunção, Ting-Hwan Lee","doi":"10.1145/3281548.3281554","DOIUrl":null,"url":null,"abstract":"Spatially constrained clustering, also known as regionalization, aims to group spatial objects into spatially contiguous clusters also known as regions. Among different approaches, tree-based partitioning is reported to define homogeneous regions rigorously, without ad-hoc adjustments, in a computationally efficient manner. One of the shortcomings of tree-based partitioning is the so-called chaining problem that results in sub-optimal regions. We propose a consensus-based regionalization approach to address the chaining problem associated with a single tree, in particular the minimum spanning tree, by exploring a wide range of partitions via a set of random spanning trees (RST). We propose an algorithm, namely SKATER-CON, that partitions spatial data via a consensus-based framework from an ensemble of regionalizations defined by its deterministic counter-part, the SKATER algorithm applied along stochastic search paths defined by RSTs. SKATER-CON utilizes evidence accumulation to represent an ensemble of regionalizations as a similarity graph. The similarity graph represents spatial objects as vertexes and frequency at which objects are assigned to the same region in the ensemble as edge weights. Proposed algorithm determines consensus among different regionalization by partitioning the similarity graph using a multi-level graph partitioning algorithm (METIS). Spatial constraints are imposed on the similarity graph prior to partitioning to ensure spatial constraints are reflected in the consensus result. We rigorously test the quality of regions produced by SKATER-CON on a large, synthetically generated dataset. The synthetic dataset is the result of full-factorial experiments designed on number, fuzziness, geometry and size of regions. Same dataset is also used compare our approach against state-of-the-art regionalization algorithms (SKATER and ARISEL). Lastly, we show the value added by SKATER-CON compared to SKATER on a real-world dataset based on Ecological Marine Units (EMU) dataset.","PeriodicalId":231184,"journal":{"name":"Proceedings of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery","volume":"168 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3281548.3281554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Spatially constrained clustering, also known as regionalization, aims to group spatial objects into spatially contiguous clusters also known as regions. Among different approaches, tree-based partitioning is reported to define homogeneous regions rigorously, without ad-hoc adjustments, in a computationally efficient manner. One of the shortcomings of tree-based partitioning is the so-called chaining problem that results in sub-optimal regions. We propose a consensus-based regionalization approach to address the chaining problem associated with a single tree, in particular the minimum spanning tree, by exploring a wide range of partitions via a set of random spanning trees (RST). We propose an algorithm, namely SKATER-CON, that partitions spatial data via a consensus-based framework from an ensemble of regionalizations defined by its deterministic counter-part, the SKATER algorithm applied along stochastic search paths defined by RSTs. SKATER-CON utilizes evidence accumulation to represent an ensemble of regionalizations as a similarity graph. The similarity graph represents spatial objects as vertexes and frequency at which objects are assigned to the same region in the ensemble as edge weights. Proposed algorithm determines consensus among different regionalization by partitioning the similarity graph using a multi-level graph partitioning algorithm (METIS). Spatial constraints are imposed on the similarity graph prior to partitioning to ensure spatial constraints are reflected in the consensus result. We rigorously test the quality of regions produced by SKATER-CON on a large, synthetically generated dataset. The synthetic dataset is the result of full-factorial experiments designed on number, fuzziness, geometry and size of regions. Same dataset is also used compare our approach against state-of-the-art regionalization algorithms (SKATER and ARISEL). Lastly, we show the value added by SKATER-CON compared to SKATER on a real-world dataset based on Ecological Marine Units (EMU) dataset.