{"title":"SOM-empowered graph segmentation for fast automatic clustering of large and complex data","authors":"E. Merényi, Joshua Taylor","doi":"10.1109/WSOM.2017.8020004","DOIUrl":null,"url":null,"abstract":"Many clustering methods, including modern graph segmentation algorithms, run into limitations when encountering “Big Data”, data with high feature dimensions, large volume, and complex structure. SOM-based clustering has been demonstrated to accurately capture many clusters of widely varying statistical properties in such data. While a number of automated SOM segmentations have been put forward, the best identifications of complex cluster structures to date are those performed interactively from informative visualizations of the learned SOM's knowledge. This does not scale for Big Data, large archives or near-real time analyses for fast decision-making. We present a new automated approach to SOM-segmentation which closely approximates the precision of the interactive method for complicated data, and at the same time is very fast and memory-efficient. We achieve this by infusing SOM knowledge into leading graph segmentation algorithms which, by themselves, produce extremely poor results segmenting the SOM prototypes. We use the SOM prototypes as input vectors and CONN similarity measure, derived from the SOM's knowledge of the data connectivity, as edge weighting to the graph segmentation algorithms. We demonstrate the effectiveness on synthetic data and on real spectral imagery.","PeriodicalId":130086,"journal":{"name":"2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WSOM.2017.8020004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Many clustering methods, including modern graph segmentation algorithms, run into limitations when encountering “Big Data”, data with high feature dimensions, large volume, and complex structure. SOM-based clustering has been demonstrated to accurately capture many clusters of widely varying statistical properties in such data. While a number of automated SOM segmentations have been put forward, the best identifications of complex cluster structures to date are those performed interactively from informative visualizations of the learned SOM's knowledge. This does not scale for Big Data, large archives or near-real time analyses for fast decision-making. We present a new automated approach to SOM-segmentation which closely approximates the precision of the interactive method for complicated data, and at the same time is very fast and memory-efficient. We achieve this by infusing SOM knowledge into leading graph segmentation algorithms which, by themselves, produce extremely poor results segmenting the SOM prototypes. We use the SOM prototypes as input vectors and CONN similarity measure, derived from the SOM's knowledge of the data connectivity, as edge weighting to the graph segmentation algorithms. We demonstrate the effectiveness on synthetic data and on real spectral imagery.