ClusterSculptor: A Visual Analytics Tool for High-Dimensional Data

E. J. Nam, Yiping Han, K. Mueller, A. Zelenyuk, D. Imre
{"title":"ClusterSculptor: A Visual Analytics Tool for High-Dimensional Data","authors":"E. J. Nam, Yiping Han, K. Mueller, A. Zelenyuk, D. Imre","doi":"10.1109/VAST.2007.4388999","DOIUrl":null,"url":null,"abstract":"Cluster analysis (CA) is a powerful strategy for the exploration of high-dimensional data in the absence of a-priori hypotheses or data classification models, and the results of CA can then be used to form such models. But even though formal models and classification rules may not exist in these data exploration scenarios, domain scientists and experts generally have a vast amount of non-compiled knowledge and intuition that they can bring to bear in this effort. In CA, there are various popular mechanisms to generate the clusters, however, the results from their non- supervised deployment rarely fully agree with this expert knowledge and intuition. To this end, our paper describes a comprehensive and intuitive framework to aid scientists in the derivation of classification hierarchies in CA, using k-means as the overall clustering engine, but allowing them to tune its parameters interactively based on a non-distorted compact visual presentation of the inherent characteristics of the data in high- dimensional space. These include cluster geometry, composition, spatial relations to neighbors, and others. In essence, we provide all the tools necessary for a high-dimensional activity we call cluster sculpting, and the evolving hierarchy can then be viewed in a space-efficient radial dendrogram. We demonstrate our system in the context of the mining and classification of a large collection of millions of data items of aerosol mass spectra, but our framework readily applies to any high-dimensional CA scenario.","PeriodicalId":227910,"journal":{"name":"2007 IEEE Symposium on Visual Analytics Science and Technology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"78","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Symposium on Visual Analytics Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VAST.2007.4388999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 78

Abstract

Cluster analysis (CA) is a powerful strategy for the exploration of high-dimensional data in the absence of a-priori hypotheses or data classification models, and the results of CA can then be used to form such models. But even though formal models and classification rules may not exist in these data exploration scenarios, domain scientists and experts generally have a vast amount of non-compiled knowledge and intuition that they can bring to bear in this effort. In CA, there are various popular mechanisms to generate the clusters, however, the results from their non- supervised deployment rarely fully agree with this expert knowledge and intuition. To this end, our paper describes a comprehensive and intuitive framework to aid scientists in the derivation of classification hierarchies in CA, using k-means as the overall clustering engine, but allowing them to tune its parameters interactively based on a non-distorted compact visual presentation of the inherent characteristics of the data in high- dimensional space. These include cluster geometry, composition, spatial relations to neighbors, and others. In essence, we provide all the tools necessary for a high-dimensional activity we call cluster sculpting, and the evolving hierarchy can then be viewed in a space-efficient radial dendrogram. We demonstrate our system in the context of the mining and classification of a large collection of millions of data items of aerosol mass spectra, but our framework readily applies to any high-dimensional CA scenario.
ClusterSculptor:用于高维数据的可视化分析工具
聚类分析(CA)是在缺乏先验假设或数据分类模型的情况下探索高维数据的一种强大策略,聚类分析的结果可以用来形成这样的模型。但是,即使在这些数据探索场景中可能不存在正式的模型和分类规则,领域科学家和专家通常也有大量未经编译的知识和直觉,他们可以在这项工作中发挥作用。在CA中,有各种流行的机制来生成集群,然而,它们的非监督部署的结果很少与这种专家知识和直觉完全一致。为此,我们的论文描述了一个全面而直观的框架,以帮助科学家在CA中推导分类层次结构,使用k-means作为整体聚类引擎,但允许他们基于高维空间中数据固有特征的非扭曲紧凑视觉表示交互式地调整其参数。这些包括集群几何、组成、与邻居的空间关系等。从本质上讲,我们提供了所有必要的高维活动的工具,我们称之为集群雕刻,然后可以在一个空间效率高的径向树状图中查看不断发展的层次结构。我们在大量收集数百万个气溶胶质谱数据项的挖掘和分类的背景下演示了我们的系统,但我们的框架很容易适用于任何高维CA场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信