面向视觉分析的基于感知的聚类技术评价

Michaël Aupetit, M. Sedlmair, M. Abbas, Abdelkader Baggag, H. Bensmail
{"title":"面向视觉分析的基于感知的聚类技术评价","authors":"Michaël Aupetit, M. Sedlmair, M. Abbas, Abdelkader Baggag, H. Bensmail","doi":"10.1109/VISUAL.2019.8933620","DOIUrl":null,"url":null,"abstract":"Automatic clustering techniques play a central role in Visual Analytics by helping analysts to discover interesting patterns in high-dimensional data. Evaluating these clustering techniques, however, is difficult due to the lack of universal ground truth. Instead, clustering approaches are usually evaluated based on a subjective visual judgment of low-dimensional scatterplots of different datasets. As clustering is an inherent human-in-the-loop task, we propose a more systematic way of evaluating clustering algorithms based on quantification of human perception of clusters in 2D scatterplots. The core question we are asking is in how far existing clustering techniques align with clusters perceived by humans. To do so, we build on a dataset from a previous study [1], in which 34 human subjects la-beled 1000 synthetic scatterplots in terms of whether they could see one or more than one cluster. Here, we use this dataset to benchmark state-of-the-art clustering techniques in terms of how far they agree with these human judgments. More specifically, we assess 1437 variants of K-means, Gaussian Mixture Models, CLIQUE, DBSCAN, and Agglomerative Clustering techniques on these benchmarks data. We get unexpected results. For instance, CLIQUE and DBSCAN are at best in slight agreement on this basic cluster counting task, while model-agnostic Agglomerative clustering can be up to a substantial agreement with human subjects depending on the variants. We discuss how to extend this perception-based clustering benchmark approach, and how it could lead to the design of perception-based clustering techniques that would better support more trustworthy and explainable models of cluster patterns.","PeriodicalId":192801,"journal":{"name":"2019 IEEE Visualization Conference (VIS)","volume":"871 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Toward Perception-Based Evaluation of Clustering Techniques for Visual Analytics\",\"authors\":\"Michaël Aupetit, M. Sedlmair, M. Abbas, Abdelkader Baggag, H. Bensmail\",\"doi\":\"10.1109/VISUAL.2019.8933620\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic clustering techniques play a central role in Visual Analytics by helping analysts to discover interesting patterns in high-dimensional data. Evaluating these clustering techniques, however, is difficult due to the lack of universal ground truth. Instead, clustering approaches are usually evaluated based on a subjective visual judgment of low-dimensional scatterplots of different datasets. As clustering is an inherent human-in-the-loop task, we propose a more systematic way of evaluating clustering algorithms based on quantification of human perception of clusters in 2D scatterplots. The core question we are asking is in how far existing clustering techniques align with clusters perceived by humans. To do so, we build on a dataset from a previous study [1], in which 34 human subjects la-beled 1000 synthetic scatterplots in terms of whether they could see one or more than one cluster. Here, we use this dataset to benchmark state-of-the-art clustering techniques in terms of how far they agree with these human judgments. More specifically, we assess 1437 variants of K-means, Gaussian Mixture Models, CLIQUE, DBSCAN, and Agglomerative Clustering techniques on these benchmarks data. We get unexpected results. For instance, CLIQUE and DBSCAN are at best in slight agreement on this basic cluster counting task, while model-agnostic Agglomerative clustering can be up to a substantial agreement with human subjects depending on the variants. We discuss how to extend this perception-based clustering benchmark approach, and how it could lead to the design of perception-based clustering techniques that would better support more trustworthy and explainable models of cluster patterns.\",\"PeriodicalId\":192801,\"journal\":{\"name\":\"2019 IEEE Visualization Conference (VIS)\",\"volume\":\"871 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Visualization Conference (VIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VISUAL.2019.8933620\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Visualization Conference (VIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VISUAL.2019.8933620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

自动聚类技术通过帮助分析人员在高维数据中发现有趣的模式,在可视化分析中发挥着核心作用。然而,由于缺乏普遍的基础真值,评估这些聚类技术是困难的。相反,聚类方法通常是基于对不同数据集的低维散点图的主观视觉判断来评估的。由于聚类是一项固有的“人在环”任务,我们提出了一种基于量化人类对二维散点图中聚类的感知来评估聚类算法的更系统的方法。我们要问的核心问题是,现有的聚类技术与人类感知的聚类在多大程度上一致。为此,我们建立在先前研究[1]的数据集上,其中34名人类受试者根据他们是否可以看到一个或多个簇来标记1000个合成散点图。在这里,我们使用这个数据集对最先进的聚类技术进行基准测试,看看它们在多大程度上与这些人类判断一致。更具体地说,我们在这些基准数据上评估了K-means、高斯混合模型、CLIQUE、DBSCAN和聚集聚类技术的1437种变体。我们得到了意想不到的结果。例如,CLIQUE和DBSCAN在这个基本的聚类计数任务上最多是略微一致的,而与模型无关的聚集聚类可以根据变体与人类受试者达成基本一致。我们讨论了如何扩展这种基于感知的聚类基准方法,以及它如何导致基于感知的聚类技术的设计,从而更好地支持更可信和可解释的聚类模式模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Toward Perception-Based Evaluation of Clustering Techniques for Visual Analytics
Automatic clustering techniques play a central role in Visual Analytics by helping analysts to discover interesting patterns in high-dimensional data. Evaluating these clustering techniques, however, is difficult due to the lack of universal ground truth. Instead, clustering approaches are usually evaluated based on a subjective visual judgment of low-dimensional scatterplots of different datasets. As clustering is an inherent human-in-the-loop task, we propose a more systematic way of evaluating clustering algorithms based on quantification of human perception of clusters in 2D scatterplots. The core question we are asking is in how far existing clustering techniques align with clusters perceived by humans. To do so, we build on a dataset from a previous study [1], in which 34 human subjects la-beled 1000 synthetic scatterplots in terms of whether they could see one or more than one cluster. Here, we use this dataset to benchmark state-of-the-art clustering techniques in terms of how far they agree with these human judgments. More specifically, we assess 1437 variants of K-means, Gaussian Mixture Models, CLIQUE, DBSCAN, and Agglomerative Clustering techniques on these benchmarks data. We get unexpected results. For instance, CLIQUE and DBSCAN are at best in slight agreement on this basic cluster counting task, while model-agnostic Agglomerative clustering can be up to a substantial agreement with human subjects depending on the variants. We discuss how to extend this perception-based clustering benchmark approach, and how it could lead to the design of perception-based clustering techniques that would better support more trustworthy and explainable models of cluster patterns.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信