多变量天体物理数据聚类的耦合相似性和多样性

Xu Teng, Thomas Beckler, Bradley Gannon, Benjamin Huinker, Gabriel Huinker, Koushhik Kumar, Christina Marquez, Jacob Spooner, Goce Trajcevski, Prabin Giri, A. Dotter, J. Andrews, S. Coughlin, Y. Qin, J. G. Serra-Perez, N. Tran, Jaime Roman-Garja, K. Kovlakas, E. Zapartas, S. Bavera, D. Misra, T. Fragos
{"title":"多变量天体物理数据聚类的耦合相似性和多样性","authors":"Xu Teng, Thomas Beckler, Bradley Gannon, Benjamin Huinker, Gabriel Huinker, Koushhik Kumar, Christina Marquez, Jacob Spooner, Goce Trajcevski, Prabin Giri, A. Dotter, J. Andrews, S. Coughlin, Y. Qin, J. G. Serra-Perez, N. Tran, Jaime Roman-Garja, K. Kovlakas, E. Zapartas, S. Bavera, D. Misra, T. Fragos","doi":"10.1145/3474717.3483989","DOIUrl":null,"url":null,"abstract":"Traditionally, clustering of multivariate data aims at grouping objects described with multiple heterogeneous attributes based on a suitable similarity (conversely, distance) function. One of the main challenges is due to the fact that it is not straightforward to directly apply mathematical operations (e.g., sum, average) to the feature values, as they stem from heterogeneous contexts. In this work we take the challenge a step further and tackle the problem of clustering multivariate datasets based on jointly considering: (a) similarity among a subset of the attributes; and (b) distance-based diversity among another subset of the attributes. Specifically, we focus on astrophysics data, where the snapshots of the stellar evolution for different stars contain over 40 distinct attributes corresponding to various physical and categorical (e.g., 'black hole') attributes. We present CSD-CAMD -- a prototype system for Coupling Similarity and Diversity for Clustering Astrophysics Multivariate Datasets. It provides a flexibility for the users to select their preferred subsets of attributes; assign weight (to reflect their relative importance on the clustering); and select whether the impact should be in terms of proximity or distance. In addition, CSD-CAMD allows for selecting a clustring algorithm and enables visualization of the outcome of clustering.","PeriodicalId":340759,"journal":{"name":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"CSD-CMAD: Coupling Similarity and Diversity for Clustering Multivariate Astrophysics Data\",\"authors\":\"Xu Teng, Thomas Beckler, Bradley Gannon, Benjamin Huinker, Gabriel Huinker, Koushhik Kumar, Christina Marquez, Jacob Spooner, Goce Trajcevski, Prabin Giri, A. Dotter, J. Andrews, S. Coughlin, Y. Qin, J. G. Serra-Perez, N. Tran, Jaime Roman-Garja, K. Kovlakas, E. Zapartas, S. Bavera, D. Misra, T. Fragos\",\"doi\":\"10.1145/3474717.3483989\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditionally, clustering of multivariate data aims at grouping objects described with multiple heterogeneous attributes based on a suitable similarity (conversely, distance) function. One of the main challenges is due to the fact that it is not straightforward to directly apply mathematical operations (e.g., sum, average) to the feature values, as they stem from heterogeneous contexts. In this work we take the challenge a step further and tackle the problem of clustering multivariate datasets based on jointly considering: (a) similarity among a subset of the attributes; and (b) distance-based diversity among another subset of the attributes. Specifically, we focus on astrophysics data, where the snapshots of the stellar evolution for different stars contain over 40 distinct attributes corresponding to various physical and categorical (e.g., 'black hole') attributes. We present CSD-CAMD -- a prototype system for Coupling Similarity and Diversity for Clustering Astrophysics Multivariate Datasets. It provides a flexibility for the users to select their preferred subsets of attributes; assign weight (to reflect their relative importance on the clustering); and select whether the impact should be in terms of proximity or distance. In addition, CSD-CAMD allows for selecting a clustring algorithm and enables visualization of the outcome of clustering.\",\"PeriodicalId\":340759,\"journal\":{\"name\":\"Proceedings of the 29th International Conference on Advances in Geographic Information Systems\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 29th International Conference on Advances in Geographic Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3474717.3483989\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474717.3483989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

传统上,多变量数据聚类的目的是基于合适的相似性(反之,距离)函数对具有多个异构属性的对象进行分组。其中一个主要的挑战是,它不是直接应用数学运算(例如,求和,平均)到特征值,因为它们来自异构上下文。在这项工作中,我们将挑战进一步推进,并解决了基于联合考虑的多变量数据集聚类问题:(a)属性子集之间的相似性;(b)属性的另一个子集之间基于距离的多样性。具体来说,我们专注于天体物理学数据,其中不同恒星的恒星演化快照包含40多个不同的属性,对应于各种物理和分类(例如,“黑洞”)属性。我们提出了CSD-CAMD——一个用于天体物理多变量数据集聚类的相似性和多样性耦合的原型系统。它为用户选择他们喜欢的属性子集提供了灵活性;分配权重(以反映它们在聚类中的相对重要性);选择影响应该是在距离上还是在距离上。此外,CSD-CAMD允许选择聚类算法,并支持聚类结果的可视化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CSD-CMAD: Coupling Similarity and Diversity for Clustering Multivariate Astrophysics Data
Traditionally, clustering of multivariate data aims at grouping objects described with multiple heterogeneous attributes based on a suitable similarity (conversely, distance) function. One of the main challenges is due to the fact that it is not straightforward to directly apply mathematical operations (e.g., sum, average) to the feature values, as they stem from heterogeneous contexts. In this work we take the challenge a step further and tackle the problem of clustering multivariate datasets based on jointly considering: (a) similarity among a subset of the attributes; and (b) distance-based diversity among another subset of the attributes. Specifically, we focus on astrophysics data, where the snapshots of the stellar evolution for different stars contain over 40 distinct attributes corresponding to various physical and categorical (e.g., 'black hole') attributes. We present CSD-CAMD -- a prototype system for Coupling Similarity and Diversity for Clustering Astrophysics Multivariate Datasets. It provides a flexibility for the users to select their preferred subsets of attributes; assign weight (to reflect their relative importance on the clustering); and select whether the impact should be in terms of proximity or distance. In addition, CSD-CAMD allows for selecting a clustring algorithm and enables visualization of the outcome of clustering.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信