Xu Teng, Thomas Beckler, Bradley Gannon, Benjamin Huinker, Gabriel Huinker, Koushhik Kumar, Christina Marquez, Jacob Spooner, Goce Trajcevski, Prabin Giri, A. Dotter, J. Andrews, S. Coughlin, Y. Qin, J. G. Serra-Perez, N. Tran, Jaime Roman-Garja, K. Kovlakas, E. Zapartas, S. Bavera, D. Misra, T. Fragos
{"title":"多变量天体物理数据聚类的耦合相似性和多样性","authors":"Xu Teng, Thomas Beckler, Bradley Gannon, Benjamin Huinker, Gabriel Huinker, Koushhik Kumar, Christina Marquez, Jacob Spooner, Goce Trajcevski, Prabin Giri, A. Dotter, J. Andrews, S. Coughlin, Y. Qin, J. G. Serra-Perez, N. Tran, Jaime Roman-Garja, K. Kovlakas, E. Zapartas, S. Bavera, D. Misra, T. Fragos","doi":"10.1145/3474717.3483989","DOIUrl":null,"url":null,"abstract":"Traditionally, clustering of multivariate data aims at grouping objects described with multiple heterogeneous attributes based on a suitable similarity (conversely, distance) function. One of the main challenges is due to the fact that it is not straightforward to directly apply mathematical operations (e.g., sum, average) to the feature values, as they stem from heterogeneous contexts. In this work we take the challenge a step further and tackle the problem of clustering multivariate datasets based on jointly considering: (a) similarity among a subset of the attributes; and (b) distance-based diversity among another subset of the attributes. Specifically, we focus on astrophysics data, where the snapshots of the stellar evolution for different stars contain over 40 distinct attributes corresponding to various physical and categorical (e.g., 'black hole') attributes. We present CSD-CAMD -- a prototype system for Coupling Similarity and Diversity for Clustering Astrophysics Multivariate Datasets. It provides a flexibility for the users to select their preferred subsets of attributes; assign weight (to reflect their relative importance on the clustering); and select whether the impact should be in terms of proximity or distance. In addition, CSD-CAMD allows for selecting a clustring algorithm and enables visualization of the outcome of clustering.","PeriodicalId":340759,"journal":{"name":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"CSD-CMAD: Coupling Similarity and Diversity for Clustering Multivariate Astrophysics Data\",\"authors\":\"Xu Teng, Thomas Beckler, Bradley Gannon, Benjamin Huinker, Gabriel Huinker, Koushhik Kumar, Christina Marquez, Jacob Spooner, Goce Trajcevski, Prabin Giri, A. Dotter, J. Andrews, S. Coughlin, Y. Qin, J. G. Serra-Perez, N. Tran, Jaime Roman-Garja, K. Kovlakas, E. Zapartas, S. Bavera, D. Misra, T. Fragos\",\"doi\":\"10.1145/3474717.3483989\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditionally, clustering of multivariate data aims at grouping objects described with multiple heterogeneous attributes based on a suitable similarity (conversely, distance) function. One of the main challenges is due to the fact that it is not straightforward to directly apply mathematical operations (e.g., sum, average) to the feature values, as they stem from heterogeneous contexts. In this work we take the challenge a step further and tackle the problem of clustering multivariate datasets based on jointly considering: (a) similarity among a subset of the attributes; and (b) distance-based diversity among another subset of the attributes. Specifically, we focus on astrophysics data, where the snapshots of the stellar evolution for different stars contain over 40 distinct attributes corresponding to various physical and categorical (e.g., 'black hole') attributes. We present CSD-CAMD -- a prototype system for Coupling Similarity and Diversity for Clustering Astrophysics Multivariate Datasets. It provides a flexibility for the users to select their preferred subsets of attributes; assign weight (to reflect their relative importance on the clustering); and select whether the impact should be in terms of proximity or distance. In addition, CSD-CAMD allows for selecting a clustring algorithm and enables visualization of the outcome of clustering.\",\"PeriodicalId\":340759,\"journal\":{\"name\":\"Proceedings of the 29th International Conference on Advances in Geographic Information Systems\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 29th International Conference on Advances in Geographic Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3474717.3483989\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474717.3483989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CSD-CMAD: Coupling Similarity and Diversity for Clustering Multivariate Astrophysics Data
Traditionally, clustering of multivariate data aims at grouping objects described with multiple heterogeneous attributes based on a suitable similarity (conversely, distance) function. One of the main challenges is due to the fact that it is not straightforward to directly apply mathematical operations (e.g., sum, average) to the feature values, as they stem from heterogeneous contexts. In this work we take the challenge a step further and tackle the problem of clustering multivariate datasets based on jointly considering: (a) similarity among a subset of the attributes; and (b) distance-based diversity among another subset of the attributes. Specifically, we focus on astrophysics data, where the snapshots of the stellar evolution for different stars contain over 40 distinct attributes corresponding to various physical and categorical (e.g., 'black hole') attributes. We present CSD-CAMD -- a prototype system for Coupling Similarity and Diversity for Clustering Astrophysics Multivariate Datasets. It provides a flexibility for the users to select their preferred subsets of attributes; assign weight (to reflect their relative importance on the clustering); and select whether the impact should be in terms of proximity or distance. In addition, CSD-CAMD allows for selecting a clustring algorithm and enables visualization of the outcome of clustering.