Classification of Astronomical Objects in the Galaxy M81 using Machine Learning Techniques II. An Application of Clustering in Data Pre-processing

2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE) Pub Date : 2021-06-30 DOI:10.1109/JCSSE53117.2021.9493825

Tapanapong Chuntama, C. Suwannajak, P. Techa-angkoon, Benjamas Panyangam, N. Tanakul

{"title":"Classification of Astronomical Objects in the Galaxy M81 using Machine Learning Techniques II. An Application of Clustering in Data Pre-processing","authors":"Tapanapong Chuntama, C. Suwannajak, P. Techa-angkoon, Benjamas Panyangam, N. Tanakul","doi":"10.1109/JCSSE53117.2021.9493825","DOIUrl":null,"url":null,"abstract":"Identifying objects with a certain class in the current data in astronomy are challenging. In this study, we explored the methods to identify globular cluster candidates from a pool of astronomical objects in the galaxy M81. First, we developed a method to automatically cross-match the data. This process was done by manually overlayed the imaging data in the previous study. The process also eliminated the data points that only appear in only one or two filters, which indicates that they are artifacts. Next, we used the Expectation Maximization (EM) clustering technique to label the training dataset with classes and to reduce the use of humans in the preprocessing process. Our results show that the data can be clustered into 12 clusters, which can be grouped into 6 groups of astronomical objects with similar morphological structures. When using these 6 groups of data to build classification models, we found that the prediction accuracies have improved significantly. In the case of Random Forest, the accuracy has improved from 79.9% to 90.57% and from 67.1% to 91.59% for Multilayer Perceptron. Moreover, when using the model built from those data to analyze the unseen dataset, the results also show that the model can categorize the objects into classes with characteristics close to those in astronomy. However, this model still cannot fully separate globular clusters from foreground stars and background galaxies due to the similarities in their photometric properties.","PeriodicalId":437534,"journal":{"name":"2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE53117.2021.9493825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Identifying objects with a certain class in the current data in astronomy are challenging. In this study, we explored the methods to identify globular cluster candidates from a pool of astronomical objects in the galaxy M81. First, we developed a method to automatically cross-match the data. This process was done by manually overlayed the imaging data in the previous study. The process also eliminated the data points that only appear in only one or two filters, which indicates that they are artifacts. Next, we used the Expectation Maximization (EM) clustering technique to label the training dataset with classes and to reduce the use of humans in the preprocessing process. Our results show that the data can be clustered into 12 clusters, which can be grouped into 6 groups of astronomical objects with similar morphological structures. When using these 6 groups of data to build classification models, we found that the prediction accuracies have improved significantly. In the case of Random Forest, the accuracy has improved from 79.9% to 90.57% and from 67.1% to 91.59% for Multilayer Perceptron. Moreover, when using the model built from those data to analyze the unseen dataset, the results also show that the model can categorize the objects into classes with characteristics close to those in astronomy. However, this model still cannot fully separate globular clusters from foreground stars and background galaxies due to the similarities in their photometric properties.

查看原文本刊更多论文

M81星系中天体的机器学习分类II。聚类技术在数据预处理中的应用

在当前的天文学数据中，识别具有特定类别的物体是具有挑战性的。在这项研究中，我们探索了从M81星系的天体池中识别球状星团候选者的方法。首先，我们开发了一种自动交叉匹配数据的方法。这一过程是通过人工叠加以往研究的成像数据来完成的。该过程还消除了只出现在一个或两个过滤器中的数据点，这表明它们是工件。接下来，我们使用期望最大化(EM)聚类技术对训练数据集进行分类标记，并减少预处理过程中人工的使用。我们的结果表明，数据可以聚为12个簇，这些簇可以分为6组具有相似形态结构的天体。当使用这6组数据建立分类模型时，我们发现预测精度有了明显的提高。在随机森林的情况下，准确率从79.9%提高到90.57%，多层感知机的准确率从67.1%提高到91.59%。此外，利用这些数据建立的模型对未见数据集进行分析时，结果还表明该模型可以将目标分类为具有接近天文学特征的类别。然而，由于球状星团与前景恒星和背景星系在光度特性上的相似性，该模型仍然不能完全将其与前景恒星和背景星系分开。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE)

自引率

0.00%

发文量