Improving the open cluster census

E. L. Hunt, S. Reffert
{"title":"Improving the open cluster census","authors":"E. L. Hunt, S. Reffert","doi":"10.1051/0004-6361/202039341","DOIUrl":null,"url":null,"abstract":"The census of open clusters in the Milky Way is in a never-before seen state of flux. Recent works have reported hundreds of new open clusters thanks to the incredible astrometric quality of the Gaia satellite, but other works have also reported that many open clusters discovered in the pre Gaia era may be associations. We aim to conduct a comparison of clustering algorithms used to detect open clusters, attempting to statistically quantify their strengths and weaknesses by deriving the sensitivity, specificity, and precision of each as well as their true positive rate against a larger sample. We selected DBSCAN, HDBSCAN, and Gaussian mixture models for further study, owing to their speed and appropriateness for use with Gaia data. We developed a preprocessing pipeline for Gaia data and developed the algorithms further for the specific application to open clusters. We derived detection rates for all 1385 open clusters in the fields in our study as well as more detailed performance statistics for 100 of these open clusters. DBSCAN was sensitive to 50% to 62% of the true positive open clusters in our sample, with generally very good specificity and precision. HDBSCAN traded precision for a higher sensitivity of up to 82%, especially across different distances and scales of open clusters. Gaussian mixture models were slow and only sensitive to 33% of open clusters in our sample, which tended to be larger objects. Additionally, we report on 41 new open cluster candidates detected by HDBSCAN, three of which are closer than 500 pc. When used with additional post-processing to mitigate its false positives, we have found that HDBSCAN is the most sensitive and effective algorithm for recovering open clusters in Gaia data. Our results suggest that many more new and already reported open clusters have yet to be detected in Gaia data.","PeriodicalId":8452,"journal":{"name":"arXiv: Astrophysics of Galaxies","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Astrophysics of Galaxies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1051/0004-6361/202039341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

The census of open clusters in the Milky Way is in a never-before seen state of flux. Recent works have reported hundreds of new open clusters thanks to the incredible astrometric quality of the Gaia satellite, but other works have also reported that many open clusters discovered in the pre Gaia era may be associations. We aim to conduct a comparison of clustering algorithms used to detect open clusters, attempting to statistically quantify their strengths and weaknesses by deriving the sensitivity, specificity, and precision of each as well as their true positive rate against a larger sample. We selected DBSCAN, HDBSCAN, and Gaussian mixture models for further study, owing to their speed and appropriateness for use with Gaia data. We developed a preprocessing pipeline for Gaia data and developed the algorithms further for the specific application to open clusters. We derived detection rates for all 1385 open clusters in the fields in our study as well as more detailed performance statistics for 100 of these open clusters. DBSCAN was sensitive to 50% to 62% of the true positive open clusters in our sample, with generally very good specificity and precision. HDBSCAN traded precision for a higher sensitivity of up to 82%, especially across different distances and scales of open clusters. Gaussian mixture models were slow and only sensitive to 33% of open clusters in our sample, which tended to be larger objects. Additionally, we report on 41 new open cluster candidates detected by HDBSCAN, three of which are closer than 500 pc. When used with additional post-processing to mitigate its false positives, we have found that HDBSCAN is the most sensitive and effective algorithm for recovering open clusters in Gaia data. Our results suggest that many more new and already reported open clusters have yet to be detected in Gaia data.
完善开放集群普查
银河系中疏散星团的普查正处于一种前所未有的变化状态。最近的研究报告了数百个新的疏散星团,这要归功于盖亚卫星令人难以置信的天文测量质量,但其他研究也报告了许多在前盖亚时代发现的疏散星团可能是关联的。我们的目标是对用于检测开放聚类的聚类算法进行比较,试图通过推导每种算法的灵敏度、特异性和精度以及它们对更大样本的真阳性率来统计量化它们的优缺点。我们选择了DBSCAN、HDBSCAN和高斯混合模型进行进一步的研究,因为它们的速度快,适合与Gaia数据一起使用。我们开发了Gaia数据的预处理管道,并进一步开发了针对开放集群具体应用的算法。在我们的研究中,我们得出了所有1385个开放集群的检测率,以及其中100个开放集群的更详细的性能统计数据。在我们的样本中,DBSCAN对50%至62%的真阳性开放簇敏感,通常具有非常好的特异性和准确性。HDBSCAN以精度换取高达82%的灵敏度,特别是在不同距离和尺度的疏散星团之间。高斯混合模型速度慢,仅对样本中33%的开放簇敏感,这些开放簇往往是较大的对象。此外,我们报告了HDBSCAN检测到的41个新的开放簇候选,其中三个接近500 pc。当与额外的后处理一起使用以减轻其误报时,我们发现HDBSCAN是恢复Gaia数据中开放簇的最敏感和有效的算法。我们的研究结果表明,在盖亚的数据中还发现了更多新的和已经报道过的疏散星团。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信