{"title":"利用机器学习限制银河与光环的联系","authors":"A. Jana , L. Samushia","doi":"10.1016/j.ascom.2024.100883","DOIUrl":null,"url":null,"abstract":"<div><div>We investigate the potential of machine learning (ML) methods to model small-scale galaxy clustering for constraining Halo Occupation Distribution (HOD) parameters. Our analysis reveals that while many ML algorithms report good statistical fits, they often yield likelihood contours that are significantly biased in both mean values and variances relative to the true model parameters. This highlights the importance of careful data processing and algorithm selection in ML applications for galaxy clustering, as even seemingly robust methods can lead to biased results if not applied correctly. ML tools offer a promising approach to exploring the HOD parameter space with significantly reduced computational costs compared to traditional brute-force methods if their robustness is established. Using our ANN-based pipeline, we successfully recreate some standard results from recent literature. Properly restricting the HOD parameter space, transforming the training data, and carefully selecting ML algorithms are essential for achieving unbiased and robust predictions. Among the methods tested, artificial neural networks (ANNs) outperform random forests (RF) and ridge regression in predicting clustering statistics, when the HOD prior space is appropriately restricted. We demonstrate these findings using the projected two-point correlation function (<span><math><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>p</mi></mrow></msub><mrow><mo>(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>p</mi></mrow></msub><mo>)</mo></mrow></mrow></math></span>), angular multipoles of the correlation function (<span><math><mrow><msub><mrow><mi>ξ</mi></mrow><mrow><mi>ℓ</mi></mrow></msub><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow></mrow></math></span>), and the void probability function (VPF) of Luminous Red Galaxies from Dark Energy Spectroscopic Instrument mocks. Our results show that while combining <span><math><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>p</mi></mrow></msub><mrow><mo>(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>p</mi></mrow></msub><mo>)</mo></mrow></mrow></math></span> and VPF improves parameter constraints, adding the multipoles <span><math><msub><mrow><mi>ξ</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>, <span><math><msub><mrow><mi>ξ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>, and <span><math><msub><mrow><mi>ξ</mi></mrow><mrow><mn>4</mn></mrow></msub></math></span> to <span><math><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>p</mi></mrow></msub><mrow><mo>(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>p</mi></mrow></msub><mo>)</mo></mrow></mrow></math></span> does not significantly improve the constraints.</div></div>","PeriodicalId":48757,"journal":{"name":"Astronomy and Computing","volume":"49 ","pages":"Article 100883"},"PeriodicalIF":1.9000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Constraining Galaxy-Halo connection using machine learning\",\"authors\":\"A. Jana , L. Samushia\",\"doi\":\"10.1016/j.ascom.2024.100883\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>We investigate the potential of machine learning (ML) methods to model small-scale galaxy clustering for constraining Halo Occupation Distribution (HOD) parameters. Our analysis reveals that while many ML algorithms report good statistical fits, they often yield likelihood contours that are significantly biased in both mean values and variances relative to the true model parameters. This highlights the importance of careful data processing and algorithm selection in ML applications for galaxy clustering, as even seemingly robust methods can lead to biased results if not applied correctly. ML tools offer a promising approach to exploring the HOD parameter space with significantly reduced computational costs compared to traditional brute-force methods if their robustness is established. Using our ANN-based pipeline, we successfully recreate some standard results from recent literature. Properly restricting the HOD parameter space, transforming the training data, and carefully selecting ML algorithms are essential for achieving unbiased and robust predictions. Among the methods tested, artificial neural networks (ANNs) outperform random forests (RF) and ridge regression in predicting clustering statistics, when the HOD prior space is appropriately restricted. We demonstrate these findings using the projected two-point correlation function (<span><math><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>p</mi></mrow></msub><mrow><mo>(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>p</mi></mrow></msub><mo>)</mo></mrow></mrow></math></span>), angular multipoles of the correlation function (<span><math><mrow><msub><mrow><mi>ξ</mi></mrow><mrow><mi>ℓ</mi></mrow></msub><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow></mrow></math></span>), and the void probability function (VPF) of Luminous Red Galaxies from Dark Energy Spectroscopic Instrument mocks. Our results show that while combining <span><math><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>p</mi></mrow></msub><mrow><mo>(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>p</mi></mrow></msub><mo>)</mo></mrow></mrow></math></span> and VPF improves parameter constraints, adding the multipoles <span><math><msub><mrow><mi>ξ</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>, <span><math><msub><mrow><mi>ξ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>, and <span><math><msub><mrow><mi>ξ</mi></mrow><mrow><mn>4</mn></mrow></msub></math></span> to <span><math><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>p</mi></mrow></msub><mrow><mo>(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>p</mi></mrow></msub><mo>)</mo></mrow></mrow></math></span> does not significantly improve the constraints.</div></div>\",\"PeriodicalId\":48757,\"journal\":{\"name\":\"Astronomy and Computing\",\"volume\":\"49 \",\"pages\":\"Article 100883\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Astronomy and Computing\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2213133724000982\",\"RegionNum\":4,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ASTRONOMY & ASTROPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Astronomy and Computing","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213133724000982","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0
摘要
我们研究了机器学习(ML)方法对小尺度星系聚类建模的潜力,以约束星系晕占分布(HOD)参数。我们的分析表明,虽然许多 ML 算法报告了良好的统计拟合,但它们产生的似然等值线的均值和方差与真实的模型参数相比都有很大偏差。这凸显了在星系聚类的 ML 应用中仔细处理数据和选择算法的重要性,因为如果应用不当,即使是看似稳健的方法也会导致有偏差的结果。与传统的粗暴方法相比,如果ML工具的鲁棒性得到确立,那么它就能提供一种探索HOD参数空间的有前途的方法,而且能大大降低计算成本。利用我们基于 ANN 的管道,我们成功地重现了近期文献中的一些标准结果。适当限制 HOD 参数空间、转换训练数据以及谨慎选择 ML 算法对于实现无偏且稳健的预测至关重要。在所测试的方法中,如果适当限制 HOD 先验空间,人工神经网络(ANN)在预测聚类统计数据方面的表现优于随机森林(RF)和脊回归。我们使用投影两点相关函数(wp(rp))、相关函数的角倍率(ξℓ(r))以及暗能量光谱仪器模拟的红色发光星系的空隙概率函数(VPF)证明了这些发现。我们的研究结果表明,将 wp(rp) 和 VPF 结合起来可以改善参数约束,但将乘数ξ0、ξ2 和ξ4 加入 wp(rp) 并不能显著改善约束。
Constraining Galaxy-Halo connection using machine learning
We investigate the potential of machine learning (ML) methods to model small-scale galaxy clustering for constraining Halo Occupation Distribution (HOD) parameters. Our analysis reveals that while many ML algorithms report good statistical fits, they often yield likelihood contours that are significantly biased in both mean values and variances relative to the true model parameters. This highlights the importance of careful data processing and algorithm selection in ML applications for galaxy clustering, as even seemingly robust methods can lead to biased results if not applied correctly. ML tools offer a promising approach to exploring the HOD parameter space with significantly reduced computational costs compared to traditional brute-force methods if their robustness is established. Using our ANN-based pipeline, we successfully recreate some standard results from recent literature. Properly restricting the HOD parameter space, transforming the training data, and carefully selecting ML algorithms are essential for achieving unbiased and robust predictions. Among the methods tested, artificial neural networks (ANNs) outperform random forests (RF) and ridge regression in predicting clustering statistics, when the HOD prior space is appropriately restricted. We demonstrate these findings using the projected two-point correlation function (), angular multipoles of the correlation function (), and the void probability function (VPF) of Luminous Red Galaxies from Dark Energy Spectroscopic Instrument mocks. Our results show that while combining and VPF improves parameter constraints, adding the multipoles , , and to does not significantly improve the constraints.
Astronomy and ComputingASTRONOMY & ASTROPHYSICSCOMPUTER SCIENCE,-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
4.10
自引率
8.00%
发文量
67
期刊介绍:
Astronomy and Computing is a peer-reviewed journal that focuses on the broad area between astronomy, computer science and information technology. The journal aims to publish the work of scientists and (software) engineers in all aspects of astronomical computing, including the collection, analysis, reduction, visualisation, preservation and dissemination of data, and the development of astronomical software and simulations. The journal covers applications for academic computer science techniques to astronomy, as well as novel applications of information technologies within astronomy.