Towards cosmological inference on unlabeled out-of-distribution HI observational data

IF 1.8 4区物理与天体物理 Q3 ASTRONOMY & ASTROPHYSICS

Astrophysics and Space Science Pub Date : 2025-02-10 DOI:10.1007/s10509-025-04405-y

Sambatra Andrianomena, Sultan Hassan

{"title":"Towards cosmological inference on unlabeled out-of-distribution HI observational data","authors":"Sambatra Andrianomena, Sultan Hassan","doi":"10.1007/s10509-025-04405-y","DOIUrl":null,"url":null,"abstract":"<div>We present an approach that can be utilized in order to account for the covariate shift between two datasets of the same observable with different distributions. This helps improve the generalizability of a neural network model trained on in-distribution samples (IDs) when inferring cosmology at the field level on out-of-distribution samples (OODs) of unknown labels. We make use of HI maps from the two simulation suites in CAMELS, IllustrisTNG and SIMBA. We consider two different techniques, namely adversarial approach and optimal transport, to adapt a target network whose initial weights are those of a source network pre-trained on a labeled dataset. Results show that after adaptation, salient features that are extracted by source and target encoders are well aligned in the embedding space. This indicates that the target encoder has learned the representations of the target domain via the adversarial training and optimal transport. Furthermore, in all scenarios considered in our analyses, the target encoder, which does not have access to any labels (\\(\\Omega _{\\mathrm{m}}\\)) during adaptation phase, is able to retrieve the underlying \\(\\Omega _{\\mathrm{m}}\\) from out-of-distribution maps to a great accuracy of \\(R^{2}\\) score ≥ 0.9, comparable to the performance of the source encoder trained in a supervised learning setup. We further test the viability of the techniques when only a few out-of-distribution instances are available for training and find that the target encoder still reasonably recovers the matter density. Our approach is critical in extracting information from upcoming large scale surveys.</div>","PeriodicalId":8644,"journal":{"name":"Astrophysics and Space Science","volume":"370 2","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Astrophysics and Space Science","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.1007/s10509-025-04405-y","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}

引用次数: 0

Abstract

We present an approach that can be utilized in order to account for the covariate shift between two datasets of the same observable with different distributions. This helps improve the generalizability of a neural network model trained on in-distribution samples (IDs) when inferring cosmology at the field level on out-of-distribution samples (OODs) of unknown labels. We make use of HI maps from the two simulation suites in CAMELS, IllustrisTNG and SIMBA. We consider two different techniques, namely adversarial approach and optimal transport, to adapt a target network whose initial weights are those of a source network pre-trained on a labeled dataset. Results show that after adaptation, salient features that are extracted by source and target encoders are well aligned in the embedding space. This indicates that the target encoder has learned the representations of the target domain via the adversarial training and optimal transport. Furthermore, in all scenarios considered in our analyses, the target encoder, which does not have access to any labels (\(\Omega _{\mathrm{m}}\)) during adaptation phase, is able to retrieve the underlying \(\Omega _{\mathrm{m}}\) from out-of-distribution maps to a great accuracy of \(R^{2}\) score ≥ 0.9, comparable to the performance of the source encoder trained in a supervised learning setup. We further test the viability of the techniques when only a few out-of-distribution instances are available for training and find that the target encoder still reasonably recovers the matter density. Our approach is critical in extracting information from upcoming large scale surveys.

查看原文本刊更多论文

对未标记的非分布HI观测数据的宇宙学推断

我们提出了一种方法，可以用来解释具有不同分布的相同可观察值的两个数据集之间的协变量移位。这有助于提高在分布内样本（id）上训练的神经网络模型在未知标签的分布外样本（OODs）上推断宇宙学时的可泛化性。我们使用了骆驼的两个模拟套件，IllustrisTNG和SIMBA中的HI地图。我们考虑了两种不同的技术，即对抗方法和最优传输，以适应目标网络，其初始权重是在标记数据集上预训练的源网络的权重。结果表明，自适应后，源编码器和目标编码器提取的显著特征在嵌入空间中能够很好地对齐。这表明目标编码器通过对抗性训练和最优传输学习了目标域的表示。此外，在我们的分析中考虑的所有场景中，在适应阶段没有访问任何标签（\(\Omega _{\mathrm{m}}\)）的目标编码器能够从分布外映射中检索底层\(\Omega _{\mathrm{m}}\)，其准确度为\(R^{2}\)得分≥0.9，与在监督学习设置中训练的源编码器的性能相当。我们进一步测试了这些技术的可行性，当只有少数非分布实例可用于训练时，我们发现目标编码器仍然可以合理地恢复物质密度。我们的方法对于从即将到来的大规模调查中提取信息至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Astrophysics and Space Science 地学天文-天文与天体物理

CiteScore

3.40

自引率

5.30%

发文量

106

审稿时长

2-4 weeks

期刊介绍： Astrophysics and Space Science publishes original contributions and invited reviews covering the entire range of astronomy, astrophysics, astrophysical cosmology, planetary and space science and the astrophysical aspects of astrobiology. This includes both observational and theoretical research, the techniques of astronomical instrumentation and data analysis and astronomical space instrumentation. We particularly welcome papers in the general fields of high-energy astrophysics, astrophysical and astrochemical studies of the interstellar medium including star formation, planetary astrophysics, the formation and evolution of galaxies and the evolution of large scale structure in the Universe. Papers in mathematical physics or in general relativity which do not establish clear astrophysical applications will no longer be considered. The journal also publishes topically selected special issues in research fields of particular scientific interest. These consist of both invited reviews and original research papers. Conference proceedings will not be considered. All papers published in the journal are subject to thorough and strict peer-reviewing. Astrophysics and Space Science features short publication times after acceptance and colour printing free of charge.