Towards Automating Search and Classification of Protostellar Images

2021 Systems and Information Engineering Design Symposium (SIEDS) Pub Date : 2021-04-30 DOI:10.1109/SIEDS52267.2021.9483748

Pavan Kumar Bondalapati, Pengwei Hu, Shannon E Paylor, John Zhang

{"title":"Towards Automating Search and Classification of Protostellar Images","authors":"Pavan Kumar Bondalapati, Pengwei Hu, Shannon E Paylor, John Zhang","doi":"10.1109/SIEDS52267.2021.9483748","DOIUrl":null,"url":null,"abstract":"Research on the origins of planets and life centers around protoplanetary disks and protostars, for which the Atacama Large Millimeter/sub-millimeter Array (ALMA) has been revolutionary due to its ability to capture high-resolution images with exceptional sensitivity. Astronomers study these birthplaces of planets and their properties, which determine the properties of any eventual planets. The ALMA science archive contains over a petabyte of astronomical data which has been collected by the ALMA telescope over the last decade. While the archive data is publicly available, manually searching through many thousands of unlabelled images and ascertaining the type and physical properties of celestial objects is immensely labor-intensive. For these reasons, an exhaustive manual search of the archive is unlikely to be comprehensive and creates the potential for astronomers to miss objects that were not the primary target of the telescope observational program. We develop a Python package to automate the noise filtration process, identify astronomical objects within a single image, and fit bivariate Gaussians to each detection. We apply an unsupervised learning algorithm to identify many apparently different protostellar disk images in a curated ALMA data set. Using this model and the residuals from a bivariate Gaussian fit, we can flag images of an unusual nature (e.g. spiral, ring, or other structure that does not adhere to a bivariate Gaussian shape) for manual review by astronomers, allowing them to examine a small subset of interesting images without sifting through the entire archive. Our open-source package is intended to assist astronomers in making new scientific discoveries by eliminating a labor-intensive bottleneck in their research.","PeriodicalId":426747,"journal":{"name":"2021 Systems and Information Engineering Design Symposium (SIEDS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS52267.2021.9483748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Research on the origins of planets and life centers around protoplanetary disks and protostars, for which the Atacama Large Millimeter/sub-millimeter Array (ALMA) has been revolutionary due to its ability to capture high-resolution images with exceptional sensitivity. Astronomers study these birthplaces of planets and their properties, which determine the properties of any eventual planets. The ALMA science archive contains over a petabyte of astronomical data which has been collected by the ALMA telescope over the last decade. While the archive data is publicly available, manually searching through many thousands of unlabelled images and ascertaining the type and physical properties of celestial objects is immensely labor-intensive. For these reasons, an exhaustive manual search of the archive is unlikely to be comprehensive and creates the potential for astronomers to miss objects that were not the primary target of the telescope observational program. We develop a Python package to automate the noise filtration process, identify astronomical objects within a single image, and fit bivariate Gaussians to each detection. We apply an unsupervised learning algorithm to identify many apparently different protostellar disk images in a curated ALMA data set. Using this model and the residuals from a bivariate Gaussian fit, we can flag images of an unusual nature (e.g. spiral, ring, or other structure that does not adhere to a bivariate Gaussian shape) for manual review by astronomers, allowing them to examine a small subset of interesting images without sifting through the entire archive. Our open-source package is intended to assist astronomers in making new scientific discoveries by eliminating a labor-intensive bottleneck in their research.

查看原文本刊更多论文

原恒星图像的自动搜索与分类

对行星和生命起源的研究以原行星盘和原恒星为中心，阿塔卡马大型毫米波/亚毫米波阵列(ALMA)由于能够以极高的灵敏度捕捉高分辨率图像而具有革命性意义。天文学家研究这些行星的诞生地及其性质，这些性质决定了任何最终行星的性质。ALMA科学档案包含了超过1拍字节的天文数据，这些数据是ALMA望远镜在过去十年中收集的。虽然存档数据是公开的，但手动搜索成千上万未标记的图像并确定天体的类型和物理特性是非常费力的。由于这些原因，对这些档案进行详尽的人工搜索不太可能是全面的，而且有可能使天文学家错过望远镜观测计划的主要目标之外的物体。我们开发了一个Python包来自动化噪声过滤过程，识别单个图像中的天文物体，并对每个检测拟合二元高斯函数。我们应用一种无监督学习算法来识别许多明显不同的原恒星盘图像。使用该模型和二元高斯拟合的残差，我们可以标记不寻常性质的图像(例如螺旋，环状或其他不符合二元高斯形状的结构)，供天文学家手动审查，使他们能够检查一小部分有趣的图像，而无需筛选整个档案。我们的开源包旨在通过消除研究中劳动密集型的瓶颈，帮助天文学家做出新的科学发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Systems and Information Engineering Design Symposium (SIEDS)

自引率

0.00%

发文量