Sreevarsha Sreejith , Maria V. Pruzhinskaya , Alina A. Volnova , Vadim V. Krushinsky , Konstantin L. Malanchev , Emille E.O. Ishida , Anastasia D. Lavrukhina , Timofey A. Semenikhin , Emmanuel Gangler , Matwey V. Kornilov , Vladimir S. Korolev
{"title":"天文学中机器学习应用的人工制品数据集","authors":"Sreevarsha Sreejith , Maria V. Pruzhinskaya , Alina A. Volnova , Vadim V. Krushinsky , Konstantin L. Malanchev , Emille E.O. Ishida , Anastasia D. Lavrukhina , Timofey A. Semenikhin , Emmanuel Gangler , Matwey V. Kornilov , Vladimir S. Korolev","doi":"10.1016/j.newast.2025.102466","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate photometry in astronomical surveys is challenged by image artefacts, which affect measurements and degrade data quality. Due to the large amount of available data, this task is increasingly handled using machine learning algorithms, which often require a labelled training set to learn data patterns. We present an expert-labelled dataset of 1127 artefacts with 1213 labels from 26 fields in ZTF DR3, along with a complementary set of nominal objects. The artefact dataset was compiled using the active anomaly detection algorithm <span>PineForest</span>, developed by the SNAD team. These datasets can serve as valuable resources for real-bogus classification, catalogue cleaning, anomaly detection, and educational purposes. Both artefacts and nominal images are provided in FITS format in two sizes (28 × 28 and 63 × 63 pixels). The datasets are publicly available for further scientific applications.</div></div>","PeriodicalId":54727,"journal":{"name":"New Astronomy","volume":"122 ","pages":"Article 102466"},"PeriodicalIF":2.1000,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dataset of artefacts for machine learning applications in astronomy\",\"authors\":\"Sreevarsha Sreejith , Maria V. Pruzhinskaya , Alina A. Volnova , Vadim V. Krushinsky , Konstantin L. Malanchev , Emille E.O. Ishida , Anastasia D. Lavrukhina , Timofey A. Semenikhin , Emmanuel Gangler , Matwey V. Kornilov , Vladimir S. Korolev\",\"doi\":\"10.1016/j.newast.2025.102466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate photometry in astronomical surveys is challenged by image artefacts, which affect measurements and degrade data quality. Due to the large amount of available data, this task is increasingly handled using machine learning algorithms, which often require a labelled training set to learn data patterns. We present an expert-labelled dataset of 1127 artefacts with 1213 labels from 26 fields in ZTF DR3, along with a complementary set of nominal objects. The artefact dataset was compiled using the active anomaly detection algorithm <span>PineForest</span>, developed by the SNAD team. These datasets can serve as valuable resources for real-bogus classification, catalogue cleaning, anomaly detection, and educational purposes. Both artefacts and nominal images are provided in FITS format in two sizes (28 × 28 and 63 × 63 pixels). The datasets are publicly available for further scientific applications.</div></div>\",\"PeriodicalId\":54727,\"journal\":{\"name\":\"New Astronomy\",\"volume\":\"122 \",\"pages\":\"Article 102466\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"New Astronomy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1384107625001162\",\"RegionNum\":4,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ASTRONOMY & ASTROPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Astronomy","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1384107625001162","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
Dataset of artefacts for machine learning applications in astronomy
Accurate photometry in astronomical surveys is challenged by image artefacts, which affect measurements and degrade data quality. Due to the large amount of available data, this task is increasingly handled using machine learning algorithms, which often require a labelled training set to learn data patterns. We present an expert-labelled dataset of 1127 artefacts with 1213 labels from 26 fields in ZTF DR3, along with a complementary set of nominal objects. The artefact dataset was compiled using the active anomaly detection algorithm PineForest, developed by the SNAD team. These datasets can serve as valuable resources for real-bogus classification, catalogue cleaning, anomaly detection, and educational purposes. Both artefacts and nominal images are provided in FITS format in two sizes (28 × 28 and 63 × 63 pixels). The datasets are publicly available for further scientific applications.
期刊介绍:
New Astronomy publishes articles in all fields of astronomy and astrophysics, with a particular focus on computational astronomy: mathematical and astronomy techniques and methodology, simulations, modelling and numerical results and computational techniques in instrumentation.
New Astronomy includes full length research articles and review articles. The journal covers solar, stellar, galactic and extragalactic astronomy and astrophysics. It reports on original research in all wavelength bands, ranging from radio to gamma-ray.