Image Data Augmentation for the TAIGA-IACT Experiment with Conditional Generative Adversarial Networks

IF 0.4 4区 物理与天体物理 Q4 PHYSICS, MULTIDISCIPLINARY
Yu. Yu. Dubenskaya, A. P. Kryukov, E. O. Gres, S. P. Polyakov, E. B. Postnikov, P. A. Volchugov, A. A. Vlaskina, D. P. Zhurov
{"title":"Image Data Augmentation for the TAIGA-IACT Experiment with Conditional Generative Adversarial Networks","authors":"Yu. Yu. Dubenskaya,&nbsp;A. P. Kryukov,&nbsp;E. O. Gres,&nbsp;S. P. Polyakov,&nbsp;E. B. Postnikov,&nbsp;P. A. Volchugov,&nbsp;A. A. Vlaskina,&nbsp;D. P. Zhurov","doi":"10.3103/S0027134924702059","DOIUrl":null,"url":null,"abstract":"<p>Modern Imaging Atmospheric Cherenkov Telescopes (IACTs) generate a huge amount of data that must be classified automatically, ideally in real time. Currently, machine learning-based solutions are increasingly being used to solve classification problems. However, these classifiers require proper training data sets to work correctly. The problem with training neural networks on real IACT data is that these data need to be prelabeled, whereas such labeling is difficult and its results are estimates. In addition, the distribution of incoming events is highly imbalanced. Firstly, there is an imbalance in the types of events, since the number of detected gamma quanta is significantly less than the number of protons. Secondly, the energy distribution of particles of the same type is also imbalanced, since high-energy particles are extremely rare. This imbalance results in poorly trained classifiers that, once trained, do not handle rare events correctly. Using only conventional Monte Carlo event simulation methods to solve this problem is possible, but extremely resource-intensive and time-consuming. To address this issue, we propose to perform data augmentation with artificially generated events of the desired type and energy using conditional generative adversarial networks (cGANs), distinguishing classes by energy values. In the paper, we describe a simple algorithm for generating balanced data sets using cGANs. Thus, the proposed neural network model produces both imbalanced data sets for physical analysis as well as balanced data sets suitable for training other neural networks.</p>","PeriodicalId":711,"journal":{"name":"Moscow University Physics Bulletin","volume":"79 2 supplement","pages":"S598 - S607"},"PeriodicalIF":0.4000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Moscow University Physics Bulletin","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.3103/S0027134924702059","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Modern Imaging Atmospheric Cherenkov Telescopes (IACTs) generate a huge amount of data that must be classified automatically, ideally in real time. Currently, machine learning-based solutions are increasingly being used to solve classification problems. However, these classifiers require proper training data sets to work correctly. The problem with training neural networks on real IACT data is that these data need to be prelabeled, whereas such labeling is difficult and its results are estimates. In addition, the distribution of incoming events is highly imbalanced. Firstly, there is an imbalance in the types of events, since the number of detected gamma quanta is significantly less than the number of protons. Secondly, the energy distribution of particles of the same type is also imbalanced, since high-energy particles are extremely rare. This imbalance results in poorly trained classifiers that, once trained, do not handle rare events correctly. Using only conventional Monte Carlo event simulation methods to solve this problem is possible, but extremely resource-intensive and time-consuming. To address this issue, we propose to perform data augmentation with artificially generated events of the desired type and energy using conditional generative adversarial networks (cGANs), distinguishing classes by energy values. In the paper, we describe a simple algorithm for generating balanced data sets using cGANs. Thus, the proposed neural network model produces both imbalanced data sets for physical analysis as well as balanced data sets suitable for training other neural networks.

Abstract Image

现代成像大气切伦科夫望远镜(IACTs)会产生大量数据,这些数据必须自动分类,最好是实时分类。目前,基于机器学习的解决方案越来越多地被用于解决分类问题。然而,这些分类器需要适当的训练数据集才能正确工作。在真实的 IACT 数据上训练神经网络的问题在于,这些数据需要预先标记,而这种标记是困难的,其结果也是估计的。此外,输入事件的分布极不平衡。首先,事件类型不平衡,因为检测到的伽马量子数量明显少于质子数量。其次,同一类型粒子的能量分布也不平衡,因为高能粒子极为罕见。这种不平衡导致训练有素的分类器效果不佳,一旦训练有素,就不能正确处理罕见事件。仅使用传统的蒙特卡罗事件模拟方法来解决这一问题是可行的,但却极其耗费资源和时间。为了解决这个问题,我们建议使用条件生成对抗网络 (cGAN) 人工生成所需类型和能量的事件来增强数据,并通过能量值来区分类别。在本文中,我们介绍了一种使用 cGAN 生成平衡数据集的简单算法。因此,建议的神经网络模型既能生成用于物理分析的不平衡数据集,也能生成适合训练其他神经网络的平衡数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Moscow University Physics Bulletin
Moscow University Physics Bulletin PHYSICS, MULTIDISCIPLINARY-
CiteScore
0.70
自引率
0.00%
发文量
129
审稿时长
6-12 weeks
期刊介绍: Moscow University Physics Bulletin publishes original papers (reviews, articles, and brief communications) in the following fields of experimental and theoretical physics: theoretical and mathematical physics; physics of nuclei and elementary particles; radiophysics, electronics, acoustics; optics and spectroscopy; laser physics; condensed matter physics; chemical physics, physical kinetics, and plasma physics; biophysics and medical physics; astronomy, astrophysics, and cosmology; physics of the Earth’s, atmosphere, and hydrosphere.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信