End-to-end simulation of particle physics events with flow matching and generator oversampling

IF 6.3 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology Pub Date : 2024-07-09 DOI:10.1088/2632-2153/ad563c

F Vaselli, F Cattafesta, P Asenov, A Rizzi

{"title":"End-to-end simulation of particle physics events with flow matching and generator oversampling","authors":"F Vaselli, F Cattafesta, P Asenov, A Rizzi","doi":"10.1088/2632-2153/ad563c","DOIUrl":null,"url":null,"abstract":"The simulation of high-energy physics collision events is a key element for data analysis at present and future particle accelerators. The comparison of simulation predictions to data allows looking for rare deviations that can be due to new phenomena not previously observed. We show that novel machine learning algorithms, specifically Normalizing Flows and Flow Matching, can be used to replicate accurate simulations from traditional approaches with several orders of magnitude of speed-up. The classical simulation chain starts from a physics process of interest, computes energy deposits of particles and electronics response, and finally employs the same reconstruction algorithms used for data. Eventually, the data are reduced to some high-level analysis format. Instead, we propose an end-to-end approach, simulating the final data format directly from physical generator inputs, skipping any intermediate steps. We use particle jets simulation as a benchmark for comparing both <italic toggle=\"yes\">discrete</italic> and <italic toggle=\"yes\">continuous</italic> Normalizing Flows models. The models are validated across a variety of metrics to identify the most accurate. We discuss the scaling of performance with the increase in training data, as well as the generalization power of these models on physical processes different from the training one. We investigate sampling multiple times from the same physical generator inputs, a procedure we name <italic toggle=\"yes\">oversampling</italic>, and we show that it can effectively reduce the statistical uncertainties of a dataset. This class of ML algorithms is found to be capable of learning the expected detector response independently of the physical input process. The speed and accuracy of the models, coupled with the stability of the training procedure, make them a compelling tool for the needs of current and future experiments.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"38 1","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad563c","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The simulation of high-energy physics collision events is a key element for data analysis at present and future particle accelerators. The comparison of simulation predictions to data allows looking for rare deviations that can be due to new phenomena not previously observed. We show that novel machine learning algorithms, specifically Normalizing Flows and Flow Matching, can be used to replicate accurate simulations from traditional approaches with several orders of magnitude of speed-up. The classical simulation chain starts from a physics process of interest, computes energy deposits of particles and electronics response, and finally employs the same reconstruction algorithms used for data. Eventually, the data are reduced to some high-level analysis format. Instead, we propose an end-to-end approach, simulating the final data format directly from physical generator inputs, skipping any intermediate steps. We use particle jets simulation as a benchmark for comparing both discrete and continuous Normalizing Flows models. The models are validated across a variety of metrics to identify the most accurate. We discuss the scaling of performance with the increase in training data, as well as the generalization power of these models on physical processes different from the training one. We investigate sampling multiple times from the same physical generator inputs, a procedure we name oversampling, and we show that it can effectively reduce the statistical uncertainties of a dataset. This class of ML algorithms is found to be capable of learning the expected detector response independently of the physical input process. The speed and accuracy of the models, coupled with the stability of the training procedure, make them a compelling tool for the needs of current and future experiments.

查看原文本刊更多论文

利用流量匹配和发生器超采样对粒子物理事件进行端到端模拟

高能物理碰撞事件的模拟是目前和未来粒子加速器数据分析的关键要素。将模拟预测与数据进行比较，可以发现罕见的偏差，而这些偏差可能是由于以前未观察到的新现象造成的。我们展示了新颖的机器学习算法，特别是 "归一化流量"（Normalizing Flows）和 "流量匹配"（Flow Matching）算法，可用于从传统方法中复制精确的模拟结果，并将速度提高几个数量级。经典模拟链从感兴趣的物理过程开始，计算粒子的能量沉积和电子响应，最后采用与数据相同的重构算法。最终，数据被还原为某种高级分析格式。相反，我们提出了一种端到端的方法，直接从物理发生器输入模拟最终数据格式，跳过任何中间步骤。我们将粒子喷流模拟作为比较离散和连续归一化流模型的基准。通过各种指标对模型进行验证，以确定最准确的模型。我们讨论了性能随着训练数据的增加而缩放的问题，以及这些模型对不同于训练数据的物理过程的泛化能力。我们研究了从相同的物理发生器输入中进行多次采样的方法，我们将这一过程命名为 "超采样"，结果表明它能有效降低数据集的统计不确定性。我们发现，这类 ML 算法能够独立于物理输入过程学习预期的探测器响应。模型的速度和准确性，加上训练过程的稳定性，使它们成为满足当前和未来实验需求的有力工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine Learning Science and Technology Computer Science-Artificial Intelligence

CiteScore

9.10

自引率

4.40%

发文量

审稿时长

5 weeks

期刊介绍： Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.