{"title":"Anomaly-aware summary statistic from data batches","authors":"G. Grosso","doi":"10.1007/JHEP12(2024)093","DOIUrl":null,"url":null,"abstract":"<p>Signal-agnostic data exploration based on machine learning could unveil very subtle statistical deviations of collider data from the expected Standard Model of particle physics. The beneficial impact of a large training sample on machine learning solutions motivates the exploration of increasingly large and inclusive samples of acquired data with resource efficient computational methods. In this work we consider the New Physics Learning Machine (NPLM), a multivariate goodness-of-fit test built on the Neyman-Pearson maximum-likelihood-ratio construction, and we address the problem of testing large size samples under computational and storage resource constraints. We propose to perform parallel NPLM routines over batches of the data, and to combine them by locally aggregating over the data-to-reference density ratios learnt by each batch. The resulting data hypothesis defining the likelihood-ratio test is thus shared over the batches, and complies with the assumption that the expected rate of new physical processes is time invariant. We show that this method outperforms the simple sum of the independent tests run over the batches, and can recover, or even surpass, the sensitivity of the single test run over the full data. Beside the significant advantage for the offline application of NPLM to large size samples, the proposed approach offers new prospects toward the use of NPLM to construct anomaly-aware summary statistics in quasi-online data streaming scenarios.</p>","PeriodicalId":635,"journal":{"name":"Journal of High Energy Physics","volume":"2024 12","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/JHEP12(2024)093.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of High Energy Physics","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.1007/JHEP12(2024)093","RegionNum":1,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Physics and Astronomy","Score":null,"Total":0}
引用次数: 0
Abstract
Signal-agnostic data exploration based on machine learning could unveil very subtle statistical deviations of collider data from the expected Standard Model of particle physics. The beneficial impact of a large training sample on machine learning solutions motivates the exploration of increasingly large and inclusive samples of acquired data with resource efficient computational methods. In this work we consider the New Physics Learning Machine (NPLM), a multivariate goodness-of-fit test built on the Neyman-Pearson maximum-likelihood-ratio construction, and we address the problem of testing large size samples under computational and storage resource constraints. We propose to perform parallel NPLM routines over batches of the data, and to combine them by locally aggregating over the data-to-reference density ratios learnt by each batch. The resulting data hypothesis defining the likelihood-ratio test is thus shared over the batches, and complies with the assumption that the expected rate of new physical processes is time invariant. We show that this method outperforms the simple sum of the independent tests run over the batches, and can recover, or even surpass, the sensitivity of the single test run over the full data. Beside the significant advantage for the offline application of NPLM to large size samples, the proposed approach offers new prospects toward the use of NPLM to construct anomaly-aware summary statistics in quasi-online data streaming scenarios.
期刊介绍:
The aim of the Journal of High Energy Physics (JHEP) is to ensure fast and efficient online publication tools to the scientific community, while keeping that community in charge of every aspect of the peer-review and publication process in order to ensure the highest quality standards in the journal.
Consequently, the Advisory and Editorial Boards, composed of distinguished, active scientists in the field, jointly establish with the Scientific Director the journal''s scientific policy and ensure the scientific quality of accepted articles.
JHEP presently encompasses the following areas of theoretical and experimental physics:
Collider Physics
Underground and Large Array Physics
Quantum Field Theory
Gauge Field Theories
Symmetries
String and Brane Theory
General Relativity and Gravitation
Supersymmetry
Mathematical Methods of Physics
Mostly Solvable Models
Astroparticles
Statistical Field Theories
Mostly Weak Interactions
Mostly Strong Interactions
Quantum Field Theory (phenomenology)
Strings and Branes
Phenomenological Aspects of Supersymmetry
Mostly Strong Interactions (phenomenology).