{"title":"Converting sWeights to probabilities with density ratios","authors":"D.I. Glazier , R. Tyson","doi":"10.1016/j.cpc.2025.109890","DOIUrl":null,"url":null,"abstract":"<div><div>The use of machine learning approaches continues to have many benefits in experimental nuclear and particle physics. One common issue is generating training data which is sufficiently realistic to give reliable results. Here we advocate using real experimental data as the source of training data and demonstrate how one might subtract background contributions through the use of probabilistic weights which can be readily applied to training data. The <em>sPlot</em> formalism is a common tool used to isolate distributions from different sources. However, the negative <em>sWeights</em> produced by the <em>sPlot</em> technique can cause training problems and poor predictive power. This article demonstrates how density ratio estimation can be applied to convert <em>sWeights</em> to event probabilities, which we call <em>drWeights</em>. The <em>drWeights</em> can then be applied to produce the distributions of interest and are consistent with direct use of the <em>sWeights</em>. This article will also show how decision trees are particularly well suited to convert <em>sWeights</em>, with the benefit of fast prediction rates and adaptability to aspects of experimental data such as the data sample size and proportions of different event sources. We also show that a density ratio product approach in which the initial <em>drWeights</em> are reweighted by an additional converter gives substantially better results.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"318 ","pages":"Article 109890"},"PeriodicalIF":3.4000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465525003923","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The use of machine learning approaches continues to have many benefits in experimental nuclear and particle physics. One common issue is generating training data which is sufficiently realistic to give reliable results. Here we advocate using real experimental data as the source of training data and demonstrate how one might subtract background contributions through the use of probabilistic weights which can be readily applied to training data. The sPlot formalism is a common tool used to isolate distributions from different sources. However, the negative sWeights produced by the sPlot technique can cause training problems and poor predictive power. This article demonstrates how density ratio estimation can be applied to convert sWeights to event probabilities, which we call drWeights. The drWeights can then be applied to produce the distributions of interest and are consistent with direct use of the sWeights. This article will also show how decision trees are particularly well suited to convert sWeights, with the benefit of fast prediction rates and adaptability to aspects of experimental data such as the data sample size and proportions of different event sources. We also show that a density ratio product approach in which the initial drWeights are reweighted by an additional converter gives substantially better results.
期刊介绍:
The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper.
Computer Programs in Physics (CPiP)
These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged.
Computational Physics Papers (CP)
These are research papers in, but are not limited to, the following themes across computational physics and related disciplines.
mathematical and numerical methods and algorithms;
computational models including those associated with the design, control and analysis of experiments; and
algebraic computation.
Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.