Purvasha Chakravarti, Lucas Kania, Olaf Behnke, Mikael Kuusela, Larry Wasserman
{"title":"Robust semi-parametric signal detection in particle physics with classifiers decorrelated via optimal transport","authors":"Purvasha Chakravarti, Lucas Kania, Olaf Behnke, Mikael Kuusela, Larry Wasserman","doi":"arxiv-2409.06399","DOIUrl":null,"url":null,"abstract":"Searches of new signals in particle physics are usually done by training a\nsupervised classifier to separate a signal model from the known Standard Model\nphysics (also called the background model). However, even when the signal model\nis correct, systematic errors in the background model can influence supervised\nclassifiers and might adversely affect the signal detection procedure. To\ntackle this problem, one approach is to use the (possibly misspecified)\nclassifier only to perform a preliminary signal-enrichment step and then to\ncarry out a bump hunt on the signal-rich sample using only the real\nexperimental data. For this procedure to work, we need a classifier constrained\nto be decorrelated with one or more protected variables used for the signal\ndetection step. We do this by considering an optimal transport map of the\nclassifier output that makes it independent of the protected variable(s) for\nthe background. We then fit a semi-parametric mixture model to the distribution\nof the protected variable after making cuts on the transformed classifier to\ndetect the presence of a signal. We compare and contrast this decorrelation\nmethod with previous approaches, show that the decorrelation procedure is\nrobust to moderate background misspecification, and analyse the power of the\nsignal detection test as a function of the cut on the classifier.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Searches of new signals in particle physics are usually done by training a
supervised classifier to separate a signal model from the known Standard Model
physics (also called the background model). However, even when the signal model
is correct, systematic errors in the background model can influence supervised
classifiers and might adversely affect the signal detection procedure. To
tackle this problem, one approach is to use the (possibly misspecified)
classifier only to perform a preliminary signal-enrichment step and then to
carry out a bump hunt on the signal-rich sample using only the real
experimental data. For this procedure to work, we need a classifier constrained
to be decorrelated with one or more protected variables used for the signal
detection step. We do this by considering an optimal transport map of the
classifier output that makes it independent of the protected variable(s) for
the background. We then fit a semi-parametric mixture model to the distribution
of the protected variable after making cuts on the transformed classifier to
detect the presence of a signal. We compare and contrast this decorrelation
method with previous approaches, show that the decorrelation procedure is
robust to moderate background misspecification, and analyse the power of the
signal detection test as a function of the cut on the classifier.