Machine-Learning Application for a Likelihood Ratio Estimation Problem at LHC

Auricchio, Silvia; Cirotto, Francesco; Giannini, Antonio

doi:10.3390/app13010086

Open AccessReview

Machine-Learning Application for a Likelihood Ratio Estimation Problem at LHC

by

Silvia Auricchio

^1,2,*,†

,

Francesco Cirotto

^1,2,*,†

and

Antonio Giannini

^3,*,†

¹

Dipartimento di Fisica “Ettore Pancini”, Università Degli Studi di Napoli Federico II, Complesso Univ. Monte S. Angelo, Via Cinthia, 21, Edificio 6, 80126 Napoli, Italy

²

INFN-Sezione di Napoli, Complesso Univ. Monte S. Angelo, Via Cinthia, Edificio 6, 80126 Napoli, Italy

³

Center of Innovation and Cooperation for Particle and Interaction (CICPI), University of Science and Technology of China (USTC), No. 96, JinZhai Road, Baohe District, Hefei 230026, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2023, 13(1), 86; https://doi.org/10.3390/app13010086

Submission received: 14 November 2022 / Revised: 13 December 2022 / Accepted: 16 December 2022 / Published: 21 December 2022

(This article belongs to the Special Issue Machine Learning Applications in Atlas and CMS Experiments at LHC)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

High-energy physics is now entering a new era. Current particle experiments, such as ATLAS at the Large Hadron Collider at CERN, offer the possibility of discovering new and interesting physics’ phenomena beyond the Standard Model, the theoretical framework which describes fundamental interactions between particles. In this paper a machine-learning algorithm to deal with the estimate of the expected background processes distribution, a very demanding task for particle physics analyses, is described. A new technique is exploited, inspired by the well-known likelihood ratio estimation problem, called direct importance estimation in statistics. First, its theoretical formulation is discussed, then its performances in two ATLAS analyses are described.

Keywords:

experimental particle physics; machine learning; background estimation; likelihood ratio estimation; LHC; ATLAS experiment

1. Introduction

The Large Hadron Collider (LHC) [1] is a proton–proton particle accelerator placed at CERN whose physics program, after the successful Higgs boson discovery in 2012 [2], aims to search for new phenomena in physics. The LHC long-time schedule involves the collection of a considerable amount of data both to improve the precision of measurements and to find any extremely rare phenomena predicted by Beyond Standard Model theories. This goal can be achieved by increasing the number of proton collisions within the accelerator ring; however, to be able to collect data at a higher frequency, it was necessary to implement a complex upgrade program, on the hardware, electronics and software. The capacity to process and analyse the large amount of data is essential. In this context, machine learning (ML) finds fertile ground for application, as it offers solutions to the need for rapid analysis of inner structures and hidden pieces of information in large datasets.

In this paper, we focus on one of the many different applications of an ML algorithm for particle physics analysis, involving the adoption of a deep neural network to address one of the main challenges, that of background process estimation. Techniques applied have mostly involved Monte Carlo simulations, but, if the complexity of the processes is too high, a data-driven approach is preferred.

The application examples considered are drawn from two publications of the ATLAS Collaboration [3], one of the four major experiments designed to analyze data from particle collisions, involving a broad physics program ranging from precision measurements to direct searches for new particles and new interactions.

In Section 2, the physical problem is introduced and the theoretical solution adopted is described. Two applications to specific analyses are then discussed in Section 3.

2. The Reweighting Problem

In particle physics analyses, we are typically interested in studying a particular process called signal. To be sensitive to seeing it, selection of specific data is usually performed, defining regions that contain events of interest called signal regions (SR). Non-signal events, referred to as background, with the same experimental signature, can pass these selections; this is problematic and requires them to be distinguished from the signal. The estimation of background events is typically performed by definition of additional regions, called control regions (CRs), which are complementary to the SR where no signal events are expected. By studying the data distribution in these CRs, the expected distribution of background events in SR can be inferred; however, since these two are not expected to have identical kinematic characteristics, a reweighting function has to be derived.

More technically, if

p_{0}

and

p_{1}

are the probability distribution functions (pdfs) of a real variable X describing two different kinematic regions (labelled 0 and 1), the reweighting function between them corresponds to their likelihood ratio (or importance)

w (X)

:

p_{1} (X) = w (X) \cdot p_{0} (X) \to w (X) = \frac{p_{1} (X)}{p_{0} (X)} .

(1)

In the real world, we do not know the pdfs and the most we can do is to calculate an approximation of the data sampled from the two different kinematic regions 0 and 1. Unfortunately, even this problem is often not simple and straightforward to solve; however, there is a means of addressing this, which consists of directly estimating the likelihood ratio from data without determining the single pdfs. In statistics, this is referred to as direct importance estimation and can represent a positive solution if only the pdf ratio is needed.

In this paper, background estimation involves obtaining the likelihood ratio between SR and CRs from (1). This has previously been performed using "classical" approaches, involving attempting to approximate the pdf of each region with the most discriminant variable distributions; however, obviouszly, this technique cannot take into account the complex multi-dimensionality of data. The use of a deep neural network, which is known to be able to perform well in complex learning and the identification of hidden correlations in multi-dimensional datasets, appeared to offer a natural solution to this problem. Its use is described in the following Section.

2.1. A Least-Squares Approach to Direct Importance Estimation

Let us first formulate the direct importance estimation problem as a least-squares-function-fitting problem, as described in [4], that can be readily generalized.

From this point, the estimate of quantities of data will be indicated with the symbol

\hat{}

.

Let us consider two pdfs,

p_{0}

and

p_{1}

, and let us suppose that we have

N_{0}

independent and identically distributed (i.i.d.) measurements extracted from

p_{0}

and

N_{1}

i.i.d. measurements extracted from

p_{1}

of B variables:

{X_{i, j}^{0}}_{i = 1, j = 1}^{N_{0}, B}

and

{X_{l, j}^{1}}_{l = 1, j = 1}^{N_{1}, B}

.

We neither know the analytical form of

p_{0} (\underset{̲}{X})

and

p_{1} (\underset{̲}{X})

, nor their empirical realization, but we are interested in the ratio between them and can estimate it from the two samples of the extracted measurements.

Let us call

w (\underset{̲}{X}) = \frac{p_{1} (\underset{̲}{X})}{p_{0} (\underset{̲}{X})}

the analytical ratio and

\hat{w} (\underset{̲}{X}) = \sum_{f} α_{f} \cdot ϕ_{f} (\underset{̲}{X})

its estimate from data, which can be written as a linear combination of arbitrarily chosen functions of data

ϕ_{f} (\underset{̲}{X})

that must satisfy the same properties of

w (\underset{̲}{X})

:

$w (\underset{̲}{X}) = \frac{p_{1} (\underset{̲}{X})}{p_{0} (\underset{̲}{X})} > 0$
$w (\underset{̲}{X}) \to + \infty$ if $\underset{̲}{X} \in p_{1} (\underset{̲}{X})$
$w (\underset{̲}{X}) \to 0$ if $\underset{̲}{X} \in p_{0} (\underset{̲}{X})$

The coefficients

\underset{̲}{α}

are constants that are chosen in such a way to obtain the best estimate of

w (\underset{̲}{X})

by minimizing a loss function which depends on them.

Now that we have fixed the notation, we can formulate our minimization problem by defining a loss function as the sum of the quadratic differences between the true value

w (\underset{̲}{X})

and its estimates on data

\hat{w} (\underset{̲}{X})

.

Thus, the loss function

J_{0}

can be written with the following expression:

\begin{matrix} J_{0} (\underset{̲}{α}) = E_{0} [{(\hat{w} (\underset{̲}{X}) - w (\underset{̲}{X}))}^{2}] = \int d \underset{̲}{X} p_{0} (\underset{̲}{X}) {(\hat{w} (\underset{̲}{X}) - w (\underset{̲}{X}))}^{2} = \\ \int d \underset{̲}{X} p_{0} (\underset{̲}{X}) \hat{w} {(\underset{̲}{X})}^{2} + \int d \underset{̲}{X} p_{0} (\underset{̲}{X}) w {(\underset{̲}{X})}^{2} - 2 \int d \underset{̲}{X} p_{0} (\underset{̲}{X}) \hat{w} (\underset{̲}{X}) \cdot w (\underset{̲}{X}) = \\ = \int d \underset{̲}{X} p_{0} (\underset{̲}{X}) \hat{w} {(\underset{̲}{X})}^{2} - 2 \cdot \int d \underset{̲}{X} p_{1} (\underset{̲}{X}) \hat{w} (\underset{̲}{X}) \end{matrix}

(2)

Since, in the centre line of (2), the second term does not depend on data, it can be ignored and the expression can be written as follows:

\begin{matrix} J_{0} (\underset{̲}{α}) = E_{0} [\hat{w} {(\underset{̲}{X})}^{2} - 2 \cdot w (\underset{̲}{X}) \cdot \hat{w} (\underset{̲}{X})] = E_{0} [\hat{w} {(\underset{̲}{X})}^{2}] - 2 \cdot E_{1} [\hat{w} (\underset{̲}{X})] \end{matrix}

(3)

This loss function has to be minimized with respect to the constants

\underset{̲}{α}

on data to obtain the best estimate of

w (\underset{̲}{X})

.

If

\hat{w} (\underset{̲}{X})

is intended as the output of a neural network (NN), and the constants

\underset{̲}{α}

as its weights

\underset{̲}{θ}

, the loss function

J_{0}

can effectively be taken to be a loss function to be minimized by an NN.

Since

\hat{w} (\underset{̲}{X})

is positively defined, a constraint term should be added to the loss function, or

\hat{w} (\underset{̲}{X})

should be modified with a monotone function and the loss consequently adapted. For example, the choice of log-likelihood ratio estimation offers a straightforward solution to the problem of a non-negative likelihood ratio. With the latter choice, the variable to be estimated becomes ln

(w (\underset{̲}{X}))

and, to obtain the desired quantity, the inverse transformation has to be applied to the network output, as described in [5].

A more generic way to write (3) as a function of a monotone function of

\hat{w} (\underset{̲}{X})

,

u (\underset{̲}{X}, \underset{̲}{θ})

, is the following:

J_{0} (\underset{̲}{θ}) = E_{0} [ϕ (u (\underset{̲}{X}, \underset{̲}{θ})) + w (\underset{̲}{X}) \cdot ψ (u (\underset{̲}{X}, \underset{̲}{θ}))] = E_{0} [ϕ (u (\underset{̲}{X}, \underset{̲}{θ}))] + E_{1} [ψ (u (\underset{̲}{X}, \underset{̲}{θ}))]

(4)

Here,

ϕ

and

ψ

are two real functions of real variables that should satisfy some mathematical criteria if we want

u (\underset{̲}{X}, \underset{̲}{θ})

to minimize the loss function.

It can be demonstrated that there are no single solutions giving a closed form for

ϕ

and

ψ

. A possible choice for

u = ln (w)

can be

ϕ (u) = \sqrt{e^{u}}

and

ψ (u) = \frac{1}{\sqrt{e^{u}}}

.

With these substitutions, the loss function to minimize becomes:

J_{0} (\underset{̲}{θ}) = E_{0} [\sqrt{e^{u (\underset{̲}{X}, \underset{̲}{θ})}}] + E_{1} [\frac{1}{\sqrt{e^{u (\underset{̲}{X}, \underset{̲}{θ})}}}]

(5)

3. Particle Physics Analyses Application

This methodology for data-driven likelihood ratio estimation has been adopted by two ATLAS analyses: one is a search for heavy resonances Y decaying into a Standard Model Higgs boson H and a new boson X in the fully-hadronic final state [6], the other is a search for resonant Higgs boson pair production in the

b \bar{b} b \bar{b}

final state [7]. The purpose of both analyses is the estimation of the background distribution for the final fit discriminant variable, performed in SR.

Since the hadronic final state is common to both the analyses, it is useful here to introduce the concept of jets, which are referred to frequently in the following sections. After high-energy collisions in a particle collider, free quarks or gluons are created. Due to a particular property of QCD (the colour confinement), these cannot exist individually, rather, they interact with each other to form composite particles, called hadrons. In the hadronization process, all the particles created leave traces within the detector, resulting in conic agglomerates of tracks and energies deposits, called hadronic jets. Once data are analyzed offline, these hadronic jets are reconstructed with a dedicated algorithm [8] with different cone opening angles, labeled by the parameter R(

R = \sqrt{{(ϕ_{1} - ϕ_{2})}^{2} + {(η_{1} - η_{2})}^{2}}

, which is a measure of the angular distance between two vectors in cylindrical coordinates. The formula

ϕ

indicates the angle in the plane x-y in the ATLAS coordinate system, transverse to the beam axis z;

η

is the pseudo-rapidity, defined as

η = - ln tan (θ / 2)

, and correlated to the angle

θ

between the particle trajectory and the z axis. In the definition of a jet, it refers to the angular opening of the cone enclosing all the particles associated with the jet.). Jets with R=0.4 are called small-R jets; large-R jets are, alternatively, those with R = 1.0.

This section is divided into two parts, with the two applications described separately.

3.1. YXH Fully-Hadronic Analysis

The first analysis described is a search for a Beyond Standard Model (BSM) TeV-scale narrow-width boson Y decaying to a Standard Model Higgs (H) and a new O(100s) GeV-scale boson X [6]. The H is required to decay to

b \bar{b}

, its largest branching ratio decay; explicit requests are not made for X, except that its decay products are jets. The full Large Hadron Collider Run 2

\sqrt{s} = 13

TeV

p p

dataset is used, collected by the ATLAS detector from 2015 to 2018 and corresponding to an integrated luminosity of 139 fb

^{- 1}

. A Feynman diagram for this process is shown in Figure 1.

Because H and X masses are of the order of thousands of GeV, they are created with a significant momentum component in the transverse plane with respect to the beam axis, labeled

p_{T}

, that increases with the mass of the resonance. This collimates each boson’s decay products, which are then reconstructed together as a single jet with a large radius, from now on, identified by J. Therefore, the signature of such a signal is a resonant structure in the dijet invariant mass spectrum, where events can be selected based on a final state of two large-radius jets. A supplementary resolved selection is used to reconstruct the X mass with small-R jets (in this paper labeled with j) in the case of insufficient boost from the Y.

The primary background comprises SM multi-jet processes, where jets are produced from quantum chromodynamic [9] (QCD) interactions, producing a smoothly falling invariant mass distribution on top of which a signal bump can be isolated.

This search is motivated by several key extensions to the Standard Model, which predict heavy diboson resonances, one of the simplest being a simplified model based on spin-1 heavy vector triplets (HVT) [10], which reproduces a large class of BSM models.

The final fit is performed on the reconstructed invariant mass of the Y in overlapping windows of the X candidate mass to further enrich the signal-to-background ratio. The results are presented as limits on the cross-section times branching ratio of the generic HVT process.

SRs are constructed by selection based on the different properties of the H and X jets. An ambiguity resolution, required to determine which of the two J in the event is more likely to be the Higgs boson, is solved using a neural-net-based classifier, which separates bosons decaying to

b \bar{b}

from top quark and QCD jets [11]. The outputs of the NN are three classification scores corresponding to the likelihood of the jet originating from a Higgs boson (

p_{H i g g s}

), top quark (

p_{T o p}

), or multijet process (

p_{m u l t i j e t}

), which are subsequently combined into the jet-level discriminant

D_{X_{b b}}

, as shown in Equation (6).

D_{X_{b b}} = ln \frac{p_{H i g g s}}{0.25 \cdot p_{T o p} + (1 - 0.25) \cdot p_{m u l t i j e t}}

(6)

The jet with the largest value of

D_{X_{b b}}

is labeled as the Higgs candidate (

J_{H}

), and the other J is, by default, the X candidate (

J_{X}

), therefore, determining which jet is subject to further H and X tagging.

A novel anomaly detection signal region is implemented based on a jet-level score for signal model-independent tagging of the boosted X [12], representing the first application of fully unsupervised machine learning to an ATLAS analysis. The primary SR defined through a selection on the jet-level anomaly score (AS) of the X candidate is referred to as the anomaly SR. The remaining two SRs target the benchmark

X \to q \bar{q}

decay and are, thus, referred to as two-prong SRs, which differ based on reconstruction of the X as either a single large-R jet (merged SR) or two small-R jets (resolved SR).

For all signal regions, a cut is applied to the Higgs boson candidate

D_{X_{b b}}

, along with a mass window requirement of

75 GeV < m_{H} < 145 GeV

. The further classification of events, where the Higgs boson mass candidate passes the

D_{X_{b b}}

selection and has mass between 145 and 200 GeV defines the high-side band HSB1; HSB0 is in the same mass window, with the cut on

D_{X_{b b}}

reverted. Validation is performed in the low-side band LSB, where the reconstructed Higgs boson mass is required to be between 65 and 75 GeV. LSB0 and LSB1 are similarly defined as having a Higgs boson candidate that fails or passes the

D_{X_{b b}}

tagging criterion, respectively. CR0 is defined as the set of events in which the

J_{H}

is in the SR mass window but fails the

D_{X_{b b}}

tagging. A scheme for the selection flow and the analysis region used for the background estimation is shown in Figure 2.

Background Estimation

The overwhelming background to the

Y \to X H

signal comprises high-

p_{T}

multi-jet events. Such processes are known to produce mismodelings in Monte Carlo simulations, making simulation-based background estimation very challenging. Therefore, this analysis relies on a fully data-driven estimation of the background in the SR. The shape of the expected

m_{Y}

distribution in the SR is obtained from data in the CR0 and weights are derived that can be applied to HSB0 to reproduce the shape found in HSB1.

The baseline for this method is the verified assumption of independence of the

D_{X_{b b}}

cut efficiency from

m_{H}

, in such a way that it is possible to define the reweighting function in a certain mass window and then apply it in another one. The reweighting function is defined as the ratio of the multi-dimensional probability distribution functions (PDFs) of the data in HSB1 to data in HSB0. In this analysis, the statistical procedure of direct importance estimation explained in Section 2 and Section 2.1 is utilized, where the ratio is estimated directly from data. It is implemented via the training of a DNN, where the loss function (in Equation (5)) is minimized to produce weights that can accurately reproduce the observed ratio in data.

The DNN is built using a fully connected sequential model from Keras with three inner layers, each with 20 neurons and a rectified linear unit (ReLU) activation function. To reduce the problem of overfitting during training, 10% of the connections among inner layers are randomly truncated (“dropout"). The last layer has a single output with a simple linear activation function. The model is trained using the Adam optimizer in Keras with Tensorflow as the backend. Training is performed using a batch size equal to the full dataset size for 1600 epochs, with early stopping if the value of the loss calculated on the validation dataset does not decrease for 100 subsequent epochs.

Events are considered for training if they pass the analysis preselection, satisfy

145 GeV < m_{H} < 175 GeV

and, additionally, have at least two track jets associated to the Higgs boson candidate. They are modeled as an unordered set of variables, namely the transverse momentum (

p_{T}

), the pseudorapidity (

η

), the azimuthal angle

ϕ

and the energy of the Higgs boson candidate (E), the number of tracks associated to the Higgs boson candidate, the transverse momentum,

η

,

ϕ

and the mass of the first two track jets associated with the Higgs boson candidate, ordered in

p_{T}

.

Each variable x is standardized with the transformation

x = \frac{(x - μ)}{σ}

, where

μ

and

σ

are its mean and standard deviation, respectively.

The training was performed in the HSB, using data both in the tagged (HSB1) and untagged regions (HSB0) before applying the SRs categorization. This inclusive training enables use of a single weights set for merged, resolved and anomaly SRs. The dataset was divided into training and test sets using 70% and 30% of the full training dataset, respectively. From the training set, 20% was used for validation, to validate the model and to monitor overfitting during the training phase.

The DNN outputs event-level weights, assumed to be approximately independent of

m_{H}

, that can be applied to an untagged region to produce the

m_{Y}

shape in the corresponding

X_{b b}

-tagged region. These weights are validated using data from the LSB.

Figure 3 shows the impact of the reweighting on the distributions of several key analysis variables, using the two-prong merged LSB as an example region.

Three curves are shown in total, comparing the LSB0 data before and after DNN reweighting is applied to the target data distribution in LSB1. These variables are chosen to focus on kinematic variables over which the background estimation is extrapolated to generate the SR prediction. Shape differences are observed after the application of the weights, and good agreement is observed for the reweighted shapes to the true tagged data in all distributions, suggesting a robust background model. As the training is performed inclusively of the X-tagging, the same conclusion holds for the anomaly and two-prong resolved LSB regions.

This approach enables a normalized distribution to be obtained and the background normalization factor in SR is obtained by the fit procedure.

Several sources of uncertainties are related to the method, i.e., effects that are not considered in the statistical model, but can affect the result of the measurement. They are called systematic uncertainties; their impact is taken into account by quantifying the corresponding variation on the background shape. Three different kinds of such uncertainties were considered, as explained below.

The first is the potential variation in the obtained weights due to differences in phase space between HSB (where the network is trained) and the Higgs mass window. The related uncertainty is calculated by obtaining an additional background model, training the DNN in an alternate region of 165 GeV <

m_{H}

< 200 GeV. This region has approximately the same statistics and tagging efficiency as the nominal training region, helping to isolate the effect of the particular training region on the obtained output weights. Up and down variations are defined by symmetrizing the shape difference in

m_{Y}

between the two different models, creating an effect of

O (1 - 10)

% across the distribution.

Another DNN variation is built to account for the finite statistics of the training sample and the random initialization of the weights. It is estimated with a bootstrap procedure [13], where a set of 100 bootstrap networks are trained, each time varying the training dataset by resampling it with a replacement. The correct way to evaluate this systematic is to use the event-level covariance matrix between all bootstrap weights, but, since it is computationally prohibitive, the interquartile range (IQR) for each event’s weight distribution is considered as a good approximation of the uncertainty, along with the IQR of the normalization factor for each bootstrap training. Two additional templates are then defined with the median weight for each event, plus or minus half of the IQR, defining the upper and lower symmetric error bands. This corresponds to a

O (1)

% effect across

m_{Y}

.

Lastly, a non-closure uncertainty is included to cover modeling discrepancies that may arise from extrapolating weights derived from the NN training in the HSB to the LSB, and subsequently to the SR. It is defined by the symmetrized shape difference between the data and predicted background in the LSB. The non-closure is negligible for low

m_{Y}

and rises to

O (10)

% in the

m_{Y}

tails.

The results of background-only fits of the

m_{Y}

distribution across all

m_{X}

categories in the anomaly SR show good compatibility of data to the expected background, after incorporating all statistical and systematic uncertainties. The largest deviation is in the

m_{X}

window [75.5, 95.5] GeV, corresponding to a global significance of 1.47

σ

. The results for the two-prong SRs show no significant deviations of data with respect to the predicted background beyond expected statistical fluctuations.

3.2. Resonant HH in the $b \bar{b} b \bar{b}$ Final State

The second ATLAS analysis described in this paper, which adopted the same background estimation technique, is a search for resonant (in particle physics, the term resonant stands for processes having a peaked invariant mass distribution around the mass of a particle which has decayed into the observed final state) Higgs boson pair production in the

b \bar{b} b \bar{b}

final state [7]. LHC

p p

-collisions data collected by ATLAS in 2016–2018 are used, corresponding to

126 f b^{- 1}

.

To perform the search, a selection of the events used is performed, according to the expected properties of the signal, and signal and control regions are defined. The considered hadronic final state is explored by selecting only events with at least four small-R jets. Then kinematic cuts are applied to obtain SRs and CRs for the final fit on the 4b invariant mass distribution, summarized as follows: Events are first divided into two categories: 2b, where exactly two jets are b-tagged (a b-tagged jet is a jet identified as produced by the hadronization of a b quark) and 4b, where at least four jets are b-tagged. Exactly four jets are selected to construct the two H candidates. For 4b events, the four b-tagged jets with the highest

p_{T}

are selected. For 2b events, the two b-tagged jets and the two untagged jets with the highest transverse momentum are selected. The 2b events are needed to construct the background model for the 4b category. This selection of untagged jets can introduce a kinematic bias with respect to the 4b category; however, this is exactly what is accounted for by the reweighting function. After the four jets are chosen, they are paired to form the two Higgs candidates H

_{1}

and H

_{2}

and the pairing is chosen by a boosted decision tree (BDT). Pairs are then ordered in terms of their transverse momentum. The multi-jet background is reduced by requiring a certain angular separation between the two H candidates; processes coming from the top decay are suppressed by a proper applied veto. Finally, events are sorted into three kinematic regions based on the invariant masses of the H candidates: a signal region (SR), a validation region (VR) and a control region (CR).

Background Estimation

After the selections described above, the background is mainly composed of pure QCD multi-jet processes; therefore, the discussed data-driven technique fits the background estimation problem well. Since the signal contamination in the 2b region is found to be negligible compared to the background uncertainties, these data are used to predict the background shape in the 4b SR. As in the previously discussed analysis, the HSB of the Higgs mass window is used for the training of the neural network; in this case, the CR plays the same role. Here a reweighting function is approximated by data to map the 2b kinematic region onto the 4b region. This function is then applied to the corresponding 2b SR in order to have a background model in the 4b SR.

The neural network used to minimize the loss in Equation (5) is composed of three densely-connected hidden layers, with 50 nodes and a ReLU activation function each, and an output layer with a single neuron with a linear activation function. The variables used in the input to the NN for training performed in CR are chosen to be sensitive to the differences between the two kinematic regions. They include the following:

log p_{T}

of both the

p_{T}

-subleading jet and the fourth highest

p_{T}

jet;

log Δ R

between the first and the third

p_{T}

-ordered jets; the average |

η

| of the four jets;

log p_{T}

from the di-Higgs four-momentum;

Δ R

between the Higgs’ candidates;

Δ ϕ

between the two jets forming each Higgs’ candidate; a constructed variable taking together the difference between the reconstructed and the nominal value of the Higgs mass and the number of jets in the event.

The effect of applying this reweighting to the CR, where it is derived, is shown in Figure 4. The output of this procedure is an estimate of the corrected

m_{H H}

distribution in the 4b SR, which is then used as input to the statistical procedure.

The number of events in the 4b region is calculated similarly to the YXH analysis. In the VR, a good compatibility between target and reweighted data is found and residual differences are used for the systematic uncertainty estimation.

In the background estimation procedure of this analysis, two sources of systematic uncertainties have been considered:

An uncertainty originating from the limited sample size in CR, which affects the training of the network;
An uncertainty which takes into account the differences between SR and CR kinematics, which can have an impact in the background estimation, as described in Section 3.1.

The training of the NN is subject to fluctuations due to the initial conditions and the limited size of the training sample. For this reason, the bootstrap resampling technique is used [13], resulting in an ensemble of background estimates. In this ensemble, distributions are obtained after applying a weight to each event that is varied to the event-level weight IQR and then scaled to the same normalization as the nominal distribution; finally, they are multiplied by the ratio of the upper IQR value of the normalization factors to its nominal value. This new set of distributions creates an envelope centered around the nominal distribution, from which an estimate of the uncertainty in each

m_{(H H)}

bin can be evaluated.

No significant excess has been observed in the data, which shows good agreement with the background prediction.

4. Conclusions

The computation of the likelihood ratio plays a fundamental role in particle physics’ analyses. This quantity can be calculated with a data-driven approach involving machine learning by writing a custom loss function derived by setting a least-squared problem. The method itself is not limited to these kinds of problems. It can be applied to estimate likelihood ratios from data without requiring knowledge of single particle distribution functions, which is a non-trivial problem to solve, and can be applied to hypothesis-testing problems in data analysis used for other applications in various fields, such as covariate shift adaptation [14] and outlier detection [15]. The method was applied to two ATLAS analyses in relation to the background estimation problem, with fully satisfactory results, confirming the power of the approach.

Author Contributions

The authors equally contributed to the conceptualization, the writing, review and editing of this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

LHC Machine. JINST 2008, 3, S08001. [CrossRef] [Green Version]
Aad, G.; Abajyan, T.; Abbott, B.; Abdallah, J.; Khalek, S.A.; Abdelalim, A.A.; Aben, R.; Abi, B.; Abolins, M.; AbouZeid, O.; et al. Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC. Phys. Lett. B 2012, 716, 1–29. [Google Scholar]
Aad, G.; Abat, E.; Abdallah, J.; Abdelalim, A.; Abdesselam, A.; Abi, B.; Abolins, M.; Abramowicz, H.; Acerbi, E.; Acharya, B.; et al. The ATLAS experiment at the CERN large hadron collider. J. Instrum. 2008, 3, S08003. [Google Scholar] [CrossRef] [Green Version]
Kanamori, T.; Hido, S.; Sugiyama, M. A least-squares approach to direct importance estimation. J. Mach. Learn. Res. 2009, 10, 1391–1445. [Google Scholar]
Moustakides, G.V.; Basioti, K. Training neural networks for likelihood/density ratio estimation. arXiv 2019, arXiv:1911.00405. [Google Scholar]
Anomaly Detection Search for New Resonances Decaying into a Higgs Boson and a Generic New Particle X in Hadronic Final States Using s = 13 TeV pp Collisions with the ATLAS Detector; Technical Report; CERN: Geneva, Switzerland. 2022. Available online: https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/CONFNOTES/ATLAS-CONF-2022-045 (accessed on 6 July 2022).
Aad, G.; Abbott, B.; Abbott, D.; Abud, A.A.; Abeling, K.; Abhayasinghe, D.; Abidi, S.; Abramowicz, H.; Abreu, H.; Abulaiti, Y.; et al. Search for resonant pair production of Higgs bosons in the b b b b final state using p p collisions at s = 13 TeV with the ATLAS detector. Phys. Rev. D 2022, 105, 092002. [Google Scholar] [CrossRef]
Cacciari, M.; Salam, G.P.; Soyez, G. The anti-kt jet clustering algorithm. J. High Energy Phys. 2008, 2008, 063. [Google Scholar] [CrossRef] [Green Version]
Marciano, W.; Pagels, H. Quantum chromodynamics. Phys. Rep. 1978, 36, 137–276. [Google Scholar] [CrossRef]
Pappadopulo, D.; Thamm, A.; Torre, R.; Wulzer, A. Heavy vector triplets: Bridging theory and data. J. High Energy Phys. 2014, 2014, 1–50. [Google Scholar] [CrossRef] [Green Version]
The ATLAS Collaboration. Identification of Boosted Higgs Bosons Decaying Into bb With Neural Networks and Variable Radius Subjets in ATLAS. arXiv 2020, arXiv:1906.11005.
Kahn, A.; Gonski, J.; Ochoa, I.; Williams, D.; Brooijmans, G. Anomalous jet identification via sequence modeling. J. Instrum. 2021, 16, P08012. [Google Scholar] [CrossRef]
Michelucci, U.; Venturini, F. Estimating neural network’s performance with bootstrap: A tutorial. Mach. Learn. Knowl. Extr. 2021, 3, 357–373. [Google Scholar] [CrossRef]
Sugiyama, M.; Suzuki, T.; Nakajima, S.; Kashima, H.; von Bünau, P.; Kawanabe, M. Direct importance estimation for covariate shift adaptation. Ann. Inst. Stat. Math. 2008, 60, 699–746. [Google Scholar] [CrossRef]
Ben-Gal, I. Outlier detection. In Data Mining and Knowledge Discovery Handbook; Springer: Boston, MA, USA, 2005; pp. 131–146. [Google Scholar]

Figure 1. Feynman diagram of the target signal process, where the Y is produced in the initial

p p

collision and decays to a fully hadronic final state via an SM Higgs boson(

\to b \bar{b}

) and a new particle X.

Figure 1. Feynman diagram of the target signal process, where the Y is produced in the initial

p p

collision and decays to a fully hadronic final state via an SM Higgs boson(

\to b \bar{b}

) and a new particle X.

Figure 2. Selection flow after and analysis regions of the

Y \to X H

search. The CRs (CR0, HSB0, HSB1), VRs (LSB0, LSB1), and SR are defined in the same way for all three analyses using the H candidate mass and

D_{X_{b b}}

score [6].

Figure 2. Selection flow after and analysis regions of the

Y \to X H

search. The CRs (CR0, HSB0, HSB1), VRs (LSB0, LSB1), and SR are defined in the same way for all three analyses using the H candidate mass and

D_{X_{b b}}

score [6].

Figure 3. Distribution of the H candidate

p_{T}

(a) and

η

(b), along with the mass of

J_{X}

(c) and

m_{Y}

(d) in the merged LSB validation region, overlaying data from LSB1 with the data in LSB0 shown before (orange) and after (red) reweighting. The ratio of the LSB1 data to both the LSB0 data (orange) and the reweighted LSB0 data (red) is shown in the lower panel. Error bars indicate statistical uncertainties [6].

Figure 3. Distribution of the H candidate

p_{T}

(a) and

η

(b), along with the mass of

J_{X}

(c) and

m_{Y}

(d) in the merged LSB validation region, overlaying data from LSB1 with the data in LSB0 shown before (orange) and after (red) reweighting. The ratio of the LSB1 data to both the LSB0 data (orange) and the reweighted LSB0 data (red) is shown in the lower panel. Error bars indicate statistical uncertainties [6].

Figure 4. A comparison between the

m_{(H H)}

distributions in the 2b (filled histogram) and in the 4b control regions (dots) is shown, where the 2b data are before and after the reweighting, respectively in (a,b). Here, the distributions are all normalized to the 4b event yields to purely compare the shapes [7].

Figure 4. A comparison between the

m_{(H H)}

distributions in the 2b (filled histogram) and in the 4b control regions (dots) is shown, where the 2b data are before and after the reweighting, respectively in (a,b). Here, the distributions are all normalized to the 4b event yields to purely compare the shapes [7].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Auricchio, S.; Cirotto, F.; Giannini, A. Machine-Learning Application for a Likelihood Ratio Estimation Problem at LHC. Appl. Sci. 2023, 13, 86. https://doi.org/10.3390/app13010086

AMA Style

Auricchio S, Cirotto F, Giannini A. Machine-Learning Application for a Likelihood Ratio Estimation Problem at LHC. Applied Sciences. 2023; 13(1):86. https://doi.org/10.3390/app13010086

Chicago/Turabian Style

Auricchio, Silvia, Francesco Cirotto, and Antonio Giannini. 2023. "Machine-Learning Application for a Likelihood Ratio Estimation Problem at LHC" Applied Sciences 13, no. 1: 86. https://doi.org/10.3390/app13010086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine-Learning Application for a Likelihood Ratio Estimation Problem at LHC

Abstract

1. Introduction

2. The Reweighting Problem

2.1. A Least-Squares Approach to Direct Importance Estimation

3. Particle Physics Analyses Application

3.1. YXH Fully-Hadronic Analysis

Background Estimation

3.2. Resonant HH in the $b \bar{b} b \bar{b}$ Final State

Background Estimation

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Machine-Learning Application for a Likelihood Ratio Estimation Problem at LHC

Abstract

1. Introduction

2. The Reweighting Problem

2.1. A Least-Squares Approach to Direct Importance Estimation

3. Particle Physics Analyses Application

3.1. YXH Fully-Hadronic Analysis

Background Estimation

3.2. Resonant HH in the b b ¯ b b ¯ Final State

Background Estimation

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. Resonant HH in the $b \bar{b} b \bar{b}$ Final State