Mean Shift Cluster Recognition Method Implementation in the Nested Sampling Algorithm

Trassinelli, Martino; Ciccodicola, Pierre

doi:10.3390/e22020185

Open AccessArticle

Mean Shift Cluster Recognition Method Implementation in the Nested Sampling Algorithm

by

Martino Trassinelli

^*

and

Pierre Ciccodicola

Institut des NanoSciences de Paris, CNRS, Sorbonne Université, 4 Place Jussieu, 75005 Paris, France

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(2), 185; https://doi.org/10.3390/e22020185

Submission received: 16 December 2019 / Revised: 24 January 2020 / Accepted: 3 February 2020 / Published: 6 February 2020

(This article belongs to the Special Issue MaxEnt 2019—The 39th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Nested sampling is an efficient algorithm for the calculation of the Bayesian evidence and posterior parameter probability distributions. It is based on the step-by-step exploration of the parameter space by Monte Carlo sampling with a series of values sets called live points that evolve towards the region of interest, i.e., where the likelihood function is maximal. In presence of several local likelihood maxima, the algorithm converges with difficulty. Some systematic errors can also be introduced by unexplored parameter volume regions. In order to avoid this, different methods are proposed in the literature for an efficient search of new live points, even in presence of local maxima. Here we present a new solution based on the mean shift cluster recognition method implemented in a random walk search algorithm. The clustering recognition is integrated within the Bayesian analysis program NestedFit. It is tested with the analysis of some difficult cases. Compared to the analysis results without cluster recognition, the computation time is considerably reduced. At the same time, the entire parameter space is efficiently explored, which translates into a smaller uncertainty of the extracted value of the Bayesian evidence.

Keywords:

nested sampling; cluster analysis; mean shift method; Bayesian evidence; model comparison

1. Introduction

At present, Bayesian methods are routinely used in many fields: astrophysics and cosmology [1,2,3,4,5,6,7,8], particle physics [9], plasma physics [10,11], machine learning [12] and many others [13,14]. In the past few years, they were also applied to nuclear [15,16] and atomic physics [17,18,19,20,21]. On one hand, one of the reasons for this success is related to the possibility of assigning a probability value to models (hypotheses) from the analysis of the same set of data in a very defined framework. In opposite to this, classical statistical tests and criteria (e.g., chi-square and likelihood ratio, Aikaike information criterion [22], etc.) are completely powerless if any defined preference does not emerge. On the other hand, the implementation of Bayesian methods is only now widely possible thanks to the recent relatively cheap cost of computation power. A large computing capability is in fact required for the fine exploration of the probability distribution of the model parameters. Unlike standard methods, which are mostly reduced to minimization/maximization problems (of the likelihood function or chi-squares), Bayesian approaches have to deal with non-trivial integrations in multi-dimensional space. One of the key points of Bayesian model selection is in fact the calculation of the Bayesian evidence, also called marginal likelihood, defined by

E (M) \equiv P (D a t a | M, I) = \int P (D a t a | a, M, I) P (a | M, I) d^{J} a = \int L^{M} (a) P (a | M, I) d^{J} a .

(1)

It consists in the integral of the likelihood function

L^{M} (a) = P (D a t a | a, M, I)

in the J-dimensional parameter space (with J the number of the parameters) weighted by the prior probability

P (a | M, I)

of the parameters

a

of a defined model

M

and where I represents the background available information. From the evidence, the probability of the model

P (M | D a t a, I)

is simply evaluated by the formula

P (M | D a t a, I) \propto E (M) P (M | I),

(2)

where

P (M | I)

is the prior probability of the model itself. The challenging part resides in the multi-dimensional integration of Equation (1). For this matter, different approaches have been developed in the past, some of them are Markov Chain Monte Carlo (MCMC) based techniques (see e.g., [14,23]) for the integration of

L^{M} (a) P (a | M, I)

. As an alternative, the nested sampling method has been proposed by Skilling in 2004 [24,25,26]. With this method, the multi-dimensional integral in Equation (1) is reduced to a one-dimensional integral and calculated. Because of its high-efficiency and relatively moderate calculation power requirement compared to other approaches, the nested sampling method is actually implemented in several data analysis codes such Multinest [3,27], Diamonds [28], Polycord [29], UltraNest, DNest4 [30] and Dynesty [31] for the computation of the Bayesian evidence and posterior probability distributions. Because of its efficient sampling, nested sampling is also routinely used to study thermodynamic partition functions [32,33,34,35] and to explore potential energy landscapes of atomistic systems [36,37,38].

When several maxima of the likelihood function are present, nested sampling algorithm can however encounter problems with converging correctly. The parameter space exploration can become inefficient or exclude entire regions, which introduces systematic errors in the estimation of the evidence. In order to avoid such a problem, several solutions are proposed in the literature. Here we present an original approach based on cluster recognition with the mean shift method, one of the classic clustering algorithm widely used and included in the major machine learning libraries. This method is implemented in the program NestedFit, a code developed by one of the authors and described in details in [39,40].

An introduction to nested sampling and NestedFit code is presented in Section 2. The description of the mean shift algorithm, its implementation on NestedFit and the results of some tests are presented in Section 3. The article will end with a conclusive section (Section 4).

2. Nested Sampling and NestedFit

2.1. The Nested Sampling Algorithm

Nested sampling is based on the reduction of the multi-dimensional integral in Equation (1) for the evidence computation into a one-dimensional integral

E (M) = \int_{0}^{1} L (X) d X .

(3)

X represents the normalized value of the volume, weighted by the prior probability

P (a | I)

, of the portion of J-dimensional space of parameters where

L (a)

is higher than a certain value

L

:

X (L) = \int_{L (a) > L} P (a | I) d^{J} a .

(4)

Equation (3) is numerically calculated using the rectangle integration method subdividing the

[0, 1]

interval in

M + 1

segments with an ensemble

{X_{m}}

of M ordered points

0 < X_{M} < \dots < X_{2} < X_{1} < X_{0} = 1

:

E (M) \approx \sum_{m} L_{m} Δ X_{m},

(5)

where

L_{m} = L (X_{m})

isgiven by the invertible relation in Equation (4) and

Δ X_{m}

is simply given by

X_{m} - X_{m + 1}

or by the more accurate trapezoid rule

Δ X_{m} = 1 / 2 (X_{m - 1} - X_{m + 1})

. Each

Δ X_{m}

represents a slice of parameter space of nested hypervolumes defined by Equation (4), giving the algorithm its name.

The evaluation of

L_{m}

is obtained by a recursive step-by-step exploration of the likelihood function by a Monte Carlo sampling. A collection of K parameter values

{a_{k}}

, called live points, corresponds to K random points

{ξ_{1, k}}

in

[0, 1]

interval. When the live point

{\tilde{a}}_{1} = a_{1, k^{'}}

corresponding to the highest value of

{ξ_{1, k}}, ξ_{1, k^{'}} = max {ξ_{1, k}}

(with

L_{1} = min {L (ξ_{1, k})} = L (ξ_{1, k^{'}}) \equiv L ({\tilde{a}}_{1})

from Equation (4)) is discarded, the mean value of the interval occupied by the remaining

ξ_{k}

points shrinks to

X_{m} = max_{k \neq k^{'}} {ξ_{k}} \approx {(\frac{K}{K + 1})}^{m} \approx e^{- m / K}

(6)

with, at this first step,

m = 1

.

If a new live point

a_{n e w}

is found with the condition

L (a_{n e w}) > L_{m = 1}

, a new set of

ξ_{m = 2, k}

points is constructed and the next procedure iteration step starts. For each step, the discarded values

{\tilde{a}}_{m} = a_{m, k^{'}}

are stored together with their corresponding likelihood values

L_{m} = L ({\tilde{a}}_{m})

. The

X_{m}

are obtained by their average expectation value from Equation (6). Step by step, the nested volumes built with the condition

L (a) > L_{m}

converge around the parameter space regions corresponding to high values of the likelihood function. When the algorithm converges, the evidence is evaluated from the different values

L_{m}, Δ X_{m}

using Equation (5). From the set of collected values of the discarded live points

{\tilde{a}}_{m}

and the associated weights

w_{m} = L_{m} Δ X_{m}

, the posterior probability

P (a | D a t a, M, I)

can be determined. More details on the nested sampling algorithm and its implementation can be found in Refs. [3,24,25,26,27,41,42].

2.2. Bottleneck of Nested Sampling and Proposed Solutions

The difficulty of this elegant method is to efficiently find a new live point at each step within the hypervolume contour defined by

L (a) > L_{m}

. Codes that use the nested sampling method generally encounter difficulties to find new live points

a_{n e w}

when several maxima of the likelihood function are present. In this case, the exploration of the parameter space becomes generally inefficient or can consider only one local maximum while introducing systematic errors in the estimation of the evidence. In order to avoid these problems, different strategies have been proposed in the literature. These strategies can be divided into two categories: with a cluster recognition algorithms and without cluster recognition, but with other improvements of the search algorithm for new live points.

A first attempt to improve the search of new live points for multimodal problems via MCMC has been proposed by Veitch and collaborators in 2010 [42]. Here 10% of the steps of the random walk are determined by a combination of three past points and not only the previous point of the Markov chain. In this way, a more efficient sampling is obtained without need of cluster recognition.

Another improved random walk method for nested sampling algorithm is the diffusive nested sampling, developed by Brewer et al. in 2011 [43] and implemented in DNest4 [30] program. Here, the passage between maxima is facilitated by blurring the condition

L (a_{i}) > L_{m}

for the parameter values explored by the MCMC, allowing to momentarily pass in regions with lower values of the likelihood function.

Alternatively to random walks, the use of single- or multi-particle trajectories have been implemented for improving the search of new points in complex landscapes of the function to maximize or minimize. This is the principle of Galilean and Hamiltonian Monte Carlo exploration [34,44]. In the first case, linear trajectories and reflection from hard boundaries, given by the minimal likelihood threshold value, are considered. In the second case, more complex trajectories are computed from the motion determined by the Hamiltonian function, like in molecular dynamics, assimilated here to the likelihood function.

In the case of the presence of several maxima, these methods significantly improves the search of new points but do not allow to pass from one maximal region to another, which limits their efficiency. A completely different approach has been proposed by Martiniani and collaborators in 2014 [45]. To take into account the presence of several maxima without recurring to cluster recognition, they suggest global optimization techniques to use the knowledge of identified local maxima and their statistical weight and then to perform parallel nested sampling in correspondence of each significant region.

A first solution with the use of a cluster recognition algorithm has been implemented in Multinest code already in 2008 [3,27]. Here, new live points are randomly selected in an ellipsoid that is defined by the covariance matrix of the present live points. Cluster analysis is used for partitioning of the parameter spaces in a series of ellipsoids. This is obtained by implementing the k-means clustering algorithm, which is triggered when the estimated volume occupied by the live points is much smaller than the ellipsoid volume estimated from their covariance matrix. A partition in two cluster is initially performed (

k = 2

) and recursively repeated (always with

k = 2

) to obtain an efficient partition of the space with many ellipsoids.

In the more recent Polycord program [29], where the search of new live points is based on the slice sampling (a MCMC that uses the live point covariant matrix to provide a probability distribution for the choice of the random walk direction), the cluster recognition is obtained by the k-nearest neighbor algorithm. Once the different cluster are identified, for each of them a parallel exploration and analysis via slice sampling MCMC is independently performed.

In the recent and very complete nested sampling code Dynesty [31], different sampling methods are proposed: from random uniform selections in ellipsoids, like Multinest, to a series of MCMC (random walks, slice sampling, …). Difficult cases with several likelihood maxima are treated by decomposing the parameter space in several ellipsoids via a cluster analysis (using k-means algorithm like Multinest), or spheres or cubes (with same radius/side, one per each live point implementing the RadFriends algorithm [46]) with no need of any cluster recognition technique.

In the following sections, we present a new alternative method based on an MCMC and where the mean shift algorithm is used for the identification of clusters. It is implemented in the existing nested sampling code NestedFit, which is briefly introduced as well in the next paragraph.

2.3. The NestedFit Program

NestedFit is a general-purpose code for the evaluation of Bayesian evidence and parameter probability distributions based on nested sampling algorithm. It is written in Fortran90 with some subroutines in Fortran77, and parallelized via OPEN- MPI. It is mainly developed and implemented for the analysis in the fields of atomic, nuclear and solid-state physics [16,39,40,47,48,49,50]. It is accompanied by a Python function library for visualization of the results and for automatization of series of analyses. In this publication we present the version 3.2 that has the cluster analysis of the live points as substantial upgrade with respect to older versions (see Ref. [39] for v. 0.7 and Ref. [40] for v. 2.2). In addition, in this new version some new improvements in the search of live point are also implemented. The source code is freely available in the repository https://github.com/martinit18/nested_fit.

The code requires two main input files: the main input file (nf_input.dat) where the analysis parameters are selected, and the data file, in the format (channel, counts) or (channel, y value, y uncertainty). Dependent on the data format, a Poisson or Gaussian statistics likelihood function is used. The function name in the input file indicates the model to be used for the calculation of the likelihood function. Several functions are already defined in the function library for different model of spectral lines. Additional functions can be easily defined by the user in a dedicated routine. Non-analytical or simulated profile models can be considered as well. In this case, one or more additional files have to be provided by the user for interpolation by B-splines using FITPACK routines [51].

Several data sets can be analyzed at the same time. This is particularly important for the correct study of physically correlated spectra at the same time, e.g., background and signal-plus-background spectra. This is implemented by using a global user-defined function composed by different models to describe each spectra but with common parameters between the models.

The main analysis results are summarized in one output file (nf_output_res.dat). Here the details of the computation, number of trials, number of iteration, can be found as well as the final evidence value and its uncertainty

E \pm δ E

, the parameter values corresponding to the maximum of the likelihood function, but also the mean, the median, the standard deviation and the confidence intervals one, two and three sigma (68%, 95% and 99%) of the posterior probability distribution of each parameter.

δ E

, or more precisely

δ (ln E)

is evaluated by running the nested sampling several time for different sets of starting live points.

δ (ln E)

is obtained by the standard deviation of the different values of

ln E

, the natural estimation to study the uncertainty of E [52,53]. The information gain

H

and the Bayesian complexity are also provided in the output. Data for plots and for further analyses are provided in separated files. The step-by-step information of the nested sampling exploration can be found in the largest output file that contains the live points used during the parameter space exploration

{\tilde{a}}_{m}

, their associated likelihood values

L_{m}

and weight

w_{m} = L_{m} Δ X

. From this file, the different parameter probability distributions and joint probabilities can be build from the marginalization of the unretained parameters.

Details of the NestedFit search algorithm are presented in the next section. Additional information can be found in Refs. [39,40].

2.4. NenstedFit Search Algorithm

The search of new live points in NestedFit is based on a random walk called lawn mower robot [39,40,54], which is represented in Figure 1a. It is composed by a sequence of N steps (or jumps, with N selected by the user) starting from a randomly chosen live point. Each step has an amplitude and direction given by the J-dimensional vector

f r σ

where each component

f r_{j} σ_{j}

is determined by factor f, selected by the user, a random number

r_{j}

and the standard deviation of the current live points

σ_{j}

with respect to the jth parameter. For an efficient covering of the entire parameter space, f and N should be chosen with the criterion

f \times N ≳ 1

(7)

to explore regions within a distance of the order of one standard deviation around the starting point. Each new step, which correspond to a new parameter set

a_{n}

, is accepted if

L (a_{n}) > L_{m}

. If

L (a_{n}) < L_{m}

, a new set of

r_{j}

is chosen. The number of total tries

n_{t}

is recorded. The choice of the values for f and N is very critical and it could vary from case to case. N has to be large enough to lose the memory of the starting live point position, but an increase of it produces a linear increase in computation time. A similar situation arises for f. If it is too small, a strong correlation between live points is artificially created. If it is too large, many failures can occur. From our experience, a reasonable range of values is

N = 20 - 40

and

f = 0.1 - 0.2

. In any case we suggest a visual check of the explored live points for detecting possible correlations.

If the number of failures becomes too high (

n_{t} ≫ N

), two different strategies are implemented for finding a new live point. In the first one, schematically represented in Figure 1b, a new parameter set is determined by randomly choosing a point between the last failing chain point

a_{n}

with

L (a_{n}) < L_{m}

and the barycenter of the current live points. As for the lawn mower robot method, also this algorithm is due to Simons [54] but it was not implemented in the past versions of NestedFit.. The second method, represented in Figure 1c, consists of building a new synthetic live point

a_{n e w}

from the

j th

components from distinct live points:

{(a_{n e w})}_{j} = {(a_{m, k})}_{j}

where k is randomly chosen between 1 and K (the total number of live points) for each j. If

a_{n e w}

L (a_{n e w}) > L_{m}

, the new point is considered, otherwise another random live point is chosen as start of the random walk.

The two strategies are chosen randomly when

n_{t} = N_{t}

(

N_{t}

chosen by the user in the configuration file) and

n_{t}

is reset to zero. As suggested by the schemes in Figure 1, the first one favors a re-centering of the live points. In the opposite, the second can more easily explore peripheral regions. This second strategy was in fact the only present in the previous versions of NestedFit (where also

N_{t}

was a fixed parameter of the code), and it was developed to improve the search algorithm for multimodal cases facilitating jumps between maximal regions of the likelihood function [39,40].

If the entire above procedure is repeated subsequently too many times (

N N_{t}

, selected by the user), the cluster analysis, described in the following sections, is triggered for improving the search of new live points.

3. Mean Shift Clustering Algorithm and Its Implementation

3.1. Preliminary Tests and Considerations on Other Cluster Recognition Algoritms

Before the implementation of one particular cluster recognition method in NestedFit, different algorithms from classical machine learning libraries (https://scikit-learn.org as example) have been considered and some of them have been tested with simple Python scripts. For this purpose, we used different ensembles of live points issued from NestedFit runs on real data when convergence problems were encountered. We excluded a priori Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method. This method is well adapted for detecting cluster with singular shapes (e.g., arc of a circle) without necessarily improving the implemented random walk algorithm that is based on the standard deviation of the recognized cluster. We then tested the Gaussian mixture method with the determination of the number of clusters based on the expectation-maximization algorithm. The results were not convincing and required external criteria for determining the number of clusters. For similar reasons, we excluded the k-means method that requires a preliminary choice of number of clusters and the x-means method that uses external criteria to determinate the best choice of k. We did not consider the recursive use of k-means with

k = 2

, like in the Multinest code, to keep a simple cluster recognition implementation.

From these preliminary tests and considerations, the mean shift clustering algorithm [55,56] emerged for its simplicity of implementation, its robustness and, more importantly, because it does not require a choice of the number of clusters before the analysis.

3.2. The Mean Shift Algorithm for Cluster Recognition

Mean shift is a recursive algorithm based on the iterative calculation of the mean of points within a given region. Considering an ensemble

{x_{i}}

, for each point the mean value

m_{i}

of the neighbor points

N H (x_{i})

is calculated recursively via a kernel function

K (x_{i}, x_{j})

via

m_{s, i} = \frac{\sum_{x_{s, j} \in N H (x_{s, i})} K (x_{s, i}, x_{s, j}) x_{s, j}}{\sum_{x_{s, j} \in N H (x_{s, i})} K (x_{s, i}, x_{s, j})},

(8)

with

s = 1

and

x_{s = 1, i} = x_{i}

for the first step. Then the procedure is repeated considering instead of the initial points

x_{i}

, the mean values of the previous step,

x_{s, i} = m_{s - 1, i}

, until convergence or the maximum number of allowed steps is reached. Different points belonging to the same cluster are identified by the vicinity of the final

m_{s, i}

values.

With the present implementation, via a Fortran module in NestedFit, the identification of the neighbor points

N H

is determined by the Euclidean distance

d (x_{i}, x_{j}) < D

, with D selected by the user. Two choices of K are available: a flat kernel

K (x_{i}, x_{j}) = 1

, and a Gaussian kernel

K (x_{i}, x_{j}) = exp (- d (x_{i}, x_{j}) / ℓ)

, with ℓ the bandwidth selected by the user. Before the implementation of the mean shift algorithm, the live points are normalized to their minima and maxima

{(x_{k})}_{j} = [{(a_{m, k})}_{j} - min {{(a_{m, k})}_{j}}] / [max {{(a_{m, k})}_{j}} - min {{(a_{m, k})}_{j}}]

to have parameter D and ℓ dimensionless and in a fixed possible range

[0, 1]

.

At the end of the analysis, each live point has an additional flag indicating its belonging cluster that is used in the main NestedFit search algorithm.

3.3. Mean Shift Implementation in NestedFit

As written above, the cluster analysis is triggered when there are too many tries in the main search algorithm (

n_{t} = N_{t} \times N N_{t}

). Once the cluster analysis is performed, the algorithm restarts from a random live point but, instead of the standard deviation of whole ensemble of live points

σ

, only the standard deviation of the belonging cth cluster

σ_{c}^{c l u s t e r}

is used for the random walk. Even if the cluster analysis is not perfect (e.g., too many or too few clusters are recognized), the generally smaller values of

σ_{c}^{c l u s t e r}

compared to

σ

significantly improves the efficiency of the nested sampling. When the algorithm becomes inefficient (

n_{t}

reach

N_{t}

), a new starting live point is chosen. When

n_{t}

is becoming too high again (

n_{t} = N_{t} \times N N_{t}

), a new cluster analysis is performed and the calculation continues until the end of the evidence calculation. Because of the random selection of the starting live point, small clusters have small probability to be chosen, and naturally disappear (or eventually grow) to the advantage (disadvantage) to clusters with higher (lower) likelihood values during the nested sampling.

To illustrate the cluster recognition at work in NestedFit, two practical examples are considered. In both cases, a Gaussian kernel has been used with a relatively large value of

D = 0.5 - 0.6

in order to avoid having too many isolated clusters and

ℓ = 0.1 - 0.2

, which ensures a good convergence of the algorithm. The cluster analysis is triggered after few failures,

N N_{t} = 2 - 3

, with a relatively low number of maximal tries

n_{t}

(

N_{t} = 100 - 200

) to change search strategy quite often when it becomes critical. With these criteria, the cluster analysis is triggered only about

2 - 10

times for one entire nested sampling computation.

The first example consists of the analysis of a high-resolution X-ray spectrum corresponding to the helium-like

1 s 2 p^{3} P_{2} \to 1 s 2 s^{3} S_{1}

intrashell transition of uranium obtained by Bragg diffraction from a curved crystal [57]. For the analysis of the spectra, we assume the presence of four Gaussian peaks with the same width and a flat background. The second analysis is related to the measurement of the single decay of H-like

_{61}^{142}

Pm ions to the stable

_{60}^{142}

Nd bare nucleus via electron capture. Here, an exponential decay with a sinusoidal modulation is used as a model, considered parameters are the relative amplitude, pulsation and phase (see Ref. [16] for more details). Both data sets, presented in Figure 2, are characterized by low statistic and the presence of many local maxima of the likelihood function, which makes them therefore difficult to analyze. In the first case, the possible permutations of the position of different peaks correspond to different maxima of the likelihood (4! = 24 maxima for four peaks). In the second case, the multimodal behavior is caused by the different possible combinations of phase and pulsation values and corresponding beats.

To observe the evolution of the nested sampling algorithm with and without cluster analysis in the first case, we represent in Figure 3 the evolution of one of the model parameters

{({\tilde{a}}_{m})}_{j}

relative to the position of one of the four Gaussian peaks as function of the step number m for ten different choices (tries) of starting live points. Different colors correspond to different values of the step weight

w_{m} = L_{m} Δ X_{m}

. Parameters with higher values or

w_{m}

had a higher influence on the final evidence and probability distributions

P (a | D a t a, M, I)

. When the cluster analysis was not implemented (Figure 3 (top)), each try slowly converged to one likelihood maximum only, which corresponds to one of the four possible positions. The convergence in different maxima produced as consequence a spread of the values of the Bayesian evidence E.

In contrast, when the cluster analysis was turned on (Figure 3 (bottom)), all four possible peak positions were considered at the same time and were equally explored for any try. The convergence improvement was directly observable in the smaller value of uncertainty of the evidence E. When the cluster analysis was off, we had

ln E = - 320.52 \pm 1.71

and

ln E = - 323.22 \pm 0.17

when it was on. These results were obtained with

f = 0.1

for the analysis without clusters and

f = 0.2

for the analysis with it,

N = 20

and

K = 2000

for both cases. The uncertainty of the previous values, and for all following evaluations, was obtained from the standard deviation of 16 different

ln E

values obtained running the analysis with 16 different sets of starting live points. The smaller value of f for the run without clusters was chosen to reduce the computation time, which was still about eight times longer than with the cluster analysis. It is interesting to note that, surprisingly, the two main values with and without cluster analysis werere compatible (note: a difference of 0.9 in the

ln E

corresponds to about two sigmas [58]), without a systematic shift due to the exploration of a smaller parameter space. Only the associated uncertainty significantly changes.

To better visualize the cluster analysis process, a 3D presentation of the evolution of three components of

{\tilde{a}}_{m}

, relative to the position of three peaks is presented in Figure 4. Each image is obtained just after a cluster analysis, where different clusters are represented by different colors. To note, the analysis was triggered only a few times (four times for this selected example with

K = 2000, N_{t} = 200, N N_{t} = 2

and with a Gaussian kernel with

D = 0.6

and

ℓ = 0.2

), showing the efficiency of the clustering recognition in the search of new live points (for about 60000 steps for each run). After the first run of the mean shift analysis, only a large cluster (and few isolated live points) were identified. In the following cluster analysis, all 24 different maxima likelihood regions were correctly identified.

The correct and simultaneous identification of all maxima translated to a more regular histogram of probability distributions evaluated from the nested sampling outputs. This is shown in Figure 5, where the 2D histogram relative to the joint probability of position and amplitude of one of the peaks is presented for the analysis with and without cluster recognition. When the cluster analysis was not implemented, the presence of very localized maxima of the probability distribution reflected the pathological behavior of the nested sampling convergence to only one of the likelihood maxima. On the contrary, a much smoother distribution of the joint probability was present when the cluster analysis was on.

A more quantitative measurement on the cluster analysis was obtained by varying the number of used live points K. As it can be observed in Figure 6 (left, top), the final evidence did not change significantly with K. In opposite, the evaluated uncertainty (in blue) changed by several orders of magnitudes and was systematically larger than its theoretical estimation (in black)

δ (log E) \approx \sqrt{H / K}

[25,52], where

H

is the information gain. When the evaluated evidence uncertainty was plotted in logarithmic scale (Figure 6 (left, bottom)), it can be observed that, for high values of K (

\geq 500

),

δ (log E)

was proportional to

1 / \sqrt{H / K}

as expected (

δ (log E) \propto K^{c}

with

c = - 0.52 \pm 0.02

), but was systematically higher by a factor of about 1.6 than the estimated accuracy (not shown in the bottom figure). When K was too low (

K < 500

in the present case), even with the cluster analysis, the nested sampling algorithm could not efficiently explore the 24 minima producing a systematic increases of

δ (log E)

.

As expected, the computation time (equivalent for one single CPU) per set of live points grew almost linearly with K. A simple fit gives an exponential dependency

\propto K^{c}

with

c = 1.13 \pm 0.01

. A significant deviation was observed for

K = 10,000

. In this case the cluster analysis, which number of operations was proportional to

K^{2}

, significantly contributed to the total computation time.

Cluster analyses of above results were obtained all with the same set of parameters: with a Gaussian kernel and with

D = 0.6, ℓ = 0.2, N_{t} = 200

and

N N_{t} = 2

. The exploration of the algorithm efficiency as dependence of these parameters is investigated and the corresponding results are resumed in Figure 7, where the final evidence values and required computation time are presented for different parameter sets. Several cases were considered with flat and Gaussian kernel, indicated in the label by ‘f’ and ‘g’, respectively, and different values of D and ℓ, indicated in the label as well (only D for the flat kernel). As it can be noticed, for too small values of D and ℓ the final accuracy was poor. This is related to the identification of too many and too small clusters that finally induced an inaccurate, but fast, exploration of the parameter space. On the opposite, for too large values, one or very few clusters were identified. In these cases, the cluster algorithm was called very often without really improving the situation but increasing significantly the total computation time. Gaussian kernel proves to be more robust and flexible than a flat kernel, probably due to the presence of the counter-reaction of the two parameters. The optimal parameter choice depended on the specific problem and the values of

N_{t}

and

N N_{t}

. It was generally observed that low values of

N_{t}

allowed for changing starting live point often enough improve the efficiency of the algorithm.

N N_{t}

had to be adapted to trigger enough times the cluster analysis, but not too often.

The analysis of the other considered case was characterized by a completely different cluster evolution. In Figure 8 we represent the amplitude, pulsation and phase of the modulation values after each cluster analysis. After the first run, several clusters were identified even if no clear structures were visible. In the following analysis, a very complex landscape was drawn, with many clusters and with very narrow values in omega. Even if characterized by very different sizes for the different parameters (even after their normalization), different clusters were well identified by the mean shift algorithm.

The complex dependency on the modulation pulsation

ω

is also presented in Figure 9, where its evolution as function of the nested sampling step is represented for two different choices of starting live points. It can be observed that the rich landscape of the likelihood value as function of

ω

was well reproduced for each try, demonstrating the efficiency of the cluster analysis implementation once again.

As in the previous example, similar values of the Bayesian evidence were found:

ln E = - 1921.54 \pm 0.12

without cluster analysis and

- 1922.04 \pm 0.21

with. In contrast to the previous case, the uncertainty for the analysis without cluster analysis was very small. This was mainly caused by the choice of the value

f = 0.014

(and

N = 40

,

K = 5000

), a very small value compared to the value set for the analysis with cluster analysis (with

f = 0.1

,

N = 20

,

K = 5000

). This small value of f contradicted in fact also the recommendation from Equation (7) with the risk to introduce some systematic errors in the computation. It was however required for insuring the convergence of the computation, which was otherwise impossible without cluster analysis. Like the previous example, the computation time without cluster analysis was in the best case about eight times longer than with the cluster analysis.

When the number of live points K was varied, keeping the other parameters fixed (Gaussian kernel with

D = 0.6, ℓ = 0.2, N_{t} = 100

and

N N_{t} = 3

), we could observe in Figure 6 (right) a similar tendency for the results as in the previous case. The estimated evidence accuracy was found to be proportional, as expected, to

1 / \sqrt{K}

(

δ (log E \propto K^{c}

with

c = - 0.48 \pm 0.08

). Here too,

δ (ln E)

was by a factor of 4.4–5.5 higher than the estimated accuracy. Because of the presence of less local minima than in the case of the four Gaussian peaks problem, the evaluated accuracy followed the proportionality to

1 / \sqrt{K}

down to

K = 100

. An almost linear dependency of the computation time on K was visible in this case too (CPU time

\propto K^{c}

with

c = 1.13 \pm 0.01

), with a significant deviation for

K = 10,000

due to the high cluster analysis requirements for high K.

These two examples show the general behavior of the cluster algorithm and its dependency on the parameters choice. However, each case can be different and the user should vary the different parameters to reach the required accuracy. A general and simple suggestion is to use a large number of live points to efficiently explore the whole parameter space. This is crucial when multiple local maxima of the likelihood function are present to avoid missing one of them. This is an important requirement even when a cluster analysis is available.

4. Conclusions

We present a new application of cluster recognition to a nested sampling algorithm for the evaluation of the Bayesian evidence and posterior parameter probability distributions. For this matter, we selected the method of the mean shift, a robust and simple classical cluster recognition method widely used in the machine learning community. This clustering algorithm proved itself well adapted to critical data analysis when several likelihood maxima are present. It has been implemented in the program NestedFit and tested with two different benchmark cases, proving its efficiency in exploring the parameter space without excluding any region. As a consequence, the computation time is reduced by a factor at least eight. At the same time, a smaller value of the estimated evidence uncertainty is obtained. As a result from study the dependency on the different algorithm parameters, a sufficiently high number of live point should be always used, even when the cluster analysis is implemented, to efficiently explore all local likelihood maxima. Moreover for a good efficiency of the mean shift cluster recognition, its typical parametric distances (D and ℓ, the maximal neighbours distance and the bandwidth of the Gaussian kernel) should neither be too small or too large. In one case very low accuracy, but fast computation is obtained, in the other case the computation time increases too much.

In this article we explore only the implementation of the mean shift algorithm for cluster recognition. In the future, we plan to explore other methods like the k-nearest neighbours and the x-means method, successfully used in other nested sampling codes, and compare NestedFit performances with these codes in benchmark cases.

Author Contributions

Conceptualization, methodology, investigation, formal analysis, data curation, software, validation, formal analysis, writing, visualisations, supervision, M.T.; investigation, methodology, validation with test softwares, P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

M.T. thanks the Alexander von Humboldt Foundation that provide the financing to attend the MaxEnt2019 conference. M.T. would like to express once again his deep gratitude to Leopold M. Simons who introduced the author to the Bayesian data analysis and without whom this work could not have been started. We thank D. Schury and E.B. for the carful reading of the manuscript. In addition, we would like to thank Lorenzo Paulatto for his help for the introduction of git and github for NestedFit future version development and sharing. The development of this program would not be possible without the close interactions and discussions with many collaborators that the author would like to thank as well R. Grisenti, A. Lévy, D. Gotta, Y. Litvinov, J. Machado, N. Paul and all members of the Pionic Hydrogen, FOCAL and GSI Oscillation collaborations and the ASUR group at INSP.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lewis, A.; Bridle, S. Cosmological parameters from CMB and other data: A Monte Carlo approach. Phys. Rev. D 2002, 66, 103511. [Google Scholar] [CrossRef] [Green Version]
Trotta, R. Bayes in the sky: Bayesian inference and model selection in cosmology. Contemp. Phys. 2008, 49, 71–104. [Google Scholar] [CrossRef] [Green Version]
Feroz, F.; Hobson, M.P.; Bridges, M. MultiNest: An efficient and robust Bayesian inference tool for cosmology and particle physics. Mon. Not. R. Astron. Soc. 2009, 398, 1601–1614. [Google Scholar] [CrossRef] [Green Version]
Xu, J.; van Dyk, D.A.; Kashyap, V.L.; Siemiginowska, A.; Connors, A.; Drake, J.; Meng, X.L.; Ratzlaff, P.; Yu, Y. A Fully Bayesian Method for Jointly Fitting Instrumental Calibration and X-ray Spectral Models. Astrophys. J. 2014, 794, 97. [Google Scholar] [CrossRef] [Green Version]
Yu, X.; Zanna, G.D.; Stenning, D.C.; Cisewski-Kehe, J.; Kashyap, V.L.; Stein, N.; van Dyk, D.A.; Warren, H.P.; Weber, M.A. Incorporating Uncertainties in Atomic Data into the Analysis of Solar and Stellar Observations: A Case Study in Fe xiii. Astrophys. J. 2018, 866, 146. [Google Scholar] [CrossRef] [Green Version]
Günther, M.N.; Pozuelos, F.J.; Dittmann, J.A.; Dragomir, D.; Kane, S.R.; Daylan, T.; Feinstein, A.D.; Huang, C.X.; Morton, T.D.; Bonfanti, A.; et al. A super-Earth and two sub-Neptunes transiting the nearby and quiet M dwarf TOI-270. Nat. Astron. 2019, 3, 1099–1108. [Google Scholar] [CrossRef]
Abbott, B.; Abbott, R.; Abbott, T.; Acernese, F.; Ackley, K.; Adams, C.; Adams, T.; Addesso, P.; Adhikari, R.; Adya, V.; et al. Properties of the Binary Neutron Star Merger GW170817. Phys. Rev. X 2019, 9, 011001. [Google Scholar] [CrossRef] [Green Version]
Abbott, B.; Abbott, R.; Abbott, T.; Abraham, S.; Acernese, F.; Ackley, K.; Adams, C.; Adhikari, R.; Adya, V.; Affeldt, C. GW190425: Observation of a Compact Binary Coalescence with Total Mass ∼3.4M_⊙. arXiv 2020, arXiv:2001.01761. [Google Scholar]
Particle Data Group. Review of Particle Physics. Phys. Rev. D 2018, 98, 030001. [Google Scholar] [CrossRef] [Green Version]
Langenberg, A.; Svensson, J.; Marchuk, O.; Fuchert, G.; Bozhenkov, S.; Damm, H.; Pasch, E.; Pavone, A.; Thomsen, H.; Pablant, N.A.; et al. Inference of temperature and density profiles via forward modeling of an X-ray imaging crystal spectrometer within the Minerva Bayesian analysis framework. Rev. Sci. Instrum. 2019, 90, 063505. [Google Scholar] [CrossRef] [Green Version]
Milhone, J.; Flanagan, K.; Nornberg, M.D.; Tabbutt, M.; Forest, C.B. A spectrometer for high-precision ion temperature and velocity measurements in low-temperature plasmas. Rev. Sci. Instrum. 2019, 90, 063502. [Google Scholar] [CrossRef]
Barber, D. Bayesian Reasoning and Machine Learning; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
von Toussaint, U. Bayesian inference in physics. Rev. Mod. Phys. 2011, 83, 943–999. [Google Scholar] [CrossRef] [Green Version]
von der Linden, W.; Dose, V.; von Toussaint, U. Bayesian Probability Theory: Applications in the Physical Sciences; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
King, G.; Lovell, A.; Neufcourt, L.; Nunes, F. Direct Comparison between Bayesian and Frequentist Uncertainty Quantification for Nuclear Reactions. Phys. Rev. Lett. 2019, 122, 232502. [Google Scholar] [CrossRef] [Green Version]
Ozturk, F.C.; Akkus, B.; Atanasov, D.; Beyer, H.; Bosch, F.; Boutin, D.; Brandau, C.; Bühler, P.; Cakirli, R.B.; Chen, R.J.; et al. New test of modulated electron capture decay of hydrogen-like 142Pm ions: Precision measurement of purely exponential decay. Phys. Lett. B 2019, 797, 134800. [Google Scholar] [CrossRef]
Stockton, J.K.; Wu, X.; Kasevich, M.A. Bayesian estimation of differential interferometer phase. Phys. Rev. A 2007, 76, 033613. [Google Scholar] [CrossRef]
Calonico, D.; Levi, F.; Lorini, L.; Mana, G. Bayesian inference of a negative quantity from positive measurement results. Metrologia 2009, 46, 267. [Google Scholar] [CrossRef]
Mooser, A.; Kracke, H.; Blaum, K.; Bräuninger, S.A.; Franke, K.; Leiteritz, C.; Quint, W.; Rodegheri, C.C.; Ulmer, S.; Walz, J. Resolution of Single Spin Flips of a Single Proton. Phys. Rev. Lett. 2013, 110, 140405. [Google Scholar] [CrossRef] [Green Version]
Covita, D.S.; Anagnostopoulos, D.F.; Fuhrmann, H.; Gorke, H.; Gotta, D.; Gruber, A.; Hirtl, A.; Ishiwatari, T.; Indelicato, P.; Jensen, T.S.; et al. Line shape analysis of the Kβ transition in muonic hydrogen. Eur. Phys. J. D 2018, 72, 72. [Google Scholar] [CrossRef] [Green Version]
Heim, P.; Rumetshofer, M.; Ranftl, S.; Thaler, B.; Ernst, W.E.; Koch, M.; von der Linden, W. Bayesian Analysis of Femtosecond Pump-Probe Photoelectron-Photoion Coincidence Spectra with Fluctuating Laser Intensities. Entropy 2019, 21, 93. [Google Scholar] [CrossRef] [Green Version]
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Lawrence, A. Probability in Physics; Springer: Cham, Switzerland, 2019. [Google Scholar]
Skilling, J. Nested Sampling. AIP Conf. Proc. 2004, 735, 395–405. [Google Scholar] [CrossRef]
Skilling, J. Nested sampling for general Bayesian computation. Bayesian Anal. 2006, 1, 833–859. [Google Scholar] [CrossRef]
Sivia, D.S.; Skilling, J. Data Analysis: A Bayesian Tutorial, 2nd ed.; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
Feroz, F.; Hobson, M.P. Multimodal nested sampling: An efficient and robust alternative to Markov Chain Monte Carlo methods for astronomical data analyses. Mon. Not. R. Astron. Soc. 2008, 384, 449–463. [Google Scholar] [CrossRef] [Green Version]
Corsaro, E.; Ridder, J.D. DIAMONDS: A new Bayesian nested sampling tool. Astron. Astrophys. 2014, 571, A71. [Google Scholar] [CrossRef] [Green Version]
Handley, W.J.; Hobson, M.P.; Lasenby, A.N. Polychord: Next-generation nested sampling. Mon. Not. R. Astron. Soc. 2015, 453, 4384–4398. [Google Scholar] [CrossRef] [Green Version]
Brewer, B.J.; Foreman-Mackey, D. DNest4: Diffusive Nested Sampling in C++ and Python. J. Stat. Softw. 2018, 86, 33. [Google Scholar] [CrossRef] [Green Version]
Speagle, J.S. Dynesty: A Dynamic Nested Sampling Package for Estimating Bayesian Posteriors and Evidences. arXiv 2019, arXiv:1904.02180. [Google Scholar] [CrossRef] [Green Version]
Murray, I.; MacKay, D.J.C.; Ghahramani, Z.; Skilling, J. Nested Sampling for Potts Models. In Advances in Neural Information Processing Systems; MIT Press: Vancouver, BC, Canada, 2006; Volume 18, pp. 947–954. [Google Scholar]
Nielsen, S.O. Nested sampling in the canonical ensemble: Direct calculation of the partition function from NVT trajectories. J. Chem. Phys. 2013, 139, 124104. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Baldock, R.J.N.; Bernstein, N.; Salerno, K.M.; Pártay, L.B.; Csányi, G. Constant-pressure nested sampling with atomistic dynamics. Phys. Rev. E 2017, 96, 043311. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bolhuis, P.G.; Csányi, G. Nested Transition Path Sampling. Phys. Rev. Lett. 2018, 120, 250601. [Google Scholar] [CrossRef] [Green Version]
Pártay, L.B.; Bartók, A.P.; Csányi, G. Efficient Sampling of Atomic Configurational Spaces. J. Phys. Chem. B 2010, 114, 10502–10512. [Google Scholar] [CrossRef] [Green Version]
Burkoff, N.; Várnai, C.; Wells, S.; Wild, D. Exploring the Energy Landscapes of Protein Folding Simulations with Bayesian Computation. Biophys. J. 2012, 102, 878–886. [Google Scholar] [CrossRef] [Green Version]
Pártay, L.B.; Bartók, A.P.; Csányi, G. Nested sampling for materials: The case of hard spheres. Phys. Rev. E 2014, 89, 022302. [Google Scholar] [CrossRef] [Green Version]
Trassinelli, M. Bayesian data analysis tools for atomic physics. Nucl. Instrum. Methods B 2017, 408, 301–312. [Google Scholar] [CrossRef] [Green Version]
Trassinelli, M. The Nested_fit Data Analysis Program. Proceedings 2019, 33, 14. [Google Scholar] [CrossRef] [Green Version]
Mukherjee, P.; Parkinson, D.; Liddle, A.R. A Nested Sampling Algorithm for Cosmological Model Selection. Astrophys. J. Lett. 2006, 638, L51. [Google Scholar] [CrossRef]
Veitch, J.; Vecchio, A. Bayesian coherent analysis of in-spiral gravitational wave signals with a detector network. Phys. Rev. D 2010, 81, 062003. [Google Scholar] [CrossRef] [Green Version]
Brewer, B.J.; Pártay, L.B.; Csányi, G. Diffusive nested sampling. Stat. Comput. 2011, 21, 649–656. [Google Scholar] [CrossRef] [Green Version]
Skilling, J. Galilean and Hamiltonian Monte Carlo. Proceedings 2019, 33, 19. [Google Scholar] [CrossRef] [Green Version]
Martiniani, S.; Stevenson, J.D.; Wales, D.J.; Frenkel, D. Superposition Enhanced Nested Sampling. Phys. Rev. X 2014, 4, 031034. [Google Scholar] [CrossRef] [Green Version]
Bruchner, J. A statistical test for nested sampling algorithms. Stat. Comput. 2016, 26, 383–392. [Google Scholar] [CrossRef] [Green Version]
Trassinelli, M.; Anagnostopoulos, D.F.; Borchert, G.; Dax, A.; Egger, J.P.; Gotta, D.; Hennebach, M.; Indelicato, P.; Liu, Y.W.; Manil, B.; et al. Measurement of the charged pion mass using X-ray spectroscopy of exotic atoms. Phys. Lett. B 2016, 759, 583–588. [Google Scholar] [CrossRef]
Trassinelli, M.; Anagnostopoulos, D.; Borchert, G.; Dax, A.; Egger, J.P.; Gotta, D.; Hennebach, M.; Indelicato, P.; Liu, Y.W.; Manil, B.; et al. Measurement of the charged pion mass using a low-density target of light atoms. EPJ Web Conf. 2016, 130, 01022. [Google Scholar] [CrossRef] [Green Version]
Papagiannouli, I.; Patanen, M.; Blanchet, V.; Bozek, J.D.; de Anda Villa, M.; Huttula, M.; Kokkonen, E.; Lamour, E.; Mevel, E.; Pelimanni, E.; et al. Depth Profiling of the Chemical Composition of Free-Standing Carbon Dots Using X-ray Photoelectron Spectroscopy. J. Phys. Chem. A 2018, 122, 14889–14897. [Google Scholar] [CrossRef] [Green Version]
Villa, M.D.A.; Gaudin, J.; Amans, D.; Boudjada, F.; Bozek, J.; Grisenti, R.E.; Lamour, E.; Laurens, G.; Macé, S.; Nicolas, C.; et al. Assessing the Surface Oxidation State of Free-Standing Gold Nanoparticles Produced by Laser Ablation. Langmuir 2019, 35, 11859–11871. [Google Scholar] [CrossRef] [PubMed]
Dierckx, P. Curve and Surface Fitting with Splines; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
Skilling, J. Nested Sampling’s Convergence. AIP Conf. Proc. 2009, 1193, 277–291. [Google Scholar] [CrossRef]
Chopin, N.; Robert, C.P. Properties of nested sampling. Biometrika 2010, 97, 741–755. [Google Scholar] [CrossRef] [Green Version]
Theisen, M. Analyse der Linienform von Röntgenübergängen Nach der Bayesmethode. Master’s Thesis, Faculty of Mathematics, Computer Science and Natural Sciences, RWTH Aachen University, Aachen, Germany, 2013. [Google Scholar]
Fukunaga, K.; Hostetler, L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory 1975, 21, 32–40. [Google Scholar] [CrossRef] [Green Version]
Cheng, Y. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern. Anal. 1995, 17, 790–799. [Google Scholar] [CrossRef] [Green Version]
Trassinelli, M.; Kumar, A.; Beyer, H.F.; Indelicato, P.; Märtin, R.; Reuschl, R.; Kozhedub, Y.S.; Brandau, C.; Bräuning, H.; Geyer, S.; et al. Observation of the 2p_3/2→2s_1/2 intra-shell transition in He-like uranium. Eur. Phys. Lett. 2009, 87, 63001. [Google Scholar] [CrossRef] [Green Version]
Gordon, C.; Trotta, R. Bayesian calibrated significance levels applied to the spectral tilt and hemispherical asymmetry. Mon. Not. R. Astron. Soc. 2007, 382, 1859–1863. [Google Scholar] [CrossRef]

Figure 1. Graphical presentation of the different search algorithms discussed in the text. (a): Exploration of the parameter volume via the lawn mower robot for finding a new live point. (b): Search of a new live point from the parameter set

a_{n}

outside the limit

L (a) > L_{m}

and the barycenter of the current live points. (c): Construction of the new live point from different coordinates of the current live points.

Figure 1. Graphical presentation of the different search algorithms discussed in the text. (a): Exploration of the parameter volume via the lawn mower robot for finding a new live point. (b): Search of a new live point from the parameter set

a_{n}

outside the limit

L (a) > L_{m}

and the barycenter of the current live points. (c): Construction of the new live point from different coordinates of the current live points.

Figure 2. Data corresponding to the high-resolution X-ray spectrum of the helium-like uranium

1 s 2 p^{3} P_{2} \to 1 s 2 s^{3} S_{1}

intrashell transition obtained by Bragg diffraction from a curved crystal [57] (left) and of the single decay of H-like

_{61}^{142}

Pm ions to the stable

_{60}^{142}

Nd bare nucleus via electron capture [16] (right).

Figure 2. Data corresponding to the high-resolution X-ray spectrum of the helium-like uranium

1 s 2 p^{3} P_{2} \to 1 s 2 s^{3} S_{1}

intrashell transition obtained by Bragg diffraction from a curved crystal [57] (left) and of the single decay of H-like

_{61}^{142}

Pm ions to the stable

_{60}^{142}

Nd bare nucleus via electron capture [16] (right).

Figure 3. Evolution of one of the components of the discarded live points

{\tilde{a}}_{m}

relative to the position of one of the four considered Gaussian peaks (see text) as function of the nested sampling step and for ten different choices starting live points. Results relative to the analysis without (top) and with the cluster analysis (bottom).

Figure 3. Evolution of one of the components of the discarded live points

{\tilde{a}}_{m}

relative to the position of one of the four considered Gaussian peaks (see text) as function of the nested sampling step and for ten different choices starting live points. Results relative to the analysis without (top) and with the cluster analysis (bottom).

Figure 4. Results of the cluster recognition corresponding to the analysis of four Gaussian peaks. The position of three peaks are represented. Different colors represent different identified clusters. In black, the projection to some planes are represented. The 24 likelihood maxima (corresponding to the 4! permutation of the position of four peaks) are well visible.

Figure 5. Joint probability distribution of the position and amplitude of one of the four considered peaks obtained without (left) and with cluster analysis (right).

Figure 6. (Top) evaluation of the logarithm of the evidence for different number of live points K for the four gaussian peaks analysis (left) and the modulated exponential decay (right). In blue are indicated the uncertainty values evaluated by the results of 16 different run for each case. In black the theoretical uncertainty

\sqrt{H / K}

estimated from the information gain

H

. (Bottom) dependency of the evaluated uncertainty and CPU time with K. The dashed lines are the fits with power laws, which results are also shown. Data relative to

K < 500

and

K > 5000

are excluded for the fit of

log E

uncertainty and CPU time, respectively.

Figure 6. (Top) evaluation of the logarithm of the evidence for different number of live points K for the four gaussian peaks analysis (left) and the modulated exponential decay (right). In blue are indicated the uncertainty values evaluated by the results of 16 different run for each case. In black the theoretical uncertainty

\sqrt{H / K}

estimated from the information gain

H

. (Bottom) dependency of the evaluated uncertainty and CPU time with K. The dashed lines are the fits with power laws, which results are also shown. Data relative to

K < 500

and

K > 5000

are excluded for the fit of

log E

uncertainty and CPU time, respectively.

Figure 7. Values of

log E

and CPU time for different choices of parameter values of the cluster recognition algorithm. Uncertainties of the evidence relative to the labels ‘f 0.7’ and ‘g 0.8 0.3’ are not evaluated because of the large computation time corresponding to these cases.

Figure 7. Values of

log E

and CPU time for different choices of parameter values of the cluster recognition algorithm. Uncertainties of the evidence relative to the labels ‘f 0.7’ and ‘g 0.8 0.3’ are not evaluated because of the large computation time corresponding to these cases.

Figure 8. Results of the cluster recognition corresponding to the analysis of the modulation of the exponential decay. The relative amplitude, pulsation and phase are represented. Different colors represent different identified clusters. In black, the projection to some planes are represented.

Figure 9. Evolution of one of the components of the discarded live points

{\tilde{a}}_{m}

relative to the pulsation

ω

of the modulation of the single ion exponential decay (see text) as function of the nested sampling step with cluster analysis and for two different starting live points selections.

Figure 9. Evolution of one of the components of the discarded live points

{\tilde{a}}_{m}

relative to the pulsation

ω

of the modulation of the single ion exponential decay (see text) as function of the nested sampling step with cluster analysis and for two different starting live points selections.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Trassinelli, M.; Ciccodicola, P. Mean Shift Cluster Recognition Method Implementation in the Nested Sampling Algorithm. Entropy 2020, 22, 185. https://doi.org/10.3390/e22020185

AMA Style

Trassinelli M, Ciccodicola P. Mean Shift Cluster Recognition Method Implementation in the Nested Sampling Algorithm. Entropy. 2020; 22(2):185. https://doi.org/10.3390/e22020185

Chicago/Turabian Style

Trassinelli, Martino, and Pierre Ciccodicola. 2020. "Mean Shift Cluster Recognition Method Implementation in the Nested Sampling Algorithm" Entropy 22, no. 2: 185. https://doi.org/10.3390/e22020185

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mean Shift Cluster Recognition Method Implementation in the Nested Sampling Algorithm

Abstract

1. Introduction

2. Nested Sampling and NestedFit

2.1. The Nested Sampling Algorithm

2.2. Bottleneck of Nested Sampling and Proposed Solutions

2.3. The NestedFit Program

2.4. NenstedFit Search Algorithm

3. Mean Shift Clustering Algorithm and Its Implementation

3.1. Preliminary Tests and Considerations on Other Cluster Recognition Algoritms

3.2. The Mean Shift Algorithm for Cluster Recognition

3.3. Mean Shift Implementation in NestedFit

4. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI