# Improving Clustering Accuracy of K-Means and Random Swap by an Evolutionary Technique Based on Careful Seeding

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- A more complete description of the evolutionary approach, which is the basis of the proposed clustering algorithms, is provided.
- PB-KM now includes a mutation operation in the second step of recombination.
- More details about the Java implementations are furnished.
- All previous execution experiments were reworked, and new challenging case studies were added to the experimental framework, exploiting synthetic (benchmark) and real-world datasets.

## 2. Related Work

#### 2.1. Lloyd’s K-Means

Algorithm 1. The Lloyd’s K-Means |

Input: the dataset $X$ and the number of clusters $K$.Output: final centroids and corresponding partitions.1. Initialization. Use some seeding method (e.g., uniform random) to choose $K$ data in $X$ as initial centroids. 2. Partitioning. Assign data points of $X$ to clusters according to the nearest centroid rule. 3. Update. Redefine centroids as the mean points of the clusters resulting from step 2. 4. Check termination. If the termination condition does not hold, repeat from 2. |

#### 2.2. The Random Swap Clustering Algorithm

#### 2.3. Centroids Initialization Methods

Algorithm 2. The K-Means++ seeding method. |

1. Establish the first centroid through a uniform random selection: |

${\mu}_{1}\leftarrow {x}_{j},j\leftarrow unif\_rand(1..N),L\leftarrow 1$ |

2. For each point ${x}_{i},$ define the probability $\pi \left({x}_{i}\right)$ of being chosen as the next centroid as: |

$\pi \left({x}_{i}\right)=\frac{{D\left({x}_{i}\right)}^{2}}{{\sum}_{j=1}^{N}{{D(x}_{j})}^{2}}$ |

Use a random switch based on the newly computed values of $\{\pi \left({x}_{i}\right){\}}_{i=1}^{N}$, for choosing a point ${x}^{*}\in X$, not previously selected, as the next centroid |

$L\leftarrow L+1,{\mu}_{L}\leftarrow {x}^{*}$ |

3. If $L<K$, repeat from step 2. |

Algorithm 3. The Greedy_K-Means++ (GKM++) seeding method. |

${\mu}_{1}\leftarrow {x}_{j},j\leftarrow unif\_rand(1..N)$, $L\leftarrow 1$ $do\{$ $\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}\mathrm{B}\mathrm{e}\mathrm{s}\mathrm{t}\leftarrow \mathrm{\infty}$ $\mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{B}\mathrm{e}\mathrm{s}\mathrm{t}\leftarrow ?$ $repeatStimes\{$ $\mathrm{s}\mathrm{e}\mathrm{l}\mathrm{e}\mathrm{c}\mathrm{t}$ a point ${x}^{*}\in X$ as candidate centroid, using the K-Means++ method $\mathrm{p}\mathrm{a}\mathrm{r}\mathrm{t}\mathrm{i}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}$ $X$ according to $\left\{{\mu}_{1},{\mu}_{2},\dots ,{\mu}_{L},{x}^{*}\right\},$that is assign points to clusters according to the $nc\left(.\right)$ function $\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}\leftarrow SSE\left(\right)$ $if(\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}<\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}\mathrm{B}\mathrm{e}\mathrm{s}\mathrm{t})\{$ $\mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{B}\mathrm{e}\mathrm{s}\mathrm{t}\leftarrow {x}^{*}$ $\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}\mathrm{B}\mathrm{e}\mathrm{s}\mathrm{t}\leftarrow \mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}$ $\}$ $\}$ $L\leftarrow L+1$ ${\mu}_{L}\leftarrow \mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{B}\mathrm{e}\mathrm{s}\mathrm{t}$ $\}while(LK)$ |

#### 2.4. Evolutionary Algorithm Concepts

#### 2.4.1. GA-K-Means

#### 2.4.2. Concepts of Recombinator K-Means

#### 2.5. External Measures of Clustering Accuracy

## 3. Population-Based Clustering Algorithms

#### 3.1. PB-KM

Algorithm 4. The PB-KM operation. |

1. Setup population $\wp \leftarrow \varnothing $ repeat $J$times{ costBest←∞, candBest←? repeat ${R}_{1}$ times{ cand←run(K-Means,GKM++,$X$) cost←SSE(cand,$X$) if(cost<costBest){ costBest←cost candBest←cand } } $\wp =\wp \cup $ {candBest} $\}$ 2. Recombination costBest←∞ candBest←? repeat ${R}_{2}$ times{ cand←run(K-Means,GKM++,$\wp $) cost←SSE(cand,$X$) if(cost<costBest){ costBest←cost candBest←cand replace in $\wp $ the GKM++ selected centroids by cand centroids } check candBest accuracy by clustering indexes } |

#### 3.2. PB-RS

Algorithm 5. The PB-RS recombination step. |

cand←GKM++($\wp $) partition $X$ data points according to cand cost←SSE($X$) repeat $T$ times{ save cand cand’←swap(cand), that is: cs←pj, pj$\in \wp $, s←unif_rand(1.. K), j←unif_rand($1..\mathrm{J}\ast \mathrm{K})$ refine cand’ by a few K-Means iterations (e.g., 5) new_cost←SSE(cand’$,X$) if(new_cost<cost){ accept cand’, cand←cand’ cost←new_cost } else{ restore saved cand and its previous partitioning } } check the accuracy of candBest by further clustering indexes. |

## 4. JAVA Implementation Notes

Algorithm 6. Code fragment of K-Means++/Greedy_K-Means++ operating on a source of data points. |

…final int l=L;//turn L into a final variable lStream<DataPoint> pStream= (PARALLEL) ? Arrays.stream(source).parallel(): Arrays.stream(source); DataPoint ssd=pStream//sum of squared distances .map(p->{ p.setDist(Double. MAX_VALUE);for(int k=0; k<l; ++k) {//existing centroidsdouble d=p.distance(centroids[k]);if(d<p.getDist()) p.setDist(d); } return p; }).reduce( new DataPoint(), DataPoint::add2Dist, DataPoint::add2DistCombiner); double denP=ssd.getDist(); //common denominator of points probability …//random switch … |

Algorithm 7. Java function which calculates the $SSE$ the cost of a given partitioning. |

Stream<DataPoint> pStream= (PARALLEL) ? Stream.of(dataset).parallel(): Stream.of (dataset); DataPoint s=pStream .map(p ->{ int k=p.getCID();//retrieve partition label (centroid index) of pdouble d=p.distance(centroids[k]);p.setDist(d*d);//store locally to p the squared distance of p to its (nearest) centroid return p;} ) .reduce( new DataPoint(), (p1,p2)->{ DataPoint ps= new DataPoint(); ps.setDist(p1.getDist()+p2.getDist());return ps; } ); return s.getDist(); |

## 5. Experimental Framework

^{5}2-dimensional points distributed into 100 clusters. In particular, $\mathrm{B}\mathrm{i}\mathrm{r}\mathrm{c}\mathrm{h}1$ places its clusters on a 10 × 10 grid. $\mathrm{B}\mathrm{i}\mathrm{r}\mathrm{c}\mathrm{h}2$, instead, puts the clusters on a sine curve. $\mathrm{B}\mathrm{i}\mathrm{r}\mathrm{c}\mathrm{h}1$ and $\mathrm{B}\mathrm{i}\mathrm{r}\mathrm{c}\mathrm{h}2$ have spherical clusters of the same size.

#### 5.1. Clustering the A3 Dataset

^{KM++}) and Greedy K-Means++ $\left({\mathrm{R}\mathrm{K}\mathrm{M}}^{\mathrm{G}\mathrm{K}\mathrm{M}++}\right)$ seeding procedure.

^{4}repetitions of K-Means were executed and the following quantities monitored: (a) the minimal value of the $\mathrm{S}\mathrm{S}\mathrm{E}$ cost $\left({\mathrm{S}\mathrm{S}\mathrm{E}}_{\mathrm{m}\mathrm{i}\mathrm{n}}\right),$ (b) the corresponding Cluster Index $\left(\mathrm{C}\mathrm{I}\right)$ value (see Section 2.5) (${\mathrm{C}\mathrm{I}}_{\mathrm{m}\mathrm{i}\mathrm{n}\left(\mathrm{S}\mathrm{S}\mathrm{E}\right)}$), (c) the minimal value of the observed $\mathrm{C}\mathrm{I}$ (${\mathrm{C}\mathrm{I}}_{\mathrm{m}\mathrm{i}\mathrm{n}}$) and the corresponding value of the $\mathrm{S}\mathrm{S}\mathrm{E}$ cost (${\mathrm{S}\mathrm{S}\mathrm{E}}_{\mathrm{m}\mathrm{i}\mathrm{n}\left(\mathrm{C}\mathrm{I}\right)}$), (d) the emerging average $\mathrm{C}\mathrm{I}$ value ($\mathrm{a}\mathrm{v}\mathrm{g}\_\mathrm{C}\mathrm{I}$) and (e) the $\mathrm{s}\mathrm{u}\mathrm{c}\mathrm{c}\mathrm{e}\mathrm{s}\mathrm{s}\_\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{e}$, that is, the number of runs which ended with a $\mathrm{C}\mathrm{I}=0$, divided by 10

^{4}. In addition, the Parallel Execution Time ($\mathrm{P}\mathrm{E}\mathrm{T}$), in sec, needed by Repeated K-Means to complete its runs was also observed. Table 6 collects all the achieved results.

#### 5.2. First Group of Synthetic Datasets (Table 2)

#### 5.3. Second Group of Real-World Datasets (Table 3)

#### 5.4. Third Group of Synthetic Datasets (Table 4)

#### 5.5. Fourth Group of Real-World Datasets (Table 5)

#### 5.6. Time Efficiency of PB-KM

## 6. Conclusions

## References

**Figure 4.**$\mathrm{S}\mathrm{S}\mathrm{E}$ cost vs. time for the $\mathrm{B}\mathrm{i}\mathrm{r}\mathrm{c}\mathrm{h}3$ dataset.

**Figure 5.**$\mathrm{C}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{r}\mathrm{o}\mathrm{i}\mathrm{d}\mathrm{I}\mathrm{n}\mathrm{d}\mathrm{e}\mathrm{x}\left(\mathrm{C}\mathrm{I}\right)$ vs. time for the $\mathrm{B}\mathrm{i}\mathrm{r}\mathrm{c}\mathrm{h}3$ dataset.

**Figure 6.**$\mathrm{S}\mathrm{S}\mathrm{E}$ cost vs. time for the $\mathrm{W}\mathrm{o}\mathrm{r}\mathrm{m}\mathrm{s}\_2\mathrm{d}$ dataset.

**Figure 7.**$\left(\mathrm{G}\mathrm{e}\mathrm{n}\mathrm{e}\mathrm{r}\mathrm{a}\mathrm{l}\mathrm{i}\mathrm{z}\mathrm{e}\mathrm{d}\right)\mathrm{C}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{r}\mathrm{o}\mathrm{i}\mathrm{d}\mathrm{I}\mathrm{n}\mathrm{d}\mathrm{e}\mathrm{x}\left(\mathrm{C}\mathrm{I}\right)$ vs. time for the $\mathrm{W}\mathrm{o}\mathrm{r}\mathrm{m}\mathrm{s}\_2\mathrm{d}$ dataset.

**Figure 8.**$\mathrm{S}\mathrm{S}\mathrm{E}$ cost vs. time for the $\mathrm{W}\mathrm{o}\mathrm{r}\mathrm{m}\mathrm{s}\_64\mathrm{d}$ dataset.

**Figure 9.**$\left(\mathrm{G}\mathrm{e}\mathrm{n}\mathrm{e}\mathrm{r}\mathrm{a}\mathrm{l}\mathrm{i}\mathrm{z}\mathrm{e}\mathrm{d}\right)\mathrm{C}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{r}\mathrm{o}\mathrm{i}\mathrm{d}\mathrm{I}\mathrm{n}\mathrm{d}\mathrm{e}\mathrm{x}\left(\mathrm{C}\mathrm{I}\right)$ vs. time for the $\mathrm{W}\mathrm{o}\mathrm{r}\mathrm{m}\mathrm{s}\_64\mathrm{d}$ dataset.

**Figure 10.**$\mathrm{S}\mathrm{S}\mathrm{E}$ vs. time for the $\mathrm{B}\mathrm{r}\mathrm{i}\mathrm{d}\mathrm{g}\mathrm{e}$ dataset.

**Figure 11.**$\mathrm{S}\mathrm{S}\mathrm{E}$ vs. time for the $\mathrm{M}\mathrm{i}\mathrm{s}\mathrm{s}\mathrm{A}\mathrm{m}\mathrm{e}\mathrm{r}\mathrm{i}\mathrm{c}\mathrm{a}$ dataset.

**Figure 12.**$\mathrm{S}\mathrm{S}\mathrm{E}$ vs. time for the $\mathrm{H}\mathrm{o}\mathrm{u}\mathrm{s}\mathrm{e}$ dataset.

**Figure 13.**$\mathrm{S}\mathrm{S}\mathrm{E}$ vs. time for the $\mathrm{O}\mathrm{l}\mathrm{i}\mathrm{v}\mathrm{e}\mathrm{t}\mathrm{t}\mathrm{i}$ dataset.

**Figure 14.**$\mathrm{C}\mathrm{I}$ vs. time for the $\mathrm{O}\mathrm{l}\mathrm{i}\mathrm{v}\mathrm{e}\mathrm{t}\mathrm{t}\mathrm{i}$ dataset.

**Figure 15.**$\mathrm{S}\mathrm{S}\mathrm{E}$ vs. time for the $\mathrm{U}\mathrm{r}\mathrm{b}\mathrm{a}\mathrm{n}\mathrm{G}\mathrm{B}$ dataset.

**Figure 16.**$\mathrm{C}\mathrm{I}$ vs. time for the $\mathrm{U}\mathrm{r}\mathrm{b}\mathrm{a}\mathrm{n}\mathrm{G}\mathrm{B}$ dataset.

Symbol | Description |
---|---|

$N$ | $\mathrm{number}\mathrm{of}\mathrm{data}\mathrm{points}\left(\mathrm{vectors}\right){x}_{i}$$\mathrm{in}\mathrm{the}\mathrm{dataset}X$ |

$D$ | number of dimensions (coordinates or features) for each data point |

$K$ | number of clusters/centroids |

$d({x}_{i},{x}_{j})$ | $\mathrm{Euclidean}\mathrm{distance}\mathrm{between}\mathrm{data}\mathrm{points}{x}_{i}$$\mathrm{and}{x}_{j}$ |

${C}_{1}\dots {C}_{K}$ | partition clusters |

${\mu}_{1}\dots {\mu}_{K}$ | representative centroids of clusters |

$nc\left({x}_{i}\right)$ | $\mathrm{nearest}\mathrm{centroid}\mathrm{to}\mathrm{data}\mathrm{point}{x}_{i}$ |

$L$ | the number of currently defined centroids in a seeding method |

$D\left({x}_{i}\right)$ | $\mathrm{minimal}\mathrm{distance}\mathrm{of}{x}_{i}$ to the currently existing centroids |

$SSE$ | Sum-of-Squared Errors objective function |

$nMSE$ | $\mathrm{normalized}\mathrm{mean}\mathrm{of}SSE$, also referred to as distortion |

$Unif$ | uniform random seeding method |

$KM$++ | K-Means++ seeding method |

$GKM$++ | Greedy-K-Means++ seeding method |

$S$ | number of attempts in GKM++ for identifying the next centroid |

$CI$ | Cluster Index—an external measure of clustering accuracy |

<${C}^{j},{P}^{j}$> | a solution of a clustering algorithm, i.e., a pair of a centroids vector and corresponding partition labels of clusters belonging data points |

$PB-KM$ | proposed Population-Based K-Means clustering algorithm |

$PB-RS$ | proposed Population-Based Random Swap clustering algorithm |

℘ | $\mathrm{population}\mathrm{of}J\ast K$ centroids in PB-KM/PB-RS algorithms |

$J$ | number of “best” solutions initially put in the ℘ |

${R}_{1}$ | number of repetitions of K-Means in the 1st step of PB-KM |

${R}_{2}$ | number of repetitions of K-Means in the 2nd step of PB-KM |

$T$ | number of iterations of Random Swap in the 1st step of PB-RS, for defining each of the J candidate population solutions; also the number of iterations of Random Swap in the 2nd step of PB-RS for achieving a careful solution |

**Table 2.**The first group of synthetic datasets [24].

$\mathbf{D}\mathbf{a}\mathbf{t}\mathbf{a}\mathbf{s}\mathbf{e}\mathbf{t}$ | $\mathbf{N}$ | $\mathbf{D}$ | $\mathbf{K}$ |
---|---|---|---|

$A3$ | $7500$ | $2$ | $50$ |

$S3$ | $5000$ | $2$ | $15$ |

$Dim1024$ | $1024$ | $1024$ | $16$ |

$Unbalance$ | $6500$ | $2$ | $8$ |

$Birch1/2$ | 100,000 | $2$ | $100$ |

$\mathbf{D}\mathbf{a}\mathbf{t}\mathbf{a}\mathbf{s}\mathbf{e}\mathbf{t}$ | $\mathbf{N}$ | $\mathbf{D}$ | $\mathbf{K}$ |
---|---|---|---|

$Musk$ | $6598$ | $166$ | $2$ |

$MiniBooNE$ | 130,064 | $50$ | $2$ |

**Table 4.**The third group of synthetic datasets [24].

$\mathbf{D}\mathbf{a}\mathbf{t}\mathbf{a}\mathbf{s}\mathbf{e}\mathbf{t}$ | $\mathbf{N}$ | $\mathbf{D}$ | $\mathbf{K}$ |
---|---|---|---|

$Birch3$ | 100,000 | $2$ | 100 |

$Worms\_2d$ | 105,600 | $2$ | 35 |

$Worms\_64d$ | 105,000 | $64$ | $25$ |

$\mathbf{D}\mathbf{a}\mathbf{t}\mathbf{a}\mathbf{s}\mathbf{e}\mathbf{t}$ | $\mathbf{N}$ | $\mathbf{D}$ | $\mathbf{K}$ |
---|---|---|---|

$Bridge$ | $4096$ | $16$ | $256$ |

$House$ | 34,112 | $3$ | $256$ |

$MissAmerica$ | $6480$ | $16$ | $256$ |

$Olivetti$ | $400$ | $4096$ | $40$ |

$UrbanGB$ | 360,177 | $2$ | $469$ |

${\mathbf{R}\mathbf{K}\mathbf{M}}^{\mathbf{U}\mathbf{n}\mathbf{i}\mathbf{f}}$ | ${\mathbf{R}\mathbf{K}\mathbf{M}}^{\mathbf{K}\mathbf{M}++}$ | ${\mathbf{R}\mathbf{K}\mathbf{M}}^{\mathbf{G}\mathbf{K}\mathbf{M}++}$ | |
---|---|---|---|

${\mathrm{S}\mathrm{S}\mathrm{E}}_{\mathrm{m}\mathrm{i}\mathrm{n}}$ | $7.44$ | $6.74$ | $6.74$ |

${\mathrm{C}\mathrm{I}}_{\mathrm{m}\mathrm{i}\mathrm{n}\left(\mathrm{S}\mathrm{S}\mathrm{E}\right)}$ | $1$ | $0$ | $0$ |

${\mathrm{C}\mathrm{I}}_{\mathrm{m}\mathrm{i}\mathrm{n}}$ | $1$ | $0$ | $0$ |

${\mathrm{S}\mathrm{S}\mathrm{E}}_{\mathrm{m}\mathrm{i}\mathrm{n}\left(\mathrm{C}\mathrm{I}\right)}$ | $7.44$ | $6.74$ | $6.74$ |

$\mathrm{a}\mathrm{v}\mathrm{g}\_\mathrm{C}\mathrm{I}$ | $6.58$ | $4.17$ | $1.62$ |

$\mathrm{s}\mathrm{u}\mathrm{c}\mathrm{c}\mathrm{e}\mathrm{s}\mathrm{s}\_\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{e}$ | $0\mathrm{\%}$ | $0.01\mathrm{\%}$ | $5.8\mathrm{\%}$ |

$\mathrm{P}\mathrm{E}\mathrm{T}\left(\mathrm{s}\right)$ | $103$ | $154$ | $990$ |

$\mathbf{P}\mathbf{B}-\mathbf{K}\mathbf{M}$$(\mathbf{J}=25,{\mathbf{R}}_{1}=3),{\mathbf{R}}_{2}=40$ | |
---|---|

${\mathrm{P}\mathrm{E}\mathrm{T}}_{1}\left(\mathrm{s}\right)$ | $6.4$ |

${\mathrm{S}\mathrm{S}\mathrm{E}}_{\mathrm{m}\mathrm{i}\mathrm{n}}$ | $6.74$ |

${\mathrm{C}\mathrm{I}}_{\mathrm{m}\mathrm{i}\mathrm{n}\left(\mathrm{S}\mathrm{S}\mathrm{E}\right)}$ | $0$ |

${\mathrm{C}\mathrm{I}}_{\mathrm{m}\mathrm{i}\mathrm{n}}$ | $0$ |

${\mathrm{S}\mathrm{S}\mathrm{E}}_{\mathrm{m}\mathrm{i}\mathrm{n}\left(\mathrm{C}\mathrm{I}\right)}$ | $6.74$ |

$\mathrm{a}\mathrm{v}\mathrm{g}\_\mathrm{C}\mathrm{I}$ | $0$ |

$\mathrm{s}\mathrm{u}\mathrm{c}\mathrm{c}\mathrm{e}\mathrm{s}\mathrm{s}\_\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{e}$ | $100\%$ |

${\mathrm{P}\mathrm{E}\mathrm{T}}_{2}\left(\mathrm{s}\right)$ | $2.4$ |

**Table 8.**PB-KM results on the synthetic datasets of Table 2.

$\mathbf{D}\mathbf{a}\mathbf{t}\mathbf{a}\mathbf{s}\mathbf{e}\mathbf{t}$ | $\mathbf{m}\mathbf{i}\mathbf{n}\left(\mathbf{S}\mathbf{S}\mathbf{E}\right)$ | ${\mathbf{C}\mathbf{I}}_{\mathbf{m}\mathbf{i}\mathbf{n}\left(\mathbf{S}\mathbf{S}\mathbf{E}\right)}$ | $\mathbf{a}\mathbf{v}\mathbf{g}\_\mathbf{C}\mathbf{I}$ | $\mathbf{S}\mathbf{u}\mathbf{c}\mathbf{c}\mathbf{e}\mathbf{s}\mathbf{s}\_\mathbf{R}\mathbf{a}\mathbf{t}\mathbf{e}$ | ${\mathbf{P}\mathbf{E}\mathbf{T}}_{1}\left(\mathbf{s}\right)$ | ${\mathbf{P}\mathbf{E}\mathbf{T}}_{2}\left(\mathbf{s}\right)$ |
---|---|---|---|---|---|---|

$A3$ | 6.74 | 0 | 0 | 100% | 6.4 | 2.4 |

$S3$ | 18.82 | 0 | 0 | 100% | 1.1 | 0.5 |

$Dim1024$ | 5.39 | 0 | 0 | 100% | 9.4 | 4.0 |

$Unbalance$ | 0.65 | 0 | 0 | 100% | 0.6 | 0.3 |

$Birch1$ | 92.77 | 0 | 0 | 100% | 277.3 | 96.6 |

$Birch2$ | 0.46 | 0 | 0 | 100% | 242.2 | 99.0 |

**Table 9.**PB-KM results on the real-world datasets of Table 3.

$\mathbf{D}\mathbf{a}\mathbf{t}\mathbf{a}\mathbf{s}\mathbf{e}\mathbf{t}$ | $\mathbf{m}\mathbf{i}\mathbf{n}\left(\mathbf{S}\mathbf{S}\mathbf{E}\right)$ | ${\mathbf{C}\mathbf{I}}_{\mathbf{m}\mathbf{i}\mathbf{n}\left(\mathbf{S}\mathbf{S}\mathbf{E}\right)}$ | $\mathbf{a}\mathbf{v}\mathbf{g}\_\mathbf{C}\mathbf{I}$ | $\mathbf{S}\mathbf{u}\mathbf{c}\mathbf{c}\mathbf{e}\mathbf{s}\mathbf{s}\_\mathbf{R}\mathbf{a}\mathbf{t}\mathbf{e}$ | ${\mathbf{P}\mathbf{E}\mathbf{T}}_{1}\left(\mathbf{s}\right)$ | ${\mathbf{P}\mathbf{E}\mathbf{T}}_{2}\left(\mathbf{s}\right)$ |
---|---|---|---|---|---|---|

$Musk$ | 36,373 | 0 | 0 | 100% | 0.5 | 0.1 |

$MiniBooNE$ | 2802 | 0 | 0 | 100% | 5.3 | 0.8 |

**Table 10.**The sequential and parallel execution of PB-KM recombination on ${\mathrm{W}\mathrm{o}\mathrm{r}\mathrm{m}\mathrm{s}}_{64\mathrm{d}}$. (8 physical cores).

$\mathbf{W}\mathbf{o}\mathbf{r}\mathbf{m}\mathbf{s}\_64\mathbf{d}$ | $\mathbf{PB}-\mathbf{KM},2\mathbf{nd}\mathbf{Step},\mathbf{J}=40$$,{\mathbf{R}}_{2}=100$ |
---|---|

${\mathrm{t}\mathrm{E}\mathrm{T}}^{S}$ (ms) | 4,405,325 |

${\mathrm{t}\mathrm{I}\mathrm{T}}^{S}$ | 15,049 |

${\mathrm{t}\mathrm{E}\mathrm{T}}^{P}$(ms) | 650,622 |

${\mathrm{t}\mathrm{I}\mathrm{T}}^{P}$ | 14,887 |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

