Next Article in Journal
Constructing the Neighborhood Structure of VNS Based on Binomial Distribution for Solving QUBO Problems
Next Article in Special Issue
Mixed Alternating Projections with Application to Hankel Low-Rank Approximation
Previous Article in Journal
A Missing Data Reconstruction Method Using an Accelerated Least-Squares Approximation with Randomized SVD
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Clustering Algorithm with a Greedy Agglomerative Heuristic and Special Distance Measures

1
Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy Av., 660037 Krasnoyarsk, Russia
2
Institute of Business Process Management, Siberian Federal University, 79 Svobodny Av., 660041 Krasnoyarsk, Russia
*
Author to whom correspondence should be addressed.
Algorithms 2022, 15(6), 191; https://doi.org/10.3390/a15060191
Submission received: 30 April 2022 / Revised: 26 May 2022 / Accepted: 30 May 2022 / Published: 1 June 2022
(This article belongs to the Collection Feature Papers in Algorithms)

Abstract

:
Automatic grouping (clustering) involves dividing a set of objects into subsets (groups) so that the objects from one subset are more similar to each other than to the objects from other subsets according to some criterion. Kohonen neural networks are a class of artificial neural networks, the main element of which is a layer of adaptive linear adders, operating on the principle of “winner takes all”. One of the advantages of Kohonen networks is their ability of online clustering. Greedy agglomerative procedures in clustering consistently improve the result in some neighborhood of a known solution, choosing as the next solution the option that provides the least increase in the objective function. Algorithms using the agglomerative greedy heuristics demonstrate precise and stable results for a k-means model. In our study, we propose a greedy agglomerative heuristic algorithm based on a Kohonen neural network with distance measure variations to cluster industrial products. Computational experiments demonstrate the comparative efficiency and accuracy of using the greedy agglomerative heuristic in the problem of grouping of industrial products into homogeneous production batches.

1. Introduction

The search for a clustering algorithm that has both high accuracy and stability of the result, and, at the same time, a high speed of operation, is one of the problems of cluster analysis. The clustering result depends on the initially selected number of subsets, as well as the selected measure of similarity (dissimilarity) [1]. One of the most famous automatic grouping models is the k-means model [2], which was proposed by Steinhaus [3]. The goal of the k-means problem is to find k points (centers, centroids) X1, …, Xk in an M-dimensional space, such that the sum of the squared distances from the known points (data vectors) A1, …, Am to the nearest of the required points reaches a minimum:
arg min F ( X 1 , , X k ) = j { 1 , k } min X j A i 2 .
Optimization model (1) can be considered as a location problem, where the points Xj have to be placed in an optimal way. Location theory has been developing for a long time separately from cluster analysis while solving very close or completely identical problems. Weiszfeld proposed an iterative procedure for solving the Weber problem, one of the simplest location problems, based on an iterative weighted least squares method [4]. This algorithm determines a set of weights that are inversely proportional to the distances from the current estimate to the sample points and creates a new estimate that is the weighted average of the sample according to these weights. In specific situations, Weiszfeld’s algorithm is very slow and does not always converge.
Local search methods for solving location problems are used in Ref. [5]. The standard local descent algorithm starts with an initial solution S = {X1, …, Xk}, chosen randomly or with the help of some auxiliary algorithm. At each step of the local descent, there is a transition from the current solution to a neighboring solution with a smaller value of the objective function until the local optimum is reached. The key problem here is to find the set of neighboring solutions n(S). At each step of the local search, the neighborhood function n(S) specifies a set of possible search directions. Neighborhood functions can be very diverse, and the neighborhood relation is not always symmetrical. Local search algorithms are widely used to solve NP-hard discrete optimization problems. However, simple local descent does not allow the finding of the global optimum of the problem.
Local search methods were further developed in so-called metaheuristics, in particular in the search algorithm with variable neighborhoods (variable neighborhoods search, VNS). Its main idea is to systematically change the neighborhood function and the corresponding change in the landscape during the local search.
The k-means problem was algorithmically implemented by S. Lloyd [6]. For the observation vector X, the k-means algorithm is designed to determine k centers and assign data points (objects) to each center to form clusters Cj, j = 1… k while minimizing the difference between the objects within the cluster. In the k-means algorithm, it is necessary to initially predict the number of groups (clusters).
In addition, the result obtained depends on the initial choice of centers, which is one of the main disadvantages of the algorithm. The modern literature offers many approaches to setting the initial centroids for the k-means algorithm, which are basically various evolutionary and random search methods. One of the most popular methods is the k-means++ algorithm [7], where the first centroid is chosen randomly and further selection goes with a certain probability. Moreover, in Ref. [8], a similar method is developed based on finding areas with maximum density. In Ref. [9], the authors propose a brute-force approach to the initialization of the k-means algorithm. Kalczynski et al. in Ref. [10] introduce three algorithms (merging, construction and separation) to create starting solutions of the k-means problem. In Ref. [11], a heuristic-based algorithm to improve the initial seeding of the k-means clustering described is implemented, where a hybrid approach with a genetic algorithm and the differential evolution heuristic is used. Detailed overviews of the existing initialization approaches may be found in Refs. [12,13,14].
The Kohonen neural network [15] of the vector quantization type is an autoassociator closely related to k-means. This competitive network can be related to unsupervised learning. The Kohonen learning law is an algorithm that finds the centroid (called “codebook vector”) closest to each training case and moves the winning centroid closer to the training case. Another type of Kohonen network is the self-organizing map competitive network that provides a topological mapping from the input space to the clusters. Kohonen networks are used in many fields of interest, such as speech recognition [16,17], image compression [18,19], image segmentation [20], face recognition [21,22,23], classification of weather patterns [24], malware detection [25], seasonal sales planning [26], medical decision-making [27,28], e-learning recommendations [29], denial of service attack defense detection [30], groundwater quality assessment [31], exploration of the investment patterns [32] and others.
Greedy agglomerative heuristic procedures [33,34] are tools to improve a result in a certain neighborhood of the solution. An agglomerative procedure starts with some solution S containing an excessive number of centroids and sequentially removes them. The elements of the clusters, related to the removed centroids, are redistributed among the remaining clusters. The greedy strategies are used to decide which clusters are most similar to be merged together at each iteration of the agglomerative procedure. For subsequent iterations, the chosen method is that which gave the best increase in the objective function in the previous iterations. The practice of solving NP-hard problems shows the efficiency of the transition from randomized recombination procedures to the search for the best recombination method [35]. The authors in Ref. [36] proposed methods of greedy agglomerative heuristics based on the location theory models. Algorithms using these methods are often randomized, but the results are quite stable. The method of greedy agglomerative heuristics uses evolutionary algorithms as one of the ways to organize a global search.
To date, the vast majority of the algorithms developed for continuous location problems use the most common distance measures (Euclidean, Manhattan). However, taking into account the characteristics of the feature space of a specific practical problem, the choice of a distance measure can lead to an increase in the accuracy of clustering. For instance, Itakura–Saito distance was used to build the learning vector quantization algorithm. In Ref. [19], a method is proposed for using the Mahalanobis distance as the basis for grouping. The Mahalanobis distance was also used to cluster incoming data into neural nodes in a self-organizing incremental neural network [22]. Kohonen networks with graph-based augmented metrics are presented in Ref. [37]. In Ref. [38], a distance measure for a self-organizing map is defined based on data distribution, and it is calculated with the use of an energy function. A distance metric learning method is often applied. For instance, in Ref. [22], the Mahalanobis matrix is computed, which assures small distances between the nearest neighbor points from the same class and the separation of points belonging to different classes by a large margin. Metric learning for the SOM based on the adaptive subspaces is represented in Ref. [39]. Furukawa [40] offers a nonlinear metrics learning method. The authors in Ref. [41] use an ensemble approach to metric learning with objective function generalization. Yoneda and Furukawa [42] propose a co-training approach, which collapses the objective function, thereby avoiding undesirable local optima.
In our work, our aim is solving the problem of the automatic grouping of objects. As a base, we use different types of Kohonen networks and modify them with the greedy agglomerative heuristic and different types of distance measures. Then, we test our approach on the applied problem.
The rest of this paper is organized as follows. In the Section 2, we introduce the Kohonen neural network model and propose our new algorithm involving the greedy agglomerative heuristic procedure. In the Section 3, we describe the computational experiments with a practically important dataset. In the Section 4 and Section 5, we discuss the results and provide a short conclusion.

2. Kohonen Neural Networks for Clustering Problem

2.1. Distance Measures

In clustering problems, the key concept is the concept of distance metrics between objects. Metric is a function that determines the measure of the distance between objects in the metric space Rp. Metric space is a set of points with a distance function d(xi, yi). The distance of order p between two points is determined by the Minkowski function (lp–norm) [1,43,44]:
d ( x , y ) = ( i = 1 M | x i y i | p ) 1 p ,
where x and y are vectors of parameter values, M is the vector dimension. The parameter p is determined by the researcher; it can be used to progressively increase or decrease the weight of the i-th variable. Special cases of the Minkowski function depend on the p value. For p = 2, the function calculates the Euclidean distance between two points (l2–norm):
d ( x , y ) = i = 1 M ( x i y i ) 2 .
The squared Euclidean distance is often used:
d ( x , y ) = i = 1 M ( x i y i ) 2 .
For p = 1, the function calculates the Manhattan distance, also called rectangular (l1–norm):
d ( x , y ) = i = 1 M | x i y i | .
For p = ∞, the function calculates the Chebyshev distance, returning the largest value of the difference between the object parameters modulo:
d ( x , y ) = max | x i y i | .
In addition to the cases of the dependence of the distance function on the parameter p, there are other methods for calculating distances, for example, the Mahalanobis distance [45]. Mahalanobis distance can be defined as a measure of dissimilarity (difference) between vectors from the same probability distribution with the covariance matrix C:
d ( x , y ) = i = 1 M ( x i y i ) T · C 1 · ( x i y i ) .
If the covariance matrix is identity, then the distance becomes equal to Euclidean. The covariance matrix is defined as:
C = cov ( x , y ) = μ [ ( x μ ( x ) ) ( y μ ( y ) ) ] ,
where μ is expected value. In most cases, the literature presents problems with Euclidean or Manhattan metrics.
In our study, we use five different types of distance measures to evaluate which one is preferable in our case.

2.2. Vector Quantization Networks and Self-Organizing Kohonen Maps

A Kohonen network [15,39] is a self-organized neural network that enables us to allocate groups (clusters) of input vectors that have some common features. The Kohonen network (or Kohonen layer) is a single-layer network, each neuron of which is connected to all components of the input vector. The input vector is a description of one of the objects to be clustered. The number of neurons coincides with the number of clusters K that the network should allocate. Linear weighted adders are used as neurons in the Kohonen network. Each j-th neuron is described by a vector of weights Wj = (w1j, w2j, …, wMj), where M is input vector dimension, j = 1…K. The input vector has the form Xi = (x1i, x2i, …, xMi), i = 1…N, where N is number of objects.
According to the methods of adjusting the input weights of the adders and the problems being solved, many varieties of Kohonen networks are distinguished [46]. The most famous of them are:
  • Vector quantization networks (VQ), closely related to k-means method;
  • Self-organizing Kohonen maps (SOM), which provide a “topological” mapping from the input space to the clusters. Neurons in SOMs are organized into a grid (usually two-dimensional);
  • Learning vector quantization networks (LVQ), which include supervised learning and used for classification problems.
First two types of Kohonen networks refer to unsupervised learning, considered in this article.
Vector quantization consists of replacing a continuous distribution by a finite set of quantizers while minimizing a predefined distortion criterion and may be used to determine groups (clusters) of data sharing common properties [46]. Since vector quantization is a natural application for k-means, the centroids are also referred to as a “codes”, and the table mapping codes to centroid is often referred to as a “codebook”.
Competition mechanisms are used to train the network according to principle “winner takes all”, for instance, simple competitive learning (SCL) algorithm [47]. When the vector X is fed to the network input, the neuron wins, the weight vector of which is the least different from the input vector: d ( X , W c l ) = min 1 j K d ( X , W j ) . This problem with K key vectors Wj in the feature space of the observed data X in terms of vector quantization is defined as a minimization of encoding distortion, i.e., as minimization problem:
D = j = 1 K x V ( j ) | | x W j | | 2   m i n
where V(j) consists of points XiX closest to Wj.
The SCL algorithm is in fact the online version of the Lloyd’s algorithm. If all data points are known in advance, this algorithm works offline as a batch algorithm (batch vector quantization, BVQ). The k-means method is an intermediate version, where only one data point is randomly chosen, and only the winning centroid is updated as the mean value of its cluster [46]. Thus, we can say that Kohonen networks have an advantage over the k-means model in that they allow organizing the online clustering. Basic version of SCL algorithm is represented in Algorithm 1.
Algorithm 1. Basic SCL algorithm
Required: Set of initial data vectors X1, …, XN, where N is the number of points; set the number of neurons K; η0 (η = η0), where η0 is the initial learning rate; set the step of changing the learning rate Δη;
1. Set the weight of each neuron Wj ( j = 1 , K )
       2. While η > 0:
              3. for i = 1 , N ¯ do
                           4. for j = 1 , K ¯ do
                                     5. Find the closest neuron Wcl to Xi: c l = a r g m i n 1 j K d ( X i , W j )
                           6. Update the closest neuron: Wcl = Wcl + η(Xi − Wcl)
              7. η = η − Δη
Self-organizing Kohonen maps [15,39] produce a mapping from a multidimensional input space onto a lattice of neurons. The mapping is topology-preserving in that neighboring neurons respond to “similar” input patterns. SOMs are typically organized as one- or two- dimensional lattices for the purpose of visualization and dimensionality reduction. During training, the winner neuron and its topological neighbors are adapted to make their weight vectors more similar to the input pattern that caused the activation. SOM algorithm with time limit is represented in Algorithm 2.
Thus, the SCL algorithm is a particular case of the SOM algorithm, when the neighborhood is reduced to zero. The magnitude of the changes decreases with time and is smaller for neurons far away from the winning neuron. The learning rates and neighborhood functions can be applied in various ways. However, these functions should be decreasing [48].
Algorithm 2. Basic SOM algorithm
Required: Set of initial data X1, …, XN, where N is the number of points; set the number of neurons K; η0 (η = η0), where η0 is the initial learning rate.
1. Set the weight of each neuron Wj ( j = 1 , K )
2. t = 0
3. While maximum time is not exceeded: t Tmax:
       4. Randomly choose Xi from initial data X1, …, XN.
       5. Find the closest neuron Wcl to Xi: c l = a r g m i n 1 j K d ( X i , W j )
       6. Update the closest neuron Wcl and its neighbors V(Wcl).
             Here, V(Wcl) is the set of indexes in the neighborhood of Wcl, including Wcl:
             For each p in V(Wcl):
                 Wp = Wp + η(t) · h(t,d (Wp,Wcl)) · (Xi − Wp),
                 where η(t) is learning rate, h(t,d (Wp,Wcl)) is neighborhood function.
       7. t = t + 1
8. End while
Several methods can be used to initialize the neurons. The simplest one is when the initial values are chosen randomly. Another method is to initialize the weights by the average of the minimum and the maximum values of the elements of the vectors that have to be classified.
Moreover, there are different types of stopping rules that can be applied at step 2 of Algorithm 1 and step 3 of Algorithm 2. A fixed threshold for the error, time limit of fixed number of algorithm repeating can be used.

2.3. Proposed Algorithms

For our purposes, we take as a basis the Kohonen network. First, we use vector quantization type (Algorithm 3). The idea is to initialize the excess number of neurons and then gradually decrease their number. The algorithm processes data points one by one. After a certain step, the neuron is removed, the removal of which shows the smallest increase in the value of the objective function. The learning rate decreases after a certain number of steps SN.
Algorithm 3. SCL-based algorithm with a greedy agglomerative heuristic (SCL-GREEDY)
Required: Set of initial data X1, …, XN, where N is the number of points; set the number of neurons K1; η0 (η = η0), where η0 is the initial learning rate; set the step of changing the learning rate Δη
1.  Increase the number of neurons k times K = K · K1
2.  Set the weight of each neuron Wj ( j = 1 , K )
3.  Determine the number of steps to calculate η: SN = trunc(N/K), jj = 1
4.  While η > 0:
     5.  For i = 1 , N ¯ , do
          6.  For   j = 1 , K ¯ , do
               7.  Find the closest neuron Wcl to Xi: c l = a r g m i n 1 j K d ( X i , W j )
          8.  Update the closest neuron Wcl: Wcl = Wcl +η·(XiWcl)
          9.  For each neuron Wj, calculate sum of distances to the initial data
               X1, …, Xi:
                S j = j = 1 K q = 1 i W j X q 2
          10. IF i%(SN + 1) == 0, THEN Recalculate η: jj = jj + 1, η = η0/jj
     11. IF K <> K1, THEN remove neuron Wj with the maximum sum of distances S j
     12. η = η – Δη
13. End while
SOM-based algorithm with a greedy agglomerative heuristic is represented in Algorithm 4.
Batch versions of described algorithms are similar to their online versions except that the neuron weights are recalculated after passing through all sample points. We denote batch version of SCL algorithm as BVQ (as in Ref. [46]) and batch version of SOM algorithm as BSOM.
Algorithm 4. SOM-based algorithm with a greedy agglomerative heuristic (SOM-GREEDY)
Required: Set of initial data X1, …, XN, where N is the number of points; set the number of neurons K1; η0 (η = η0), where η0 is the initial learning rate.
1. Increase the number of neurons k times K = k · K1
2. Set the weight of each neuron Wj ( j = 1 , K )
3. t = 0
4. While maximum time is not exceeded: t Tmax:
     5. Randomly choose Xi from initial data X1, …, XN.
     6. Find the closest neuron Wcl to Xi: c l = a r g m i n 1 j K d ( X i , W j )
     7. Update the closest neuron Wcl and its neighbors V(Wcl).
         Here, V(Wcl) is the set of indexes in the neighborhood of Wcl, including Wcl:
          For each p in V(Wcl):
            Wp = Wp + η(t) · h(t,d (Wp,Wcl)) · (XiWp),
            where η(t) is learning rate, h(t,d (Wp,Wcl)) is neighborhood function.
8. For each neuron Wj, calculate sum of distances to the initial data X1, …, Xi: S j = j = 1 K q = 1 i W j X q 2
9. IF K <> K1, THEN remove neuron Wj with the maximum sum of distances S j
10. η = η − Δη
       11. t = t + 1
12. End while

3. Computational Experiment and Analysis

For the experiment, we considered the sample consisting of four different homogeneous batches of electronic radio components [49]. The total number of devices in all batches is 446. We considered various combinations of batches: mixed lots from four, three and two batches. Four-batch mixed lot contains 62 parameters (features), three-batch mixed lot and two-batch mixed lot contain 41 parameters. The difficulty of the sample is that the number of parameters in it is large enough relative to the number of sample elements.
Each experiment is performed in online mode and in batch mode. We used different types of distance measures: Chebyshev distance (ChD), Euclidean distance (EuD), squared Euclidean distance (SEuD), Mahalanobis distance (MahD), Manhattan distance (ManD). The choice of the method for initializing the weight coefficients was also different: average, random and with preliminary clustering by the k-means algorithm. Each experiment was run 30 times.
Algorithms were implemented in Java. For the computational experiments, we used the following test system: AMD Ryzen 5-1600 6C/12T 3200MHz CPU, 16 CB RAM. Each experiment took an average of 1 min of computer time.

3.1. Experiments in Online Mode

In this section, we compare the results of the experiment performed with SCL and SCL-GREEDY methods. Experiment with initial number of neurons K1 coinciding with a given number of clusters is marked as SCL; experiments with the initial number of neurons exceeding the specified number of clusters by two (K = 2 × K1) and three (K = 3 × K1) times are marked as SCL-GREEDY(2) and SCL-GREEDY(3).
Computational experiments showed that the use of the greedy agglomerative heuristic in SCL algorithm, in most cases, improves the accuracy of batch separation. Moreover, clustering accuracy decreases with increasing number of homogeneous batches in a mixed lot (Figure 1).
With regard to the influence of the distance measure on the clustering accuracy, it can be noted that the Chebyshev distance has an advantage over the others except two-batch mixed lot. However, in the case of the Chebyshev distance, we were dealing with a large coefficient of variation and a span coefficient for 3 and 4 mixed lots.
Moreover, for various combinations of batches, the minimum (Min), maximum (Max), mean (Mean), standard deviation (σ), coefficient of variation (V) and the span factor (R) of the objective function are calculated (Table 1, Table 2, Table 3, Table 4 and Table 5, Figure 2).
Statistical significance of difference in the objective function values given by SCL algorithm and best of its greedy version were tested with Wilcoxon rank sum test. The best (minimal) mean values of objective function are given in bold if the precedence is statistically significant at p ≤ 0.05.
For Chebyshev distance, the best objective function value was achieved with SCL-GREEDY(3) algorithm for two and three batches in a mixed lot. For four batches, the difference between algorithms was insignificant. The coefficient of variation and span factor have minimal values with SCL algorithm for two-batch mixed lot and four-batch mixed lot. For three-batch mixed lot, the coefficient of variation and span factor show best result with SCL-GREEDY(3).
For Euclidean distance, the best objective function value was achieved with SCL-GREEDY(3) algorithm for two and four batches, and with SCL-GREEDY(2) for three batches. The coefficient of variation and span factor have minimal values with SCL-GREEDY(3) algorithm for almost all mixed lots. For the four-batch mixed lot, the coefficient of variation shows best result with SCL algorithm.
For squared Euclidean distance, for all mixed lots, SCL-GREEDY(3) gives the best value of objective function. Besides, SCL-GREEDY(3) algorithm gives minimal values of the coefficient of variation and span factor for the two-batch mixed lot and minimal value of the span factor for the three-batch mixed lot. SCL-GREEDY(2) algorithm gives minimal values of the coefficient of variation and span factor for the four-batch mixed lot. SCL algorithm for the three-batch mixed lot gives minimal value of the coefficient of variation.
For Mahalanobis distance, the difference between objective function values was insignificant for two-batch mixed lot. For three-batch and four-batch mixed lots, the minimal objective function value was achieved with SCL-GREEDY(3) algorithm. The coefficient of variation and span factor have minimal values with SCL-GREEDY(3) for the two-batch mixed lot. For the three-batch and four-batch mixed lots, the coefficient of variation and span factor show best result with SCL algorithm.
For Manhattan distance, the precedence of SCL-GREEDY(3) algorithm in objective function value was observed for all mixed lots. The coefficient of variation and span factor have minimal values with SCL algorithm for almost all mixed lots. For the two-batch mixed lot, the span factor shows best result with SCL-GREEDY(3) algorithm.
From Figure 2, it can be seen that the coefficient of variation is minimum with Chebyshev distance and Mahalanobis distance for mixed two-batch lot. For mixed three-batch lot, the coefficient of variation is minimum with square Euclidean distance and Manhattan distance. The worst (maximum) value it shows with Chebyshev distance for four-batch lot; other distances are similar to each other in this case.
The span factor shows the best result with Chebyshev distance and Mahalanobis distance for mixed two-batch lot. For mixed three-batch lot and four-batch lot, the span factor is the best with square Euclidean distance and Manhattan distance.

3.2. Experiments in Batch Mode

In this section, we compare the results of the experiment performed with modifications of BVQ and SOM algorithms. Experiment with initial number of neurons K1 coinciding with a given number of clusters is marked as BVQ and SOM; experiments with the initial number of neurons exceeding the specified number of clusters by two (K = 2 × K1) and three (K = 3 × K1) times are marked as BVQ-GREEDY(2), BVQ-GREEDY(3), SOM-GREEDY(2) and SOM-GREEDY(3). We used different methods for initializing the weight coefficients: by average, by random and with preliminary clustering by the k-means algorithm.
Experiments showed that, like in online mode, clustering accuracy decreases with increasing number of homogeneous batches in a mixed lot (Table 6, Table 7 and Table 8). Moreover, it can be seen that the use of the greedy agglomerative heuristic in batch mode improves the accuracy of batch separation for BVQ algorithm but degrades for SOM algorithm.
From Table 6, it can be seen that, for two-batch mixed lot, all presented algorithms exceed k-means in accuracy. However, BVQ, SOM-GREEDY(2), SOM-GREEDY(3) algorithms showed worse results for random initialization with Mahalanobis distance.
From Table 7, it can be seen that, for three-batch mixed lot, all presented algorithms were approximately equal to k-means algorithm in accuracy except Chebyshev distance. For BVQ algorithm, its greedy version was more accurate, but, for SOM algorithm, its greedy version was better only for Chebyshev distance.
From Table 8, it can be seen that, for four-batch mixed lot, all presented algorithms were approximately equal to k-means algorithm in accuracy except Chebyshev distance. For BVQ algorithm, its greedy version was more accurate. For SOM algorithm, on the contrary, result was generally better than its greedy version.
Characteristics of objective function value for batch-type algorithms with various combinations of distance measure and initialization method are given in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14 and Table A15 of Appendix A. Results were tested with Wilcoxon rank sum test. Statistically significant superiority in objective function value between BVQ algorithm and its greedy version, SOM algorithm and its greedy version at p ≤ 0.05 highlighted in bold.
It turned out that, in the vast majority of cases, minimal objective function value was demonstrated by SOM algorithm without influence of initialization method or distance measure. Minimal value of coefficient of variation and span factor were achieved with Euclidean, squared Euclidean and Manhattan distance and initialization by average.
From Figure 3, Figure 4 and Figure 5, it can be seen that initialization by k-means increases the coefficient of variation value for any type of mixed lot composition.

4. Discussion

It can be summarized that the use of the greedy agglomerative heuristic procedure with the simple competitive learning algorithm in online mode improves the objective function value in the majority of the cases. In the other cases, the difference between our new algorithms and the known algorithms is insignificant (see two exceptions in our computational experiments: the four batches with Chebyshev distance and two batches with Mahalanobis distance).
It can be noted that, in the vast majority of cases, the use of a triple number of neurons in the greedy agglomerative heuristic procedure provided a smaller value of the objective function than double.
Regarding the stability of the results (coefficient of variation and span factor values), none of the algorithms have demonstrated advantages over the others.
The minimal values of the objective function were achieved with the squared Euclidean distance measure for two batches and with the Euclidean distance measure for three batches and with the Chebyshev distance measure for four batches.
In almost all the cases, the clustering accuracy of the SCL-GREEDY version was the same as or better than the SCL algorithm. The only exception was the three-batch mixed lot with Chebyshev distance.
Concerning batch mode algorithms, the situation was different. The greedy version of the BVQ algorithm demonstrated a better objective function value than the BVQ in 42% of the cases and a worse objective function value than the BVQ in 37% of the cases. In 23% of the cases, the difference was statistically insignificant. The SOM algorithm was better than its greedy version in the vast majority of cases.
The accuracy of the clustering in batch mode is similar to the situation with the objective function. Applying a greedy heuristic to the BVQ algorithm improves the clustering accuracy. However, the greedy version of the SOM algorithm exhibits less clustering accuracy.

5. Conclusions

In our work, we proposed algorithms for products clustering based on a Kohonen network and self-organizing Kohonen maps using the greedy agglomerative heuristic procedure in online and batch modes. We performed experiments with different distance measures (Euclidean, squared Euclidean, Manhattan, Chebyshev, Mahalanobis), different ways of neuron weights initialization (average, random, k-means) and different numbers of extra neurons in the greedy heuristic procedure.
The studies have shown that the used distance measure, in most cases, does not significantly affect the clustering accuracy. The way of neuron weights initialization plays a role for the stability of the objective function: the coefficient of variation for any type of mixed lot composition was higher (worse) with k-means initialization.
In batch mode, in the vast majority of cases, the minimal objective function value was demonstrated by the SOM algorithm without the influence of the initialization method or distance measure.
The computational experiments showed that the use of the greedy agglomerative heuristic in online mode, in most cases, improves the accuracy of homogeneous batch separation. In batch mode, the greedy heuristic improves the accuracy for the vector quantization algorithm and, on the contrary, reduces the accuracy for self-organized maps.
The study of the batch mode and online mode of algorithms for clustering products using the greedy agglomerative heuristic procedure on a large number of homogeneous batches is an interesting area for further research.

Author Contributions

Conceptualization, G.S., L.K. and E.T.; methodology, G.S. and L.K.; software, G.S. and L.V.; validation, G.S., N.R. and E.T.; formal analysis, L.K.; investigation, L.V.; resources, L.V. and L.K.; data curation, L.K.; writing—original draft preparation, G.S. and E.T.; writing—review and editing, L.K., G.S., N.R. and E.T.; visualization, G.S.; supervision, L.K.; project administration, L.K.; funding acquisition, L.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Higher Education of the Russian Federation, project no. FEFE-2020-0013.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Objective function value summarized after 30 attempts (Chebyshev distance, initialization by average).
Table A1. Objective function value summarized after 30 attempts (Chebyshev distance, initialization by average).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min207.903194.919204.323204.113205.657206.582
Max207.903217.269204.323216.201205.657249.768
Mean207.903203.274204.323205.577205.657222.260
σ0.00010.0110.0003.1440.00020.395
V0.0004.9250.0001.5290.0009.176
R0.00022.3510.00012.0880.00043.187
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min342.987252.824294.450303.473258.930254.905
Max342.987353.693342.987345.475258.930254.905
Mean342.987334.532317.101331.474258.930254.905
σ0.00032.01825.06420.4950.0000.000
V0.0009.5717.9046.1830.0000.000
R0.000100.86948.53742.0010.0000.000
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min553.715493.863493.936493.368553.548559.033
Max701.325578.828563.388561.249705.364701.241
Mean613.034523.175548.875528.842576.858616.482
σ70.63230.29523.65932.58938.34761.034
V11.5225.7914.3116.1626.6479.900
R147.61084.96569.45267.881151.816142.208
Table A2. Objective function value summarized after 30 attempts (Chebyshev distance, initialization by k-means).
Table A2. Objective function value summarized after 30 attempts (Chebyshev distance, initialization by k-means).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min207.903195.429204.323204.113205.657216.201
Max207.903217.037335.401337.849335.401337.849
Mean207.903204.757230.805320.018270.697307.738
σ0.0009.86954.13647.05744.57343.024
V0.0004.82023.45514.70516.46613.981
R0.00021.608131.079133.737129.744121.648
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min258.930252.467294.450303.473258.930254.905
Max258.930347.462366.993365.509366.993366.993
Mean258.930328.184349.165351.103302.155335.717
σ0.00037.90019.11121.05654.79850.449
V0.00011.5485.4735.99718.13615.027
R0.00094.99572.54362.036108.063112.088
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min554.632495.630493.396566.491555.149692.980
Max703.198664.941719.547719.547732.274878.714
Mean648.439557.872589.796625.305690.896721.950
σ66.90755.82182.80473.94153.47847.524
V10.31810.00614.03911.8257.7406.583
R148.566169.311226.151153.056177.125185.735
Table A3. Objective function value summarized after 30 attempts (Chebyshev distance, initialization by random).
Table A3. Objective function value summarized after 30 attempts (Chebyshev distance, initialization by random).
Para-
meter
BVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min207.903195.096204.323204.113205.657206.582
Max207.903218.493214.846216.201251.462249.768
Mean207.903206.037205.904209.754225.213230.583
σ0.00010.0633.6606.24219.53918.841
V0.0004.8841.7782.9768.6768.171
R0.00023.39610.52312.08845.80543.187
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min258.930252.251258.930254.905357.439254.905
Max366.993347.509366.993364.023366.993364.023
Mean357.453315.496348.951336.351365.296333.542
σ28.00244.78435.29644.9643.36249.308
V7.83414.19510.11513.3680.92014.783
R108.06395.259108.063109.1189.554109.118
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min552.568497.795493.396493.368556.048552.696
Max698.601620.549561.092561.078705.538702.573
Mean601.527533.155531.922513.349583.717614.007
σ60.19638.03232.67129.47149.76562.183
V10.0077.1336.1425.7418.52510.127
R146.033122.75467.69667.710149.490149.877
Table A4. Objective function value summarized after 30 attempts (Euclidean distance, initialization by average).
Table A4. Objective function value summarized after 30 attempts (Euclidean distance, initialization by average).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min186.160185.969186.346186.372186.346186.372
Max186.160186.207186.346186.372186.346186.372
Mean186.160186.087186.346186.372186.346186.372
σ0.0000.0610.0000.0000.0000.000
V0.0000.0330.0000.0000.0000.000
R0.0000.2370.0000.0000.0000.000
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min314.924241.647241.731241.795314.924310.298
Max314.924315.304241.731241.795314.924310.298
Mean314.924258.499241.731241.795314.924310.298
σ0.00030.5710.0000.0000.0000.000
V0.00011.8260.0000.0000.0000.000
R0.00073.6570.0000.0000.0000.000
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min546.596492.597492.933492.903552.883548.802
Max546.596655.370549.751492.903552.883548.804
Mean546.596521.356541.720492.903552.883548.803
σ0.00045.03819.8090.0000.0000.001
V0.0008.6393.6570.0000.0000.000
R0.000162.77256.8180.0000.0000.001
Table A5. Objective function value summarized after 30 attempts (Euclidean distance, initialization by k-means).
Table A5. Objective function value summarized after 30 attempts (Euclidean distance, initialization by k-means).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min186.160186.007186.346186.372186.346186.372
Max186.160186.235335.401337.849335.401337.849
Mean186.160186.105216.157297.455236.031317.652
σ0.0000.06161.71569.33772.73253.300
V0.0000.03328.55123.31030.81416.779
R0.0000.227149.055151.477149.055151.477
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min314.924241.768241.731241.795314.924310.298
Max314.924312.783325.737325.737325.737323.263
Mean314.924258.406264.133300.787316.366318.737
σ0.00030.36438.45336.8483.8055.357
V0.00011.75114.55812.2501.2031.681
R0.00071.01584.00683.94310.81312.965
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min546.596492.696492.933492.903552.883561.003
Max559.778554.321711.945704.490711.945711.945
Mean548.117523.082565.934613.940599.916647.609
σ4.06928.34344.20278.43869.01973.181
V0.7425.4187.81112.77611.50511.300
R13.18261.624219.011211.587159.062150.942
Table A6. Objective function value summarized after 30 attempts (Euclidean distance, initialization by random).
Table A6. Objective function value summarized after 30 attempts (Euclidean distance, initialization by random).
Para-
meter
BVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min186.160185.954186.346186.372186.346186.372
Max186.160186.211186.346186.372186.346186.372
Mean186.160186.100186.346186.372186.346186.372
σ0.0000.0620.0000.0000.0000.000
V0.0000.0330.0000.0000.0000.000
R0.0000.2580.0000.0000.0000.000
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min314.924241.634314.924310.298314.924310.298
Max314.924312.797325.737320.973325.737320.973
Mean314.924260.991315.645313.145315.645311.722
σ0.00031.6612.7924.8862.7923.756
V0.00012.1310.8851.5600.8851.205
R0.00071.16310.81310.67510.81310.675
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min703.485492.661711.930699.061711.945707.793
Max703.485550.607711.945707.793711.945707.793
Mean703.485528.546711.944707.096711.945707.793
σ0.00025.5460.0042.2420.0000.000
V0.0004.8330.0010.3170.0000.000
R0.00057.9460.0158.7320.0000.000
Table A7. Objective function value summarized after 30 attempts (squared Euclidean distance, initialization by average).
Table A7. Objective function value summarized after 30 attempts (squared Euclidean distance, initialization by average).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min186.160185.962186.346186.372186.346186.372
Max186.160186.233186.346186.372186.346186.372
Mean186.160186.092186.346186.372186.346186.372
σ0.0000.0700.0000.0000.0000.000
V0.0000.0380.0000.0000.0000.000
R0.0000.2710.0000.0000.0000.000
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min314.924241.673241.731241.795314.924310.298
Max314.924312.911241.731241.795314.924310.298
Mean314.924265.496241.731241.795314.924310.298
σ0.00033.8760.0000.0000.0000.000
V0.00012.7590.0000.0000.0000.000
R0.00071.2380.0000.0000.0000.000
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min546.596492.541549.130492.903552.883548.802
Max546.596544.710549.751492.903552.883559.135
Mean546.596509.981549.586492.903552.883550.164
σ0.00024.2490.2840.0000.0002.840
V0.0004.7550.0520.0000.0000.516
R0.00052.1690.6210.0000.00010.333
Table A8. Objective function value summarized after 30 attempts (squared Euclidean distance, initialization by k-means).
Table A8. Objective function value summarized after 30 attempts (squared Euclidean distance, initialization by k-means).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min186.160186.010186.346186.372186.346186.372
Max186.160186.200335.401337.849335.401337.849
Mean186.160186.113236.031297.455226.094317.652
σ0.0000.05072.73269.33768.22853.300
V0.0000.02730.81423.31030.17716.779
R0.0000.189149.055151.477149.055151.477
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min314.924241.634241.731241.795314.924310.298
Max314.924315.223325.737323.263325.737325.737
Mean314.924275.027269.733295.038317.087317.326
σ0.00035.99540.99138.9814.4776.079
V0.00013.08815.19713.2121.4121.916
R0.00073.58984.00681.46810.81315.439
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min546.596492.561492.933560.904552.883560.904
Max559.778555.250711.945711.945711.945711.945
Mean549.233530.534585.963658.380570.195636.706
σ5.45824.01567.56471.29439.40373.474
V0.9944.52711.53010.8296.91111.540
R13.18262.688219.011151.041159.062151.041
Table A9. Objective function value summarized after 30 attempts (squared Euclidean distance, initialization by random).
Table A9. Objective function value summarized after 30 attempts (squared Euclidean distance, initialization by random).
Para-
meter
BVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min186.160185.969186.346186.372186.346186.372
Max186.160186.183186.346186.372186.346186.372
Mean186.160186.088186.346186.372186.346186.372
σ0.0000.0480.0000.0000.0000.000
V0.0000.0260.0000.0000.0000.000
R0.0000.2140.0000.0000.0000.000
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min314.924241.651314.924310.298314.924310.298
Max325.737315.550325.737320.973325.737320.973
Mean315.645272.613316.366315.280315.645311.010
σ2.79235.7463.8055.5132.7922.756
V0.88513.1121.2031.7480.8850.886
R10.81373.89910.81310.67510.81310.675
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min703.485492.663562.352561.616711.909707.793
Max703.485550.840711.945707.793711.945707.793
Mean703.485523.502701.972697.952711.942707.793
σ0.00026.00838.62537.7180.0090.000
V0.0004.9685.5025.4040.0010.000
R0.00058.177149.592146.1770.0350.000
Table A10. Objective function value summarized after 30 attempts (Mahalanobis distance, initialization by average).
Table A10. Objective function value summarized after 30 attempts (Mahalanobis distance, initialization by average).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min201.387194.572205.657206.582205.657206.582
Max201.387197.145205.657206.582205.657206.582
Mean201.387195.929205.657206.582205.657206.582
σ0.0000.5680.0000.0000.0000.000
V0.0000.2900.0000.0000.0000.000
R0.0002.5730.0000.0000.0000.000
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min326.463312.110287.148288.169326.463288.169
Max326.463361.353348.315350.407326.463350.407
Mean326.463342.690295.317332.825326.463325.676
σ0.00011.44818.17327.8950.00028.141
V0.0003.3416.1548.3810.0008.641
R0.00049.24261.16862.2380.00062.238
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min758.068572.730728.875696.210757.097756.752
Max758.068663.835730.771722.998757.734757.838
Mean758.068619.269730.508709.527757.375757.390
σ0.00029.5090.66311.9500.2690.344
V0.0004.7650.0911.6840.0360.045
R0.00091.1041.89626.7880.6381.086
Table A11. Objective function value summarized after 30 attempts (Mahalanobis distance, initialization by k-means).
Table A11. Objective function value summarized after 30 attempts (Mahalanobis distance, initialization by k-means).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min201.387194.573205.657206.582205.657206.582
Max201.387197.148335.401337.849335.401337.849
Mean201.387195.931222.956311.596231.606294.093
σ0.0000.68045.65354.35053.71964.052
V0.0000.34720.47617.44223.19421.779
R0.0002.575129.744131.268129.744131.268
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min348.315291.064287.148317.121326.463335.463
Max348.315361.323366.993366.993366.993365.509
Mean348.315340.856347.631353.627337.082358.806
σ0.00014.37026.14416.29818.24112.100
V0.0004.2167.5214.6095.4113.372
R0.00070.25879.84649.87240.53130.046
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min758.068554.372725.168701.832757.148763.684
Max764.357677.513852.818892.642845.107836.489
Mean758.487613.179765.066741.820773.655800.642
σ1.62434.37845.73343.56429.14734.313
V0.2145.6075.9785.8733.7684.286
R6.289123.141127.650190.81087.95972.805
Table A12. Objective function value summarized after 30 attempts (Mahalanobis distance, initialization by random).
Table A12. Objective function value summarized after 30 attempts (Mahalanobis distance, initialization by random).
Para-
meter
BVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min331.089194.303205.657206.582335.258337.849
Max331.107196.693335.401337.849335.401337.849
Mean331.106195.717326.716320.358335.387337.849
σ0.0050.50333.49046.1160.0380.000
V0.0010.25710.25114.3950.0110.000
R0.0182.390129.744131.2680.1430.000
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min443.219305.398437.281437.742440.640438.611
Max444.607361.336444.378439.538444.607439.598
Mean444.133338.766442.954438.906444.084439.413
σ0.47915.1772.1630.5401.0110.275
V0.1084.4800.4880.1230.2280.063
R1.38855.9377.0971.7953.9670.987
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min764.357564.890764.433771.053839.280838.055
Max856.541673.446836.222866.897851.622874.529
Mean818.674614.438787.136790.511845.713854.250
σ45.91427.72131.28437.2963.15413.329
V5.6084.5123.9744.7180.3731.560
R92.185108.55671.78995.84412.34236.475
Table A13. Objective function value summarized after 30 attempts (Manhattan distance, initialization by average).
Table A13. Objective function value summarized after 30 attempts (Manhattan distance, initialization by average).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min186.160185.971186.346186.372186.346186.372
Max186.160186.184186.346186.372186.346186.372
Mean186.160186.068186.346186.372186.346186.372
σ0.0000.0530.0000.0000.0000.000
V0.0000.0280.0000.0000.0000.000
R0.0000.2140.0000.0000.0000.000
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min314.924241.918241.887241.991314.924310.298
Max314.924315.391241.887241.991314.924310.298
Mean314.924247.062241.887241.991314.924310.298
σ0.00018.5650.0000.0000.0000.000
V0.0007.5140.0000.0000.0000.000
R0.00073.4730.0000.0000.0000.000
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min690.643493.142551.927493.236703.810692.768
Max690.643546.207555.852493.236703.810700.558
Mean690.643497.063554.659493.236703.810697.752
σ0.00013.5971.7650.0000.0002.854
V0.0002.7350.3180.0000.0000.409
R0.00053.0653.9250.0000.0007.790
Table A14. Objective function value summarized after 30 attempts (Manhattan distance, initialization by k-means).
Table A14. Objective function value summarized after 30 attempts (Manhattan distance, initialization by k-means).
ParameterBVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min186.160185.934186.346186.372186.346186.372
Max186.160186.214335.401337.849335.401337.849
Mean186.160186.071226.094297.455236.031266.997
σ0.0000.06468.22869.33772.73278.067
V0.0000.03530.17723.31030.81429.239
R0.0000.281149.055151.477149.055151.477
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min314.924241.851241.887241.991314.924310.298
Max314.924315.759325.737325.737325.737323.263
Mean314.924254.400269.547305.952317.087317.568
σ0.00027.82140.50433.1324.4775.352
V0.00010.93615.02710.8291.4121.685
R0.00073.90883.85083.74610.81312.965
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min690.643493.044551.927493.236695.985696.331
Max700.891555.231706.183705.449711.926892.642
Mean692.693508.407645.549633.408705.483714.682
σ4.24325.69972.85081.0675.03949.382
V0.6135.05511.28512.7990.7146.910
R10.24762.187154.257212.21315.941196.311
Table A15. Objective function value summarized after 30 attempts (Manhattan distance, initialization by random).
Table A15. Objective function value summarized after 30 attempts (Manhattan distance, initialization by random).
Para-
meter
BVQSOMBVQ-
GREEDY(2)
BVQ-
GREEDY(3)
SOM-
GREEDY(2)
SOM-
GREEDY(3)
Two-batch mixed lot
K1 = 2K1 = 4K1 = 6K1 = 4K1 = 6
Min186.160186.004186.346186.372186.346186.372
Max186.160186.259186.346186.372186.346186.372
Mean186.160186.098186.346186.372186.346186.372
σ0.0000.0610.0000.0000.0000.000
V0.0000.0330.0000.0000.0000.000
R0.0000.2550.0000.0000.0000.000
Three-batch mixed lot
K1 = 3K1 = 6K1 = 9K1 = 6K1 = 9
Min314.924241.891241.887310.298314.924310.298
Max325.737315.597325.737320.973325.737310.298
Mean318.529251.905312.939315.280315.645310.298
σ5.27625.30420.2545.5132.7920.000
V1.65610.0456.4721.7480.8850.000
R10.81373.70683.85010.67510.8130.000
Four-batch mixed lot
K1 = 4K1 = 8K1 = 12K1 = 8K1 = 12
Min703.264493.211711.715704.380711.772867.281
Max885.950549.237891.441882.936891.058884.912
Mean715.546510.358779.266730.258850.182882.353
σ47.14124.75085.74760.06871.9824.286
V6.5884.85011.0048.2268.4670.486
R182.68656.026179.727178.556179.28617.632

References

  1. Shirkhorshidi, A.S.; Aghabozorgi, S.; Wah, T. A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE 2015, 10, e0144059. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Youguo, L.; Haiyan, W. A clustering method based on k-means algorithm. Phys. Procedia 2012, 25, 1104–1109. [Google Scholar]
  3. Steinhaus, H. Sur la divisiondes corps materiels en parties. Bull. Acad. Polon. Sci. 1956, 4, 801–804. [Google Scholar]
  4. Weiszfeld, E.; Plastria, F. On the point for which the sum of the distances to n given points is minimum. Ann. Oper. Res. 2009, 167, 7–41. [Google Scholar] [CrossRef]
  5. Nicholson, T. A sequential method for discrete optimization problems and its application to the assignment, traveling salesman and tree scheduling problems. J. Inst. Math. Appl. 1965, 13, 362–375. [Google Scholar]
  6. Lloyd, S.P. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef] [Green Version]
  7. Arthur, D.; Vassilvitskii, S. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
  8. Bradley, P.S.; Fayyad, U.M. Refining initial points for k-means clustering. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, WI, USA, 24–27 July 1998; Volume 98, pp. 91–99. [Google Scholar]
  9. Golasowski, M.; Martinovič, J.; Slaninová, K. Comparison of k-means clustering initialization approaches with brute-force initialization. In Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing; Springer: Singapore, 2017; Volume 567, pp. 103–114. [Google Scholar]
  10. Kalczynski, P.; Brimberg, J.; Drezner, Z. Less is more: Simple algorithms for the minimum sum of squares clustering problem. IMA J. Manag. Math. 2021, dpab031. [Google Scholar] [CrossRef]
  11. Mustafi, D.; Sahoo, G. A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering. Soft Comput. 2019, 23, 6361–6378. [Google Scholar] [CrossRef]
  12. Jain, A.K. Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
  13. Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
  14. Celebi, M.E.; Kingravi, H.A.; Vela, P.A. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 2013, 40, 200–210. [Google Scholar] [CrossRef] [Green Version]
  15. Kohonen, T. Self-Organizing Maps; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
  16. Kohonen, T.; Somervuo, P. Self-organizing maps of symbol strings with application to speech recognition. In Proceedings of the Workshop on Self-Organizing Maps (WSOM’97), Espoo, Finland, 4–6 June 1997; pp. 2–7. [Google Scholar]
  17. Świetlicka, I.; Kuniszyk-Jóźkowiak, W.; Świetlicki, M. Artificial neural networks combined with the principal component analysis for non-fluent speech recognition. Sensors 2022, 22, 321. [Google Scholar] [CrossRef] [PubMed]
  18. Ettaouil, M.; Lazaar, M.; Ghanou, Y. Vector quantization by improved Kohonen algorithm. J. Comput. 2012, 4, 2151–9617. [Google Scholar]
  19. Younis, K.S.; Rogers, S.K.; DeSimio, M.P. Vector quantization based on dynamic adjustment of Mahalanobis distance. In Proceedings of the IEEE 1996 National Aerospace and Electronics Conference NAECON, Dayton, OH, USA, 20–23 May 1996; Volume 2, pp. 858–862. [Google Scholar] [CrossRef]
  20. Paul, S.; Gupta, M. Image segmentation by self-organizing map with Mahalanobis distance. Int. J. Emerg. Technol. Adv. Eng. 2013, 3, 2250–2459. [Google Scholar]
  21. Sun, Y.; Liu, H.; Sun, Q. Online learning on incremental distance metric for person re-identification. In Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics, Bali, Indonesia, 5 December 2014. [Google Scholar]
  22. Plonski, P.; Zaremba, K. Improving Performance of self-organising maps with distance metric learning method. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 29 April 29–3 May 2012; Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7267. [Google Scholar] [CrossRef] [Green Version]
  23. Saleh, A.; Naoyuki, T.; Rin-Ichiro, T. Face recognition under varying illumination using Mahalanobis self-organizing map. Artif. Life Robot. 2008, 13, 298–301. [Google Scholar] [CrossRef]
  24. Natita, W.; Wiboonsak, W.; Dusadee, S. Appropriate learning rate and neighborhood function of self-organizing map (SOM) for specific humidity pattern classification over Southern Thailand. Int. J. Modeling Optim. 2016, 6, 61–65. [Google Scholar] [CrossRef] [Green Version]
  25. Mahindru, A.; Sangal, A.L. SOMDROID: Android malware detection by artificial neural network trained using unsupervised learning. Evol. Intel. 2022, 15, 407–437. [Google Scholar] [CrossRef]
  26. Grinyak, V.M.; Yudin, P.V. Kohonen self-organizing map in seasonal sales planning. In SMART Automatics and Energy. Smart Innovation, Systems and Technologies; Solovev, D.B., Kyriakopoulos, G.L., Venelin, T., Eds.; Springer: Singapore, 2022; Volume 272. [Google Scholar] [CrossRef]
  27. Wang, Y.; Wang, H.; Li, S.; Wang, L. Survival risk prediction of esophageal cancer based on the Kohonen network clustering algorithm and kernel extreme learning machine. Mathematics 2022, 10, 1367. [Google Scholar] [CrossRef]
  28. Kiseleva, E.I.; Astachova, I.F. Intelligent support for medical decision making. In Advances in Automation III. RusAutoCon 2021. Lecture Notes in Electrical Engineering; Radionov, A.A., Gasiyarov, V.R., Eds.; Springer: Cham, Switzerland, 2022; Volume 857. [Google Scholar] [CrossRef]
  29. Mawane, J.; Naji, A.; Ramdani, M. A cluster validity for optimal configuration of Kohonen maps in e-learning recommendation. Indones. J. Electr. Eng. Comput. Sci. 2022, 26, 482–492. [Google Scholar] [CrossRef]
  30. Huang, X. Application of computer data mining technology based on AKN algorithm in denial of service attack defense detection. Wirel. Commun. Mob. Comput. 2022, 2022, 4729526. [Google Scholar] [CrossRef]
  31. Amiri, V.; Nakagawa, K. Using a linear discriminant analysis (LDA)-based nomenclature system and self-organizing maps (SOM) for spatiotemporal assessment of groundwater quality in a coastal aquifer. J. Hydrol. 2021, 603, 127082. [Google Scholar] [CrossRef]
  32. Kovács, T.; Ko, A.; Asemi, A. Exploration of the investment patterns of potential retail banking customers using two-stage cluster analysis. J. Big Data 2021, 8, 141. [Google Scholar] [CrossRef]
  33. Kuehn, A.A.; Hamburger, M.J. A heuristic program for locating warehouses. Manag. Sci. 1963, 9, 643–666. [Google Scholar] [CrossRef]
  34. Alp, O.; Erkut, E.; Drezner, Z. An efficient genetic algorithm for the p-median problem. Ann. Oper. Res. 2003, 122, 21–42. [Google Scholar] [CrossRef]
  35. Agarwal, C.C.; Orlin, J.B.; Tai, R.P. Optimized crossover for the independent set problem. Oper. Res. 1997, 45, 226–234. [Google Scholar] [CrossRef] [Green Version]
  36. Kazakovtsev, L.A.; Antamoshkin, A.N. Genetic algorithm wish fast greedy heuristic for clustering and location problems. Informatica 2014, 38, 229–240. [Google Scholar]
  37. Andras, P.; Idowu, O. Kohonen networks with graph-based augmented metrics. In Proceedings of the Workshop on Self-Organizing Maps (WSOM 2005), Paris, France, 5–8 September 2005; pp. 179–186. [Google Scholar]
  38. Horio, K.; Koga, T.; Yamakawa, T. Self-organizing map with distance measure defined by data distribution. In Proceedings of the 2008 World Automation Congress, Waikoloa, HI, USA, 28 September–2 October 2008; pp. 1–6. [Google Scholar]
  39. Kohonen, T.; Kaski, S.; Lappalainen, H. Self-organized formation of various invariant-feature filters in the adaptive-subspace SOM. Neural Comput. 1997, 9, 1321–1344. [Google Scholar] [CrossRef]
  40. Furukawa, T. SOM of SOMs. Neural Netw. 2009, 22, 463–478. [Google Scholar] [CrossRef] [Green Version]
  41. Arnonkijpanich, B.; Hasenfuss, A.; Hammer, B. Local matrix adaptation in topographic neural maps. Neurocomputing 2011, 74, 522–539. [Google Scholar] [CrossRef] [Green Version]
  42. Yoneda, K.; Furukawa, T. Distance metric learning for the self-organizing map using a co-training approach. Int. J. Innov. Comput. Inf. Control 2018, 14, 2343–2351. [Google Scholar] [CrossRef]
  43. Alfeilat, H.; Hassanat, A.; Lasassmeh, O.; Tarawneh, A.; Alhasanat, M.; Salman, H.; Prasath, V. Effects of distance measure choice on K-Nearest Neighbor classifier performance: A review. Big Data 2019, 7, 221–248. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Weller-Fahy, D.J.; Borghetti, B.J.; Sodemann, A.A. A Survey of Distance and similarity measures used within network intrusion anomaly detection. IEEE Commun. Surv. Tutor. 2015, 17, 70–91. [Google Scholar] [CrossRef]
  45. McLachlan, G. Mahalanobis distance. Resonance 1999, 4, 20–26. [Google Scholar] [CrossRef]
  46. De Bodt, E.; Cottrell, M.; Letremy, P.; Verleysen, M. On the use of self-organizing maps to accelerate vector quantization. Neurocomputing 2004, 56, 187–203. [Google Scholar] [CrossRef] [Green Version]
  47. Haykin, S. Neural Networks and Learning Machines; Pearson Education: New York, NY, USA, 2009. [Google Scholar]
  48. Fausett, L. Fundamental of Neural Networks: Architectures, Algorithms, and Applications; Prentice Hall International: Hoboken, NJ, USA, 1994; pp. 169–175. [Google Scholar]
  49. Shkaberina, G.S.; Orlov, V.I.; Tovbis, E.M.; Kazakovtsev, L.A. On the optimization models for automatic grouping of industrial products by homogeneous production batches. In Mathematical Optimization Theory and Operations Research 2020, Communications in Computer and Information Science; Kochetov, Y., Bykadorov, I., Gruzdeva, T., Eds.; Springer: Cham, Switzerland, 2020; Volume 1275, pp. 421–436. [Google Scholar]
Figure 1. Accuracy of device clustering for SCL, SCL-GREEDY(2) and SCL-GREEDY(3) algorithms.
Figure 1. Accuracy of device clustering for SCL, SCL-GREEDY(2) and SCL-GREEDY(3) algorithms.
Algorithms 15 00191 g001
Figure 2. (a) Coefficient of variation of the objective function value for two-batch mixed lot; (b) span factor of the objective function value for two-batch mixed lot; (c) coefficient of variation of the objective function value for three-batch mixed lot; (d) span factor of the objective function value for three-batch mixed lot; (e) coefficient of variation of the objective function value for four-batch mixed lot; (f) span factor of the objective function value for four-batch mixed lot.
Figure 2. (a) Coefficient of variation of the objective function value for two-batch mixed lot; (b) span factor of the objective function value for two-batch mixed lot; (c) coefficient of variation of the objective function value for three-batch mixed lot; (d) span factor of the objective function value for three-batch mixed lot; (e) coefficient of variation of the objective function value for four-batch mixed lot; (f) span factor of the objective function value for four-batch mixed lot.
Algorithms 15 00191 g002
Figure 3. Coefficient of variation of the objective function value for the two-batch mixed lot.
Figure 3. Coefficient of variation of the objective function value for the two-batch mixed lot.
Algorithms 15 00191 g003
Figure 4. Coefficient of variation of the objective function value for the three-batch mixed lot.
Figure 4. Coefficient of variation of the objective function value for the three-batch mixed lot.
Algorithms 15 00191 g004
Figure 5. Coefficient of variation of the objective function value for the four-batch mixed lot.
Figure 5. Coefficient of variation of the objective function value for the four-batch mixed lot.
Algorithms 15 00191 g005
Table 1. Objective function value summarized after 30 attempts (Chebyshev distance).
Table 1. Objective function value summarized after 30 attempts (Chebyshev distance).
ParameterSCLSCL-GREEDY(2)SCL-GREEDY(3)
Two-batch mixed lot (p < 0.00001)
K1 = 2K1 = 4K1 = 6
Min415.627371.1297301.2941
Max415.627371.1322337.3007
Mean415.627371.1301329.2811 1
σ6.38E-120.00077214.67288
V1.53E-120.0002084.456033
R2.80E-110.00255836.00667
Three-batch mixed lot (p < 0.00001)
K1 = 3K1 = 6K1 = 9
Min406.912397.815397.787
Max438.101417.747407.933
Mean431.002415.140406.966
σ8.3954.7832.167
V1.9481.1520.533
R31.18919.93310.146
Four-batch mixed lot (p = 0.61708)
K1 = 4K1 = 8K1 = 12
Min606.056596.008594.863
Max614.033892.644903.672
Mean609.644689.886704.368
σ2.167125.611136.492
V0.35518.20819.378
R7.977296.636308.810
1 The best mean values of objective function are given in bold if the difference between SCL algorithm and its greedy version is statistically significant at p ≤ 0.05 (statistical significance was tested using the Wilcoxon rank sum test).
Table 2. Objective function value summarized after 30 attempts (Euclidean distance).
Table 2. Objective function value summarized after 30 attempts (Euclidean distance).
ParameterSCLSCL-GREEDY(2)SCL-GREEDY(3)
Two-batch mixed lot (p < 0.00001)
K1 = 2K1 = 4K1 = 6
Min198.486193.825191.610
Max441.215395.174393.348
Mean296.329251.285206.390 1
σ104.21484.99250.003
V35.16833.82324.228
R198.486193.825191.610
Three-batch mixed lot (p = 0.00214)
K1 = 3K1 = 6K1 = 9
Min365.927274.081365.719
Max372.730370.565369.705
Mean371.892366.467367.961
σ2.05917.5181.459
V0.5544.7800.396
R6.80396.4843.986
Four-batch mixed lot (p = 0.0012)
K1 = 4K1 = 8K1 = 12
Min952.563939.212914.629
Max1142.8121133.9921108.637
Mean1113.3521084.1221071.603
σ55.09574.01263.228
V4.9496.8275.900
R952.563939.212914.629
1 The best mean values of objective function are given in bold if the difference between SCL algorithm and its greedy version is statistically significant at p ≤ 0.05 (statistical significance was tested using the Wilcoxon rank sum test).
Table 3. Objective function value summarized after 30 attempts (squared Euclidean distance).
Table 3. Objective function value summarized after 30 attempts (squared Euclidean distance).
ParameterSCLSCL-GREEDY(2)SCL-GREEDY(3)
Two-batch mixed lot (p < 0.00001)
K1 = 2K1 = 4K1 = 6
Min198.486193.825191.611
Max450.382413.390386.087
Mean312.271228.919205.969 1
σ100.74974.98747.677
V32.26332.75723.148
R251.896219.564194.475
Three-batch mixed lot (p < 0.00001)
K1 = 3K1 = 6K1 = 9
Min367.557366.012366.085
Max372.730370.568369.725
Mean372.527369.305368.306
σ0.9421.7151.324
V0.2530.4640.360
R5.1734.5563.640
Four-batch mixed lot (p = 0.02088)
K1 = 4K1 = 8K1 = 12
Min952.513939.456914.668
Max1142.0981129.0141112.335
Mean1097.7901099.9331067.170
σ73.90254.80669.570
V6.7324.9836.519
R189.584189.558197.668
1 The best mean values of objective function are given in bold if the difference between SCL algorithm and its greedy version is statistically significant at p ≤ 0.05 (statistical significance was tested using the Wilcoxon rank sum test).
Table 4. Objective function value summarized after 30 attempts (Mahalanobis distance).
Table 4. Objective function value summarized after 30 attempts (Mahalanobis distance).
ParameterSCLSCL-GREEDY(2)SCL-GREEDY(3)
Two-batch mixed lot (p = 0.06288)
K1 = 2K1 = 4K1 = 6
Min352.603359.915356.673
Max458.955439.944421.191
Mean399.825396.924388.449
σ24.15121.47115.522
V6.0405.4093.996
R106.35380.02964.519
Three-batch mixed lot (p = 0.00338)
K1 = 3K1 = 6K1 = 9
Min462.220455.998451.949
Max493.155493.356490.924
Mean484.589480.276476.854 1
σ7.0208.8928.978
V1.4491.8511.883
R30.93537.35838.976
Four-batch mixed lot (p = 0.00034)
K1 = 4K1 = 8K1 = 12
Min962.070980.266959.326
Max1084.9601129.0441109.918
Mean1050.0831044.941996.063
σ34.76063.59551.328
V3.3106.0865.153
R122.890148.778150.592
1 The best mean values of objective function are given in bold if the difference between SCL algorithm and its greedy version is statistically significant at p ≤ 0.05 (statistical significance was tested using the Wilcoxon rank sum test).
Table 5. Objective function value summarized after 30 attempts (Manhattan distance).
Table 5. Objective function value summarized after 30 attempts (Manhattan distance).
ParameterSCLSCL-GREEDY(2)SCL-GREEDY(3)
Two-batch mixed lot (p < 0.00001)
K1 = 2K1 = 4K1 = 6
Min198.486193.825191.609
Max437.413403.538396.210
Mean360.874248.205224.361 1
σ87.56988.92172.589
V24.26635.82632.354
R238.927209.712204.602
Three-batch mixed lot (p < 0.00001)
K1 = 3K1 = 6K1 = 9
Min372.016366.748365.858
Max372.730370.567369.709
Mean372.636369.982368.337
σ0.1791.2561.271
V0.0480.3400.345
R0.7143.8193.851
Four-batch mixed lot (p = 0.00096)
K1 = 4K1 = 8K1 = 12
Min952.637939.453912.548
Max1140.9601131.1181109.011
Mean1114.0741077.7801062.806
σ55.07577.85368.147
V4.9447.2236.412
R188.324191.665196.463
1 The best mean values of objective function are given in bold if the difference between SCL algorithm and its greedy version is statistically significant at p ≤ 0.05 (statistical significance was tested using the Wilcoxon rank sum test).
Table 6. Accuracy of device clustering for BVQ, SOM, BVQ-GREEDY(2), BVQ-GREEDY(3), SOM-GREEDY(2) and SOM-GREEDY(3) algorithms (two-batch mixed lot).
Table 6. Accuracy of device clustering for BVQ, SOM, BVQ-GREEDY(2), BVQ-GREEDY(3), SOM-GREEDY(2) and SOM-GREEDY(3) algorithms (two-batch mixed lot).
AlgorithmInitialization MethodChDEuDSEuDMahDManD
BVQaverage0.981111
k-means0.981111
random0.98110.621
BVQ-GREEDY(2)average0.991111
k-means11111
random0.991111
BVQ-GREEDY(3)average11111
k-means11111
random11111
SOMaverage11111
k-means11111
random11111
SOM-GREEDY(2)average11111
k-means11111
random1110.631
SOM-GREEDY(3)average11111
k-means11111
random1110.621
k-meansrandom0.990.980.980.670.99
Table 7. Accuracy of device clustering for BVQ, SOM, BVQ-GREEDY(2), BVQ-GREEDY(3), SOM-GREEDY(2) and SOM-GREEDY(3) algorithms (three-batch mixed lot).
Table 7. Accuracy of device clustering for BVQ, SOM, BVQ-GREEDY(2), BVQ-GREEDY(3), SOM-GREEDY(2) and SOM-GREEDY(3) algorithms (three-batch mixed lot).
AlgorithmInitialization MethodChDEuDSEuDMahDManD
BVQaverage0.720.640.640.940.64
k-means0.970.640.640.920.64
random0.600.640.640.620.64
BVQ-GREEDY(2)average0.95110.951
k-means0.95110.921
random0.950.640.640.510.64
BVQ-GREEDY(3)average0.95110.951
k-means0.950.9810.940.98
random0.960.640.640.390.64
SOMaverage0.690.570.980.930.98
k-means0.70.980.980.910.98
random0.70.980.560.880.98
SOM-GREEDY(2)average0.960.640.640.940.64
k-means0.960.640.640.940.64
random0.700.640.640.440.64
SOM-GREEDY(3)average0.960.640.640.950.64
k-means0.960.640.640.930.64
random0.960.640.640.400.64
k-meansrandom0.620.630.660.490.63
Table 8. Accuracy of device clustering for BVQ, SOM, BVQ-GREEDY(2), BVQ-GREEDY(3), SOM-GREEDY(2) and SOM-GREEDY(3) algorithms (four-batch mixed lot).
Table 8. Accuracy of device clustering for BVQ, SOM, BVQ-GREEDY(2), BVQ-GREEDY(3), SOM-GREEDY(2) and SOM-GREEDY(3) algorithms (four-batch mixed lot).
AlgorithmInitialization MethodChDEuDSEuDMahDManD
BVQaverage0.390.600.600.650.63
k-means0.760.600.600.490.48
random0.710.590.590.520.59
BVQ-GREEDY(2)average0.990.990.640.500.64
k-means0.920.920.920.500.99
random0.990.590.590.580.59
BVQ-GREEDY(3)average0.990.990.990.500.99
k-means0.740.740.740.570.99
random0.990.590.740.560.59
SOMaverage0.990.990.990.970.98
k-means0.380.980.650.980.98
random0.990.650.680.590.49
SOM-GREEDY(2)average0.990.750.750.470.47
k-means0.720.740.740.520.59
random0.640.580.580.560.59
SOM-GREEDY(3)average0.640.620.740.470.48
k-means0.580.740.740.570.59
random0.660.590.590.560.37
k-meansrandom0.390.650.650.650.66
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shkaberina, G.; Verenev, L.; Tovbis, E.; Rezova, N.; Kazakovtsev, L. Clustering Algorithm with a Greedy Agglomerative Heuristic and Special Distance Measures. Algorithms 2022, 15, 191. https://doi.org/10.3390/a15060191

AMA Style

Shkaberina G, Verenev L, Tovbis E, Rezova N, Kazakovtsev L. Clustering Algorithm with a Greedy Agglomerative Heuristic and Special Distance Measures. Algorithms. 2022; 15(6):191. https://doi.org/10.3390/a15060191

Chicago/Turabian Style

Shkaberina, Guzel, Leonid Verenev, Elena Tovbis, Natalia Rezova, and Lev Kazakovtsev. 2022. "Clustering Algorithm with a Greedy Agglomerative Heuristic and Special Distance Measures" Algorithms 15, no. 6: 191. https://doi.org/10.3390/a15060191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop