A Novel Adaptive Kernel Picture Fuzzy C-Means Clustering Algorithm Based on Grey Wolf Optimizer Algorithm

Yang, Can-Ming; Liu, Ye; Wang, Yi-Ting; Li, Yan-Ping; Hou, Wen-Hui; Duan, Sheng; Wang, Jian-Qiang

doi:10.3390/sym14071442

Open AccessArticle

A Novel Adaptive Kernel Picture Fuzzy C-Means Clustering Algorithm Based on Grey Wolf Optimizer Algorithm

by

Can-Ming Yang

¹,

Ye Liu

²,

Yi-Ting Wang

²,

Yan-Ping Li

²,

Wen-Hui Hou

²,

Sheng Duan

^3,* and

Jian-Qiang Wang

^2,*

¹

Library, Guilin University of Technology, Guilin 541004, China

²

School of Business, Central South University, Changsha 410083, China

³

College of Computer and Artificial Intelligence, Xiangnan University, Chenzhou 423038, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2022, 14(7), 1442; https://doi.org/10.3390/sym14071442

Submission received: 18 June 2022 / Revised: 10 July 2022 / Accepted: 11 July 2022 / Published: 13 July 2022

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

Over the years, research on fuzzy clustering algorithms has attracted the attention of many researchers, and they have been applied to various areas, such as image segmentation and data clustering. Various fuzzy clustering algorithms have been put forward based on the initial Fuzzy C-Means clustering (FCM) with Euclidean distance. However, the existing fuzzy clustering approaches ignore two problems. Firstly, clustering algorithms based on Euclidean distance have a high error rate, and are more sensitive to noise and outliers. Secondly, the parameters of the fuzzy clustering algorithms are hard to determine. In practice, they are often determined by the user’s experience, which results in poor performance of the clustering algorithm. Therefore, considering the above deficiencies, this paper proposes a novel fuzzy clustering algorithm by combining the Gaussian kernel function and Grey Wolf Optimizer (GWO), called Kernel-based Picture Fuzzy C-Means clustering with Grey Wolf Optimizer (KPFCM-GWO). In KPFCM-GWO, the Gaussian kernel function is used as a symmetrical measure of distance between data points and cluster centers, and the GWO is utilized to determine the parameter values of PFCM. To verify the validity of KPFCM-GWO, a comparative study was conducted. The experimental results indicate that KPFCM-GWO outperforms other clustering methods, and the improvement of KPFCM-GWO is mainly attributed to the combination of the Gaussian kernel function and the parameter optimization capability of the GWO. What is more, the paper applies KPFCM-GWO to analyzes the value of an airline’s customers, and five levels of customer categories are defined.

Keywords:

fuzzy c-means clustering; picture fuzzy sets; kernel function; grey wolf optimizer; picture fuzzy clustering

1. Introduction

Clustering analysis is performed to classify a group of objects into different categories, in which objects in the same cluster will be more similar than object in other clusters [1]. It is often used to discover the structure of data that is invisible. According to the range of membership, clustering algorithms can be divided into two categories: hard clustering and soft clustering. Particularly, Fuzzy C-Means clustering (FCM) is a typical center-based soft clustering algorithm [2]. It determines which class the data belongs to by comparing the membership degree,

u_{i j}

, of the data points to each clustering center. The value of the membership degree is determined by the distance or similarity of the data points to the cluster center.

The FCM approach has been used in various fields, such as face recognition [3], intrusion detection [4], customer segmentation [5], image segmentation [6], and so forth. Not satisfied with the clustering effect of the original FCM method, numerous scholars have proposed improved methods to enhance its effectiveness. The improvement of the algorithm has three main directions, optimization of distance, selection of parameters or initial clustering centers, and introduction of other fuzzy sets. To address the problem of noise sensitivity deficiency of the FCM method in dealing with large datasets, Nikhil and Kuhu proposed a novel fuzzy clustering method called Probabilistic Fuzzy C-Means (PFCM) clustering, which combined the probabilistic c-means algorithm with the FCM method [7]. Stelios and Vassilios presented a Fuzzy Local Information C-Means clustering algorithm (FLICM) by considering the local spatial information and grey information to improve the accuracy of image segmentation [8]. Another improvement that has received attention is the introduction of other fuzzy sets into the FCM algorithm to make the information representation in the data more complete, and thus to improve the clustering effect [9]. Hwang and Rhee introduced the interval type-2 fuzzy sets into the FCM algorithm, and constructed the Interval Type-2 Fuzzy C-Means clustering method (IT2FCM) [10]. This method utilizes two fuzzy parameters (

m_{1}

and

m_{2}

) to control the fuzziness of the final clusters. Since the computation time of the IT2FCM method is excessive, some researchers have turned their attention to Intuitionistic Fuzzy Sets (IFS). Xu and Wu put forward the intuitionistic fuzzy c-means clustering approach [11]. Subsequently, a hesitant fuzzy c-means clustering method was raised, and a novel fuzzification and defuzzification method of image information was defined [12]. Hou proposed a novel proximity measure to substitute the traditional distance/similarity measure in the IFCM algorithm, then introduced the Genetic Algorithm (GA) to select the optimal parameters [13].

Considering the neutral degree of decision-makers, the Picture Fuzzy Set (PFS) was proposed based on the IFS [14]. A picture fuzzy set-based approach can be used in situations that contain more ideas or emotions, and it provides more uncertain information and improves clustering accuracy in clustering algorithms. Therefore, many clustering methods based on the Picture Fuzzy C-Means clustering algorithm (PFCM) have been developed by scholars. Considering that the traditional distributed fuzzy clustering algorithm requires huge computational cost with suboptimal effect, Son [15] proposed a new Distributed Picture Fuzzy C-Means clustering algorithm (DPFCM) via the concept of PFS. Then, Thong and Son summarized the conventional Picture Fuzzy C-Means clustering algorithm model (FC-PFS) on the basis of DPFCM. Subsequently, a comparative study showed that the clustering effect of the FC-PFS algorithm is significantly better than other related fuzzy clustering algorithms [9]. After that, many improved methods were put forward, such as AFC-PFC [16], PFCM-CD [17], and AEWPFCM [18].

Even though the picture fuzzy clustering algorithm performs well, there are still two problems that need to be solved. First of all, the traditional FCM method, including PFCM, utilizes Euclidean distances between data points and cluster centers. However, relevant experimental results signify that Euclidean distance for clustering algorithms has a high error rate, and is more sensitive to noise and outliers [9]. Furthermore, all features in the Euclidean metric contribute equally to the clustering results. The other issue is that there are numerous parameter settings involved in the PFCM algorithm, and the selection of parameters directly affects the final clustering effect. In the manual setting, it may be difficult to find the most suitable parameter values, and may require different parameter values when processing different datasets.

For the first problem, considering that kernel function has stronger distance representation ability, researchers constructed clustering algorithms combining kernel function with FCM. For example, Graves demonstrated that the Kernel-based FCM method (KFCM) outperformed the traditional FCM algorithm, and that the accuracy of clustering is improved, by implementing a comparative study [19]. However, there is a non-negligible problem that the performance of the KFCM algorithm is sensitive to the values of the parameters taken in the kernel function. In addition, Zhou and Zhang put forward a novel kernel-based IFCM algorithm, and also adopted the State Transition Algorithm (STA) for obtaining the initial cluster centers [20]. Wu and Cao introduced the entropy-like kernel function into the FCM algorithm, with local information constraints [21].

For the second problem, applying an intelligent optimization algorithm is a good choice. For example, the GA is able to convert the process of problem solving into a process similar to the crossover and mutation of chromosomal genes in biological evolution. It can be used to select the optimal parameters of the FCM algorithm, as conducted by Lin [22] and Chou [23]. Moreover, the particle swarm algorithm (PSO) is also very popular for solving this issue. Thong obtained the suitable cluster number, cluster center, and membership by combining PSO with PFCM [16]. Then, Zhang presented a novel FCM-ELPSO for solving the problem that PSO-based clustering methods have poor execution times [24]. Beyond that, other intelligent optimization algorithms are often used in parameter selection problems, such as Ant Colony Optimization (ACO) [25], Artificial Bee Colony (ABC) [26], and Grey Wolf Optimizer (GWO) [27].

Considering the advantages of kernel functions and intelligent optimization algorithms, this paper proposed a Kernel-based Picture Fuzzy C-Means clustering with Grey Wolf Optimizer (KPFCM-GWO) for solving the two abovementioned issues simultaneously. KPFCM-GWO can not only better represent the distance between two sample points, but also identify suitable parameter values for enhancing the performance of the proposed method, according to the characteristics of datasets.

The remainder of this paper is structured as follows: The relative background knowledge of basic concepts and operators associated with fuzzy clustering algorithms and PFSs are introduced briefly in Section 2. The proposed method combining kernel functions with GWO is presented in Section 3. Section 4 completes two experiments using KPFCM-GWO in the context of a machine learning dataset and practical case dataset. Finally, Section 5 presents the conclusion of this paper, and offers some recommendations for further research regarding PFCM.

2. Preliminaries

2.1. Fuzzy C-Means Clustering Algorithm

Differing from traditional hard clustering algorithms, fuzzy clustering algorithms use a fuzzy matrix,

u

, with

N

rows and

C

columns to demonstrate the likelihood that each data point belongs to each cluster. The fuzzy matrix,

u

, consists of a group of elements,

u_{k j}

, which represent the membership degree of the

k

th data point belonging to the

j

th cluster. The membership degree,

u_{k j}

(where

u_{k j} \in [0, 1]

and

\sum_{j = 1}^{C} u_{k j} = 1

), is usually measured by distance or similarity. The mathematical model of the FCM algorithm is defined as follows [2]:

M i n : J = \sum_{k = 1}^{N} \sum_{j = 1}^{C} u_{k j}^{m} ‖ X_{k} - V_{j} ‖^{2}

(1)

where:

$N$ indicates the number of data points. Each data point is represented in $d$ dimensions;
$C$ indicates the number of clusters of the dataset that need to be set up in advance ( $2 \leq C \leq N$ );
$m$ is the value of fuzzifier degree. A small $m$ results in larger memberships, $u_{k j}$ . The value of m generally depends on human knowledge and experience, $m$ is commonly set to 2;
$u_{k j}$ is the membership degree of the data point $X_{k}$ belonging to the cluster $C_{j}$ ;
$X = {X_{1}, X_{2}, \dots, X_{N}}$ is the dataset in the feature space;
$V = {V_{1}, V_{2}, \dots, V_{C}}$ is the set of cluster centers, where each element, $V_{j}$ , is the center of cluster $C_{j}$ in $d$ dimensions;
$‖ . ‖$ denotes the Euclidean distance measure.

Applying the Lagrangian method to solve the above equations, the membership degrees,

u_{k j}

, of the affiliation matrix and clustering centers,

V_{j}

, are obtained as follows:

V_{j} = \frac{\sum_{k = 1}^{N} u_{k j}^{m} X_{k}}{\sum_{k = 1}^{N} u_{k j}^{m}}

(2)

u_{k j} = \frac{1}{\sum_{i = 1}^{C} {(\frac{‖ X_{k} - V_{j} ‖}{X_{k} - V_{i}})}^{\frac{2}{m - 1}}}

(3)

Since FCM has better local search ability and faster convergence speed compared to a traditional hard clustering algorithm, it is widely applied in many practical cases. However, the fuzzy clustering algorithm also has some defects that cannot be ignored. The use of Euclidean measures in clustering algorithms has a high error rate. Additionally, the choice of parameters can also have an impact on the effectiveness of the algorithm, and improper parameters tend to trap the algorithm in local minimum. For reducing the limitations and enhancing the accuracy of the algorithm, it is required to consider how to choose the appropriate algorithm parameters, the distance/similarity measure, and the initial clustering center determination, etc.

2.2. Picture Fuzzy C-Means Clustering Algorithm

Definition 1

[14].A picture fuzzy set in a nonempty set X is,

A = {⟨ x, μ_{A} (x), η_{A} (x), γ_{A} (x) ⟩ | x \in X}

(4)

where

μ_{A} (x)

,

η_{A} (x)

, and

γ_{A} (x)

indicate the positive degree, neutral degree, and negative degree of data point x, respectively. Moreover, these three values satisfy the following constraints:

μ_{A} (x), η_{A} (x), γ_{A} (x) \in [0, 1], 0 \leq μ_{A} (x) + η_{A} (x) + γ_{A} (x) \leq 1,

Thus, the refusal degree could be measured as

ξ_{A} (x) = 1 - (μ_{A} (x) + η_{A} (x) + γ_{A} (x))

. Using the Yager intuitionistic fuzzy complement function [28], the refusal degree is also described as

ξ_{A} (x) = 1 - (μ_{A} (x) + η_{A} (x)) - {(1 - {(μ_{A} (x) + η_{A} (x))}^{α})}^{\frac{1}{α}}

. The parameter

α \in (0.1]

is an exponent coefficient that is commonly set to 0.5.

Combining the original FCM algorithm with PFS, Son put forward a novel fuzzy clustering method, named the Picture Fuzzy C-Means clustering algorithm (PFCM) [15]. On the basis of the mathematical model of FCM, the mathematical model of PFCM is adjusted as follows:

M i n : J = \sum_{k = 1}^{N} \sum_{j = 1}^{C} {(u_{k j} (2 - ξ_{k j}))}^{m} ‖ X_{k} - V_{j} ‖^{2} + \sum_{k = 1}^{N} \sum_{j = 1}^{C} η_{k j} (\log η_{k j} + ξ_{k j})

(5)

The constraints are defined as follows:

{\begin{matrix} u_{k j} + η_{k j} + ξ_{k j} \leq 1, \\ \sum_{j = 1}^{C} (u_{k j} (2 - ξ_{k j})) = 1 \\ \sum_{j = 1}^{C} (η_{k j} + \frac{ξ_{k j}}{C}) = 1 \end{matrix}

(6)

Applying the Lagrangian method to determine the optimal solution of the above model (6) and (7) results in the subsequent equations:

ξ_{k j} = 1 - (u_{k j} + η_{k j}) - {(1 - {(u_{k j} + η_{k j})}^{α})}^{\frac{1}{α}}

(7)

u_{k j} = \frac{1}{\sum_{i = 1}^{C} (2 - ξ_{k j}) {(\frac{‖ X_{k} - V_{j} ‖}{‖ X_{k} - V_{i} ‖})}^{\frac{2}{m - 1}}}

(8)

η_{k j} = \frac{e^{- ξ_{k j}}}{\sum_{i = 1}^{C} e^{- ξ_{k i}}} (1 - \frac{1}{C} \sum_{i = 1}^{C} ξ_{k i})

(9)

V_{j} = \frac{\sum_{k = 1}^{N} {(u_{k j} (2 - ξ_{k j}))}^{m} X_{k}}{\sum_{k = 1}^{N} {(u_{k j} (2 - ξ_{k j}))}^{m}}

(10)

The computational process of the PFCM method is illustrated in Algorithm 1.

Algorithm 1 PFCM algorithm

I: Dataset

X

with number of data points (

N

) in

d

dimensions; number of clusters (

C

); threshold

ε

; fuzzifier

m

; exponent

α

, and the maximal number of iterations

m a x S t e p s > 0

O: Matrices

u, η, ξ

and the cluster centers,

V

.

1:

t = 0

, t is the number of current iterations;
2:

u_{k j}^{(t)} \leftarrow

random;

η_{k j}^{(t)} \leftarrow

random;

ξ_{k j}^{(t)} \leftarrow

random satisfying constraint (7);
3:

t = t + 1

;
4: Calculate current clustering center

V_{j}^{(t)}

using Equation (10);
5: Update

u_{k j}^{(t)}

,

η_{k j}^{(t)}

, and

ξ_{k j}^{(t)}

using Equations (7)–(9);
6: Repeat step (4)–step (7);
7: Continue until

‖ u^{(t)} - u^{(t - 1)} ‖ + ‖ η^{(t)} - η^{(t - 1)} ‖ + ‖ ξ^{(t)} - ξ^{(t - 1)} ‖ \leq ε

or

t > m a x S t e p s

.

Even though PFCM improves upon FCM and IFCM, and enhances the accuracy of the algorithm, there are still two limitations.

Firstly, the value of parameters, such as fuzzifier,

m

, and

α

of Yager’s generating function, are usually set by users, which depends on their experience and knowledge. However, unsuitable parameter values can lead to lower accuracy and performance of the algorithm. Secondly, the traditional fuzzy clustering algorithm often utilizes the Euclidean distance measure in the objective function to estimate the distance or similarity between data points and cluster centers. It is proven that the use of Euclidean distance in fuzzy clustering algorithms often leads to higher error rates and greater sensitivity to noise and outliers [29].

To cope with the two disadvantages mentioned above, we combine the kernel function and Grey Wolf Optimizer (GWO) to select the optimal parameters.

3. Proposed PFCM Algorithm Based on Kernel Function and GWO

The proposed KPFCM-GWO algorithm is an adaptive fuzzy clustering method that can be split into two parts: (1) Kernel-based Picture Fuzzy C-Means clustering (KPFCM) and (2) parameter selection of KPFCM with GWO.

3.1. Kernel-Based Picture Fuzzy C-Means Clustering (KPFCM)

Over the years, kernel-based approaches have received great attention, and have been used in many fields, such as face recognition [30], image classification [31], forecasting [32], damage detection [33], and so on. The kernel-based approach enables the feature space of the original dimension to be mapped to a higher dimensional feature space through arbitrary nonlinearity using the kernel equations. This feature of the kernel equation could enhance the expressiveness of linear machines, and its use in clustering algorithms can improve their clustering power and accuracy. The following classes of kernel equations are commonly used in algorithms.

Gaussian kernel function:

$K (x, y) = e x p (- \frac{‖ x - y ‖^{2}}{σ^{2}})$

(11)
Linear kernel function:

$K (x, y) = x^{T} y$

(12)
Radial basis kernel function:

$K (x, y) = e x p (- \frac{\sum_{i} {| x_{i}^{a} - y_{i}^{a} |}^{b}}{σ^{2}}) (0 < b \leq 2)$

(13)
Hyper tangent kernel function:

$K (x, y) = 1 - \tanh (- \frac{‖ x - y ‖^{2}}{σ^{2}})$

(14)

where $x$ and $y$ indicate two different data points that are mutually independent, and $σ$ represents the standard deviation that should be set in advance.

Based on the original PFCM method, the proposed KPFCM algorithm utilized the kernel function to substitute Euclidean distance measure, which is used in the objective function to calculate the distance or similarity between data points and cluster centers. Correspondingly, the mathematical model of the KPFCM algorithm is adjusted as follows:

M i n : J = \sum_{k = 1}^{N} \sum_{j = 1}^{C} {(u_{k j} (2 - ξ_{k j}))}^{m} ‖ ϕ (X_{k}), ϕ {(V_{j}) ‖}^{2} + \sum_{k = 1}^{N} \sum_{j = 1}^{C} η_{k j} (\log η_{k j} + ξ_{k j})

(15)

where

‖ ϕ (X_{k}), ϕ {(V_{j}) ‖}^{2}

indicates the Euclidean distance between two kernel spaces,

ϕ (X_{k})

and

ϕ (V_{j})

. The two kernel spaces are calculated by Gaussian kernel function using Equation (16). Then, the target function of the above model can be defined as Equation (17).

‖ ϕ (X_{k}), ϕ {(V_{j}) ‖}^{2} = K (X_{k}, X_{k}) + K (V_{j}, V_{j}) - 2 K (X_{k}, V_{j}) = 2 (1 - K (X_{k}, V_{j}))

(16)

M i n : J = 2 \sum_{k = 1}^{N} \sum_{j = 1}^{C} {(u_{k j} (2 - ξ_{k j}))}^{m} (1 - K (X_{k}, V_{j})) + \sum_{k = 1}^{N} \sum_{j = 1}^{C} η_{k j} (\log η_{k j} + ξ_{k j})

(17)

The constraints are defined as follows:

{\begin{matrix} u_{k j} + η_{k j} + ξ_{k j} \leq 1, \\ \sum_{j = 1}^{C} (u_{k j} (2 - ξ_{k j})) = 1 \\ \sum_{j = 1}^{C} (η_{k j} + \frac{ξ_{k j}}{C}) = 1 \end{matrix}

(18)

Now, applying the Lagrangian method to calculate the optimal solution of the above model (18) and (19), we get:

u_{k j} = \frac{1}{\sum_{i = 1}^{C} (2 - ξ_{k j}) {(\frac{1 - K (X_{k}, V_{j})}{1 - K (X_{k}, V_{i})})}^{\frac{1}{m - 1}}}

(19)

η_{k j} = \frac{e^{- ξ_{k j}}}{\sum_{i = 1}^{C} e^{- ξ_{k i}}} (1 - \frac{1}{C} \sum_{i = 1}^{C} ξ_{k i})

(20)

ξ_{k j} = 1 - (u_{k j} + η_{k j}) - {(1 - {(u_{k j} + η_{k j})}^{α})}^{\frac{1}{α}}

(21)

V_{j} = \frac{\sum_{k = 1}^{N} {(u_{k j} (2 - ξ_{k j}))}^{m} K (X_{k}, V_{j}) X_{k}}{\sum_{k = 1}^{N} {(u_{k j} (2 - ξ_{k j}))}^{m} K (X_{k}, V_{j})}

(22)

Proof.

Taking the deviation of

J

by

V_{j}

, we have:

\frac{\partial J}{\partial V_{j}} = 4 \sum_{k = 1}^{N} {(u_{k j} (2 - ξ_{k j}))}^{m} \cdot \frac{(X_{k} - V_{j}) ‖ X_{k} - V_{j} ‖^{2}}{σ^{4}} K (X_{k}, V_{j})

Let

\frac{\partial J}{\partial V_{j}} = 0

, then,

\sum_{k = 1}^{N} {(u_{k j} (2 - ξ_{k j}))}^{m} (X_{k} - V_{j}) K (X_{k}, V_{j}) = 0 V_{j} = \frac{\sum_{k = 1}^{N} {(u_{k j} (2 - ξ_{k j}))}^{m} K (X_{k}, V_{j}) X_{k}}{\sum_{k = 1}^{N} {(u_{k j} (2 - ξ_{k j}))}^{m} K (X_{k}, V_{j})}

The Lagrangian function with respect to

u

is,

L (u) = 2 \sum_{k = 1}^{N} \sum_{j = 1}^{C} {(u_{k j} (2 - ξ_{k j}))}^{m} (1 - K (X_{k}, V_{j})) + \sum_{k = 1}^{N} \sum_{j = 1}^{C} η_{k j} (\log η_{k j} + ξ_{k j}) - λ_{k} (\sum_{j = 1}^{C} (u_{k j} (2 - ξ_{k j})) - 1)

Let

\frac{\partial L (u)}{\partial u_{k j}} = 0

, then,

\frac{\partial L (u)}{\partial u_{k j}} = 2 m u_{k j}^{m - 1} {(2 - ξ_{k j})}^{m} (1 - K (X_{k}, V_{j})) - λ_{k} (2 - ξ_{k j}) = 0 u_{k j} = \frac{1}{2 - ξ_{k j}} {(\frac{λ_{k}}{2 m (1 - K (X_{k}, V_{j}))})}^{\frac{1}{m - 1}}

In addition, Lagrange’s multiplier method satisfies the following:

\sum_{j = 1}^{C} {(\frac{λ_{k}}{2 m (1 - K (X_{k}, V_{j}))})}^{\frac{1}{m - 1}} = 1

Then,

λ_{k} = {(\frac{1}{\sum_{j = 1}^{C} {(2 m (1 - K (X_{k}, V_{j})))}^{\frac{1}{1 - m}}})}^{m - 1}

Plugging

λ_{k}

into

u_{k j}

, we have:

u_{k j} = \frac{{(1 / (1 - K (X_{k}, V_{j})))}^{\frac{1}{m - 1}}}{\sum_{i = 1}^{C} (2 - ξ_{k j}) {(1 / (1 - K (X_{k}, V_{i})))}^{\frac{1}{m - 1}}}

Similarly, the Lagrangian function with respect to

η

is

L (η) = 2 \sum_{k = 1}^{N} \sum_{j = 1}^{C} {(u_{k j} (2 - ξ_{k j}))}^{m} (1 - K (X_{k}, V_{j})) + \sum_{k = 1}^{N} \sum_{j = 1}^{C} η_{k j} (\log η_{k j} + ξ_{k j}) - λ_{k} (\sum_{j = 1}^{C} (η_{k j} + \frac{ξ_{k j}}{C}) - 1)

Let

\frac{\partial L (u)}{\partial η_{k j}} = 0

, then,

\log η_{k j} + 1 - λ_{k} + ξ_{k j} = 0 η_{k j} = e^{λ_{k} - 1 - ξ_{k j}}

as well as

\sum_{j = 1}^{C} (e^{λ_{k} - 1 - ξ_{k j}} + \frac{ξ_{k j}}{C}) = 1

and subsequently

e^{λ_{k} - 1} = \frac{1 - \frac{1}{C} \sum_{j = 1}^{C} ξ_{k j}}{\sum_{j = 1}^{C} e^{- ξ_{k j}}}

Therefore,

η_{k j} = \frac{e^{- ξ_{k j}}}{\sum_{i = 1}^{C} e^{- ξ_{k i}}} (1 - \frac{1}{C} \sum_{i = 1}^{C} ξ_{k i})

The proof is complete. □

The algorithm calculates the current cluster center and the affiliation matrix during each iteration, and stops when the threshold condition is met or the maximum number of iterations is reached and outputs the current clustering result. The detailed iterative process of KPFCM algorithm is described in Algorithm 2:

Algorithm 2 KPFCM algorithm

I: Dataset

X

with number of data points (

N

) in

d

dimensions; number of clusters (

C

); threshold

ε

; fuzzifier

m

; exponent

α

; kernel function

σ

, and the maximal number of iterations

m a x S t e p s > 0

O: Matrices

u, η, ξ

and the cluster centers,

V

.

1:

t = 0

;
2:

u_{k j}^{(t)} \leftarrow

random;

η_{k j}^{(t)} \leftarrow

random;

ξ_{k j}^{(t)} \leftarrow

random satisfying constraint (19);
3:

t = t + 1

;
4: Calculate

V_{j}^{(t)}

via Equation (22);
5: Calculate

u_{k j}^{(t)}

,

η_{k j}^{(t)}

and

ξ_{k j}^{(t)}

via Equations (19)–(21);
6: Repeat;
7: Continue until

‖ u^{(t)} - u^{(t - 1)} ‖ + ‖ η^{(t)} - η^{(t - 1)} ‖ + ‖ ξ^{(t)} - ξ^{(t - 1)} ‖ \leq ε

or

t > m a x S t e p s

.

3.2. Parameter Selection of KPFCM with GWO

The selection of parameters also has a great impact on the performance of the fuzzy clustering algorithm, such as the fuzzier

m

, iteration number

m a x S t e p s

, and threshold

ε

. As we all know, a suitable parameter can effectively enhance the accuracy of the clustering algorithm. Therefore, we select parameters that will have an impact on the performance of the proposed algorithm to be studied in this paper. It has been suggested that the fuzzifier parameter,

m

, of the IFCM algorithm impacts its clustering effect [34]. Similarly, the parameter

m

will make a difference to the KPFCM-GWO algorithm. Beyond that, since the proposed method introduces the Gaussian kernel function, compared to other fuzzy clustering algorithms, there is one more parameter,

σ

, that needs to be considered. The parameter

σ

of the Gaussian kernel function controls the radial range of action. It has been confirmed that the value of the parameter

σ

of the Gaussian kernel function affects the performance of the algorithm [19]. Therefore, in order to select the optimal parameter values,

m

and

σ

, in the KPFCM algorithm, we utilize one of the intelligent optimization algorithms.

The Grey Wolf Optimization algorithm was proposed by Mirjalili et al. [27], and was inspired by the prey-catching behavior of wolves. The GWO algorithm has numerous benefits, such as few parameters, easy implementation, simple structure, the algorithm is equipped with a convergence factor and information feedback mechanism, and has a good balance between local searches and global searches. It has received wide attention from scholars, and has been successfully applied to various fields in recent years, especially in parameter optimization. Therefore, it is a good choice to introduce this algorithm to determine the parameter values. The mathematical model and procedure of GWO are defined below.

There is a severe set of stratification guidelines in the wolf pack, and the GWO algorithm describes its stratification through a mathematical model. The top three best wolves (optimal solutions) are defined as

α, β

, and

δ

, respectively, and they guide the other wolves to search toward the target. The remaining wolves (candidate solutions) are defined as ω, and they update their positions around

α, β

, or

δ

.

Encircling prey

Grey wolves gradually approach their prey and surround it when they let it go. The mathematical model of a grey wolf encircling its prey (

m

and

σ

) is as follows:

\vec{D} = | \vec{C} \cdot {\vec{X}}_{P} (t) - \vec{X} (t) |

(23)

\vec{X} (t + 1) = {\vec{X}}_{P} (t) - \vec{A} \cdot \vec{D} \vec{A} = 2 \vec{a} \cdot {\vec{r}}_{1} - \vec{a} \vec{C} = 2 \cdot {\vec{r}}_{2}

(24)

where

t

represents the current iterations,

\vec{A}

and

\vec{C}

are coefficient vectors,

{\vec{X}}_{P}

and

\vec{X}

are the position vector of the prey and the position vector of the grey wolf, respectively. Throughout the iterations,

\vec{a}

decreases linearly from 2 to 0.

{\vec{r}}_{1}

and

{\vec{r}}_{2}

are the random vectors in (0, 1).

Hunting prey

The grey wolf, led by the first three wolves

α, β

, and

δ

, can identify the location of the prey (optimal solution) and approach it step by step. In the algorithmic model, during each round of iteration, the first three wolves search for the location of the prey, and the remaining grey wolves update their own locations based on their location information. This process of finding the location of the prey can be simulated by the following mathematical model:

{\begin{matrix} {\vec{D}}_{α} = | {\vec{C}}_{1} \cdot {\vec{X}}_{α} - \vec{X} | \\ {\vec{D}}_{β} = | {\vec{C}}_{2} \cdot {\vec{X}}_{β} - \vec{X} | \\ {\vec{D}}_{δ} = | {\vec{C}}_{3} \cdot {\vec{X}}_{δ} - \vec{X} | \end{matrix}

(25)

\vec{X} (t + 1) = \frac{{\vec{X}}_{1} + {\vec{X}}_{2} + {\vec{X}}_{3}}{3}

(26)

Attacking prey

In order to simulate the approach to the prey, the value of

\vec{a}

is gradually reduced from 2 to 0, and therefore the fluctuation range of

\vec{A}

is also reduced. In other words, as the value of

\vec{a}

reduces linearly from 2 to 0 during the iterative process, the corresponding value of

\vec{A}

varies within the interval

(- a, a)

. When

| \vec{A} | < 1

, the next position of the grey wolf can be located anywhere between its current position and the prey’s position. In other words, the wolves attack the prey (fall into local optimum).

Search for prey

Grey wolves search for prey according to the position of

α, β

, and

δ

. Based on mathematical modeling of the scatter, random values of

\vec{A} > 1

or

\vec{A} < - 1

can be used to force the grey wolf to separate from its prey, which allows the GWO algorithm to globally search for optimal solutions. The GWO algorithm has another component,

\vec{C}

, which indicates the random weights of the prey effects of the location of the wolf to help discover new solutions. The stochasticity of

\vec{C}

plays a very important role in avoiding local optima when the algorithm is trapped in a local optimum and cannot be easily jumped out, especially in the final iteration where the global optimum solution needs to be obtained.

In this study, the KPFCM-GWO clustering algorithm combined PFCM and kernel function, and the GWO method is applied to determine the most suitable value of parameters of the proposed KPFCM to enhance the performance. The procedures of the KPFCM-GWO clustering algorithm are shown in Figure 1.

4. Experiments

4.1. Datasets

This pare performs two different experiments in five datasets. The first four datasets (Iris, Wine, Glass, and Wisconsin Diagnostic Breast Cancer (WDBC)) are benchmark datasets from UCI Machine Learning Repository [35]. The other is the airline customer dataset. The details of each dataset are shown in Table 1.

The Iris and Wine datasets are often used in experiments on machine learning algorithms, especially for clustering and classification algorithms. The classification of Glasses is performed by the content of different chemical elements. The range of values for each feature is relatively small, and the differentiation is not large, which is a test of the clustering algorithm. The WDBC dataset consists of medical data, which has huge elements and a small number of classes. These datasets have different distinctions. Consequently, the experiment outcomes with these datasets can be convinced. The above four benchmark datasets were utilized to exam the performance and advantages of the proposed algorithm and the specific procedure is shown in Experiment 1. Moreover, a comparison of different fuzzy clustering algorithms on benchmark datasets is also performed in Experiment 1.

In order to show the application of the proposed algorithm to a practical problem, Experiment 2 applies KPFCM-GWO to analyze the value of airline’s customers in Airline. The Airline dataset has 62988 elements and 25 attributes including airline membership information, member flight information and member points information, and it spans two years.

Some evaluation indexes are used to reveal the validity of the KPFCM-GWO algorithm in this paper. In general, there are two criteria to assess the performance of the clustering algorithm, internal quality evaluation metrics and external evaluation metrics. In this paper, four indexes are selected for evaluation.

Calinski–Harabasz (CH) takes into account the tightness within clusters as well as the separation between clusters. Thereby, a larger CH represents a tighter class itself and a more dispersed class to class, i.e., a better clustering result. The CH index can be formulated as Equation (27).

$C H (K) = \frac{\frac{t r (B)}{(K - 1)}}{\frac{t r (W)}{(N - K)}} t r (B) = \sum_{j = i}^{k} ‖ z_{j} - {z ‖}^{2} t r (W) = \sum_{j = 1}^{k} \sum ‖ x_{i} - z_{j} ‖^{2}$

(27)
Silhouette Coefficient (SC) assesses the effectiveness of clustering by combining the cohesiveness and separation of clusters. The value is between −1 and 1. The closer the samples of the same cluster are to each other, and the more distance between the samples of different clusters, the larger the value of SC will be, indicating a better clustering effect.
The Adjusted Rand Index (ARI) takes values in the range of −1 to 1. The closer the value of ARI is to 1, the closer the clustering result is to the real situation, and the better the clustering effect is.
The accuracy index refers to the ratio of the number of correctly predicted samples to the total number of predicted samples, and it does not consider whether the predicted samples are positive or negative cases [20].

4.2. Experiment 1

Firstly, we demonstrate the process of the proposed method to cluster the Iris dataset. In this case,

N = 150, d = 4, C = 3, α = 0.5, ε = 0.0000001, m a x S t e p s = 100, S e a r c h A g e n t s_{n o} = 6, d i m = 2

. The value of fuzzier

m

and parameter

σ

are determined by GWO that the range of

m

is 1–5, the range of

σ

is 1–20. The initial matrices of positive degree,

U

, neutral degree,

η

, and refusal degree,

ξ

, are randomly generated between 0 to 1, and the initial cluster center,

V

, are calculated by the original FCM algorithm, which are presented as follows:

U_{t = 0} = [\begin{matrix} 0.1963 & 0.0139 & 0.7897 \\ 0.4479 & 0.3122 & 0.2399 \\ \dots & \dots & \dots \\ 0.5215 & 0.2569 & 0.2216 \\ 0.7047 & 0.2404 & 0.0548 \end{matrix}] η_{t = 0} = [\begin{matrix} 0.3141 & 0.2419 & 0.1013 \\ 0.2431 & 0.2875 & 0.2822 \\ \dots & \dots & \dots \\ 0.2006 & 0.1891 & 0.2844 \\ 0.1080 & 0.2114 & 0.2564 \end{matrix}] ξ_{t = 0} = [\begin{matrix} 0.4080 & 0.4999 & 0.1059 \\ 0.2805 & 0.3494 & 0.4009 \\ \dots & \dots & \dots \\ 0.2553 & 0.4437 & 0.4107 \\ 0.1776 & 0.4407 & 0.4933 \end{matrix}] V_{t = 0} = [\begin{matrix} 5.8389 & 3.0193 & 3.8047 & 1.2251 \\ 5.8811 & 3.0636 & 3.8339 & 1.2271 \\ 5.7293 & 3.0886 & 3.5299 & 1.0986 \end{matrix}] U_{t = 100} = [\begin{matrix} 0.0562 & 0.0707 & 0.8731 \\ 0.0851 & 0.1078 & 0.8071 \\ \dots & \dots & \dots \\ 0.5998 & 0.3053 & 0.0949 \\ 0.5042 & 0.4174 & 0.0784 \end{matrix}] η_{t = 100} = [\begin{matrix} 0.2316 & 0.3031 & 0.0175 \\ 0.1822 & 0.3013 & 0.0188 \\ \dots & \dots & \dots \\ 0.1949 & 0.2931 & 0.2076 \\ 0.3247 & 0.2592 & 0.3193 \end{matrix}] ξ_{t = 100} = [\begin{matrix} 0.4973 & 0.4752 & 0.1062 \\ 0.4994 & 0.4610 & 0.1658 \\ \dots & \dots & \dots \\ 0.1935 & 0.3503 & 0.4950 \\ 0.1631 & 0.2919 & 0.4659 \end{matrix}] V_{t = 100} = [\begin{matrix} 4.9948 & 3.3993 & 1.4734 & 0.2423 \\ 6.4328 & 2.9520 & 5.2118 & 1.8863 \\ 5.7281 & 2.7119 & 4.1356 & 1.2724 \end{matrix}]

To eliminate interference as much as possible and simulate the real clustering situation, the sequence of the data points in the original Iris dataset is randomly broken up. Its initial distribution is shown in Figure 2. From this figure. We then perform fuzzy clustering on the broken-up dataset for verifying the effect of the KPFCM-GWO algorithm. After performing the proposed method, KPFCM-GWO, the final clustering result is shown in Figure 3. We can observe that the Iris dataset is clustered into three different clusters accurately. Adjusting the value of parameter

C

, and keeping the other parameter values constant, the algorithm is applied to the remaining three public datasets. The values of the corresponding evaluation indexes are shown in Table 2.

As shown in Table 2, the same clustering algorithm has different performances on different datasets. In particular, comparing the clustering outcomes of the Iris dataset and the Glass dataset. The Iris dataset has only 4 attributes, while the Glass dataset has up to 9 attributes. The data points of the 9 attributes in the Glass dataset are complex and are not easy to classify correctly. That is to say, fuzzy clustering algorithms do not perform as well on complex and noisy datasets as they do on standard datasets. Besides, it is also worth noting that the value of CH index of the WDBC experiment is the largest of the four datasets. This indicates that the tightness of the classes in the WDBC dataset and the separation between classes is high.

In the following, to verify the validity of KPFCM-GWO, Fuzzy C-Means clustering (FCM) [2], Intuitionistic Fuzzy C-Means clustering (IFCM) [36], Kernel-based Fuzzy C-Means clustering (KFCM) [19], Picture Fuzzy C-Means clustering (PFCM) [9] and Picture Fuzzy C-Means clustering with Particle Swarm Optimization algorithm (PFCM-PSO) [16] are selected for comparative experiment.

Figure 4 presents the clustering performance of the above comparison methods on four UCI datasets. As seen in Figure 4, on the Iris dataset, the IFCM algorithm obtains the best accuracy, while FCM and the proposed method have better CH values than other methods. Combined Table 3 with Figure 4 and Figure 5, each clustering algorithm yields different clustering results, and different evaluation factors behave differently on different clustering algorithms. For instance, in addition to the CH index, among the three remaining evaluation factors that take values less than 1, the accuracy for the Iris dataset achieves the maximum value regardless of the clustering algorithm with the range being 0.89–0.95. However, the clustering effect of the algorithm also decreases when clustering those datasets with noise, such as the Glass dataset, where the value of accuracy drops to 0.52–0.79. The same situation exists for the other evaluation factors. For example, the values of ARI for standard dataset such as Iris is higher with the range being 0.73 to 0.77, and other complex and noisy data such as Glass dataset would reduce the ARI ranges to 0.24–0.33. In the case of CH index, the complex and noisy dataset such as the Glass dataset would have a lower value, while the experiment in datasets with significant differences between the data points such as the WDBC dataset has a higher value of CH index.

From the above experimental procedure and comparative analysis, it is clear that the proposed algorithm is generally better than other fuzzy clustering algorithms, although it cannot achieve the best results in all datasets. The improvement in algorithm performance is mainly attributed to the combination of the Gaussian kernel function and the parameter optimization capability of the GWO.

There are also some parameters in the Grey Wolf Optimization algorithm that need to be set in advance. Apparently, as the number of wolves increases, the algorithm will take more time. Hence, does this mean that the more the number of wolves, the better the efficiency of the algorithm? How does one balance time complexity, algorithm efficiency, and the number of wolves? To clear up the above uncertainties, we ran the KPFCM-GWO algorithm with a different number of wolves for the four benchmark datasets, and measured the accuracy and silhouette coefficient of those cases. The results are presented in Figure 6 and Figure 7.

The outcomes in Figure 6 obviously state that a larger number of wolves does not make the algorithm more accurate, and the suitable range of the number of wolves should be between 5 and 8. It is clear that the accuracy values of the KPFCM-GWO method on the four datasets are maximal with 6. When considering another clustering index, silhouette coefficient, it is obvious that changes in the number of wolves did not have a significant effect on SC values. It can be roughly seen that the best range of values for the number of wolves, in this case, is between 5 and 7. When the wolf pack value is taken to be 6, each dataset achieves a high SC value. Therefore, in this paper, the number of wolves takes the value of 6.

4.3. Experiment 2

The increasingly competitive market requires airlines to adapt their marketing strategies to attract more customers to fly with them. An existing airline was facing a serious business crisis, including loss of customers to other competitors, declining competitiveness of the company, resulting in fewer and fewer new members, and waste of resources due to underutilization of the company’s resources. In order to solve these problems and improve the company’s competitiveness, the company intends to establish a set of customer value assessment models to accurately and reasonably classify the airline’s current customers and implement personalized marketing strategies for different types of customers.

Based on the customers’ basic information, flight information and points information in the airline system, a time period with a width of two years was selected as the observation window for analysis based on the last flight date (LAST_FLIGHT_DATE), with 31 March 2014 as the end time, and all customers with flight records from 1 April 2012 to 31 March 2014 were extracted to consist the historical data with a total of 62,988 records (for more detail see Table 4). In conjunction with the large amount of member profile information the airline has accumulated, and the flight records it has taken, the following objectives are requested:

Segmentation of customers with the help of airline customer data.
Characterize different customer categories and compare the value of different categories of customers.

In this experiment, the proposed method KPFCM-GWO is used to solve the customer value classification issue mentioned above.

A model is often used in customer value analysis and identification problems which identifies high-value customers by segmenting customer groups through recent consumption interval (Recency), consumption frequency (Frequency) and consumption amount (Monetary), as shown in Figure 8. The model consists of three elements: Recency, Frequency and Monetary, in short, RFM.

In the airline industry, since airline fares are influenced by all kinds of factors, such as flight distance, demand, seasonal factors, cabin class, and so on, it is unreasonable to measure the value of a customer to an airline based on the amount spent alone, and the traditional RFM model is not appropriate in this application scenario. Therefore, a novel customer value identification model is obtained by adding the customer’s accumulated flight miles

M

in the time period and the discount factor

C

corresponding to the class of travel to replace the customer’s spending amount in the original model on the basis of the RFM model. Moreover, the length of membership of airline members can also influence the customer value to a certain extent, so this paper incorporates the length of customer relationship

L

into the model and obtains the LRFMC model [37]. In summary, this case uses the duration of membership

L

, consumption interval

R

, consumption frequency

F

, flight mileage

M

and average discount factor

C

as airline identification customer value indicators, and the specific meaning of each indicator is shown in Table 5 below.

The original data contained 62,988 records with 44 attributes, but there were missing values and redundant or irrelevant attributes that required advanced data preprocessing before they could be used. Therefore, we first deleted the records with 0 fares (SUM_YR = 0) and kept the records with nonzero fares (SUM_YR

\neq

0) or nonzero average discount rates (AVG_DISCOUNT

\neq

0), and total flight kilometers greater than 0 (SEG_KM_SUM

\neq

0). After data cleaning, the data comprised 62,043 records and 44 attributes. Then, based on the LRFMC model, here, six attributes related to the model, FFP_DATE, LOAD_TIME, FLIGHT_COUNT, AVG_DISCOUNT, SEG_KM_SUM, and LAST_TO_END, were selected, and their irrelevant, weakly relevant, or redundant attributes are removed. Finally, due to the large data variation in the range of values of the selected indicators, the dataset is standardized by Z-score [38] in order to eliminate the effects of order of magnitude data, and to better meet the needs of the clustering algorithm.

In this step, applying the proposed method, KPFCM-GWO, to customer data for customer classification and the number of clusters is 5. The clustering result is shown in Table 6 and Figure 9. Feature analysis is performed for the clustering results. Customer group 1 is largest on attribute C and smallest on attribute F. Customer group 3 is smallest on attributes L and M. Customer group 4 is smallest on attribute C. Customer group 5 is largest attribute L, and F and M are smallest attributes for R.

U_{t = 0} = [\begin{matrix} 0.2401 & 0.2622 & 0.1082 & 0.1400 & 0.2494 \\ 0.1276 & 0.1019 & 0.2493 & 0.3438 & 0.1773 \\ \dots & \dots & \dots & \dots & \dots \\ 0.0729 & 0.1467 & 0.2523 & 0.0724 & 0.4557 \\ 0.3194 & 0.0795 & 0.3465 & 0.1355 & 0.1191 \end{matrix}] V_{t = 0} = [\begin{matrix} - 0.2911 & - 0.4253 & 0.4238 & 0.6219 & 0.4025 \\ 0.8403 & - 0.8341 & 3.4391 & 3.9864 & 0.3975 \\ 0.6669 & - 0.5792 & 0.9228 & 0.8808 & 0.4577 \\ 0.0426 & - 0.5021 & 0.6334 & 0.7522 & 0.2676 \\ 0.6987 & - 0.8314 & 5.4953 & 10.7046 & 1.0534 \end{matrix}] U_{t = 100} = [\begin{matrix} 0.1704 & 0.1276 & 0.1255 & 0.4619 & 0.1145 \\ 0.0771 & 0.0416 & 0.0399 & 0.8073 & 0.0342 \\ \dots & \dots & \dots & \dots & \dots \\ 0.2514 & 0.3221 & 0.3103 & 0.0112 & 0.1051 \\ 0.2823 & 0.3982 & 0.2032 & 0.0123 & 0.1037 \end{matrix}] V_{t = 100} = [\begin{matrix} 0.0593 & - 0.1977 & - 0.2353 & 2.1068 & 1.7729 \\ - 0.3328 & - 0.7509 & 1.3731 & 0.7999 & - 0.0403 \\ - 0.7618 & - 0.7053 & 0.4666 & 0.7435 & 0.0125 \\ 0.8262 & - 0.6901 & 0.7813 & 0.8079 & - 0.1205 \\ 1.2951 & - 0.8648 & 2.8264 & 3.4351 & 0.4086 \end{matrix}]

Based on the above description of characteristics, five levels of customer categories are defined: important retention customers, important development customers, important retention customers, average customers, and low-value customers.

Important retention customers (customer group 1) are high-value customers for airlines, the most desirable type of customer, contributing the most to the airline, yet accounting for a smaller percentage. Airlines should prioritize resources to these customers for differentiation and one-to-one marketing to increase the loyalty and satisfaction of these customers, and to increase the high level of spending of these customers as much as possible.
Important development customers (customer group 3) are potential value customers for airlines. The airline has to make efforts to encourage such customers to increase their spending on flights with the company and with its partners, to increase their transfer costs to competitors, and to make them gradually become loyal customers of the company.
Important retention customers (customer group 5) have high uncertainty of changes in customer value. Based on the changes of the recent consumption time and the number of consumptions of these customers, airlines should speculate the variation of customer consumption, and focus on tracking contact with them and adopt certain marketing means to extend the life cycle of customer consumption.
Average customers and low-value customers (customer group 2 and 4) that they may only fly with the company when airline tickets are on sale at a discount.

5. Conclusions

To solve two problems of the original fuzzy clustering algorithm (i.e., the poor performance of Euclidean distance and the hard selection of parameters), this paper proposed a novel adaptive Kernel-based Picture Fuzzy Clustering algorithm with Grey Wolf Optimizer (KPFPCM-GWO), in which the Gaussian kernel function is used to substitute the Euclidean distance measure, and GWO is introduced to determine the values of the two parameters (

m

and

σ

) of the proposed algorithm. Furthermore, two experiments are conducted on five datasets for verifying the validity of KPFPCM-GWO. The experimental results indicate that the proposed algorithm KPFCM-GWO has good performance, and is generally better than other comparative methods.

However, the proposed algorithm has an important limitation. We applied GWO to select the parameters that improve the clustering validity, but the computation time of the algorithm is also greatly increased. Compared to the increase in time cost, the improvement in algorithmic efficiency is not very significant. Apart from that, the determination of the initial clustering center also has a great influence on the clustering effect of the algorithm. In future research, when improving the fuzzy clustering algorithm, the computation time should be taken into consideration for balancing the improvement of the algorithm performance and the time cost. It is also important to consider how to choose the initial clustering center to achieve very fast clustering and avoid falling into local minimum.

Author Contributions

Formal analysis, C.-M.Y.; Investigation, Y.-T.W., W.-H.H., Y.-P.L. and Y.L.; Methodology, C.-M.Y., Y.L., W.-H.H., Y.-T.W., S.D. and Y.-P.L.; Resources, S.D. and J.-Q.W.; Supervision, J.-Q.W.; Writing—original draft, C.-M.Y., Y.-T.W. and S.D.; Writing—review and editing, Y.L., Y.-P.L., W.-H.H. and S.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The partial datasets used in this study are from the UCI database, which is publicly available at http://archive.ics.uci.edu/ml (accessed on 12 January 2022).

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

References

Jain, A.K.; Murty, M.N.; Flynn, P.J. Data clustering: A review. ACM Comput. Surv. 1999, 31, 264–323. [Google Scholar] [CrossRef]
Bezdek, J.C.; Ehrlich, R.; Full, W. FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 1984, 10, 191–203. [Google Scholar] [CrossRef]
Shadi, A.; Mahmoud, A.; Yaser, J.; Mohammed, A. Enhanced 3D segmentation techniques for reconstructed 3D medical volumes: Robust and accurate intelligent system. Procedia Comput. Sci. 2017, 113, 531–538. [Google Scholar] [CrossRef]
Chen, M.M.; Wang, N.; Zhou, H.B.; Chen, Y.Z. FCM technique for efficient intrusion detection system for wireless networks in cloud environment. Comput. Electr. Eng. 2018, 71, 978–987. [Google Scholar] [CrossRef]
Lee, Z.J.; Lee, C.Y.; Chang, L.Y.; Sano, N. Clustering and classification based on distributed automatic feature engineering for customer segmentation. Symmetry 2021, 13, 1557. [Google Scholar] [CrossRef]
Hanuman, V.; Deepa, V.; Pawan, K.T. A population based hybrid FCM-PSO algorithm for clustering analysis and segmentation of brain image. Expert Syst. Appl. 2021, 167, 114121. [Google Scholar] [CrossRef]
Pal, N.R.; Pal, K.; Keller, J.M.; Bezdek, J.C. A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Syst. 2005, 13, 517–530. [Google Scholar] [CrossRef]
Krinidis, S.; Chatzis, V. A robust fuzzy local information c-means clustering algorithm. IEEE Trans. Image Process. 2010, 19, 1328–1337. [Google Scholar] [CrossRef]
Thong, P.H.; Son, L.H. Picture fuzzy clustering: A new computational intelligence method. Soft Comput. 2016, 20, 3549–3562. [Google Scholar] [CrossRef]
Hwang, C.; Rhee, F.C. Uncertain fuzzy clustering: Interval type-2 fuzzy approach to c-means. IEEE Trans. Fuzzy Syst. 2007, 15, 107–120. [Google Scholar] [CrossRef]
Xu, Z.; Wu, J. Intuitionistic fuzzy c-means clustering algorithms. J. Syst. Eng. Electron. 2010, 21, 580–590. [Google Scholar] [CrossRef]
Zeng, W.; Ma, R.; Yin, Q.; Zheng, X.; Xu, Z. Hesitant fuzzy c-means algorithm and its application in image segmentation. J. Intell. Fuzzy Syst. 2020, 39, 3681–3695. [Google Scholar] [CrossRef]
Hou, W.H.; Wang, Y.T.; Wang, J.Q.; Cheng, P.F.; Li, L. Intuitionistic fuzzy c-means clustering algorithm based on a novel weighted proximity measure and genetic algorithm. Int. J. Mach. Learn. Cyb. 2021, 12, 859–875. [Google Scholar] [CrossRef]
Cuong, B.C.; Kreinovich, V. Picture fuzzy sets. J. Comput. Sci. Cybern. 2014, 30, 409–420. [Google Scholar]
Son, L.H. DPFCM. Expert Syst. Appl. 2015, 42, 51–66. [Google Scholar] [CrossRef]
Thong, P.H.; Son, L.H. A novel automatic picture fuzzy clustering method based on particle swarm optimization and picture composite cardinality. Knowl. Based Syst. 2016, 109, 48–60. [Google Scholar] [CrossRef]
Thong, P.H.; Son, L.H. Picture fuzzy clustering for complex data. Eng. Appl. Artif. Intell. 2016, 56, 121–130. [Google Scholar] [CrossRef]
Wu, C.; Chen, Y. Adaptive entropy weighted picture fuzzy clustering algorithm with spatial information for image segmentation. Appl. Soft Comput. 2020, 86, 105888. [Google Scholar] [CrossRef]
Graves, D.; Pedrycz, W. Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets Syst. 2010, 161, 522–543. [Google Scholar] [CrossRef]
Zhou, X.; Zhang, R.; Wang, X.; Huang, T.; Yang, C. Kernel intuitionistic fuzzy c-means and state transition algorithm for clustering problem. Soft Comput. 2020, 24, 15507–15518. [Google Scholar] [CrossRef]
Wu, C.M.; Cao, Z. Noise distance driven fuzzy clustering based on adaptive weighted local information and entropy-like divergence kernel for robust image segmentation. Digit. Signal Process 2021, 111, 102963. [Google Scholar] [CrossRef]
Lin, K.P. A novel evolutionary kernel intuitionistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Syst. 2014, 22, 1074–1087. [Google Scholar] [CrossRef]
Chou, C.H.; Hsieh, S.C.; Qiu, C.J. Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction. Appl. Soft Comput. 2017, 56, 298–316. [Google Scholar] [CrossRef]
Zhang, J.; Ma, Z.H. Hybrid fuzzy clustering method based on FCM and enhanced logarithmical PSO (ELPSO). Comput. Intel. Neurosc. 2020, 2020, 1386839. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Niknam, T.; Amiri, B. An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Appl. Soft Comput. 2010, 10, 183–197. [Google Scholar] [CrossRef]
Karaboga, D.; Basturk, B. On the performance of artificial bee colony (ABC) algorithm. Appl. Soft Comput. 2008, 8, 687–697. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef] [Green Version]
Yager, R.R. On the measure of fuzziness and negation. II. Lattices. Inf. Control. 1980, 44, 236–260. [Google Scholar] [CrossRef] [Green Version]
Keogh, E.; Ratanamahatana, C.A. Exact indexing of dynamic time warping. Knowl. Inf. Syst. 2005, 7, 358–386. [Google Scholar] [CrossRef]
Qingshan, L.; Rui, H.; Hanqing, L.; Songde, M. Face recognition using kernel-based fisher discriminant Analysis. In Proceedings of the Fifth IEEE International Conference on Automatic Face Gesture Recognition, Washington, DC, USA, 21 May 2002; pp. 197–201. [Google Scholar]
Camps-Valls, G.; Bruzzone, L. Kernel-based methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1351–1362. [Google Scholar] [CrossRef]
Che, J.; Wang, J. Short-term load forecasting using a kernel-based support vector regression combination model. Appl. Energy 2014, 132, 602–609. [Google Scholar] [CrossRef]
Santos, A.; Figueiredo, E.; Silva, M.F.M.; Sales, C.S.; Costa, J.C.W.A. Machine learning algorithms for damage detection: Kernel-based approaches. J. Sound Vib. 2016, 363, 584–599. [Google Scholar] [CrossRef]
Chaira, T. A novel intuitionistic fuzzy c means clustering algorithm and its application to medical images. Appl. Soft Comput. 2011, 11, 1711–1717. [Google Scholar] [CrossRef]
Dua, D.a.G. Casey: UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 12 January 2022).
Verma, H.; Gupta, A.; Kumar, D. A modified intuitionistic fuzzy c-means algorithm incorporating hesitation degree. Pattern Recognit. Lett. 2019, 122, 45–52. [Google Scholar] [CrossRef]
Tao, Y. Analysis method for customer value of aviation big data based on LRFMC model. In Data Science, Proceedings of the 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020, Taiyuan, China, 18–21 September 2020; Zeng, J., Jing, W., Song, X., Lu, Z., Eds.; Springer: Singapore, 2020; pp. 89–100. [Google Scholar]
Cheadle, C.; Vawter, M.P.; Freed, W.J.; Becker, K.G. Analysis of microarray data using Z score transformation. J. Mol. Diagn. 2003, 5, 73–81. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The procedures of the proposed KPFCM-GWO.

Figure 2. Initial dataset distribution.

Figure 3. Final clustering result.

Figure 4. Comparison of various fuzzy clustering algorithm on Iris dataset. Note: SC—Silhouette Coefficient, CH—Calinski–Harabasz, ARI—Adjusted Rand Index, FCM—Fuzzy C-Means clustering, IFCM—Intuitionistic Fuzzy C-Means clustering, KFCM—Kernel-based Fuzzy C-Means clustering, PFCM—Picture Fuzzy C-Means clustering, PFCM-PSO—Picture Fuzzy C-Means clustering with Particle Swarm Optimization algorithm, KPFCM-GWO—Kernel-based Picture Fuzzy C-Means clustering with Grey Wolf Optimizer.

Figure 5. Comparison of various fuzzy clustering algorithms on the Glass dataset. Note: SC—Silhouette Coefficient, CH—Calinski–Harabasz, ARI—Adjusted Rand Index, FCM—Fuzzy C-Means clustering, IFCM—Intuitionistic Fuzzy C-Means clustering, KFCM—Kernel-based Fuzzy C-Means clustering, PFCM—Picture Fuzzy C-Means clustering, PFCM-PSO—Picture Fuzzy C-Means clustering with Particle Swarm Optimization algorithm, KPFCM-GWO—Kernel-based Picture Fuzzy C-Means clustering with Grey Wolf Optimizer.

Figure 6. The accuracy values of KPFCM-GWO by varied number of wolves.

Figure 7. The SC values of KPFCM-GWO by varied number of wolves.

Figure 8. RFM model.

Figure 9. Customer value radar chart.

Table 1. The description of experimental datasets.

	Datasets	Number of Elements	Number of Features	Number of Classes
Experiment 1	Iris	150	4	3
	Wine	178	13	3
	Glass	214	9	6
	WDBC	569	30	2
Experiment 2	Airline	62,988	25	5

Table 2. The value of evaluation index.

	Accuracy	SC	CH	ARI
Iris	0.93	0.579	558.85	0.745
Wine	0.88	0.563	557.46	0.412
Glass	0.85	0.391	88.39	0.311
WDBC	0.73	0.707	1317.16	0.518

Note: SC—Silhouette Coefficient, CH—Calinski–Harabasz, ARI—Adjusted Rand Index.

Table 3. Clustering performance with different clustering algorithms.

	Accuracy	SC	CH	ARI
FCM
Iris	0.89	0.549	558.99	0.729
Wine	0.86	0.566	559.40	0.354
Glass	0.71	0.258	88.68	0.227
WDBC	0.63	0.697	1300.21	0.491
IFCM
Iris	0.95	0.535	556.34	0.731
Wine	0.83	0.531	317.56	0.373
Glass	0.73	0.277	81.23	0.327
WDBC	0.57	0.711	1267.69	0.535
KFCM
Iris	0.91	0.521	529.618	0.768
Wine	0.69	0.559	554.568	0.366
Glass	0.52	0.280	61.482	0.239
WDBC	0.866	0.691	1273.503	0.529
PFCM
Iris	0.89	0.535	537.786	0.730
Wine	0.702	0.520	528.919	0.371
Glass	0.542	0.261	71.367	0.267
WDBC	0.849	0.704	1266.015	0.476
PFCM-PSO
Iris	0.94	0.562	541.22	0.741
Wine	0.85	0.519	553.87	0.452
Glass	0.78	0.310	40.36	0.296
WDBC	0.72	0.715	1310.26	0.505

Note: SC—Silhouette Coefficient, CH—Calinski–Harabasz, ARI—Adjusted Rand Index. FCM—Fuzzy C-Means clustering, IFCM—Intuitionistic Fuzzy C-Means clustering, KFCM—Kernel-based Fuzzy C-Means clustering, PFCM—Picture Fuzzy C-Means clustering, PFCM-PSO—Picture Fuzzy C-Means clustering with Particle Swarm Optimization algorithm.

Table 4. Airline membership related information.

	Attributes
Customers’ basic information	MEMBER_NO
	FFP_DATE
	FIRST_FLIGHT_DATE
	GENDER
	FFP_TIER
	WORD_CITY
	WORK_PROVINCE
	WORK_CONUTRY
	AGE
Flight information	FLIGHT_COUNT
	LOAD_TIME
	LAST_TO_END
	AVG_DISCOUNT
	SUM_YR
	SEG_KM_SUM
	LAST_FLIGHT_DATE
	AVG_INTERVAL
	MAX_INTERVAL
Points information in the airline system	EXCHANGE_COUNT
	EP_SUM
	PROMOPTIVE_SUM
	PARTNER_SUM
	POINTS_SUM
	POINT_NOTFLIGHT
	BP_SUM

Table 5. LRFMC model.

Model	LRFMC
$L$	Length of membership enrollment in the observed time period
$R$	Number of months since the customer’s last flight ended in the observed time period
$F$	The number of times the customer flew with the company during the observed time period counted
$M$	Customer’s accumulated flight miles within the observation window
$C$	Average of the discount factors corresponding to the customer’s class of travel within the observation window

Table 6. Airline customer data clustering results.

Cluster	Cluster Center					No. Cluster
Cluster	ZL	ZR	ZF	ZM	ZC	No. Cluster
Customer 1	0.0593	−0.1977	−0.2353	2.1068	1.7729	15,739
Customer 2	−0.3328	−0.7509	1.3731	0.7999	−0.0403	4182
Customer 3	−0.7618	−0.7053	0.4666	0.7435	0.0125	24,661
Customer 4	0.8262	−0.6901	0.7813	0.8079	−0.1205	12,125
Customer 5	1.2951	−0.8648	2.8262	3.4351	0.4086	5336

Note: ZL—length of membership enrollment in the observed time period after Z-score normalization, ZR—number of months since the customer’s last flight ended in the observed time period after Z-score normalization, ZF—the number of times the customer flew with the company during the observed time period counted after Z-score normalization, ZM—customer’s accumulated flight miles within the observation window after Z-score normalization, ZC—average of the discount factors corresponding to the customer’s class of travel within the observation window after Z-score normalization.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, C.-M.; Liu, Y.; Wang, Y.-T.; Li, Y.-P.; Hou, W.-H.; Duan, S.; Wang, J.-Q. A Novel Adaptive Kernel Picture Fuzzy C-Means Clustering Algorithm Based on Grey Wolf Optimizer Algorithm. Symmetry 2022, 14, 1442. https://doi.org/10.3390/sym14071442

AMA Style

Yang C-M, Liu Y, Wang Y-T, Li Y-P, Hou W-H, Duan S, Wang J-Q. A Novel Adaptive Kernel Picture Fuzzy C-Means Clustering Algorithm Based on Grey Wolf Optimizer Algorithm. Symmetry. 2022; 14(7):1442. https://doi.org/10.3390/sym14071442

Chicago/Turabian Style

Yang, Can-Ming, Ye Liu, Yi-Ting Wang, Yan-Ping Li, Wen-Hui Hou, Sheng Duan, and Jian-Qiang Wang. 2022. "A Novel Adaptive Kernel Picture Fuzzy C-Means Clustering Algorithm Based on Grey Wolf Optimizer Algorithm" Symmetry 14, no. 7: 1442. https://doi.org/10.3390/sym14071442

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Adaptive Kernel Picture Fuzzy C-Means Clustering Algorithm Based on Grey Wolf Optimizer Algorithm

Abstract

1. Introduction

2. Preliminaries

2.1. Fuzzy C-Means Clustering Algorithm

2.2. Picture Fuzzy C-Means Clustering Algorithm

3. Proposed PFCM Algorithm Based on Kernel Function and GWO

3.1. Kernel-Based Picture Fuzzy C-Means Clustering (KPFCM)

3.2. Parameter Selection of KPFCM with GWO

4. Experiments

4.1. Datasets

4.2. Experiment 1

4.3. Experiment 2

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI