An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms

Liu, Rencheng; Ali, Saqib; Bilal, Syed Fakhar; Sakhawat, Zareen; Imran, Azhar; Almuhaimeed, Abdullah; Alzahrani, Abdulkareem; Sun, Guangmin

doi:10.3390/app12189355

Open AccessArticle

An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms

by

Rencheng Liu

¹,

Saqib Ali

²

,

Syed Fakhar Bilal

³,

Zareen Sakhawat

²,

Azhar Imran

⁴

,

Abdullah Almuhaimeed

^5,*

,

Abdulkareem Alzahrani

⁶

and

Guangmin Sun

²

¹

School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, China

²

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

³

Computer Science Department, Federal Urdu University of Arts, Science and Technology, Islamabad 44000, Pakistan

⁴

Department of Creative Technologies, Faculty of Computing & Artifical Intelligence, Air University, Islamabad 44000, Pakistan

⁵

The National Centre for Genomics Technologies and Bioinformatics, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia

⁶

Computer Engineering and Science Department, Faculty of Computer Science and Information Technology, Al Baha University, Al Baha 11442, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(18), 9355; https://doi.org/10.3390/app12189355

Submission received: 15 July 2022 / Revised: 28 August 2022 / Accepted: 15 September 2022 / Published: 18 September 2022

(This article belongs to the Special Issue Data Clustering: Algorithms and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Nowadays, customer churn has been reflected as one of the main concerns in the processes of the telecom sector, as it affects the revenue directly. Telecom companies are looking to design novel methods to identify the potential customer to churn. Hence, it requires suitable systems to overcome the growing churn challenge. Recently, integrating different clustering and classification models to develop hybrid learners (ensembles) has gained wide acceptance. Ensembles are getting better approval in the domain of big data since they have supposedly achieved excellent predictions as compared to single classifiers. Therefore, in this study, we propose a customer churn prediction (CCP) based on ensemble system fully incorporating clustering and classification learning techniques. The proposed churn prediction model uses an ensemble of clustering and classification algorithms to improve CCP model performance. Initially, few clustering algorithms such as k-means, k-medoids, and Random are employed to test churn prediction datasets. Next, to enhance the results hybridization technique is applied using different ensemble algorithms to evaluate the performance of the proposed system. Above mentioned clustering algorithms integrated with different classifiers including Gradient Boosted Tree (GBT), Decision Tree (DT), Random Forest (RF), Deep Learning (DL), and Naive Bayes (NB) are evaluated on two standard telecom datasets which were acquired from Orange and Cell2Cell. The experimental result reveals that compared to the bagging ensemble technique, the stacking-based hybrid model (k-medoids-GBT-DT-DL) achieve the top accuracies of 96%, and 93.6% on the Orange and Cell2Cell dataset, respectively. The proposed method outperforms conventional state-of-the-art churn prediction algorithms.

Keywords:

churn prediction; machine learning; clustering algorithm; classification algorithm; ensemble model

1. Introduction

The size of the information and communication technology (ICT) market is growing as more communication companies approach the market as a result of globalization and liberalization. Customers are encouraged to switch from one service to another as a result of the rising in efficient and modern service [1]. Customer happiness and improvement are essential for customer retention. Customer relationship management (CRM) relies on this [2,3]. User retention can only be achieved by calculating the numerous characteristics of a customer who is about to churn, but this computation is challenging due to manual data collecting which is an ineffective and laborious task [4]. Continuous customer turnover is extremely damaging for a company’s business. If the mobile telecommunications business could predict customer defection, it would do well and make some measures to retain those customers who are going to switch in many sectors for example e-commerce [5,6], finance [7], banks [8] and telecommunications industries [9]. The industry strives to make minor adjustments in order to keep clients and revenue [10].

It is tough to lose a present customer in any business situation. Customer churn occurs in three modes. Some individuals may leave due to service provider issues such as network outages, costly rates, billing hassles, and so on. Second, certain clients have a habit to jump from one company to the other. Finally, some clients tend to switch service providers for reasons that the communication sector does not understand. This type of user, as well as the factors for their shifting, should be determined in advance [11]. Registering a fresh client is a long and costly process. Therefore, churn prediction and anticipation are the only ways for a telecommunications firm to survive [12]. Once a potential churn customer has been detected, the employer might give incentives and presents to sustain those clients. Leading telecommunications firms use the strategy of attracting clients using promotions and e-mail marketing. The churn client patterns are examined, and appropriate deals are offered for various groups based on their categorization. Some may receive call discounts, while others may receive data discounts. Customers who are important to the firm are recognized during this classification, and appropriate high-value incentives are offered to them [11].

Many machine learning-based classification algorithms including decision trees (DT), k-nearest neighbor (KNN), naive bayes (NB), neural network (NN), and support vector machine (SVM) are used to identify customer churn [13,14,15,16]. Previously, researchers employed single classification methods for CCP. Ensemble-based classification algorithms have been the subject of recent development. With the use of a fusion method, these new strategies employ hybrid techniques that combine numerous single classifiers, and their predictions are integrated into a single aggregated result [17].

Therefore, this study proposed an ensemble method using the advantages of clustering and classification learning algorithms for telecom churn prediction. We proposed the integration of a supervised and unsupervised learning approach for CCP. In this research, a hybrid-based CCP approach entirely incorporating (k-means, k-medoids, and Random) clustering and classification (GBT, DT, RF, DL, and NB) models are suggested. We compare different ensemble models to recognize the top-performing model.

Clustering: the proposed model uses k-means, k-medoids, and random in the clustering stage. This stage gives us the top clustering technique. The k-medoids technique performs better and overcomes several hitches of the k-mean and random algorithm.
Classification: the proposed system evaluates different classifiers as single and hybrid models using two datasets. Through the classification stage, we select the most appropriate individual and hybrid classification model.
Ensemble-based Churn prediction: the churn prediction stage, best hybrid clustering, and classification-based hybrid models are used with an ensemble classifier to select the top CCP ensemble approach.

The rest of our paper is organized as follows. Section 2 explains related work of customer churn prediction in telecom industry. The materials and methods used in this research are elaborated in Section 3. In Section 4, experimental results are analyzed. Performance comparison with other state-of-the-art techniques are discussed in Section 5. Discussion on the bases of experimental results are briefly explained in Section 6 followed by conclusion and few directions for future work in Section 7.

2. Literature Work

Enterprise Architecture (EA) provides a whole vision, using sets of models or blueprints, of an organization along with its information technologies, business processes and strategies [18,19,20,21]. Typically, classification methods anticipate future consumer behavior based on customer attributes indicated in personal demographics, account and billing information, and call details. Conventionally, data mining techniques for churn prediction were employed largely to effectively determine telecom churners.

For instance, robust churn prediction tools [22] have been developed using decision trees and neural networks. In [23], researchers designed a hybrid scheme that merges k-means and a rule method in an effort to improve the accuracy of predictions up to 89.70%. Similarly, ref. [24] presents a genetic algorithm-based neural network strategy to maximize the accuracy of telecom churner prediction. They compare their results with the z-score classification model. Before using a classification system, most churn prediction algorithms use feature extraction, sampling, or both. Authors of [25], published detailed research examining the effect of sampling approaches on CCP performance. In their research, they combined gradient boosting and weighted random forest with random undersampling and sophisticated undersampling algorithms to increase prediction performance. Furthermore, applying an advanced survey strategy does not result in substantial improvements in prediction accuracy, as shown in the same study employing the CUBE sampling methodology, corroborating the point stated in [26]. Likewise, simply replicating instances using oversampling does not improve classification results significantly revealed by Verbeke W. et al. [27]. This also backs with the idea that simply replicating minority classes via random oversampling or eliminating majority classes via random undersampling could not increase CCP accuracy [25]. The problem with the above studies is that they cannot produce better classification results that are acceptable for the telecom sector.

Additionally, numerous researchers have concentrated on developing classification techniques for telecom churn prediction employing only relevant information, rather than a broader set of features. A multi-objective feature extraction technique based on NSGAII was presented by Huang B. et al. [28]. Another [29] proposed a Bayesian Belief Network model to select and extract the most relevant characteristics that can be used to predict customer churn. A hybrid two-phase mechanism has been presented in [30] using feature extraction. An ensemble of classification techniques is proposed for CCP using a rotation-based classification network [31]. They applied two classifiers including Rotation Forest and Rot-Boost with AdaBoost. They compared these algorithms with Bagging, Random Forest, RSM, CART, and C4.5 algorithms. They also explored various feature extraction approaches and experimental results reveal that ICA-based RF is the best of all the remaining algorithms. Anouar Dalli [32] suggested a deep learning-nased model for CCP in telecommunication industry by changing hyperparameters of NN. The RemsProp optimizer outperformed other algorithms stochastic gradient descent (SGD)-based algorithms in terms of accuracy. Praveen Lalwani et al. [33], proposed a machined learning-based approach using logistic regression, NB, support vector machine, RF, DT, etc. to train the model. They also applied boosting and ensemble methods to check the performance of proposed methods on accuracy. It is found that the top AUC score of 84%, is obtained by both Adaboost and XGBoost classifiers. Xin Hu et al. [34] designed a CCP model based on DT and neural network (NN). The prediction results reveal that compared with the single CCP model, the combined prediction approach produce greater accuracy and better prediction effect. Hemlata Jain et al. [35] proposed telecom churn prediction model using seven different machined learning-based experiments incorporating features engineering and normalization techniques. This study proven to be more effective than previous models. RF surpassed other models, attaining 95% accuracy.

A review of the literature reveals a few challenges which must be addressed to improve the performance of CCP. Researchers still struggling in achieving high classification accuracy for CCP in the telecom industry. It is observed from previous work, ensemble-based classification models are not much explored yet. This strategy substitutes single algorithm classification, increases ensemble-based system prediction accuracy, and is applicable to all approaches. Therefore, in this study, we proposed an ensemble-based hybrid system using the benefits of supervised and unsupervised learning algorithms to increase the performance of CCP.

3. Materials and Methods

The suggested approach has improved the effectiveness of churn prediction results by employing a hybrid technique that combines several clustering, classification, and ensembles including bagging and stacking. Figure 1 illustrates the complete architecture of the proposed system for CCP.

3.1. Datasets Collection

The datasets included in this study are easily accessible over the internet and are widely employed in the current research on telecom churn prediction. Orange telecom offers information on its web-based site [36]. The additional dataset provided by Cell2Cell is available on the Website of Duke University’s Center for customer relationship management [27]. The Orange dataset is made available online with its original imbalanced class distribution, but the Cell2Cell dataset has been preprocessed and a balanced version is supplied for research purposes. The feature names in the Orange dataset are concealed to protect the confidentiality of client information. Some features in the Orange dataset have either no value or a single value. Due to the lack of usefulness of these characteristics, they are eliminated from the dataset. Finally, both datasets contain a few nominal features that are translated into their corresponding numerical representation to maintain numerical format uniformity across the whole dataset. The properties of both datasets are displayed in Table 1.

3.2. Pre-Processing of the Dataset

The telecom dataset containing irregularities, such as missing values, duplicates removal, noise removal, or empty features, is handled through the WEKA tool. In addition, the nominal characteristics of the sample are changed to a numerical representation by classifying the examples into small, medium, and large classes according to the number of observations in each class [37]. The dataset is preprocessed to provide a consistent numerical structure for the training phase.

3.3. Clustering Algorithms

Clustering is an unsupervised machine learning task. Because of its usefulness, this technique is often referred to by another name called cluster analysis. In order to use a clustering algorithm, you must first provide the algorithm with a large amount of unlabeled input data and then let the program discover whatever patterns it can in the data. Clusters are the names for these collections. As defined by their connections to neighboring data points, data points that share similarities form a cluster. Cluster analysis has several applications, including feature engineering and pattern discovery. Clustering can help you get understanding from data you know nothing about.

Clustering is performed after data has been cleaned and polished through preprocessing. The suggested approach uses clustering to boost prediction accuracy. The suggested model employs the following clustering techniques.

3.3.1. K-Means Clustering

By definition, the K-means clustering algorithm separates the input data set of N rows into a smaller number of subsets, where k is always smaller than N. The parameter k, standing for (centre of cluster mean) is chosen randomly. It calculates averages for each cluster and quantifies the distance between them. This procedure is repeated until the requisite clusters have been achieved. The distance is calculated using the following formula [38],

J = \sum_{i = 1}^{m} \sum_{k = 1}^{k} w_{i k} {∥ x_{i} - μ_{k} ∥}^{2}

(1)

where

w_{i k}

= 1 for data point

x_{i}

if it belongs to cluster k; else

w_{i k}

= 0. Also,

μ_{k}

is the centroid of

x_{i}

’s cluster.

3.3.2. K-Medoids Clustering

Rousseeuw Lloyd and Kaufman presented the K-medoids algorithm in 1987; it is a clustering method similar to the original but also based on partitions. When compared to k-means, K-medoids are less susceptible to being thrown off by noise and outliers [38]. The value of each cluster can be determined using the following formula.

k = \sum_{n_{i}} \sum_{m_{i} \in n_{i}} | N_{i} - M_{i} |

(2)

where

N_{i}

and

M_{i}

are entities for which difference is computed.

3.3.3. Random Clustering

We can execute random flat clustering of any dataset with the help of random clustering. In addition, there is no particular order to the samples’ placement in the clusters, and some clusters may be empty.

3.4. Classification Algorithms

Following clustering algorithms, the suggested model then performs a classification task on both datasets. In this study, we compare and contrast several clustering techniques and then combine the ones that perform the best with various classification algorithms. The suggested model begins with evaluating the effectiveness of individual classifiers. Next, we apply clustering and ensemble classifiers to ensure the maximum possible accuracy using the proposed model. The most effective method for predicting customer turnover is an amalgamation of clustering and ensemble classifiers, thus this is the approach that will be taken into account. The developed churn prediction algorithm makes use of the following classifiers.

3.4.1. K-Nearest Neighbor

For its simplicity, the Knn algorithm is widely used in the field of data mining. The Knn algorithm is utilized in the construction of the churn prediction model. This algorithm’s ability to do classification tasks without requiring any previous information about the distribution of the data. The Knn approach is useful for making predictions about the characteristics of a substance by comparing those characteristics to the experimental results of compounds that are most similar to it [39]. Mathematically it can be calculated by applying the following formula,

D (n, m) = \sqrt{\sum_{j = 1}^{c} n_{j} - m_{j}}

(3)

where j is the total number of examples contained in the set and n and m represent the instance for which distance is being measured.

3.4.2. Decision Tree

In 1993, Quinlan used the name “decision tree” to describe a technique of “divide and conquer” [40]. Following are formulas for determining the entropy and information gain of each feature.

h (D) = \sum_{x_{\in} i} - p (i) l o g_{2} p (i)

(4)

where D is the dataset, i is the set of classes in D, and

p (i)

is the probability of each class.

3.4.3. Gradient Boosted Tree

The machine learning technique of gradient boosting can be used for both regression and classification. This approach has the potential to incorporate a large number of decision-making organizations as well as a conclusive prediction model [41]. The goal of generating an ensemble of decision trees, as GBT does, is to boost prediction performance. Since GBT generates an ensemble of low-quality estimation techniques, it works better than random forest. It can be calculated by the following formula.

p \hat{j} = \sum_{k} θ_{k} x_{j k}

(5)

where p is the prediction generated using input x.

θ

is the best parameter that best fits the model.

3.4.4. Random Forest

Random forests [42], also known as random decision forests, are a type of ensemble learning technique for performing classification, regression, and other operations by establishing a large number of decision trees during training and then producing an output class that is the mode of the classes or the average prediction (regression) of the individual trees.

To calculate the Gini Index, start with one and deduct the sum of squared probabilities for each class from that number. The Gini Index can be formally represented as follows:

G i n i = 1 - \sum_{x = 1}^{c} {(p_{x})}^{2}

(6)

where

p_{x}

represents the probability of an item being assigned to a particular category.

The following formula for entropy can be used to calculate the information gain: the information gain equals the product of the probabilities of the class multiplied by a logarithm with a base of 2 of those probabilities.

e n t r o p y = \sum_{x = 1}^{c} - p_{x} * l o g_{2} (p_{x})

(7)

Here p denotes the probability that it is a function of entropy.

3.4.5. Deep Learning

A huge number of neuron layers can be compared using this method, making it a “multi-layer” methodology. It’s an artificial system employed to address the most challenging data mining issues. Using back-propagation and stochastic gradient descent, deep learning trains artificial neural network(ANN) with many interconnected layers. Neurons having tanh, rectifier, and maxout activation functions can be found in the network’s many hidden layers. High predictive accuracy is made possible by state-of-the-art features like adaptive learning rate, rate annealing, momentum training, dropout, and L1 or L2 regularisation. Using multi-threaded (asynchronous) training, each compute node learns a subset of the global model’s parameters on its own data, and then periodically adds to the global model by way of model averaging throughout the network.

3.4.6. Naive Bayes

Abstractly, Naive Bayes is a conditional probability model: given a problem instance to be classified, represented by a vector

x = (x 1, \dots, x n)

representing some n features (independent variables), it assigns to this instance probabilities

p (C_{k} | x 1, \dots, x n)

for each of K possible outcomes or classes

C_{k}

. The issue with the above formulation is that it becomes impossible to build a model using probability tables if the number of features n is big or if a feature can carry on an enormous number of values. To make the model more manageable, it will be necessary to restructure it. Bayes’ theorem allows us to break down the conditional probability into its constituent parts, which we can then use to make predictions [43]:

p (C_{k} | x) = \frac{p (C_{k}) p (x | C_{k})}{p (x)}

(8)

where, C is a class that will have its probability value computed since it is affected by the value that is assigned to feature x. Simply, in term of Bayesian probability, the above equation can be written as

p o s t e r i o r = \frac{p r i o r * l i k e l i h o o d}{e v i d e n c e}

(9)

3.5. Ensemble Classifiers

Following ensemble techniques are employed in this study to increase the proposed system performance.

3.5.1. Voting

The outcomes of separate categorization algorithms are combined via majority vote using a voting technique. Class labels are assigned to test data by each classifier individually, then the results are aggregated via voting, and a final class prediction is made by taking the class with the highest number of votes [44]. The following equation is used to perform majority voting on a dataset:

\sum_{c = 1}^{C} D_{c, i} (x) = m a x_{i = 1, 2, 3 . . . . n} \sum_{c = 1}^{C} (D_{c, i})

(10)

where C represents the number of classifiers,

D (c, i)

is the decision of the classifier and i represents the classes.

3.5.2. Bagging

Bootstrap Aggregation is also known as “Bagging”. It’s a kind of ensemble technique that uses a “bag alike or unlike entities. Improve the performance of prediction models, it aids in reducing the variance of the classifiers that are employed in those models [45].

V_{t, i} = {1 i f h_{t} p i c k s c l a s s w_{i}; 0 e l s e

(11)

where t is the set of training samples,

h_{t}

is the set of trained classifiers, and

w_{i}

are the set of class labels.

3.5.3. Stacking

Instead of choosing one leaner over another, you can combine them all via stacking [46]. You can use it to outperform any of the individual training models. Trainable classifiers are fed bootstrapped samples of the training data. In stacking, classifiers are divided into two tiers: Tier-1 and Tier-2. Classifiers in the first tier are taught to make predictions using bootstrapped data, and those predictions are utilized to train the second tier. This ensures that the training data is used effectively for learning.

3.6. Proposed Framework

The fusion of multiple models can improve the network’s performance by combining clustering methods such as k-means, k-medoids, and random clustering with Gradient Boosted Tree, Decision Tree, Random Forest, Naive Bayes, and Deep Learning classifier, integrating ensemble machine learning algorithms such as Voting, bagging, boosting, and stacking. Below are the steps involved in the proposed method.

Initially, clustering algorithms are employed, and select top clustering algorithm. We used 2 clusters in the clustering algorithms. In our experiments, the k-medoids technique outperforms k-means and random clustering.
Next, a single classifier including GBT, DT, RF, NB, and DL classification algorithms is implemented.
Afterward, k-medoids and single classifier-based model are implemented. This hybrid technique attained superior performance in comparison to single clustering or classification algorithms.
Then, k-medoids clustering and hybrid classifiers-based model are designed and results are evaluated in terms of accuracy, recall, precision, and F-measure.
Finally, ensemble models such as bagging and stacking are incorporated with the best previous hybrid model which achieves top results in a contrast to all the above experiments.
It is clear from the evaluation that the proposed combination of clustering and ensemble model has achieved the highest prediction accuracy as compared to other methods.

Map Clustering on the Label

After clustering, mapping is applied to produce TP, TN, FN, and FP from Orange and Cell2Cell datasets. Cluster 0 is mapped to churners, while cluster 1 is mapped to non-churners. Let’s say we have a table with three columns: cluster0, cluster1, and churn. Mapping will produce a fourth column, prediction (churn), which will include classes 0 and 1. Class 0 and 1 are used as a mapping for cluster 0 and 1 data, allowing TP, TN, FN, and FP to be produced. cluster 0 and cluster 1 are the two groups that emerge from applying clustering to the dataset. The values of the predicted class are compared to the clusters to see if they fall within any of them. Therefore, clusters are examined by first mapping them using prediction class. A TN is produced if the values of cluster 0 falls into prediction class 0, and a TP is produced if the values of cluster 1 fall into prediction class 1. To the same extent, FP is produced if the values of cluster 0 falls into prediction class 1, and FN is produced if the values of cluster 1 fall into prediction class 0. Hence, mapping is used for cluster analysis.

4. Results

In this section, we explain the evaluation matrix used in this study and comprehensively explain the experimental results carried out through this proposed scheme. Additionally, a discussion part explains the overall analysis of the performance of the designed model and then makes a brief comparison with conventional approaches published for CCP.

4.1. Evaluation Measures

This segment of the paper concisely deliberates the performance assessment metrics of the proposed model. All exploited multi-class performance evaluation metrics are defined below:

Accuracy: Accuracy is the percentage of truly predicted examples to all types of predictions made by the model. Mathematically, it can be defined as:

A c c u r a c y = \frac{T p + T n}{T p + F p + T n + F n}

(12)

Precision: Precision metrics estimate total examples identified as positive by the algorithm truly belong to the positive class. It can be calculated as:

P r e c i s i o n = \frac{T p}{T p + F p}

(13)

Recall: Recall measures estimate which section of samples that actually belong to the positive class is truly predicted positively by the model. Mathematically, it can be computed as:

R e c a l l = \frac{T p}{T p + F n}

(14)

F-measure: F-measure is calculated by taking the harmonic mean of precision and recall. It can be defined as:

F - m e a s u r e = \frac{2 * p * r}{p + r}

(15)

4.2. Performance Analysis Based on Clustering Algorithms

In experiment 1, various unsupervised approaches are compared by employing popular clustering models such as k-means, k-medoids, and random. The technique is built on clustering, in which customers with similar perspectives on the business are grouped to allow for churn or non-churn prediction categorization. The performance of clustering techniques is shown in the form of a bar graph in Figure 2. The x-axis indicates the clustering method and the y-axis reflects the performance values including accuracy, recall, precision, and f-measure achieved by a different method. Figure 2a represents the performance of the Orange dataset while Figure 2b represents the performance of clustering techniques for the Cell2Cell dataset. The bar graph shows k-medoids technique reach 75.44% and 75.56% accuracies for the Orange and Cell2Cell dataset, respectively. However, through this comparison, we can see that the k-medoids method outperforms other methods.

4.3. Performance Analysis Based on Classification Algorithms

In the second experiment, five different classification algorithms (i.e GBT, DT, RF, NB, and DL) are employed as a single classifier. The model competence is computed based on precision, recall, accuracy, and f-measure. Among the five single classifier models, GBT demonstrated the highest accuracy of 92.98% and 93.19% for the Orange and Cell2Cell dataset, respectively, while the DT classifier shows the lowest results and achieves 85.40% and 87.04% for both the Orange and Cell2Cell dataset. Figure 3a,b illustrate the results of the classifiers evaluated in this research.

4.4. Combining the k-Medoids Clustering Algorithm with Each Single Classifier

In the third experiment, we build a hybrid network by merging the k-medoids clustering method with every single classifier. DT, RF, GBT, DL, and NB are the key classifiers that are hybridized with k-medoids. Dataset is primarily sorted into relative clusters using the k-medoids algorithm, afterwords clusters are separated into training examples and testing examples. Based on training datasets, classification algorithms construct a training network of the CCP system, which is then used to evaluate the system. Precision, recall, f-measure, and accuracy are key performance indicators used to calculate the system’s performance as shown in Figure 4a,b. The performance of the GBT classifier is superior to that of other classifiers, as it improves classification performance, k-medoids with GBT achieve 94% and 92.25% accuracies for the Orange and Cell2Cell dataset. The performance of other indicators are also shown in Figure 4a,b in the form of a bar graph.

4.5. Combining k-Medoids Clustering Algorithm Hybrid Classifiers

In the fourth experiment, we construct a hybrid network by combining multiple classifiers along with the k-medoids clustering method to enhance the performance of the proposed CCP system. Classification results from different algorithms are combined using the voting process. Data is classified by each classification algorithm, and the outcomes are then integrated via voting to produce a final class prediction based on the maximum number of votes for a given class. The experimental results of this hybrid model reveal that the merger of different classifiers with k-medoids clustering performs better than executing each single classifier alone. Moreover among all the experiments, a hybrid classifier (GBT-DT-DL) combined with K-med clustering shows better accuracy of 95.05% and 93.40% for the Orange and Cell2Cell dataset, respectively. Figure 5a,b show the result in the form of a bar graph using various indicators such as precision, recall, f-measure, and accuracy for both datasets.

4.6. Ensemble Classifiers Combined with k-Medoids and Hybrid Classifiers

In the fifth experimental work, ensemble classifiers namely bagging and stacking are utilized to more improve the experimental results. In this method, we combined ensemble classifiers with k-medoids and hybrid classifiers and achieve higher results as compared to the above experiments. In bagging voting operator is employed to create a diverse combination of classifiers. In the testing phase apply model and performance classification operators are applied to calculate the performance of the hybrid approach. Figure 6a depicts that the ensemble method (k-medoids-GBT-DT-DL) with bagging obtain top performance in terms of 95.12% accuracy for the Orange dataset. Figure 6b illustrates that the ensemble method (k-medoids-GBT-DT-DL) with bagging obtain top performance in terms of 93.43% accuracy for the Cell2Cell dataset. Figure 6c shows maximum accuracy of 96% on the Orange dataset using the ensemble method (k-medoids-GBT-DT-DL) with stacking. Figure 6d indicates supreme accuracy of 93.6% on the Cell2Cell dataset by applying the ensemble method (k-medoids-GBT-DT-DL) with stacking. Furthermore, all other measures including precision, recall, and f-measure also perform better using the ensemble technique in contrast to other methods.

5. Performance Comparison with Other Existing Approaches

The proposed ensemble model integrates clustering and classification-based algorithms to significantly handle the problems related to the massive type of telecom datasets. This research efforts to particularly address the fundamental problems faced in telecom CCP challenge in a more systematic and intuitive method. Some existing studies for CCP using the Orange and Cell2Cell dataset are explained below.

Ahmed, A., & Maheswari [47] developed a metaheuristic-based churn prediction model using the Orange dataset. A hybrid-based firefly algorithm is applied as the classifier for CCP. The compute-intensive module of the proposed firefly algorithm which is a comparison module was replaced by simulated annealing. This model delivers efficient results and achieves 86.38% accuracy on the Orange dataset.

Idris, A., & Khan, A. [46] proposed a novel intelligent model using Filter Wrapper-based Churn Prediction (FW-ECP) for the telecom sector. The originality of the FW-ECP is in its capability to integrate filter and wrapper-based feature choice along with the use of the learning potential of an ensemble classifier constructed from multiple base classifiers. They have evaluated and compared the results of the designed FW-ECP model using two publicly accessible datasets, Orange and Cell2Cell. The presented model attained 79.4% accuracy on the Orange dataset and 84.9% accuracy on the Cell2Cell dataset.

Vijaya, J., & Sivasankar, E. [48] designed an efficient model for CCP by combining Particle swarm optimization (PSO) and feature selection with simulated annealing (FSSA). Authors designed three types of PSO viz., PSO combined through feature selection for the pre-processing scheme, PSO inserted by simulated annealing, and lastly PSO through hybridization of both feature selection and simulated annealing. Various classification algorithms including DT, NB, KNN, SVM, RF, and three hybrid algorithms were applied to analyze the model performance. The proposed model PSO-FSSA achieves a 94.08% accuracy value and the hybrid ANN-MLR model achieves 85.44% accuracy.

Irina V. Pustokhina et al. [49] proposed an improved system for CCP using the Orange telecom dataset. The authors employed a hybrid method called ISMOTE-OWELM. The improved synthetic minority over-sampling technique (ISMOTE) is used to deal with the imbalanced dataset, and the rain optimization algorithm (ROA) is employed to estimate the ideal sample rate. In the last step, the optimal weighted extreme machine learning (OWELM) method is used to establish the class labels of the employed sample. In this work, three different datasets were applied to evaluate the model efficiency, however Orange dataset attained 92% accuracy rate.

Muhammad Usman et al. [50] designed and implement a system for CCP using a comparative analysis of learning networks. The two standard datasets Cell2Cell and KDD Cup are used for this study. They determined long short-term memory (LSTM) networks obtained accuracy of 72.7% on the Cell2Cell dataset and 89.3% classification accuracy on the KDD Cup dataset for CCP.

Samah Wael Fujo et al. [51] implemented a deep-learning-based artificial neural network (Deep-BP-ANN) for CCP in the telecommunication industry. The author employed two feature selection techniques, Variance Thresholding, and Lasso Regression. Moreover, the early stopping method was also applied to overcome the overfitting issue and to end training at the appropriate time. IBM Telco and Cell2cell datasets were used to evaluate the performance of the implemented model. Experimental results reveal that the IBM Telco obtained 88.12% accuracy and 79.38% accuracy achieved by the Cell2cell dataset.

Praseeda, C. K., & Shivakumar, B. L. [52] proposed a model for CCP using both classification and clustering algorithms. The fuzzy particle swarm optimization (FPSO) technique was applied for feature selection and the divergence kernel-based support vector machine (DKSVM) technique was used to categorize churn customers. After the classification step, the hybrid clustering-based kernel distance possibilistic fuzzy local information C-means (HKD-PFLICM) model was employed for cluster-based retention. Results of this study reveal that the proposed approach achieved a 76.51% accuracy rate on the Cell2Cell dataset and outperform other existing algorithms.

6. Discussion

This study established an ensemble-based hybrid model using supervised and unsupervised learning approaches. This suggested framework produces the following results. Experiment 1 (Figure 2) reveals that k-medoids clustering technique achieved top performance of 75.44% and 75.56% for Orange and Cell2Cell datasets. Henceforth, in experiment 2 (Figure 3), we applied single classification algorithms are employed and obtain following results, 92.98%, 93.19%, 85.4%, 87.04%, 86.62%, 88.24%, 91.38%, 92.41%, 86.02%, 87.79% for GBT, DT, RF, DL and NB on both Orange and Cell2Cell datasets, respectively. In single classification algorithms, GBT outperforms other classifiers in terms of different evaluation matric accuracy. In experiment 3 (Figure 4), a combination of k-medoids and single classification algorithms have been accomplished. k-medoids-GBT achieved the highest results of 94% and 92.25% accuracy on both datasets. k-medoids-DL model perform well and obtained the second position by getting 93.5% and 92.53% accuracy on both datasets. Next, in experiment 4 (Figure 5), a voting-based hybrid classifier with the k-medoids clustering technique has been applied to check the effect on accuracy and other measures. k-medoids-GBT-DT-DL achieved 95.06% accuracy on the Orange dataset and 93.40% accuracy on the Cell2Cell dataset. The k-medoids-GBT-DT-NB model also gets better results as compared to other hybrid models in experiment 4. In experiment 5 (Figure 6a,b), bagging-based hybrid classifiers with k-medoids ensemble models was evaluated. The k-medoids-GBT-DT-DL model gives a superior performance of 95.12% and 93.43% accuracy on both datasets as compared to all other models. Similar to experiment 4, the k-medoids-GBT-DT-NB model performs better after k-medoids-GBT-DT-DL with accuracies of 93.64% and 92.98% as compared to other combinations. Lastly, in experiment 6, stacking-based hybrid classifiers with k-medoids ensemble models have been developed to enhance the performance. It can be seen from bar graphs Figure 6c,d) similar to experiment 5 k-medoids-GBT-DT-DL combination obtained the highest results as compared to all other combinations applied on Orange and Cell2Cell dataset. k-medoids-GBT-DT-DL achieved 95.34% accuracy, 77.09% recall, 83.51% precision, 80.17% f-measure for Orange dataset and 93.43% accuracy, 67.45% recall, 79.10% precision, 72.81% f-measure for Cell2Cell dataset. Furthermore, this study compares existing methods with our proposed ensemble method in the form of Table 2. Table 2 illustrates that the proposed approach for CCP outperforms traditional approaches. Moreover, Figure 7 depicts the accuracy comparison among all methods used in our study. The bar graph shows the lowest performance by single clustering k-medoids algorithm and Hybrid 4(k-medoids-Stacking-GBT-DT-DL) algorithm obtain extremely significant results on both datasets. It is seen that the proposed stacking-based ensemble algorithm achieves the highest accuracy rate among all experiments. The experimental results indicate that the proposed ensemble model evaluated on Orange dataset has achieved 96%, 91.61%, 90.23% of accuracy, recall and F-measure respectively, while this proposed ensemble model evaluated on Cell2Cell dataset has acquired 93.6%, 85.45%, 83.72% of accuracy, recall and F-measure respectively. To summarize this discussion, the results of all experiments demonstrate that hybrid models have always outperformed single clustering/classifier-based approaches. On both datasets, the proposed model exhibits superior classification performance in terms of accuracy, recall(sensitivity) and F-measure. Multiple existing techniques were applied to both datasets used in this research, however, our model stacking-based ensemble k-medoids-GBT-DT-DL model produced higher outcomes.

7. Conclusions

In summary, current work focuses on the development of extremely effective ensemble-based customer churn prediction models. In this study, Orange and Cell2Cell telecom churn prediction datasets are employed to develop CCP models. The initial data collection contains a wide range of data values, so the samples are normalized. Since preprocessing, the data collection is clustered using unsupervised based methods including k-means, k-medoids, and random clustering. The set of classifiers examined in this study has now completed the classification procedure. The stacking-based ensemble model (k-medoids-Stacking-GBT-DT-DL) is the most accurate of all ensemble models. Consequently, performance is evaluated based on precision, recall, and accuracy. The ensemble system outperforms the single classification methods in terms of accuracy. Furthermore, we can improve our results using large datasets and state-of-the-art deep learning-based methods. However, these methods require more computational resources because of their deep architectures. In the future, this study can be extended using deep learning-based approaches to enhance the performance of customer churn prediction for the telecom sector. Future efforts will be devoted to enhancing the speed and effectiveness of this system.

Author Contributions

Each author took part in the present work conception and/or design. Tasks of data collection, material preparation, data analysis, and writing of the original draft were executed by R.L. and S.A.; R.L., S.A. and S.F.B. equally contributed in whole research. Z.S., A.I., A.A. (Abdullah Almuhaimeed), G.S. and A.A. (Abdulkareem Alzahrani) helped in writing, reviewing, and editing of the manuscript. A.A. (Abdullah Almuhaimeed) and A.A. (Abdulkareem Alzahrani) supervised this study. All authors have read and agreed to the published version of the manuscript.

Funding

This study does not supported by any external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mattison, R. Telecom Churn Management: The Golden Opportunity; APDG Pub: Fuquay-Varina, NC, USA, 2001. [Google Scholar]
Payne, A.; Frow, P. A strategic framework for customer relationship management. J. Mark. 2005, 69, 167–176. [Google Scholar] [CrossRef]
Reinartz, W.; Krafft, M.; Hoyer, W.D. The customer relationship management process: Its measurement and impact on performance. J. Mark. Res. 2004, 41, 293–305. [Google Scholar] [CrossRef]
Neslin, S.A.; Gupta, S.; Kamakura, W.; Lu, J.; Mason, C.H. Defection detection: Measuring and understanding the predictive accuracy of customer churn models. J. Mark. Res. 2006, 43, 204–211. [Google Scholar] [CrossRef]
Liu, C.J.; Huang, T.S.; Ho, P.T.; Huang, J.C.; Hsieh, C.T. Machine learning-based e-commerce platform repurchase customer prediction model. PLoS ONE 2020, 15, e0243105. [Google Scholar] [CrossRef]
Gulc, A. Multi-stakeholder perspective of courier service quality in B2C e-commerce. PLoS ONE 2021, 16, e0251728. [Google Scholar] [CrossRef]
Abbasimehr, H.; Bahrini, A. An analytical framework based on the recency, frequency, and monetary model and time series clustering techniques for dynamic segmentation. Expert Syst. Appl. 2022, 192, 116373. [Google Scholar] [CrossRef]
Carbo-Valverde, S.; Cuadros-Solas, P.; Rodríguez-Fernández, F. A machine learning approach to the digitalization of bank customers: Evidence from random and causal forests. PLoS ONE 2020, 15, e0240362. [Google Scholar] [CrossRef]
Zhou, J.; Zhai, L.; Pantelous, A.A. Market segmentation using high-dimensional sparse consumers data. Expert Syst. Appl. 2020, 145, 113136. [Google Scholar] [CrossRef]
Van den Poel, D.; Lariviere, B. Customer attrition analysis for financial services using proportional hazard models. Eur. J. Oper. Res. 2004, 157, 196–217. [Google Scholar] [CrossRef]
Reinartz, W.J.; Kumar, V. The impact of customer relationship characteristics on profitable lifetime duration. J. Mark. 2003, 67, 77–99. [Google Scholar] [CrossRef] [Green Version]
Lin, S.C.; Tung, C.H.; Jan, N.Y.; Chiang, D.A. Evaluating churn model in CRM: A case study in Telecom. J. Converg. Inf. Technol. 2011, 6. [Google Scholar] [CrossRef]
Hwang, H.; Jung, T.; Suh, E. An LTV model and customer segmentation based on customer value: A case study on the wireless telecommunication industry. Expert Syst. Appl. 2004, 26, 181–188. [Google Scholar] [CrossRef]
Larivière, B.; Van den Poel, D. Predicting customer retention and profitability by using random forests and regression forests techniques. Expert Syst. Appl. 2005, 29, 472–484. [Google Scholar] [CrossRef]
Wei, C.P.; Chiu, I.T. Turning telecommunications call details to churn prediction: A data mining approach. Expert Syst. Appl. 2002, 23, 103–112. [Google Scholar] [CrossRef]
Xia, G.E.; Jin, W.D. Model of customer churn prediction on support vector machine. Syst.-Eng.-Theory Pract. 2008, 28, 71–77. [Google Scholar] [CrossRef]
Dietterich, T.G. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems, Cagliari, Italy, 21–23 June 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
Van den Berg, M.; Slot, R.; van Steenbergen, M.; Faasse, P.; van Vliet, H. How enterprise architecture improves the quality of IT investment decisions. J. Syst. Softw. 2019, 152, 134–150. [Google Scholar] [CrossRef]
Kornyshova, E.; Barrios, J. Industry 4.0 impact propagation on enterprise architecture models. Procedia Comput. Sci. 2020, 176, 2497–2506. [Google Scholar] [CrossRef]
Kotusev, S.; Kurnia, S.; Dilnutt, R. The practical roles of enterprise architecture artifacts: A classification and relationship. Inf. Softw. Technol. 2022, 147, 106897. [Google Scholar] [CrossRef]
Górski, T. Towards Enterprise Architecture for Capital Group in Energy Sector. In Proceedings of the 2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES), Las Palmas de Gran Canaria, Spain, 21–23 June 2018; pp. 000239–000244. [Google Scholar]
Hung, S.Y.; Yen, D.C.; Wang, H.Y. Applying data mining to telecom churn management. Expert Syst. Appl. 2006, 31, 515–524. [Google Scholar] [CrossRef]
Huang, Y.; Kechadi, T. An effective hybrid learning system for telecommunication churn prediction. Expert Syst. Appl. 2013, 40, 5635–5647. [Google Scholar] [CrossRef]
Pendharkar, P.C. Genetic algorithm based neural network approaches for predicting churn in cellular wireless network services. Expert Syst. Appl. 2009, 36, 6714–6720. [Google Scholar] [CrossRef]
Burez, J.; Van den Poel, D. Handling class imbalance in customer churn prediction. Expert Syst. Appl. 2009, 36, 4626–4636. [Google Scholar] [CrossRef]
Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part (Appl. Rev.) 2011, 42, 463–484. [Google Scholar] [CrossRef]
Verbeke, W.; Dejaeger, K.; Martens, D.; Hur, J.; Baesens, B. New insights into churn prediction in the telecommunication sector: A profit driven data mining approach. Eur. J. Oper. Res. 2012, 218, 211–229. [Google Scholar] [CrossRef]
Huang, B.; Buckley, B.; Kechadi, T.M. Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications. Expert Syst. Appl. 2010, 37, 3638–3646. [Google Scholar] [CrossRef]
Kisioglu, P.; Topcu, Y.I. Applying Bayesian Belief Network approach to customer churn analysis: A case study on the telecom industry of Turkey. Expert Syst. Appl. 2011, 38, 7151–7157. [Google Scholar] [CrossRef]
Xu, H.; Zhang, Z.; Zhang, Y. Churn prediction in telecom using a hybrid two-phase feature selection method. In Proceedings of the 2009 Third International Symposium on Intelligent Information Technology Application, Nanchang, China, 21–22 November 2009; Volume 3, pp. 576–579. [Google Scholar]
De Bock, K.W.; Van den Poel, D. An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction. Expert Syst. Appl. 2011, 38, 12293–12301. [Google Scholar] [CrossRef]
Dalli, A. Impact of Hyperparameters on Deep Learning Model for Customer Churn Prediction in Telecommunication Sector. Math. Probl. Eng. 2022, 2022, 4720539. [Google Scholar] [CrossRef]
Lalwani, P.; Mishra, M.K.; Chadha, J.S.; Sethi, P. Customer churn prediction system: A machine learning approach. Computing 2022, 104, 271–294. [Google Scholar] [CrossRef]
Hu, X.; Yang, Y.; Chen, L.; Zhu, S. Research on a customer churn combination prediction model based on decision tree and neural network. In Proceedings of the 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 10–13 April 2020; pp. 129–132. [Google Scholar]
Jain, H.; Khunteta, A.; Shrivastav, S.P. Telecom Churn Prediction Using Seven Machine Learning Experiments integrating Features engineering and Normalization. 2021, 1–25. Available online: https://www.researchsquare.com/article/rs-239201/v1 (accessed on 14 July 2022). [CrossRef]
Miller, H.; Clarke, S.; Lane, S.; Lonie, A.; Lazaridis, D.; Petrovski, S.; Jones, O. Predicting customer behaviour: The University of Melbourne’s KDD Cup report. In Proceedings of the KDD-Cup 2009 Competition, PMLR, Paris, France, 28 June–1 July 2009; pp. 45–55. [Google Scholar]
Sorokina, D. Application of additive groves ensemble with multiple counts feature evaluation to KDD cup’09 small data set. In Proceedings of the KDD-Cup 2009 Competition, PMLR, Paris, France, 28 June–1 July 2009; pp. 101–109. [Google Scholar]
Gajowniczek, K.; Orłowski, A.; Ząbkowski, T. Insolvency modeling with generalized entropy cost function in neural networks. Phys. Stat. Mech. Its Appl. 2019, 526, 120730. [Google Scholar] [CrossRef]
Sjarif, N.; Rusydi, M.; Yusof, M.; Hooi, D.; Wong, T.; Yaakob, S.; Ibrahim, R.; Osman, M. A customer Churn prediction using Pearson correlation function and K nearest neighbor algorithm for telecommunication industry. Int. J. Adv. Soft Compu. Appl. 2019, 11, 46–59. [Google Scholar]
Salzberg, S.L. C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 1994, 16, 235–240. [Google Scholar] [CrossRef]
Stearns, B.; Rangel, F.M.; Rangel, F.; de Faria, F.F.; Oliveira, J.; Ramos, A.A.d.S. Scholar Performance Prediction using Boosted Regression Trees Techniques. In Proceedings of the The European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, 26–28 April 2017. [Google Scholar]
Idris, A.; Khan, A. Customer churn prediction for telecommunication: Employing various various features selection techniques and tree based ensemble classifiers. In Proceedings of the 2012 15th International Multitopic Conference (INMIC), Islamabad, Pakistan, 13–15 December 2012; pp. 23–27. [Google Scholar]
Yulianti, Y.; Saifudin, A. Sequential feature selection in customer churn prediction based on Naive Bayes. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Ulaanbaatar, Mongolia, 10–13 September 2020; IOP Publishing: Bristol, UK, 2020; Volume 879, p. 012090. [Google Scholar]
Gupta, M.K.; Chandra, P. A comprehensive survey of data mining. Int. J. Inf. Technol. 2020, 12, 1243–1257. [Google Scholar] [CrossRef]
Dudoit, S.; Fridlyand, J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 2003, 19, 1090–1099. [Google Scholar] [CrossRef]
Idris, A.; Khan, A. Churn prediction system for telecom using filter–wrapper and ensemble classification. Comput. J. 2017, 60, 410–430. [Google Scholar] [CrossRef]
Ahmed, A.A.; Maheswari, D. Churn prediction on huge telecom data using hybrid firefly based classification. Egypt. Inform. J. 2017, 18, 215–220. [Google Scholar] [CrossRef]
Vijaya, J.; Sivasankar, E. An efficient system for customer churn prediction through particle swarm optimization based feature selection model with simulated annealing. Clust. Comput. 2019, 22, 10757–10768. [Google Scholar] [CrossRef]
Pustokhina, I.V.; Pustokhin, D.A.; Nguyen, P.T.; Elhoseny, M.; Shankar, K. Multi-objective rain optimization algorithm with WELM model for customer churn prediction in telecommunication sector. Complex Intell. Syst. 2021, 1–13. [Google Scholar] [CrossRef]
Usman, M.; Ahmad, W.; Fong, A. Design and Implementation of a System for Comparative Analysis of Learning Architectures for Churn Prediction. IEEE Commun. Mag. 2021, 59, 86–90. [Google Scholar] [CrossRef]
Wael Fujo, S.; Subramanian, S.; Ahmad Khder, M. Customer Churn Prediction in Telecommunication Industry Using Deep Learning. Inf. Sci. Lett. 2022, 11, 24. [Google Scholar]
Praseeda, C.; Shivakumar, B. Fuzzy particle swarm optimization (FPSO) based feature selection and hybrid kernel distance based possibilistic fuzzy local information C-means (HKD-PFLICM) clustering for churn prediction in telecom industry. SN Appl. Sci. 2021, 3, 1–18. [Google Scholar] [CrossRef]

Figure 1. Proposed CCP framework.

Figure 2. (a) Results analysis using single clustering methods on the Orange dataset. (b) Results analysis using single clustering methods on Cell2Cell dataset.

Figure 3. (a) Results analysis using single classification methods on the Orange dataset. (b) Results analysis using single classification methods on Cell2Cell dataset.

Figure 4. Depicts clustering and classification-based hybrid model performance (a) Hybrid model results using Orange dataset. (b) Hybrid model results using Cell2Cell dataset.

Figure 5. Shows voting-based ensemble classifier results (a) Voting-based methods results on the Orange dataset. (b) Voting-based methods result on the Cell2Cell dataset.

Figure 6. Represents different algorithm’s performance based on ensemble techniques (a) Bagging-based methods results on Orange dataset. (b) Bagging-based methods results on the Cell2Cell dataset. (c) Stacking-based methods results on the Orange dataset. (d) Stacking-based methods results on the Cell2Cell dataset.

Figure 7. Illustration of accuracy comparison among all experiments using Orange and Cell2Cell datasets.

Table 1. Features of applied databases for this study.

Attributes	Cell2Cell	Orange
Complete examples	40,000	50,000
Complete features	76	260
Numerical features	68	190
Nominal features	8	70
Data sharing	Balanced	Imbalanced
Missing values	No	Yes

Table 2. Comparison with state-of-the-art approaches.

Ref	Method	Dataset	Accuracy (%)	Recall (%)	F-Measure (%)	Year
[47]	Hybrid firefly algorithm	Orange	86.38	80	85	2017
[46]	FW-ECP	Orange	79.4	74.1	72.7	2017
[48]	PSO-FSSA	Orange	94.08	84.01	80.28	2019
[46]	FW-ECP	Cell2Cell	84.9	80.2	81.02	2017
[50]	LSTM	Cell2Cell	72.7	78	80.65	2021
[51]	Deep-BP-ANN	Cell2Cell	92	81.74	77.47	2022
[52]	HKD-PFLICM	Cell2Cell	76.51	79	78	2021
Proposed	Stacking-based ensemble model	Orange	96	91.61	90.23	2022
Proposed	Stacking-based ensemble model	Cell2Cell	93.6	85.45	83.72	2022

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, R.; Ali, S.; Bilal, S.F.; Sakhawat, Z.; Imran, A.; Almuhaimeed, A.; Alzahrani, A.; Sun, G. An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms. Appl. Sci. 2022, 12, 9355. https://doi.org/10.3390/app12189355

AMA Style

Liu R, Ali S, Bilal SF, Sakhawat Z, Imran A, Almuhaimeed A, Alzahrani A, Sun G. An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms. Applied Sciences. 2022; 12(18):9355. https://doi.org/10.3390/app12189355

Chicago/Turabian Style

Liu, Rencheng, Saqib Ali, Syed Fakhar Bilal, Zareen Sakhawat, Azhar Imran, Abdullah Almuhaimeed, Abdulkareem Alzahrani, and Guangmin Sun. 2022. "An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms" Applied Sciences 12, no. 18: 9355. https://doi.org/10.3390/app12189355

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms

Abstract

1. Introduction

2. Literature Work

3. Materials and Methods

3.1. Datasets Collection

3.2. Pre-Processing of the Dataset

3.3. Clustering Algorithms

3.3.1. K-Means Clustering

3.3.2. K-Medoids Clustering

3.3.3. Random Clustering

3.4. Classification Algorithms

3.4.1. K-Nearest Neighbor

3.4.2. Decision Tree

3.4.3. Gradient Boosted Tree

3.4.4. Random Forest

3.4.5. Deep Learning

3.4.6. Naive Bayes

3.5. Ensemble Classifiers

3.5.1. Voting

3.5.2. Bagging

3.5.3. Stacking

3.6. Proposed Framework

Map Clustering on the Label

4. Results

4.1. Evaluation Measures

4.2. Performance Analysis Based on Clustering Algorithms

4.3. Performance Analysis Based on Classification Algorithms

4.4. Combining the k-Medoids Clustering Algorithm with Each Single Classifier

4.5. Combining k-Medoids Clustering Algorithm Hybrid Classifiers

4.6. Ensemble Classifiers Combined with k-Medoids and Hybrid Classifiers

5. Performance Comparison with Other Existing Approaches

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI