Next Article in Journal
An Instruction-Driven Batch-Based High-Performance Resource-Efficient LSTM Accelerator on FPGA
Previous Article in Journal
An Electric Vehicle Assisted Charging Mechanism for Unmanned Aerial Vehicles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Three-Way Incremental Naive Bayes Classifier

1
College of Science, North China University of Science and Technology, Tangshan 063210, China
2
College of Computer Science and Mathematics, Anyang University, Anyang 455000, China
3
College of Electrical Engineering, North China University of Science and Technology, Tangshan 063210, China
4
College of Economics, North China University of Science and Technology, Tangshan 063210, China
5
Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan 063210, China
6
The Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan 063210, China
7
Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan 063210, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(7), 1730; https://doi.org/10.3390/electronics12071730
Submission received: 8 March 2023 / Revised: 31 March 2023 / Accepted: 3 April 2023 / Published: 5 April 2023
(This article belongs to the Section Artificial Intelligence)

Abstract

:
Aiming at the problems of the dynamic increase in data in real life and that the naive Bayes (NB) classifier only accepts or rejects the sample processing results, resulting in a high error rate when dealing with uncertain data, this paper combines three-way decision and incremental learning, and a new three-way incremental naive Bayes classifier (3WD-INB) is proposed. First, the NB classifier is established, and the distribution fitting is carried out according to the minimum residual sum of squares (RSS) for continuous data, so that 3WD-INB can process both discrete data and continuous data, then carry out an incremental learning operation, select the samples with higher data quality according to the confidence of the samples in the incremental training set for incremental learning, solve the problem of data dynamics and filter the poor samples. Then we construct the 3WD-INB classifier and determine the classification rules of the positive, negative and boundary domains of the 3WD-INB classifier, so that the three-way classification of samples can be realized and better decisions can be made when dealing with uncertain data. Finally, five discrete data and five continuous data are selected for comparative experimental analysis with traditional classification methods. The results show that 3WD-INB has high accuracy and recall rate on different types of datasets, and the classification performance is also relatively stable.

1. Introduction

The classification problem is a foundation in the field of data mining, but it is also a very important means. Common traditional classifiers include naive Bayes (NB), random forest (RF), support vector Mac (SVM), K-nearest neighbors (KNN), multilayer perceptron classifier (MLP), etc. In recent years, many scholars have made great progress in the research of new classifiers and created many new classifiers [1,2,3,4].
The naive Bayes classifier (NB) was first proposed by Duda and Hart in 1973. Its core idea is to calculate the probability that the sample belongs to each category given the characteristic value of the sample and assign it to the category with the highest probability. This algorithm does not require a large amount of training data and has good interpretability, so it has attracted the attention and use of more and more researchers. In summary, the naive Bayes classifier has the following advantages:
  • It performs well on small-scale data and can not only handle binary classification tasks but also multi-classification tasks.
  • The algorithm is simple to establish and less sensitive to missing datasets.
  • It has high speed for large-scale training and query and is suitable for large-scale datasets.
Therefore, naive Bayes is widely used and has achieved good results in text classification, spam email filtering, medical diagnosis, and other fields. To eliminate the zero probability and over-fitting problems in naive Bayes classification, Xu et al. [5] designed two smoothing strategies, M-estimation and Laplace estimation, which effectively improved the classification performance. Li et al. [6] used Pearson and Kendall coefficients to screen out new attribute sets based on principal component analysis to make them meet the conditional independence assumption as much as possible and constructed NB-IPCA classifiers to improve the classification accuracy. Farid et al. [7] proposed a hybrid decision tree and a hybrid naive Bayes classification algorithm and solved the multi-classification problem. For text classification problems, Zhang et al. [8] created a two-layer Bayes model: random forest naive Bayes (RFNB); the first layer is a random forest model, and the second layer is a Bernoulli naive Bayes model. Gama et al. [9] proposed an adaptive Bayes model, which is an incremental learning algorithm that can work online, and has improved performance compared with nonadaptive algorithms. Li et al. [10] used the weighted K-nearest neighbor algorithm to calculate the membership degree of unlabeled samples and improved the structure of the naive Bayes classifier through the membership degree to optimize its classification effect. Qiu et al. [11] combined the particle swarm optimization algorithm with naive Bayes, which effectively reduced redundant attributes and improved the classification ability. Ramoni et al. [12] constructed a robust Bayes classifier (RBC) for datasets with missing values, which can handle incomplete databases without assuming missing data patterns. Zhang et al. [13] proposed an attribute enhancement and weighted naive Bayes algorithm, which can find potential attributes beyond the original attribute space and is used to solve the attribute conditional independence assumption, and experiments have proved that the algorithm has achieved good results. Kaur et al. [14] used the weighted information gain method to reassign the features of the misclassified classifications and combined it with the polynomial naive Bayes classification algorithm to provide a better classification. For naive Bayes to be applied to continuous data, Fisher [15] assumes that the probability distribution for each classification is Gaussian (also known as normal distribution), treats multiple measurements as random variables and estimates the probability using a Gaussian function. Fisher [16] also proposed the method of discretizing continuous data for the first time. Since then, this method has been widely used in various fields, including machine learning, data mining, statistics, and so on, including the naive Bayes classifier. Fayyad et al. [17] improved the naive Bayes under interval discretization and used the basic principles of information theory to guide the operation of the multi-interval discretization process, and the results showed that the new method has significantly improved the classification accuracy. However, the traditional Bayes classifier still belongs to the two-way decision model; that is, there are two kinds of processing for the classification results of samples: accepting or rejecting. When dealing with uncertainties, the inability to make accurate decisions on samples will lead to poor classification performance. The three-way decision has the characteristics that conform to human thinking and cognition and can better handle the uncertainties in the actual decision-making process. Therefore, some scholars have improved the naive Bayes algorithm with the three-way decision. Zhang et al. [18] constructed a new three-way extended TAN Bayes classifier combining the three-way decision thinking and considering the attribute condition independence, which effectively improved the classification performance. Zhou et al. [19] combine three-way decision with the naive Bayes classifier and use it to classify junk email. In addition to classifying normal email and junk email, users are allowed to further check for uncertain email, which has been experimentally proven to reduce the rate of misclassification. Later, Zhang et al. [20] integrated naive Bayes, three-way decision and collaborative filtering algorithm, and proposed a three-way decision naive Bayes collaborative filtering recommendation (3NBCFR) model, which was used for a movie recommendation, effectively reducing the cost of recommendation and improving the quality of the recommendation. However, the above improvements to the Bayes classifier also have the following practical problems:
(1)
Datasets in the real world are generally generated dynamically, and the amount of data is constantly changing. It is difficult to obtain credible posterior probabilities based on limited training sets, and it is time consuming to reuse new datasets for training.
(2)
Most Bayes classifiers are generally applied to discrete data, and the scope of application of the model is small. The traditional improvement method is to discretize continuous data or use the Gaussian function, but the former is difficult to set the discrete interval, and the latter has high requirements on the distribution of datasets, neither of which can solve the classification problem of continuous data well.
To this end, this paper combines incremental learning, three-way decision and naive Bayes classifier and proposes a new three-way decision incremental naive Bayes (3WD-INB) classifier. The contributions of this paper are as follows:
(1)
Combining three-way decision ideas with the traditional naive Bayes classifier, which makes the decision-making mode of the classifier more in line with the human thinking process and improves the classification ability of uncertain data.
(2)
The incremental learning method solves the problem of data dynamics, and at the same time, it can filter the poor-quality data samples and optimize the training data in the incremental learning stage.
(3)
For continuous data, the distribution of data is fitted according to the sum of squares of minimum residuals (RSS), and the posterior probability is estimated by the distribution function, so that 3WD-INB can be applied not only to discrete data but also to continuous data, which enhances the applicability of the classification models.
(4)
Compared with the traditional Bayes model and other traditional classification models, 3WD-INB effectively improves the classification performance. Relative to the NB classifier, F1 is increased from 0.6364 to 0.9167 in discrete data and precision from 0.7778 to 1.0000 in discrete data. Relative to the G-NB classifier, with continuous data, F1 increased from 0.8036 to 0.9967 and precision from 0.5285 to 0.8850. The average F1 of 3WD-INB under discrete and continuous data are 0.9501 and 0.9081, respectively, and the average precision is 0.9648 and 0.9289, respectively.
The structure of this paper is as follows. In the second section, the current naive Bayes classifier and the relevant content of the three-way decision are introduced, and the basic theory is explained to provide a theoretical basis for subsequent models and algorithms. In the third section, the distribution fitting process, incremental learning process, classification rule derivation process and overall algorithm steps of the 3WD-INB classifier are explained in detail. In the fourth section, the parameter change analysis experiment is carried out on 3WD-INB, and different types of datasets are selected for comparative experiments with traditional algorithms. In the last section, the full text is summarized, and future work is proposed.

2. Related Work

2.1. Naive Bayes Classifier

The naive Bayes theory is based on the Bayes theorem and has a sufficient basis in probability theory. It first classifies by constructing a Bayes classifier structure and then by calculating the posterior probability of each object.
Given a training set with a sample size of  N U = { x 1 , x 2 , , x N } , the training set contains n attributes:  A = { a 1 , a 2 , , a n } ; the category of the data label is  k C 1 , C 2 , , C k . We express the training sample  x h  as a  n -dimensional feature vector  x h = { v 1 , ( h ) v 2 ( h ) , , v n ( h ) } v i ( h )  represents the value of sample  x h  in attribute  a i . Then according to Bayes theorem, the posterior probability  P ( C c | x h )  can be obtained, as shown in Equation (1).
P ( C c | x h ) = P ( C c ) P ( x h | C c ) P ( x h )
where  P ( C c )  is the prior probability,  P ( x h | C c )  is the conditional probability, and  P ( x h )  is a constant.
For the NB classifier, its probability estimation expression is shown in Equation (2).
P ( C c | x h ) P ( C c ) i = 1 n P ( v i ( h ) | C c )
where  n  represents the number of attributes, and  v i ( h )  represents the value of  x h  on the  i -th attribute  a i .
In the classification process of the naive Bayes classifier, the calculation formulas of  P ( C c )  and  P ( v i ( h ) | C c )  are shown in Equations (3) and (4).
P ( C c ) = | C c | | U | , c = 1 , 2 , 3 , , k
P ( v i ( h ) | C c ) = | m ( a i , v i ( h ) )   C c | | C c | , c = 1 , 2 , 3 , , k
where  | U |  is the total number of training samples,  | C c |  is the number of samples of category  C c  in the training samples, and  m ( a i , v i ( h ) )   C c  represents the set of objects in  C c  that take the value of  v i ( h )  on the  i -th attribute.
At the same time, in the Bayes classifier, to avoid the value not appearing in the training sample in the test sample, resulting in  | m ( a i , v i ( h ) )   C c | = 0 , using Laplace smoothing operation, the calculation formula of the sum after smoothing is shown in Equations (5) and (6).
P ( C c ) = | C c | + 1 | U | + k , c = 1 , 2 , 3 , , k
P ( v i ( h ) | C c ) = | m ( a i , v i ( h ) )   C c | + 1 | C c | + | a i | , c = 1 , 2 , 3 , , k
Finally, use the smoothed  P ( C c )  and  P ( v i ( h ) | C c )  to classify the samples and obtain the classification label  H ( x h ) ; the formula is Equation (7).
H ( x h ) = arg   max   P ( C c ) i = 1 n P ( v i ( h ) | C c )
When the data is continuous, the above method is no longer applicable to the calculation of  P ( v i ( h ) | C c ) , and the usual solutions are as follows:

2.1.1. Interval Continuous Data (D-NB)

Commonly used continuous data interval methods include the following:
(1)
Equal width interval method: divide the data value range into several intervals equally, and the width of each interval is equal.
(2)
Equal frequency interval method: divide the data range into several intervals, and each interval contains an equal amount of data.
(3)
Clustering-based method: use a clustering algorithm to cluster continuous data into several groups, and the data in each group are regarded as the same discrete value.
(4)
Method based on information entropy: use information entropy to measure the information gain of each division point and select the division point with the largest information gain as the dividing point of discretization.
These methods have their own advantages and disadvantages, and the specific choice should be considered according to the actual situation. Here, we mainly introduce the equal width interval method, which is also the simplest and most commonly used method. The general idea is to divide the continuous data feature  v i ( h )  into  K  intervals  D = { d 1 , d 2 , , d K } . For example, suppose the original data are  v original = { 1 ,   2.5 ,   3 ,   4.75 ,   5 ,   6 ,   7 ,   8 ,   9 ,   9.9 } . The specified intervals are as follows:  d 1 = [ 0 ,   2 ] d 2 = ( 2 ,   4 ] ,  d 3 = ( 4 ,   6 ] d 4 = ( 6 ,   8 ] d 5 = ( 8 ,   10 ] ; then, the internalized data are  v n e w = { 1 ,   1 ,   2 ,   2 ,   3 ,   3 ,   4 ,   4 ,   5 ,   5 } . After such processing, the continuous data are successfully transformed into discrete data, and the calculation problem of  P ( v i ( h ) | C c )  is also solved. However, this method has high requirements for the number of intervals and the division method of the intervals, and it is difficult to determine the optimal value of the two in the actual operation process.

2.1.2. Gaussian Naive Bayes (G-NB)

Assuming that the data of each dimension satisfy the normal distribution, that is,  P ( v i ( h ) | C c ) ~ N ( μ i , σ i ) , where  μ i  and  σ i  are the mean and variance of  v i ( h ) , respectively, then:
P ( v i ( h ) | C c ) = 1 2 π σ i e ( v i ( h ) μ i ) 2 2 σ i 2
Therefore, according to the naive Bayes classifier (NB), the final classifier formula of G-NB is shown in Equation (9).
H ( x h ) = arg   max   P ( C c ) i = 1 n 1 2 π σ i e ( v i ( h ) μ i ) 2 2 σ i 2
G-NB solves the problem of continuous data processing, but the assumption that all dimensional data satisfy the normal distribution has high requirements for the dataset, and the datasets in the real world are often distributed in multiple ways, so the application effect of G-NB is not good, unstable. Therefore, this paper proposes a new method to deal with continuous data to improve the robustness of the classifier.

2.2. Three-Way Decision

Three-way decision is a decision-making method summarized by Yao [21] in the research process of rough set theory. The general idea is to divide the universe into three (positive domain, negative domain and boundary domain) and adopt different decision-making methods for different domains, as shown in Figure 1, which is more in line with human thought and cognition. In recent years, domestic and foreign scholars have proposed a series of three-way decision models, which have been widely used in medical diagnosis [22], garbage mailbox prediction [23] and other disciplines and fields.
Liu et al. [24] systematically introduced the theory, method and application of the integration of three-way decision and rough set theory from the perspective of three-way decision. Liu et al. [25] also proposed broad and narrow theoretical models of three-way decision from the macro and micro perspectives. The broad three-way decision focuses on the interpretation of the connotation and extension of the concept of three-way decision, and the narrow three-way decision focuses on the semantic interpretation of three-way decision in practical decision-making problems. Yao et al. [26] explained and analyzed the basic concepts and theories in formal concept analysis, rough set and granular computing in detail and pointed out the relationship between them. Liang et al. [27,28,29,30] substituted fuzzy concepts such as interval value, triangular fuzzy number and intuitionistic fuzzy set for the precise conditional probability function in the three-way decision, making the three-way decision model more widely used. Yang et al. [31] proposed a new fuzzy rough set model based on three-way decision with optimal similarity, which makes the model more robust to noise and beneficial to the application of the fuzzy information systems. In Long et al. [32], to further introduce the fuzzy set theory into the three-way decision concept analysis, the attribute-derived fuzzy three-way decision and object-derived fuzzy three-way decision are studied under the background of the fuzzy form. The existing classical three-way decision is extended to the fuzzy three-way decision, which is important to improve the three-way decision theory. Xue et al. [33] proposed a three-way decision model based on probability graphs by constructing the Bayes network to calculate the conditional probability distribution function. Jia et al. [34] proposed a feature fusion method based on the three-way decision model, which combines a single feature extraction method and multiple feature extraction methods to maximize the use of different feature information and improve the performance of Chinese satire detection. Dai et al. [35] proposed a new three-way decision model. Unlike the traditional three-way decision model, this model uses intuitive fuzzy sets to describe the attributes of decision objects and preferences of decision-makers and concept lattice theory to represent the relationship between attributes to better deal with uncertainty and fuzziness. Many scholars have also applied the three-way decision to the classification model. Li et al. [36] transformed the problem of software defect prediction into three kinds of decision-making using three-way decision methods: identifying defective software, identifying defective software and identifying uncertain software, which have better accuracy and reliability. Chen et al. [37] constructed an emotional analysis model based on three-way decision, which solves the problem of “unknown” emotions by introducing an intermediate category and also uses the flexibility of three-way decision to classify different emotional intensities, which has higher accuracy in emotional classification. Chu et al. [38] proposed a three-component clustering method based on neighborhood rough set to study the classification of gout patients. This method can deal with uncertain data, which is very useful for applications such as medical diagnosis. Wang et al. [39] proposed an adaptive weighted three-way decision oversampling method to solve the problem of unbalanced data classification. This method uses the idea of three-way decision, combines the oversampling technology with clustering technology, and can more effectively identify a few classes of samples and improve the classification accuracy when dealing with unbalanced data. Remesh et al. [40] proposed a three-way decision technology based on variance criteria to detect COVID-19 patients. Finally, patients can be divided into three categories: confirmed, suspected and non-COVID-19. Confirmed samples can be treated, and suspected samples can be further detected, which has potential application value in early diagnosis and screening of COVID-19.
At present, the three-way decision has become the focus of many scholars. They have been widely used in information system analysis, machine learning models and artificial intelligence decision-making and have achieved good results in theory. Nevertheless, the three-way decision is largely limited by data quality and quantity. When processing data, the three-way decision method needs to fully consider the limitation of data quality and quantity. If the data quality is poor or the amount of data is small, the accuracy and reliability of the three-way decision may be affected.

2.3. The Basic Theory of Three-Way Decision

Given an information system,  S = ( U , A D , { V a | a A } , { I a | a A } ) , for  C U , set  D = { C , C ¯ }  to represent two states (respectively indicate whether it belongs to set  C , and  C  and  C ¯  are complementary), and set  A C = { a P , a N , a B }  represents three decision-making actions (respectively representing acceptance, rejection and delay) and decision-making action. The cost function is shown in Table 1.
The expected costs  R ( a P | x h ) ,   R ( a N | x h )  and  R ( a B | x h )  of the three decisions  a P , a N  and  a B  are shown in Equation (10).
R ( a P | x h ) = λ P P P ( C | x h ) + λ P N P ( C ¯ | x h ) R ( a N | x h ) = λ N P P ( C | x h ) + λ N N P ( C ¯ | x h ) R ( a B | x h ) = λ B P P ( C | x h ) + λ B N P ( C ¯ | x h )
where  P ( C | x h )  represents the conditional probability that  x h  belongs to the set  C , and  P ( C ¯ | x h )  represents the conditional probability that  x h  belongs to the set  C ¯ .
According to the basic theory of three-way decision, use the expected cost  R ( a P | x h ) R ( a N | x h )  and  R ( a B | x h )  to make an action decision on  x h , and the minimum cost decision [41] rule is as follows:
(P) If  R ( a P | x h ) R ( a N | x h )  and  R ( a P | x h ) R ( a B | x h ) , then for  x h , there is  x h P O S ( α , β ) ( C c ) : accept the decision;
(N) If  R ( a N | x h ) R ( a P | x h )  and  R ( a N | x h ) R ( a B | x h ) , then for  x h , there is  x h N E G ( α , β ) ( C c ) : reject the decision;
(B) If  R ( a B | x h ) R ( a P | x h )  and  R ( a B | x h ) R ( a N | x h ) , then for  x h , there is  x h B N D ( α , β ) ( C c ) : delay the decision.
In addition, because  P ( C | x h ) + P ( C ¯ | x h ) = 1 λ P P λ B P λ N P λ P N λ B N λ N N , the rules can be further simplified:
(P) If  P ( C | x h ) a  and  P ( C | x h ) γ , then for  x h , there is  x h P O S ( α , β ) ( C c ) : accept the decision;
(N) If  P ( C | x h ) β  and  P ( C | x h ) γ , then for  x h , there is  x h N E G ( α , β ) ( C c ) : reject the decision;
(B) If  P ( C | x h ) a  and  P ( C | x h ) β , then for  x h , there is  x h B N D ( α , β ) ( C c ) : delay the decision.
In the rules:
a = λ P N λ B N ( λ P N λ B N ) + ( λ B P λ P P ) β = λ B N λ N N ( λ B N λ N N ) + ( λ N P λ B P ) γ = λ P N λ N N ( λ P N λ N N ) + ( λ N P λ P P )
If it is further assumed that the cost function satisfies Equation (12), it can be proved that  a > γ > β .
λ N P λ B P λ B N λ N N > λ B P λ P P λ P N λ B N
Continuing to simplify, the final three-way decision minimum cost decision rule is:
(P) If  P ( C | x h ) a , then  x h P O S ( α , β ) ( C c ) : accept the decision;
(N) If  P ( C | x h ) β , then  x h N E G ( α , β ) ( C c ) : reject the decision;
(B) If  β < P ( C | x h ) < a , then  x h B N D ( α , β ) ( C c ) : delay the decision.

3. Three-Way Decision Incremental Naive Bayes Classifier

In the real world, the data acquisition process is often acquired dynamically. For the classification model, it will consume a lot of time to reuse new data for training. Considering that the three decision-making is more in line with the human thinking process, the incremental learning and the three-way decision are combined with the NB classifier to build a three-way decision incremental naive Bayes classifier (3WD-INB). In addition, considering that most of the naive Bayes classifiers are used for discrete data in general and that the classification problem of continuous data is also very common in the real world, the continuous data are distributed using the minimum residual sum of squares ( RSS ). The fitting process makes 3WD-INB also able to deal with the classification problem of continuous data.

3.1. Improvements for Continuous Data

For discrete data,  P ( v i ( h ) | C c )  can be calculated directly according to Equation (6), but for continuous data,  P ( v i ( h ) | C c )  cannot be calculated in an original way. For this purpose, a distribution function is fitted to the distribution of  v i ( h )  according to the residual sum of squares, then the distribution function is used to find the  P ( v i ( h ) | C c )  of the continuous data.
We refer to the Distfit official [42], and we selected 10 common distributions to apply to our model and used  R S S  to fit the distribution. The 10 distributions are shown in Table 2.
R S S  describes the predicted deviation from the actual empirical value of the data. It is a measure of the difference between the data and the estimated model. A small  R S S  indicates a good fit of the model to the data. The calculation formula of  R S S  is shown in Equation (13).
R S S = i = 1 n ( y i f ( x i ) ) 2
where  y i  is the  i  value of the variable to be predicted,  x i  is the  i  value of the explanatory variable, and  f ( x i )  is the predicted value of  y i .
During the fitting process, calculate the value of  R S S  under 10 distributions:  { r s s 1 , r s s 2 , , r s s 10 } . The distribution with the smallest  R S S  is the fitting optimal distribution; that is, the optimal distribution is   arg   min   { r s s 1 , r s s 2 , , r s s 10 } .
Given a training set with  N  samples,  U = { x 1 , x 2 , , x N } , the training set contains  n  attributes,  A = { a 1 , a 2 , , a n } , and the category of data labels has  k  categories,  C 1 , C 2 , , C k . Represent the training sample  x h  as an  n dimensional  feature vector  x h = { v 1 , ( h ) v 2 ( h ) , , v n ( h ) } v i ( h )  represents the value of the sample  x h  in attribute  a i .
The specific operation of the fitting is as follows (Algorithm 1):
Algorithm 1: My Distfit
Input: Training set:  U = { x 1 , x 2 , , x N }  
Output: Fitted distribution function:  p i ( v i ( h ) | C c ) p i ( v i ( h ) | C ¯ c )  
1. For  v i ( h ) x h  do
2. For  i = 1 , 2 , , N  do
3. Calculate the set of  R S S C c  for 10 distributions:  { r s s C c 1 , r s s C c 2 , , r s s C c 10 }  
4.  p i ( v i ( h ) | C c ) = arg   min   { r s s C c 1 , r s s C c 2 , , r s s C c 10 }  
5. Calculate the set of  RSS C ¯ c  for 10 distributions:  { r s s C ¯ c 1 , r s s C ¯ c 2 , , r s s C ¯ c 10 }  
6.  p i ( v i ( h ) | C c ) = arg   min   { r s s C ¯ c 1 , r s s C ¯ c 2 , , r s s C ¯ c 10 }  
7. End for
8. End for
9. Return  p i ( v i ( h ) | C c ) p i ( v i ( h ) | C ¯ c )  
The optimal distribution is obtained by fitting with Algorithm 1:  p i ( v i ( h ) | C c ) ,  p i ( v i ( h ) | C ¯ c ) , approximately estimating  P ( v i ( h ) | C c )  and  P ( v i ( h ) | C ¯ c )  for continuous data.

3.2. Incremental Features

This part is the process of building the INB classifier. Because of the dynamic nature of the dataset, this paper uses the incremental feature of naive Bayes and adopts incremental learning. In addition, incremental learning can filter samples with high data quality, which can improve the performance of the model to a certain extent.
Suppose the training set is  U = { x 1 , x 2 , , x N } , the incremental training set is  E = { e 1 , e 2 , , e M } , and the test set is  T = { t 1 , t 2 , , t P } . The essence of the 3WD-INB incremental feature is to add samples with higher confidence ( θ ) in the incremental training set to the training set, and to update  P ( C c ) ,  P ( C ¯ c ) ,  P ( v i ( h ) | C c )  and  P ( v i ( h ) | C ¯ c ) , the process of incremental learning is as follows.
Introduce the confidence level  θ . When the confidence level  θ j  of the sample  e j  satisfies Equation (15), add the sample to the training set  U .
θ j = max   P ( C c ) i = 1 n P ( v i ( h ) | C c ) , 1 j M
θ j γ i = 1 l θ i , 1 l M
where  γ  is the confidence coefficient; under normal circumstances  γ ( 0.5 ,   1 ] .
When the incremental training sample  e j  is added to the training set  U , the updated formulas of  P ( C c )  and  P ( C ¯ c )  are:
P ( C c ) = { N + K 1 + N + K P ( C c ) , C b C c N + K 1 + N + K P ( C c ) + 1 1 + N + K , C b = C c P ( C ¯ c ) = { N + K 1 + N + K P ( C ¯ c ) , C b = C c N + K 1 + N + K P ( C ¯ c ) + 1 1 + N + K , C b C c
The updated formulas of  P ( v i ( h ) | C c )  and  P ( v i ( h ) | C ¯ c )  are as follows,
P ( v i ( h ) | C c ) = { λ 1 + λ P ( v i ( h ) | C c ) , C b = C c v c i v i ( h ) λ 1 + λ P ( v i ( h ) | C c ) + 1 1 + λ , C b = C c v c i = v i ( h ) P ( v i ( h ) | C c ) , C b C c P ( v i ( h ) | C ¯ c ) = { λ 1 + λ P ( v i ( h ) | C c ) , C b C c v c i v i ( h ) λ 1 + λ P ( v i ( h ) | C c ) + 1 1 + λ , C b C c v c i = v i ( h ) P ( v i ( h ) | C c ) , C b = C c
The updated formulas for the number of samples and the number of categories are:
N = N + 1 c o u n t ( C c ) = { c o u n t ( C c ) , C b C c c o u n t ( C c ) + 1 , C b = C c
where  N  represents the number of samples,  K  represents the number of categories,  c o u n t ( C c )  represents the number of samples belonging to category  C c λ = | a i | + c o u n t ( C c ) , and  | a i |  represents the number of features  a i .

3.3. Classification Rules

Assume that the parameters after incremental learning are  P ( C c ) ,  P ( C ¯ c ) ,  P ( v i ( h ) | C c )  and  P ( v i ( h ) | C ¯ c ) . Substituting the naive Bayes classification rule Equation (7) into the three-way decision expected-cost Equation (10) for calculation, since the calculation amount of the continuous addition operation is much smaller than that of the multiplication operation during the operation process, the logarithm of both sides is taken. The method reduces the amount of computation and obtains the minimum cost decision rule as follows:
(P) If there are:
R ( a P C c | x h ) R ( a N C c | x h ) i = 1 n log P ( v i ( h ) | C c ) P ( v i ( h ) | C ¯ c ) log P ( C ¯ c ) P ( C c ) + log γ C c 1 γ C c R ( a P C c | x h ) R ( a B C c | x h ) i = 1 n log P ( v i ( h ) | C c ) P ( v i ( h ) | C ¯ c ) log P ( C ¯ c ) P ( C c ) + log α C c 1 α C c
Then  x h P O S ( α C c , β C c ) ( C c ) .
(N) If there are:
R ( a N C c | x h ) R ( a P C c | x h ) i = 1 n log P ( v i ( h ) | C c ) P ( v i ( h ) | C ¯ c ) log P ( C ¯ c ) P ( C c ) + log γ C c 1 γ C c R ( a N C c | x h ) R ( a B C c | x h ) i = 1 n log P ( v i ( h ) | C c ) P ( v i ( h ) | C ¯ c ) log P ( C ¯ c ) P ( C c ) + log β C c 1 β C c
Then  x h N E G ( α C c , β C c ) ( C c ) .
(B) If there are:
R ( a B C c | x h ) R ( a P C c | x h ) i = 1 n log P ( v i ( h ) | C c ) P ( v i ( h ) | C ¯ c ) log P ( C ¯ c ) P ( C c ) + log α C c 1 α C c R ( a B C c | x h ) R ( a N C c | x h ) i = 1 n log P ( v i ( h ) | C c ) P ( v i ( h ) | C ¯ c ) log P ( C ¯ c ) P ( C c ) + log β C c 1 β C c
Then  x h B N D ( α C c , β C c ) ( C c ) .
If let  P = i = 1 n log P ( v i ( h ) | C c ) P ( v i ( h ) | C ¯ c ) , then obtain a new  P O S  domain,  NEG  domain and  BND  domain, which is the final 3WD-INB classification rule, as shown in Equation (19).
P O S ( α C c , β C c ) ( C c ) = { x h | P α C c } N E G ( α C c , β C c ) ( C c ) = { x h | P β C c } B N D ( α C c , β C c ) ( C c ) = { x h | β C c P α C c }
where
α C c = log P ( C ¯ c ) P ( C c ) + log α C c 1 α C c β C c = log P ( C ¯ c ) P ( C c ) + log β C c 1 β C c

3.4. Model Idea

Suppose the training set is  U = { x 1 , x 2 , , x N } , the incremental training set is  E = { e 1 , e 2 , , e M } , and the test set is  T = { t 1 , t 2 , , t P } . First, use the training set  U  to build a naive Bayes classifier NB and, then, carry out incremental learning according to the incremental training set  E  to build the INB classifier and, finally, combine the three decision ideas to build the 3WD-IBN classifier to classify the test set  T . In the algorithm, the maximum number of increments  I M  is added as a parameter, which can control the process of incremental learning. The general flow of the algorithm is shown in Figure 2.
The specific operation of the algorithm is as follows (Algorithm 2):
Algorithm 2: 3WD-INB Classification
Input: training set  U = { x 1 , x 2 , , x N } ; incremental training set  E = { e 1 , e 2 , , e M } ; test set  T = { t 1 , t 2 , , t P } ; thresholds for each category  ( α C c , β C c ) ; maximum number of increments  I M ; confidence factor  γ .
Output: the three-way classification results of the test set  T { P O S ,   N E G ,   B N D } .
1. Build NB classifier
2. For  a i A  do
3. For  a j A  do
4. If  v i ( h )  is Discrete Data then
5. Use (5) (6) to calculate  P ( C c ) ,  P ( C ¯ c ) ,  P ( v i ( h ) | C c )  and  P ( v i ( h ) | C ¯ c )  
6. Else
7. Use (5) to calculate  P ( C c ) ,  P ( C ¯ c )  
8. The optimal distribution obtained by fitting with Algorithm 1:   p i ( v i ( h ) | C c ) ,  p i ( v i ( h ) | C ¯ c )  
9.  P ( v i ( h ) | C c ) = p i ( v i ( h ) | C c ) ,  P ( v i ( h ) | C ¯ c ) = p i ( v i ( h ) | C ¯ c )  
10. End if
11. End for
12. End for
13. Conduct an incremental learning process
14.  i m = 0  
15. For  e j E  do
16. Use (13) to calculate the sample confidence  θ j  
17. If  θ j γ i = 1 l θ i , 1 l M  then
18.  E = E e j  
19.  U = U + e j  
20. If  v i ( h )  is discrete data then
21. Use (16-18) to update the parameters:  P ( C c ) ,  P ( C ¯ c ) ,  P ( v i ( h ) | C c ) ,  P ( v i ( h ) | C ¯ c )   c o u n t ( C c )  and  U  
22. Else
23. Perform steps 1-8 to reconstruct the NB classifier
24. End if
25. End if
26. End for
27.  i m = i m + 1  
28. If  E =  or  i m = I M  then
29 Execute 33
30. Else
31. Execute 15
32. End if
33. Compute the threshold  ( α C c , β C c )  for category  C c  according to (20)
34. Carry out three-way decision-making classification on category  C c , and get  P O S ,   N E G ,   B N D  
35. For  t = 1 , 2 , , p  do
36. Calculate  P = i = 1 n log P ( v i ( h ) | C c ) P ( v i ( h ) | C ¯ c )  
37. If  P α C c  then
38.  t t P O S ( α C c , β C c ) ( C c )  
39. Else if  P β C c  then
40.  t t N E G ( α C c , β C c ) ( C c )  
41. Else if  β C c P α C c  then
42.  t t B N D ( α C c , β C c ) ( C c )  
43. End if
44. End for
45. Return classification results:  { P O S ,   N E G ,   B N D }  

4. Experimental Results and Analysis

4.1. Dataset and Experimental Environment

To verify the classification performance of the algorithm, seven discrete datasets and eight continuous datasets are selected for experiments. The datasets are all from the official website of UCI or Kaggle. The dataset information is shown in Table 3. To ensure that the running environment of the comparison experiment is the same, all the simulation results in this paper are obtained by programming in Python language under the environment of Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 2.59 GHz, RAM 16GB.

4.2. Evaluation Indicators

For the traditional binary classification model of binary decision-making, accuracy ( A C C ), recall ( R e c a l l ), precision ( P r e c i s i o n ), and F1-score ( F 1 ) are usually used to evaluate the classification performance. These indicators are based on the classification confusion matrix of binary decision-making, as shown in Table 4.
Accuracy (ACC) describes the overall classification accuracy, as shown in Equation (21).
A C C = T P + T N T P + T N + F P + F N
R e c a l l  is the ability of the classifier to find all positive samples. The  R e c a l l  value is 1 at best and 0 at worst. The calculation process is Equation (22).
R e c a l l = T P T P + F N
Precision is the ability of the classifier not to label negative examples as positive examples. The best value of Precision is 1, and the worst is 0. The calculation process is Equation (23).
P r e c i s i o n = T P T P + F P
F1-score ( F 1 ) can be regarded as a harmonic average of model precision and recall, with a maximum value of 1 and a minimum value of 0. The calculation formula is shown in Equation (24).
F 1 = 2 × A C C × R e c a l l A C C + R e c a l l
Since F1-score can comprehensively reflect the accuracy rate (ACC) and recall rate ( R e c a l l ), this paper selects two indicators of F1-score (F1) and  P r e c i s i o n  to evaluate the classification performance.
For the evaluation index of the three-way decision model, this paper refers to the processing method of Jia et al. [43]. The classification confusion matrix of the three decisions is shown in Table 5. In Table 5 n x y  represents the number of samples when the actual class  x  is judged as class  y .
For the accuracy of the three-way decision, the calculation formula is shown in Equation (25).
A C C = n P P + n N N n P P + n P N + n N P + n N N
For  R e c a l l , it is necessary to take into account the positive samples divided into the  N E G  domain, and the calculation formula is shown in Equation (26).
R e c a l l = n P P n P P + n N P + n N P
For  P r e c i s i o n , it is necessary to take into account the positive samples divided into the  N E G  domain, and the calculation formula is shown in Equation (27).
P r e c i s i o n = n P P n P P + n P N + n N P
The calculation formula of the F1-score is consistent with Equation (24).

4.3. Parameter Changes and Analysis of Experimental Results

Use the dataset to use 3WD-INB for classification testing, using fivefold cross-validation, four for the training set, one for the test set, where the ratio of the training set in the training set to the incremental training set is  U : E = 1 : 3 , and the maximum number of increments  I M = 20 . Use  F 1  and  P r e c i s i o n  to evaluate results.

4.3.1. Threshold α and β Change Analysis

Thresholds α and β are important parameters of the 3WD-INB model, usually between 0 and 1. In the experiment of exploring the change of thresholds α and β, keep the confidence coefficient  γ = 0.7  unchanged, and the thresholds α and β take a fixed step 0.1 change for experiments.
For discrete data, take the mushroom and breast datasets as examples, and for continuous data, take the WDBC dataset as an example. Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8 show the change of  F 1  and  P r e c i s i o n  as the threshold  ( α , β )  changes for the three datasets (only the results of one category of each dataset are shown).
It can be seen from the figure that as the threshold value  ( α , β )  changes, the evaluation indicators  F 1  and  P r e c i s i o n  are also constantly changing, achieving better-expected results. Due to the regulation of  α > β , in the case of  α β , the values of  F 1  and  P r e c i s i o n  are both 0; in the case of  α > β , the overall  F 1  and  P r e c i s i o n  of the classification results of 3WD-INB have maintained a relatively high level; due to the comprehensive, consider  F 1  and  P r e c i s i o n , so using the average of the two as a reference, the breast dataset is optimal at  α = 0.7 , β = 0.6 , the mushroom dataset is optimal at  α = 0.6 ,     β = 0.3 , and WDBC is optimal at  α = 0.8 ,     β = 0.2 .
Simulation experiments are also carried out on other datasets, and the optimal threshold combination  ( α , β )  under each category of all selected datasets is obtained, as shown in Table 6.

4.3.2. Change Analysis of Confidence Coefficient γ

The confidence coefficient  γ  is an important parameter in the incremental learning process, generally between 0.5 and 1. In the exploration of the change analysis experiment of confidence coefficient  γ , the optimal threshold  ( α , β )  in Table 6 is used for each dataset. The maximum number of increments  I M = 20 , and the confidence coefficient  γ  is tested with a step size of 0.05.
After experiments, the  F 1  changes of all datasets are shown in Table 7 (only one of the changes is shown).
In the analysis, confidence factor  γ  plays a good role in the incremental learning stage. From the test results of the experiment, in most cases, with the increase in the confidence factor  γ , the value of  F 1  usually rises first and then decreases, and the highest point tends to appear in the middle. For breast datasets, the peak locations occur between 0.65 and 0.8, indicating that the confidence factor  γ  is the best set in the next range. From the analysis, we can see that the WDBC and iris datasets are not aware of the confidence factor  γ  and have a wide optimal confidence range, which may be due to the high quality of the dataset. The breast and chess datasets show that when the confidence coefficient  γ  increases from 0.55 to 0.95, the corresponding classification accuracy first remains unchanged from the maximum value, then decreases gradually. The WDBC dataset is insensitive to the confidence factor  γ , and the accuracy remains unchanged at 0.9735. The vote and other datasets show a trend of change that first rises and then decreases. Therefore, it is concluded that the larger the confidence factor  γ , the higher the data quality requirements for incremental training samples of incremental learning, but the larger the  γ , the better, because a larger confidence factor  γ  may lose a large number of features in incremental training set samples. The smaller the confidence factor  γ , the lower the data quality requirement for incremental training samples, and a large number of features can be preserved, but the poorer samples may also be added to the training set. Therefore, in the process of practical application,  γ  can be adjusted reasonably according to the data quality, and good results can be achieved.

4.4. Comparative Experimental Analysis

4.4.1. Comparative Analysis in Discrete Data

Using discrete datasets, use 3WD-INB for classification testing, select RF, SVM, KNN, NB, INB, and NB-IPCA (from the literature [2]) for comparative experiments and use fivefold cross-validation, four for the training set, one for the test set, where the ratio of the training set in the training set to the incremental training set is  U : E = 1 : 3 , and  F 1  and  R e c a l l  are used for the result evaluation. After the experiment, the results shown in Table 8 are obtained (showing the evaluation indicators of all categories).
As can be seen from Table 8, when the threshold  ( α , β )  and confidence factor  γ  are given a certain value, the classification performance of 3WD-INB is better than other comparable models in most cases. For the mushroom datasets with a large number of samples, the results are the best, except that the  C 2  category is lower than the SVM model. For the chess datasets with more features, the results of 3WD-INB are higher than those of comparable models. For the Hayes-Roth datasets with a small number of features and samples, the results of 3WD-INB are also significantly superior, with the  C 3  class having better performance in different classifiers. For the breast and vote datasets with a moderate number of samples and features, 3WD-INB has a significantly better effect on breast than other models and has a smaller difference in the  C 1  category of vote than the RF and SVM models. For the car and lymphography datasets with a large number of categories, the overall classification performance of 3WD-INB is much better than other types of Bayesian models and traditional models. After the above analysis, 3WD-INB has a good effect on datasets with different numbers of samples and features, as well as on the learning of multiple classified samples, but overall, for datasets with a higher number of features, although compared with other models, there is still room for improvement objectively.

4.4.2. Comparative Analysis in Continuous Data

Using discrete datasets, 3WD-INB was used for classification testing, RF, SVM, MLP, D-NB, and G-NB were selected for comparative experiments, fivefold cross-validation was adopted, four were the training sets, and one was the testing set. The ratio of the training set is  U : E = 1 : 3 , and  F 1  and  R e c a l l  are used for the result evaluation. After the experiment, the results shown in Table 9 are obtained (showing the evaluation indicators of all categories).
From Table 9, it can be concluded that 3WD-INB performs better than other comparable models in most cases, even for continuous data, when the threshold AB and the confidence factor Y are given a certain value. For the large Magic04 dataset, 3WD-INB gives full play to the advantages that Bayesian should have. Classification performance is obviously lower than the G-NB classifier only under the CC class because of the traditional model. For the waveform datasets with the same number of samples, the results of each index 3WD-INB under the three types of results are better, and the RF model is also better. For the WDBC datasets with a large number of features, 3WD-INB is optimal in all categories. For the iris datasets with a small number of features and samples, 3WD-INB meets the criteria of a perfect classifier, which proves that 3WD-INB has excellent performance with a low number of features. For the Pima Indians diabetes and banknote authentication datasets with a moderate number of samples and features, 3WD-INB performs better and more stably than traditional classifiers and has greater overall advantages, and all indicators are improved to a great extent. For the multi-classification datasets of glass and segmentation, 3WD-INB also showed satisfactory results. MLP and SVM are only better than 3WD-INB in individual categories, and the 3WD-INB classifier is more stable. Overall, 3WD-INB is also stable on continuous datasets, regardless of whether it is a two-class or multi-class task.

4.4.3. Algorithm Time-Consumption Analysis

The algorithm’s time consumption is an important index to evaluate an algorithm. We expect that the algorithm will still perform well with lower time consumption. Due to the different configurations of different running environments, the direct running time does not accurately reflect the time consumption of the algorithm, so we select the fastest-running algorithm as the benchmark algorithm, make its running time 1, and then test the relative running time of other algorithms and the base algorithm. For discrete data, select the fastest NB classifier as the base algorithm. For continuous data, G-NB is chosen as the base algorithm.
After testing, the running time of each algorithm under discrete data is shown in Table 10, and the running time of each algorithm under continuous data is shown in Table 11.
After analysis, it can be seen that under the discrete data, since 3WD-INB does not need distribution fitting, the time consumption of the algorithm is close to that of the NB classifier. Compared with the traditional RF, SVM and other algorithms, the time consumption is shorter under the same conditions. Under continuous data, since 3WD-INB needs to fit the data distribution, the time-consuming performance of the algorithm is not as good as that under discrete data, but the overall time consumption is still due to the RF and MLP models. To sum up, 3WD-INB is not bad in terms of algorithm time consumption. In most cases, the time consumption is relatively low, and it may decrease when the number of attributes is large.

5. Conclusions and Future Work

Considering that in the process of classification, uncertain objects are forcibly divided into certain categories that do not conform to people’s actual decision-making processes and real-world data are often acquired dynamically; combining incremental learning, three-way decision ideas, and naive Bayes classifiers, a three-way incremental naive Bayes classifier (3WN-INB) is proposed. Screen samples with high data quality through incremental learning, perform three-way classification through three-way decision thinking, and use distribution fitting for continuous data to estimate the posterior probability of the data according to the minimum residual sum of squares (RSS), so that 3WN-INB can be used for both discrete and continuous data. After simulation experiments under 10 datasets, 3WN-INB has greatly improved the accuracy and recall rate compared with the traditional model, which verifies that 3WN-INB has better classification performance. In our future work, we will consider the assumption of conditional independence of attributes and consider the use of semi-naive Bayes methods or Bayes network methods to make the conditional independence of each attribute stronger and further enhance the classification performance of the model.
The advantage of this paper is that the new classification utilizes three-way decision and incremental learning, which makes the classifier perform well on different types of datasets and provides a new method for the study of the classification field. Objectively, the limitation of this paper is that the assumption of conditional independence of the naive Bayesian classifier attributes has not been improved, resulting in a slight degradation in classification performance when processing datasets with a large number of attributes.
In the future, we will consider the assumption of conditional independence of attributes and the use of semi-naive Bayesian methods or Bayesian network methods, such as building three-way decision semi-naive incremental Bayesian classifiers and three-way decision Bayesian network classifiers, to further improve the impact between attributes, make the conditional independence of attributes stronger, improve the existing limitations of 3WN-INB and further enhance the classification performance of models.

Author Contributions

Conceptualization, Z.Y., J.R., C.Z. and L.W.; data curation, Z.Z. and Y.S.; funding acquisition, L.W.; investigation, Y.S.; methodology, Z.Y. and L.W.; project administration, L.W.; software, Z.Y., J.R. and M.W.; validation, Z.Y.; visualization, Z.Z., Y.S. and M.W.; writing—original draft, Z.Y., J.R. and C.Z.; writing—review and editing, C.Z. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Basic Scientific Research Business Expenses of Hebei Provincial Universities (JST2022001), Tangshan Science and Technology Project (22130225G), and Innovation and Entrepreneurship Training Project for College Students in Hebei Province (S202210081055).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Erkan, U. A precise and stable machine learning algorithm: Eigenvalue classification (EigenClass). Neural Comput. Appl. 2021, 33, 5381–5392. [Google Scholar] [CrossRef]
  2. Zhou, X.; Wu, D.; You, Z.; Wu, D.; Ye, N.; Zhang, L. Adaptive Two-Index Fusion Attribute-Weighted Naive Bayes. Electronics 2022, 11, 3126. [Google Scholar] [CrossRef]
  3. Memiş, S.; Enginoğlu, S.; Erkan, U. Fuzzy parameterized fuzzy soft k-nearest neighbor classifier. Neurocomputing 2022, 500, 351–378. [Google Scholar] [CrossRef]
  4. Kaminska, O.; Cornelis, C.; Hoste, V. Fuzzy Rough Nearest Neighbour Methods for Aspect-Based Sentiment Analysis. Electronics 2023, 12, 1088. [Google Scholar] [CrossRef]
  5. Xu, G.M.; Liu, H.Z.; Zhang, J.Z.; Wang, J.H. Improving multi-relational Naive Bayes classifier using smoothing methods. Comput. Eng. Appl. 2017, 53, 69–72. [Google Scholar]
  6. Li, S.Q.; Lu, W.Y.; Deng, T.; Chen, W. Naive Bayes Classification Algorithm Based on Improved PCA. Stat. Decis. Mak. 2022, 38, 34–37. [Google Scholar]
  7. Farid, D.M.; Zhang, L.; Rahman, C.M.; Hossain, M.A.; Strachan, R. Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 2014, 41, 1937–1946. [Google Scholar] [CrossRef]
  8. Zhang, W.J.; Jiang, L.X.; Zhang, H.; Chen, L. A Two-Layer Bayes Model: Random Forest Naive Bayes. Comput. Res. Dev. 2021, 58, 2040–2051. [Google Scholar]
  9. Gama, J.; Castillo, G. Adaptive bayes. In Advances in Artificial Intelligence—IBERAMIA 2002: Proceedings of the 8th Ibero-American Conference on, AI Seville, Spain, 12–15 November 2002; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
  10. Li, T.T.; Lu, J. Improved Naive Bayes Self-Training Algorithm Based on Weighted K-Nearest Neighbor. J. Wuhan Univ. (Nat. Sci. Ed.) 2019, 65, 465–471. [Google Scholar] [CrossRef]
  11. Qiu, N.J.; Li, N.; Hu, X.J.; Wang, P.; Sun, S.Z. Improved Native Bayes Algorithm Based on Particle Swarm Optimization. Comput. Eng. 2018, 44, 27–32+39. [Google Scholar] [CrossRef]
  12. Ramoni, M.; Sebastiani, P. Robust bayes classifiers. Artif. Intell. 2001, 125, 209–226. [Google Scholar] [CrossRef] [Green Version]
  13. Zhang, H.; Jiang, L.; Li, C. Attribute augmented and weighted naive Bayes. Sci. China Inf. Sci. 2022, 65, 222101. [Google Scholar] [CrossRef]
  14. Kaur, W.; Balakrishnan, V.; Wong, K.S. Improving multi-label text classification using weighted information gain and co-trained Multinomial Naive Bayes classifier. Malays. J. Comput. Sci. 2022, 35, 21–36. [Google Scholar] [CrossRef]
  15. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
  16. Fisher, R.A. The logic of inductive inference. J. R. Stat. Soc. 1935, 98, 39–82. [Google Scholar] [CrossRef] [Green Version]
  17. Fayyad, U.; Irani, K. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France, 28 August–3 September 1993; pp. 1022–1029. [Google Scholar]
  18. Zhang, C.Y.; Feng, X.Z.; Liu, Y.; Ma, Y.T.; Liu, F.C.; Gao, R.Y.; Ren, J. New Three-way Extended Tree Augmented Naive Bayes Classifier. Small Micro Comput. Syst. 2021, 42, 485–490. [Google Scholar]
  19. Zhou, B.; Yao, Y.Y.; Luo, J.G. A Three-Way Decision Approach to Email Spam Filtering. In Advances in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2010; pp. 28–39. [Google Scholar]
  20. Zhang, C.; Duan, X.; Liu, F.; Li, X.Q.; Liu, S.Y. Three-way Naive Bayes collaborative filtering recommendation model for smart city. Sustain. Cities Soc. 2022, 76, 103373. [Google Scholar] [CrossRef]
  21. Yao, Y. An Outline of a Theory of Three-Way Decisions. In Rough Sets and Current Trends in Computing: Proceedings of the 8th International Conference, RSCTC 2012, Chengdu, China, 17–20 August 2012; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7413, pp. 1–17. [Google Scholar]
  22. Yao, J.T.; Azam, N. Web-based medical decision support systems for three-way medical decision making with game-theoretic rough sets. IEEE Trans. Fuzzy Syst. 2014, 23, 3–15. [Google Scholar] [CrossRef]
  23. Zhou, B.; Yao, Y.; Luo, J. Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 2014, 42, 19–45. [Google Scholar] [CrossRef]
  24. Liu, D.; Li, T.R.; Li, H.X. Rough Set Theory: Based on the Three-way Decision-Making Perspective. J. Nanjing Univ. (Nat. Sci. Ed.) 2013, 49, 574–581. [Google Scholar] [CrossRef]
  25. Liu, D.; Liang, D. Three-way decision-making in a broad sense and three-way decision-making in a narrow sense. Comput. Sci. Explor. 2017, 11, 502–510. [Google Scholar]
  26. Yao, Y.Y.; Qi, J.J.; Wei, L. Formal concept analysis, rough sets and granular computing based on three-way decision-making. J. Northwest Univ. (Nat. Sci. Ed.) 2018, 48, 477–487. [Google Scholar] [CrossRef]
  27. Liang, D.; Liu, D.; Pedrycz, W.; Pei, H. Triangular fuzzy decision-theoretic rough sets. Int. J. Approx. Reason. 2013, 54, 1087–1106. [Google Scholar] [CrossRef]
  28. Liang, D.; Liu, D. Systematic studies on three-way decisions with interval-valued decision-theoretic rough sets. Inf. Sci. 2014, 276, 186–203. [Google Scholar] [CrossRef]
  29. Liang, D.; Liu, D. Deriving three-way decisions from intuitionistic fuzzy decision-theoretic rough sets. Inf. Sci. 2015, 300, 28–48. [Google Scholar] [CrossRef]
  30. Liang, D.; Xu, Z.; Liu, D. Three-way decisions with intuitionistic fuzzy decision-theoretic rough sets based on point operators. Inf. Sci. 2017, 375, 183–201. [Google Scholar] [CrossRef]
  31. Yang, J.L.; Zhang, X.Y.; Tang, X.; Feng, L. Fuzzy Rough Set Model Based on Three-way Decisions of Optimal Similar Degrees. Comput. Sci. 2018, 45, 27–32. [Google Scholar]
  32. Long, B.H.; Xu, W.H. Fuzzy three-way concept analysis and fuzzy three-way concept lattice. J. Nanjing Univ. (Nat. Sci.) 2019, 55, 537–545. [Google Scholar] [CrossRef]
  33. Xue, Z.N.; Wang, P.H.; Liu, J.; Zhu, T.L.; Xue, T.Y. Three-way Decision Model Based on Probabilistic Graph. Comput. Sci. 2016, 43, 30–34. [Google Scholar]
  34. Jia, X.; Deng, Z.; Min, F.; Liu, D. Three-way decisions based feature fusion for Chinese irony detection. Int. J. Approx. Reason. 2019, 113, 324–335. [Google Scholar] [CrossRef]
  35. Dai, J.; Chen, T.; Zhang, K. The intuitionistic fuzzy concept-oriented three-way decision model. Inf. Sci. 2023, 619, 52–83. [Google Scholar] [CrossRef]
  36. Li, W.; Huang, Z.; Li, Q. Three-way decisions based software defect prediction. Knowl.-Based Syst. 2016, 91, 263–274. [Google Scholar] [CrossRef]
  37. Chen, J.; Chen, Y.; He, Y.; Xu, Y.; Zhao, S.; Zhang, Y. A classified feature representation three-way decision model for sentiment analysis. Appl. Intell. 2022, 52, 7995–8007. [Google Scholar] [CrossRef]
  38. Chu, X.; Sun, B.; Li, X.; Ha, K.; Wu, J.; Zhang, Y.; Huang, Q. Neighborhood rough set-based three-way clustering considering attribute correlations: An approach to classification of potential gout groups. Inf. Sci. 2020, 535, 28–41. [Google Scholar] [CrossRef]
  39. Wang, X.; Gong, J.; Song, Y.; Hu, J. Adaptively weighted three-way decision oversampling: A cluster imbalanced-ratio based approach. Appl. Intell. 2023, 53, 312–335. [Google Scholar] [CrossRef]
  40. Remesh, K.M.; Nair, L.R. A Novel Technique for the Detection of Covid-19 Patients with the Applications of Three-Way Decisions using Variance-Based Criterion. Microprocess. Microsyst. 2023, 97, 104758. [Google Scholar] [CrossRef] [PubMed]
  41. Zhang, C.Y.; Qiao, P.; Wang, L.Y.; Liu, L.; Zhang, J.S. Dynamic three-way decisions and its application based on bidirectional transfer probabilistic PS-rough sets. J. Nanjing Univ. Nat. Sci. Ed. 2017, 53, 937–946. [Google Scholar]
  42. Distfit is a Python Library for Probability Density Fitting. (Version 1.4.0). Available online: https://erdogant.github.io/distfit (accessed on 15 June 2022).
  43. Jia, X.; Shang, L. How to evaluate three-way decisions based binary classification. In Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing: Proceedings of the 15th International Conference, RSFDGRC 2015, Tianjin, China, 20–23 November 2015; Springer: Cham, Switzerland, 2015. [Google Scholar]
Figure 1. The idea of three-way decision.
Figure 1. The idea of three-way decision.
Electronics 12 01730 g001
Figure 2. 3WD-IBN Algorithm Flow.
Figure 2. 3WD-IBN Algorithm Flow.
Electronics 12 01730 g002
Figure 3. Breast’s F1 change chart.
Figure 3. Breast’s F1 change chart.
Electronics 12 01730 g003
Figure 4. Breast’s Precision change chart.
Figure 4. Breast’s Precision change chart.
Electronics 12 01730 g004
Figure 5. Mushroom’s F1 change chart.
Figure 5. Mushroom’s F1 change chart.
Electronics 12 01730 g005
Figure 6. Mushroom’s Precision change chart.
Figure 6. Mushroom’s Precision change chart.
Electronics 12 01730 g006
Figure 7. WDBC’s F1 change chart.
Figure 7. WDBC’s F1 change chart.
Electronics 12 01730 g007
Figure 8. WDBC’s Precision change chart.
Figure 8. WDBC’s Precision change chart.
Electronics 12 01730 g008
Table 1. Cost Function.
Table 1. Cost Function.
Decision Making   C  (Positive Example)   C ¯  (Negative Example)
  a P   λ P P   λ P N
  a N   λ N P   λ N N
  a B   λ B P   λ B N
Table 2. Selected distribution type.
Table 2. Selected distribution type.
Serial Number12345678910
Distributionnormexponparetodweibulltgenextremegammalognormbetauniform
Table 3. Dataset Information.
Table 3. Dataset Information.
NameTypeNumber of SamplesNumber of FeaturesNumber of Categories
Breastdiscrete699102
Votediscrete435162
Mushroomdiscrete8124222
Chessdiscrete3196362
Hayes-Rothdiscrete16053
Car Evaluationdiscrete172864
Lymphographydiscrete148184
WDBCcontinuous569302
Pima Indians Diabetescontinuous76692
Banknote Authenticationcontinuous137252
Magic04continuous19,020112
Iriscontinuous15043
Waveformcontinuous5000223
Glasscontinuous21496
Segmentationcontinuous2310197
Table 4. Two-way decision confusion matrix.
Table 4. Two-way decision confusion matrix.
ReferencePrediction
PositiveNegative
PositiveTPFN
NegativeFPTN
Table 5. Three-way decision confusion matrix.
Table 5. Three-way decision confusion matrix.
Actual Positive DomainActual Negative Domain
Predicted as POS domainnPPnPN
Predicted as BND domain   n B P   n B N
Predicted as NEG domain   n N P   n N N
Table 6. Optimal Threshold.
Table 6. Optimal Threshold.
Dataset NameCategory
  C 1   C 2   C 3   C 4   C 5   C 6   C 7
Breast(0.7, 0.6)(0.5, 0.2)-----
Vote(0.4, 0.1)(0.4, 0.1)-----
Mushroom(0.6, 0.3)(0.2, 0.1)-----
Chess(0.4, 0.3)(0.5, 0.2)-----
Hayes-Roth(0.4, 0.3)(0.4, 0.3)(0.5, 0.4)----
Car Evaluation(0.5, 0.3)(0.4, 0.3)(0.7, 0.6)(0.4, 0.3)---
Lymphography(0.3, 0.2)(0.5, 0.3)(0.5, 0.4)(0.8, 0.6)---
WDBC(0.8, 0.2)(0.5, 0.4)-----
Pima Indians Diabetes(0.3, 0.1)(0.2, 0.1)-----
Banknote Authentication(0.6, 0.1)(0.2, 0.1)-----
Magic04(0.3, 0.2)(0.6, 0.5)-----
Iris(0.5, 0.4)(0.4, 0.3)(0.4, 0.3)----
Waveform(0.2, 0.1)(0.5, 0.1)(0.2, 0.1)----
Glass(0.5, 0.3)(0.4, 0.3)(0.2, 0.1)(0.4, 0.3)(0.2, 0.1)(0.4, 0.3)-
Segmentation(0.6, 0.5)(0.8, 0.5)(0.2, 0.1)(0.4, 0.3)(0.3, 0.2)(0.5, 0.4)(0.5, 0.4)
Table 7. Accuracy Statistics Table.
Table 7. Accuracy Statistics Table.
Dataset NameConfidence Coefficient γ
0.550.60.650.70.750.80.850.90.95
Breast0.90150.95590.98560.98560.98560.98560.97150.96650.9665
Vote0.96540.97860.97860.97860.97860.95510.95510.94410.9441
Mushroom0.97750.97750.98980.98980.98980.96250.96250.96250.9459
Chess0.89750.90010.90100.91150.91150.91150.90590.89490.8516
Hayes-Roth0.90100.91030.89150.89150.89150.88140.88100.88100.8810
Car Evaluation0.98090.98090.98090.98090.97550.97130.92150.93590.9001
Lymphography0.87150.89900.89900.88500.88500.88500.82110.80130.8001
WDBC0.98510.98510.98510.98510.98510.98510.98010.97990.9799
Pima Indians Diabetes0.88590.88590.89190.89900.90150.90150.89150.89150.8126
Banknote Authentication0.89150.88910.88910.91670.91670.91670.90110.90110.8756
Magic040.95570.95570.95570.95570.93550.93550.92150.92000.9167
Iris0.98751.00001.00001.00001.00001.00001.00001.00000.9810
Waveform0.82100.83900.83900.82200.81500.80010.79950.79950.7551
Glass0.66550.66550.67440.67440.67440.66790.66790.66790.6679
Segmentation0.97550.98150.98150.98150.98150.97130.97030.96670.9667
Table 8. 3WD-INB Experimental Result 1 (bold is the best result).
Table 8. 3WD-INB Experimental Result 1 (bold is the best result).
DatasetAlgorithm   C 1   C 2   C 3   C 4
  F 1   P r e c i s i o n   F 1   P r e c i s i o n   F 1   P r e c i s i o n   F 1   P r e c i s i o n
BreastRF0.92820.93330.86870.8600----
SVM0.94320.97650.90380.8545----
KNN0.93330.94380.88000.8627----
NB0.91250.90440.90010.8774----
INB0.95450.96550.92310.8728----
NB-IPCA0.95450.96550.92310.8728----
3WD-INB0.98560.98820.95310.9327----
VoteRF0.97890.98190.96610.9554----
SVM0.97890.98190.95140.9667----
KNN0.96630.95560.96470.9762----
NB0.93510.91150.93560.9222----
INB0.95350.97620.95450.9333----
NB-IPCA0.96550.97670.96550.9545----
3WD-INB0.97860.98150.97510.9799----
MushroomRF0.98110.98850.97810.9711----
SVM0.96150.97890.98931.0000----
KNN0.95510.92450.93330.9125----
NB0.95050.91450.94020.9872----
INB0.95160.91560.94160.9886----
NB-IPCA0.96540.91990.92150.9335----
3WD-INB0.98980.98910.99940.9981----
ChessRF0.90050.92920.88490.9149----
SVM0.89990.89990.89000.8855----
KNN0.88480.93560.88490.8749----
NB0.89500.92260.87490.9001----
INB0.90050.92920.88870.9015----
NB-IPCA0.90050.93350.89550.9119----
3WD-INB0.91150.94330.89870.9211----
Hayes-RothRF0.83870.92850.73680.63641.00001.0000--
SVM0.74071.00000.69570.53331.00001.0000--
KNN0.68420.61900.35290.33330.44441.0000--
NB0.84850.87500.77780.70000.92311.0000--
INB0.83870.92860.73680.63641.00001.0000--
NB-IPCA0.85590.93590.80050.81150.95450.9887--
3WD-INB0.91031.00000.85510.84451.00001.0000--
Car EvaluationRF0.96820.95000.81821.00000.99790.95591.00001.0000
SVM0.83220.86110.66670.87500.96970.94490.88891.0000
KNN0.94870.93670.81821.00000.99380.99170.96770.9375
NB0.79170.85070.63640.77780.95390.92250.88891.0000
INB0.95570.89150.88550.92250.96560.91150.90050.9375
NB-IPCA0.72730.72730.40000.57140.95350.92910.69571.0000
3WD-INB0.98090.96250.91671.00001.00001.00000.96551.0000
LymphographyRF0.89901.00000.84850.77780.80000.83331.00001.0000
SVM0.89901.00000.84850.77780.80000.83331.00001.0000
KNN0.89901.00000.89650.92860.85710.80001.00001.0000
NB0.80801.00000.85500.79460.85510.86571.00001.0000
INB0.81521.00000.92861.00000.89660.81251.00001.0000
NB-IPCA0.85561.00000.91160.94450.89880.84491.00001.0000
3WD-INB0.89901.00000.93330.93330.89890.85711.00001.0000
Table 9. 3WD-INB Experimental Result 2 (bold is the best result).
Table 9. 3WD-INB Experimental Result 2 (bold is the best result).
DatasetAlgorithm   C 1   C 2   C 3   C 4
  F 1   P r e c i s i o n   F 1   P r e c i s i o n   F 1   P r e c i s i o n   F 1   P r e c i s i o n
WDBCRF0.95590.92850.93480.9773----
SVM0.91030.83540.84331.0000----
MLP0.92310.85710.87051.0000----
D-NB0.97360.96590.96360.9302----
G-NB0.93860.95450.93860.9167----
3WD-INB0.98510.96670.97361.0000----
Pima Indians DiabetesRF0.81860.73330.68060.7941----
SVM0.80890.70000.58190.8333----
MLP0.80590.76520.73550.7083----
D-NB0.76620.79790.76620.7091----
G-NB0.84030.86320.74030.8424----
3WD-INB0.90150.88910.85510.8312----
Banknote AuthenticationRF0.99331.00000.95450.9774----
SVM0.98660.97950.99590.9779----
MLP0.99331.00000.85510.8059----
D-NB0.80730.89860.80730.7008----
G-NB0.80360.85530.80360.7398----
3WD-INB0.91670.95510.99670.9840----
Magic04RF0.90790.87800.81340.8743----
SVM0.87240.80620.69570.8650----
MLP0.86840.84360.73730.7833----
D-NB0.75590.80590.93350.9011----
G-NB0.92040.91150.89950.9559----
3WD-INB0.95570.92260.94580.9175----
IrisRF1.00001.00001.00001.00001.00001.0000--
SVM1.00001.00001.00001.00000.94540.9285--
MLP0.95590.98851.00001.00001.00001.0000--
D-NB1.00001.00000.93331.00000.93330.7778--
G-NB1.00001.00001.00001.00001.00001.0000--
3WD-INB1.00001.00001.00001.00001.00001.0000--
WaveformRF0.82530.84350.87970.87460.87870.8676--
SVM0.81520.86730.86730.85880.85710.8559--
MLP0.79260.81440.85260.85510.87600.8543--
D-NB0.80900.49840.88100.84520.88400.9277--
G-NB0.83900.52850.87800.90200.88700.8995--
3WD-INB0.83900.88500.89910.91050.91150.9456--
GlassAlgorithm   C 1   C 2   C 3   C 4
  F 1   P r e c i s i o n   F 1   P r e c i s i o n   F 1   P r e c i s i o n   F 1   P r e c i s i o n
RF0.71150.73350.66670.59980.61590.66670.85590.9556
SVM0.72000.75000.75000.80000.50000.65101.00001.0000
MLP0.74280.72220.74070.83330.50000.42860.66671.0000
D-NB0.67440.63640.65120.21430.62330.66670.90690.2500
G-NB0.70970.75570.66670.58820.33330.50000.66671.0000
3WD-INB0.72440.76470.78570.84620.65200.69950.66671.0000
Algorithm   C 5   C 6 -
  F 1   P r e c i s i o n   F 1   P r e c i s i o n ----
RF0.75000.56560.65190.7500--
SVM0.80001.00000.66670.6667--
MLP0.66670.50000.66670.6667--
D-NB0.66670.50000.66670.6667--
G-NB0.75710.65560.66670.6667--
SegmentationAlgorithm   C 1   C 2   C 3   C 4
  F 1   P r e c i s i o n   P r e c i s i o n   F 1   P r e c i s i o n   F 1   P r e c i s i o n
RF0.98000.96080.85950.94550.91850.91181.00001.0000
SVM0.80850.92680.89550.89550.67130.65751.00001.0000
MLP0.99060.98140.94030.94030.90670.85001.00001.0000
D-NB0.95000.92450.94520.86560.83100.14291.00001.0000
G-NB0.95480.95920.93380.87880.95540.67791.00001.0000
3WD-INB0.98150.98490.94660.96880.97440.95001.00001.0000
Algorithm   C 5   C 6   C 7 -
  F 1   P r e c i s i o n   F 1   P r e c i s i o n   F 1   P r e c i s i o n
RF0.97520.95161.00001.00000.82640.7937--
SVM0.99250.98501.00001.00000.58730.5522--
MLP0.97780.98651.00001.00000.84620.9778--
D-NB0.98330.92421.00001.00000.80000.5763--
G-NB0.99760.97951.00001.00000.82380.7414--
3WD-INB1.00001.00001.00001.00000.91200.9048--
Table 10. Algorithm time consumption under discrete data.
Table 10. Algorithm time consumption under discrete data.
Dataset NameThe Running Time of the Relative Basis Algorithm
RFSVMKNNNBINBNB-IPCA3WD-INB
Breast19.42.52.711.23.91.9
Vote18.91.11.711.62.61.4
Mushroom21.263.836.0111.615.912.5
Chess18.219.84.512.79.52.8
Hayes-Roth15.11.11.511.51.61.9
Car Evaluation18.814.02.713.34.63.3
Lymphography13.11.11.111.52.51.3
Table 11. Algorithm time consumption under continuous data.
Table 11. Algorithm time consumption under continuous data.
Dataset NameThe Running Time of the Relative Basis Algorithm
RFSVMMLPD-NBG-NB3WD-INB
WDBC37.12.525.81.3111.0
Pima Indians Diabetes27.05.0227.31.816.1
Banknote Authentication52.43.6302.82.918.0
Magic04214.1360.4190.335.5152.5
Iris26.61.323.41.215.4
Waveform182.8134.0581.76.5175.3
Glass17.51.74.81.516.4
Segmentation44.67.8488.01.918.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, Z.; Ren, J.; Zhang, Z.; Sun, Y.; Zhang, C.; Wang, M.; Wang, L. A New Three-Way Incremental Naive Bayes Classifier. Electronics 2023, 12, 1730. https://doi.org/10.3390/electronics12071730

AMA Style

Yang Z, Ren J, Zhang Z, Sun Y, Zhang C, Wang M, Wang L. A New Three-Way Incremental Naive Bayes Classifier. Electronics. 2023; 12(7):1730. https://doi.org/10.3390/electronics12071730

Chicago/Turabian Style

Yang, Zhiwei, Jing Ren, Zichi Zhang, Yuqing Sun, Chunying Zhang, Mengyao Wang, and Liya Wang. 2023. "A New Three-Way Incremental Naive Bayes Classifier" Electronics 12, no. 7: 1730. https://doi.org/10.3390/electronics12071730

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop