Next Article in Journal
A Spectrum-Saving Transmission Method in Multi-Antenna Satellite Communication Star Networks: Sharing the Frequency with Terminals
Next Article in Special Issue
Causal Confirmation Measures: From Simpson’s Paradox to COVID-19
Previous Article in Journal
Comparing Several Gamma Means: An Improved Log-Likelihood Ratio Test
Previous Article in Special Issue
Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

TFD-IIS-CRMCB: Telecom Fraud Detection for Incomplete Information Systems Based on Correlated Relation and Maximal Consistent Block

1
Institute of Information Technology, PLA Strategic Support Force Information Engineering University, Zhengzhou 450002, China
2
National Digital Switching System Engineering and Technological R&D Center, Zhengzhou 450002, China
*
Authors to whom correspondence should be addressed.
Entropy 2023, 25(1), 112; https://doi.org/10.3390/e25010112
Submission received: 1 November 2022 / Revised: 24 December 2022 / Accepted: 3 January 2023 / Published: 5 January 2023
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)

Abstract

:
Telecom fraud detection is of great significance in online social networks. Yet the massive, redundant, incomplete, and uncertain network information makes it a challenging task to handle. Hence, this paper mainly uses the correlation of attributes by entropy function to optimize the data quality and then solves the problem of telecommunication fraud detection with incomplete information. First, to filter out redundancy and noise, we propose an attribute reduction algorithm based on max-correlation and max-independence rate (MCIR) to improve data quality. Then, we design a rough-gain anomaly detection algorithm (MCIR-RGAD) using the idea of maximal consistent blocks to deal with missing incomplete data. Finally, the experimental results on authentic telecommunication fraud data and UCI data show that the MCIR-RGAD algorithm provides an effective solution for reducing the computation time, improving the data quality, and processing incomplete data.

Graphical Abstract

1. Introduction

The digital age has dramatically facilitated many aspects of our lives, whereas cybersecurity issues threaten the positive effects of technology. Since unsafe information and illegitimate users blend so well with regular information and users that they can hardly be distinguished, cybersecurity threats [1] especially online fraud, telecommunications fraud [2], online social network fraud [3], credit card fraud [4], bank fraud [5], and fraudulent credit applications [6], have become a knotty governance problem.
Fraud detection [7] is a kind of anomaly detection and is usually tackled as a classification problem by screening abnormal items out with traditional machine learning methods [8,9] or deep learning ones [10,11,12]. Compared with the traditional machine learning model, the deep learning model has the problems of poor interpretability and no direction for parameter adjustment, and its calculation time increases with the complexity index of the model. Traditional machine learning is still widely studied and applied because of its strong interpretability and fast computing speed. The traditional outlier detection methods are mainly based on distribution-based [13], distance-based [14], density-based [15], and clustering-based [16] perspectives. However, traditional approaches to anomaly detection rely heavily on the relevance of features to the classification task. When the feature space is large, the presence of invalid, irrelevant, redundant, or noisy attributes in the data may inevitably affect the performance of the model. As the saying goes, “Data and features determine the upper limit of machine learning, and models and algorithms only approach this upper limit”. Therefore, in the practical training process of traditional machine learning, model performance is largely affected and hindered by data. It is mainly in the following four aspects. First, the complexity of data, which usually contain multi-dimensional, multi-level, and multi-granularity information, makes the application and processing of data complex and diverse. Second, the heterogeneous data [17], which often contain non-single mixed information, such as numerical and categorical information, make it challenging to process data effectively. Third, the uncertainty [18], redundancy [19], and inconsistency [20] of the data bring certain difficulties to the classification task. Fourth, the information contained in missing data [21] is tough to use effectively.
In order to solve the above problems in telecom fraud, achieve fraud mining, and avoid unnecessary economic losses, a large amount of telecom fraud research has emerged. Traditional telecom fraud detection methods typically rely on compiling blacklists of fraudulent numbers to enable fraudulent user discovery and detection. However, fraudulent strategies have evolved, making traditional methods no longer applicable. Therefore, to mine valuable information for fraud detection from multiple network domains of telecommunication data (SMS data, user data, call communication data, app Internet data), behavioral interaction-based [22], topology-based [23], and content-based [24] approaches arise. Meanwhile, considering the rarity and expensive nature of labeled data, unsupervised methods[25,26] are utilized to achieve fraud mining. However, the above studies lack the consideration of fraud from the perspective of the uncertainty of the data itself. The incompleteness of data or the relevance of attributes plays a critical role in the effective detection of fraud problems. Information theory and rough set theory as valid means of measuring uncertainty provide new ideas for solving the telecommunication fraud problem.
In recent years, with the intensive study of rough set theory [27], outlier detection methods based on rough sets and information theory have received extensive attention and research, which provide theoretical support for discovering important information and classifying complex objects. It has strong interpretability and can deal with unlabeled, heterogeneous, redundant, incomplete, or uncertain data. Attribute reduction [19,20,21,28,29,30,31,32], or feature selection, is a method to simplify data, reduce data dimension, and improve model classification ability by filtering out irrelevant or redundant features in data, which can effectively avoid overfitting problems. However, vanilla attribute reduction algorithms [33] of classical rough set theory can only learn the information through strict indistinguishable relation division of the data. This equivalence relation is too tough to handle the incomplete, the ordered, the mixed, and the dynamic data, and these algorithms have poor fault tolerance. To overcome this limitation, variants of rough set theory, for example, the attribute importance based [19,20], the positive region based, the tolerance relation based [28], the maximal consistent block based [21], the discernibility matrix based [29], and the incremental based [30] have proved effective in incomplete information systems [34], ordered information systems [35], mixed-valued information systems [14], and dynamic information systems [36]. Generally speaking, the discernibility matrix-based is time-consuming and infeasible for large-scale datasets, while the attribute importance-based has low time complexity. Moreover, tolerance relation is the weakened form of indistinguishable relations, which can validly solve incomplete information. Maximal consistent block describes the maximal objects set under the tolerance relationship, meaning that there is neither redundant, irrelevant information nor information loss. In contrast, the maximal consistent block accurately expresses the objects’ information under coverage and has higher accuracy.
After weighing the applicability of these variants, this paper introduces a maximal consistent block to deal with the uncertainty, incompleteness, and redundancy of data in the telecom fraud detection problem for the first time. Guided and inspired by previous research, an anomaly detection method (MCIR-CGAD) based on correlation and the maximal consistent block is proposed in this paper. The main contributions of this paper are summarized as follows:
  • From the perspective of improving data quality based on the entropy function under rough set theory, we analyze the effect of attribute correlation and independence on the importance of attributes. A max-correlation and max-independence rate attribute reduction algorithm(MCIR) is designed to eliminate redundancy and noise contained in the data.
  • From the perspective of data incompleteness processing, a rough gain anomaly detection algorithm (RGAD) is constructed based on the maximal consistent blocks and information gain, which can effectively supply missing data and provide an effective solution for incomplete data processing and feature information measurement.
  • The effectiveness of the MCIR-RGAD algorithm is verified in the UCI dataset and authentic telecom fraud dataset. The results show that compared with the other eight kernel functions, the MCIR-RGAD algorithm can reduce the time complexity and effectively use the information contained in the missing data to improve the model performance.
The remainder of this paper is organized as follows. Section 2 gives the basic preliminaries of rough set theory. The design of the MCIR-RGAD algorithm is proposed in Section 3. Furthermore, Section 4 conducts the experimental analysis, and Section 5 summarizes the conclusions.

2. Preliminaries

2.1. Rough Set Theory

Rough set theory is an effective way to tackle and utilize incomplete datasets. The information contained in datasets can be represented as an information system.
An information system ( U , A , V , f ) is a decision information system, where U = { x 1 , x 2 , , x n } is a nonempty finite set of objects known as a universe. Set A = C D = { a 1 , a 2 , , a m , D } is composed of the condition attribute set C = { a 1 , a 2 , , a m } and the decision attribute set D, where C D = . The information function f : U × A V is a map from the attribute of an object to information value, i.e.,  f ( U , A ) = V . Normally, a decision information system ( U , A , V , f ) can be abbreviated as ( U , A ) .
Definition 1
(Indistinguishable Relation [37]). Given an information system ( U , A ) , A = C D , B C is an attribute subset. An equivalence relation on the set U is called the indistinguishable relation IND(B), if it satisfies:
I N D ( B ) = { ( x , y ) U × U a B , f ( x , a ) = f ( y , a ) } ,
where [ x ] I N D ( B ) = { y ( x , y ) I N D ( B ) } is a set of equivalence relations about x. Set family U / I N D ( B ) = { [ x ] I N D ( B ) x U } = { X 1 , X 2 , , X m } means a partition of U about attribute set B. U = i = 1 m X i and X i X j = ϕ ( i j ) . Normally, [ x ] I N D ( B ) and U / I N D ( B ) can be abbreviated as [ x ] B and U / B , respectively.
In an incomplete information system, the indistinguishable relation is unable to effectively divide the incomplete information. Then, the tolerance relation is given as follows.
Definition 2
(Tolerance Relation [37]). Given an incomplete information system ( U , A ) , A = C D . B C is an attribute subset. The binary relation of incomplete information on U is defined as
S I M ( B ) = { x , y U × U f x , a = f y , a , o r f x , a = * , o r f y , a = * , a B } ,
where * means the incomplete information. Denote U / S I M ( B ) as the family of all equivalence classes of S I M ( B ) , or simply U / B .
Definition 3
(Maximal Consistent Block [31]). Given an incomplete information system ( U , A ) , A = C D , B C is an attribute subset, and Y is said to be a maximal consistent block of attribute set B . If Y satisfies
(i)
x , y Y U , s.t. ( x , y ) S I M ( B ) ,then Y is called a consistent block;
(ii)
X M C B ( B ) , s.t. Y X .
where M C B ( B ) is the set of all maximal consistent blocks with B A , x U . The set of all MCB of x is denoted by M C B x ( B ) , where M C B x ( B ) = { Y Y M C B ( B ) , x Y } .
Example 1.
Consider descriptions of several users of the telecom network in Table 1. It is an incomplete decision information system ( U , A ) , A = C D , where U = { x 1 , x 2 , , x 5 } , A = { a 1 , a 2 , a 3 } with a 1 -Duration, a 2 -Place, a 3 -Platform, and * means the incomplete information.
According to the tolerance relation in Definition 2, it follows that U / A = { [ x 1 ] A , , [ x 5 ] A } , where [ x 1 ] A = [ x 5 ] A = { x 1 , x 5 } , [ x 2 ] A = { x 2 , x 3 } , [ x 3 ] A = { x 2 , x 3 , x 4 } , [ x 4 ] A = { x 3 , x 4 } .
By the concept of the maximal consistent block in Definition 3, the maximal consistent block of attribute set A is M C B ( A ) = { [ x 1 ] A , [ x 2 ] A , [ x 4 ] A } .
Definition 4
(Information Granularity [37]). Given an incomplete information system ( U , A ) , A = C D , B C is an attribute subset, and the information granularity of attribute B is defined as
G ( B ) = 1 U i = 1 U [ x i ] B U ,
where [ x i ] B and U mean the number of the indistinguishable relation set [ x ] B and set U , respectively.
Remark 1.
Given an incomplete information system ( U , A ) , A = C D , B C , conditional granularity, mutual information granularity, and joint granularity of attribute set B and D are defined as [28,38] C G B D = i = 1 U x i B x i B x i D U 2 , M G B ; D = i = 1 U [ x i ] B + [ x i ] D x i B x i D U 2 , J G B D = i = 1 U x i B x i D U 2 , where [ x i ] B [ x i ] D = [ x i ] B D means the division of knowledge under attribute B and attribute D .

2.2. Information Theory

Information entropy is a measure of system uncertainty from the perspective of an information view. The magnitude of entropy reflects the degree of chaos or uncertainty of the system through the distribution of data information.
Definition 5
(Information Entropy [37]). Given an incomplete information system ( U , A ) , A = C D , B C is an attribute subset, and the information entropy H ( B ) is defined as
H ( B ) = 1 U i = 1 U log 2 ( [ x i ] B U ) ,
where U means the element number of object set U .
Remark 2.
By [37], the information entropy is called the granulation measure. The equivalent definition of the complete information system ( U , A ) in Equation (4) is defined as
H ( B ) = i = 1 U / B X i U log 2 X i U ,
where U / B = { [ x ] B x U } = { X 1 , X 2 , , X m } , U / B = m , and  X i X j = ϕ , i j . The form of Equation ( ) is consistent with the basic definition of information entropy H ( X ) = i = 1 U / B p i log 2 p i , where p i = X i U , i = 1 m p i = 1 . Therefore, we can understand the change of information entropy from the relationship between sets by a Venn diagram. In the information theory of rough set theory, the finer the partition, the bigger the entropy.
Remark 3.
In a complete information system ( U , A ) , condition entropy H ( D B ) , mutual information H ( D ; B ) , and joint entropy H ( D B ) of attribute B and D are defined as [39] H ( D B ) = i = 1 m j = 1 n X i Y j U log 2 X i Y j X i , H D ; B = i = 1 m j = 1 n X i Y j U log 2 X i · Y j X i Y j · U , H D B = i = 1 m j = 1 n X i Y j U log 2 X i Y j U .
Theorem 1
(Entropy Measure). Given an incomplete information system ( U , A ) , A = C D , B C , conditional entropy, mutual information, and joint entropy of attribute B and D are defined as
( i ) C H ( D B ) = 1 U i = 1 U log 2 x i B x i D x i B , ( i i ) M H ( D ; B ) = 1 U i = 1 U log 2 x i B · x i D x i B x i D · U , ( i i i ) J H ( D B ) = 1 U i = 1 U log 2 x i B x i D U ,
where M H ( D ; B ) = H ( D ) C H ( D B ) .
Proof of Theorem 1 
The specific proof of Theorem 1 can be found in Appendix A.    □

3. Max-Correlation and Max-Independence Rate-Rough GAIN anomaly Detection Algorithm (MCIR-RGAD)

In the information age of Industry 4.0, the amount of data containing a large number of attributes has proliferated. However, not all attributes are relevant to the classification task. In cyberspace, data may be relevant, repetitive, or similar, which does not bring new and valuable information to the anomaly detection task, leading to unnecessary time costs. In addition, attributes that are not relevant to the anomaly detection task may be noisy and not only fail to help model learning, but may even affect detection performance. In addition, data may inevitably be lost during collection, processing, and storage. The lost data itself may contain hidden anomaly information, and a simple subjective assignment or deletion may lead to an invalid use of the lost information. As a result, the attributes in the data are usually not fully functional.
The entropy function of information theory, as a quantitative paradigm for measuring uncertainty, can effectively measure the correlation between attributes in data. In addition, the missing incomplete data has a certain degree of uncertainty, and this uncertainty may also contain valuable information. Therefore, this paper uses the mutual information function in information theory to measure important attributes that are highly relevant and less redundant to the classification task. Further, the maximal consistent blocks of rough set theory are used to process the missing data, and the information useful for the anomaly detection task is mined from the perspective of uncertainty of incomplete data to realize the improvement of the anomaly detection performance.
The main idea of this section can be divided into three parts. Section 3.1 theoretically discusses the relationship between the attribute correlation and redundancy in the incomplete information system. Then, Section 3.2 presents an optimization algorithm of attribute reduction(MCIR) with a correlation and independent information. Further, we design a rough gain anomaly detection algorithm(RGAD) based on the maximal consistent block to solve the incompleteness of authentic telecom fraud detection in Section 3.3. Figure 1 shows the framework of the proposed methodology.

3.1. Relationship of Correlation and Redundancy

To date, many criteria have been proposed to consider the correlation or redundancy of new classification information, such as criteria JMI, CMIM, CIFE, ICAP, ICI, MCI, etc., which are summarized in [19]. The criteria are shown as follows.
J J M I ( a j ) = a i B M H ( a i , a j ; D ) i (① i + ② i + ③ i ),
J C I F E ( a j ) = M H ( D ; a j ) a i B M H ( D ; a j ; a i ) i + i i i ,
J I C A P ( a j ) = M H ( D ; a j ) a i B max M H ( D ; a j ; a i ) i + ③ i B max i i ,
J I C I ( a j ) = M H ( D ; a i a j ) + M H ( D ; a j a i ) i + ③ i ,
J M R I ( a j ) = M H ( D , a j ) + a i B I C I ( D ; a i , a j ) i + ③ i + i (① i + ③ i ),
where M H ( D ; a i a j ) = M H ( D ; a i , a j ) M H ( D ; a j ) , ① i = M H ( D ; a i a j ) , ② i = M H ( D ; a i ; a j ) , and ③ i = M H ( D ; a j a i ) .
In Figure 2a, ① i manifests the relevant information of selected attribute a i B , ② i means the redundant information between the attributes a i , a j , and D , and ③ i represents the relevant information of candidate attribute.
In the literature, the correlation and redundancy of the criterion function are frequently compared between each candidate attribute a j and each attribute a i of the selected attribute set B . However, this comparison method has a lot of redundant calculations about information. Therefore, this paper regards the selected attribute reduction set B as a whole and studies the correlation and redundancy between the candidate attribute a j and the attribute reduction set B , as shown in Figure 2b.
For the convenience of formulation, we set ①, ②, and ③ denoted as ① = M H ( D ; B a j ) , ② = M H ( D ; B ; a j ) , and ③ = M H ( D ; a j B ) , respectively. In Figure 2b, ① manifests the relevant information of selected attribute a i B , ② means the redundant information between the attributes B , a j , and D , and ③ represents the relevant and independent information of candidate attribute a j .
Theorem 2
(③ ≜ ① + ② + ③). Given an incomplete information system ( U , A ) , A = C D , B C has been selected, and  a j is a candidate attribute, then the correlation and redundancy relationship between the attribute a j , B , and D satisfies
M H ( D ; a j B ) M H ( D ; B a j ) .
Proof of Theorem 2 
According to the definition of symbols ①, ②, and ③, we deduce
M H ( D ; B a j ) M H ( D ; a j B ) = (① +② + ③) − ③ = M H ( D ; B ) .
In the decision information system, the attribute set B has been selected, and  D is certain, so the division of knowledge is definite. Then, M H ( D ; B ) is a constant. There is a nonnegative constant Δ 3 = M H ( D ; B ) , such that
M H ( D ; B a j ) M H ( D ; a j B ) = Δ 3 .
Hence, Equation (7) holds, i.e., ③ ≜ ① + ② + ③.    □
Theorem 2 manifests that the correlation of the newly selected attribute a j is consistent with the attribute reduction set B a j , and the effect is the same in classification detection.
Theorem 3
(① + ③ ≜ ③ − ②). Given an incomplete information system ( U , A ) , A = C D , suppose attribute set B C has been selected, and  a j is a candidate attribute, then the correlation and redundancy relationship between the attribute a j , B , and D satisfies
M H ( D ; B a j ) + M H ( D ; a j B ) M H ( D ; a j B ) M H ( D ; B ; a j ) .
Proof of Theorem 3 
According to the definition of symbols ①, ②, and ③, we have that ( M H ( D ; B a j ) + M H ( D ; a j B ) ) ( M H ( D ; a j B ) M H ( D ; B ; a j ) ) = (① + ③) − (③ − ②)∣.
Based on Equation (8) in Theorem 2, ③ + Δ 3 = ① + ② + ③ holds, hence
( M H ( D ; B a j ) + M H ( D ; a j B ) ) ( M H ( D ; a j B ) M H ( D ; B ; a j ) )
=∣(① + ③) − (③ − ②)∣
=∣(① + ③) − [(① + ② + ③ − Δ 3 ) − ②]∣
=∣(① + ③) − (① + ③ − Δ 3 )∣ = Δ 3
Hence, Theorem 3 is proved, i.e., ① + ③ ≜ ③ − ②.    □
Theorem 3 shows that only the correlation between the new attribute a j and the selected attribute set B is considered, which is equivalent to considering the correlation and redundancy of new attributes a j .

3.2. Max-Correlation and Max-Independence Rate Algorithm (MCIR)

In light of the above analysis and inspired by the literature [19], the max-correlation and max-independence rate algorithm (MCIR) is introduced as follows.
Definition 6
(MCIR). Given an incomplete information system ( U , A ) , A = C D , B C , suppose a i B and a j C B , then the max-correlation and max-independence rate function is presented as
a j = arg a j C B max J M C I R ( a j ) ,
where J M C I R ( a j ) = { M H ( D ; a j B ) H ( a j ) min a j C B M H ( D ; B ; a j ) } , i.e. J M C I R ( a j ) = { / H ( a j ) min ②}.
The principle of the MCIR algorithm is to maximize the correlation and the independence of new classification information and minimize the redundancy between old attributes. The definition of information entropy in rough set theory is from the view of the object attribute information division. The finer the division, the greater the entropy value. Therefore, when the system increases the correlation, it tends to select attributes with more new information.
The attribute reduction algorithm based on max-correlation and max-independence rate is shown in Algorithm 1.
Algorithm 1:Max-Correlation and Max-Independence Rate (MCIR)
Input: Information system ( U , C D ) .
Output: An attribute reduction set B .
1:
compute M H ( C ; D )
2:
F e a t C ,
3:
B arg a j F e a t max M H ( D ; B ) M H ( D ; B a j )
4:
F e a t C B ,
5:
while J M H ( a j ) θ do
6:
   for  a j F e a t  do
7:
     if  F e a t = 0  then
8:
         B B
9:
     else[ F e a t 0 ]
10:
         a j arg a j F e a t max J M C I R ( a j )
11:
         B B + { a j }
12:
         F e a t F e a t { a j }
13:
         J M H ( a j ) C H ( D C ) C H ( D B )
14:
     end if
15:
   end for
16:
end while
With the data obtained in the different scenarios, the importance of the correlation and redundancy between attributes exists in diversity. In other words, in the incomplete information system, when the effect of correlation is far biggerer than the redundancy, it is more effective to add new information related to the decision attribute. When similar, redundant, and repetitive information causes noise to affect the detection and classification, it is necessary to increase the correlation and reduce the redundancy.
From the relationship of relevance and independence of the MCIR algorithm in Definition 6, Figure 2c satisfies ③ j < k = l , ② l = j . It shows that the order of attribute importance is a k a l a j , i.e., attribute a k is better than attribute a i , and attribute a i is better than attribute a j , which can be sorted correctly by the MCIR algorithm.

3.3. Rough Gain Anomaly Detection Algorithm with Max-Correlation and Max-Independence Rate (MCIR-RGAD)

An anomaly detection algorithm (MCIR-RGAD) is designed based on the maximal consistent block horizontally supplementing reduced data. Then, anomaly detection is carried out for the new complemented data. Inspired by the design of information gain in the decision tree, the main idea of the MCIR-RGAD algorithm is to construct a correlation function to measure the ability of attribute classification.
The decision tree, one of the basic classification methods of machine learning, achieves classification tasks by the characteristics of data information. It has fast classification speed, strong interpretability, and readability. Generally, the decision tree learning process consists of feature selection, decision tree generation, and decision tree pruning. In the decision tree, to improve the learning efficiency of the decision tree, the kernel functions, such as information gain, information gain rate, or Gini coefficient, are used to select important features, and then, the decision tree is constructed recursively based on the kernel function. To avoid the occurrence of classification overfitting, we prune the decision tree to balance the model complexity while ensuring the fitting accuracy of the training data.
Both attribute reduction and decision tree work by finding significant features that can classify decision features in information systems. Attribute reduction algorithms can effectively find relevant classification features and achieve effective feature selection. In addition, since there may be intersections in the equivalence class of the object set divided by the maximally consistent block in the incomplete information system, the completeness is not satisfied, i.e.,  i p i 1 , and there is a negative value when using the information gain for decision learning. Therefore, this paper designs an improved algorithm(MCIR-RGAD) to solve the anomaly detection problem in incomplete systems. Moreover, similar, redundant, repeated, or invalid features are filtered out by reducing. Therefore, this paper does not consider decision pruning.
Frequently, missing data is handled simply by deleting the missing row, filling in zero, filling in one, or filling in the previous data information. However, the explicit deletion or subjective filling of the acquired information will destroy the original data information, so that the missing information cannot be effectively utilized and processed. In an incomplete information system, knowledge can be divided according to the compatibility between available and missing information. This division method not only does not lose the existing data information, but also is more objective. The definition of the kernel function, rough gain R G , is given below.
Definition 7
([40] Rough Entropy). Given an incomplete information system ( U , A ) , A = C D , B C , rough entropy E r ( B ) is defined as
E r ( B ) = i = 1 U 1 U log 2 1 [ x ] B ,
where rough entropy E r ( B ) satisfies E r ( B ) + H ( B ) = log 2 U .
Inspired by the literature [40], this paper presents a generalized form of the definition of rough entropy for decision making in information division as shown in Definition 8.
Definition 8
(Decision Rough Entropy). Given an incomplete information system ( U , A ) , A = C D , B C , the maximal consistent block of attributes B and D are M C B ( B ) = { B 1 , , B k } , M C B ( D ) = { D 1 , , D m } , then the decision rough entropy E r ( D B ) is defined as
E r ( D B ) = j = 1 m i = 1 k B i U log 2 B i D j B i .
Definition 9
(Rough Gain). Given an incomplete information system ( U , A ) , A = C D , B C , the maximal consistent block of attributes B and D are M C B ( B ) = { B 1 , , B k } , M C B ( D ) = { D 1 , , D m } , then the rough gain are defined as
R G ( D , B ) = g · R r ( D , B ) + ( 1 g ) · 1 G r ( D , B ) ,
where g [ 0 , 1 ] is a positive constant, R r ( D , B ) = E r ( D B ) E r ( B ) is the rough entropy rate, E r ( D B ) is decision rough entropy, E r ( B ) is rough entropy of attribute B , G r ( D , B ) = G ( D , B ) H ( B ) is the information gain rate, G ( D , B ) = H ( D ) H ( D B ) is the information gain, H ( D ) = i = 1 m D i U log 2 D i U , and H ( D B ) = i = 1 k j = 1 m B i D j U log 2 B i D j B i .
Therefore, this paper selects features based on the MCIR algorithm, then combines the advantage of the information gain with rough entropy to deal with missing data information. We design an anomaly detection algorithm, MCIR-RGAD algorithm, to achieve the task of anomaly detection. The specific algorithm is shown in Algorithm 2.
Essentially, the MCIR-RGAD algorithm replaces the information gain function of the decision tree with the rough gain function in Definition 9. Contrary to the information gain, a smaller rough gain indicates a better attribute, and the other parts are consistent with the decision tree. Therefore, consistent with the decision tree model, the time complexity of this model is O ( n 2 ) .
Algorithm 2:MCIR-RGAD algorithm
Input: Information system ( U , C D ) , an attribute reduction set B , threshold ϵ > 0 .
Output: A decision tree T
1:
compute B     U s i n g A l g o r i t h m
2:
compute M C B ( B ) = { B 1 , , B k } , M C B ( D ) = { D 1 , , D m } .   U s i n g D e f i n i t i o n
3:
if ( U , C D ) is incomplete then
4:
    g 1 , R G ( D , B ) R r ( D , B ) (Equation (14))
5:
else
6:
    0 < g < 1 , R G ( D , B ) g · R r ( D , B ) + ( 1 g ) · 1 G r ( D , B ) (Equation (14))
7:
end if R e c u r s i o n P o i n t
8:
for a j B do
9:
    a j arg a j B min R G ( D , a j )
10:
    B B a j
11:
   if  R G ( D , a j ) > ϵ  then
12:
      L a b e l T arg f ( U , D i ) max D i
13:
   else
14:
      M C B ( a j ) { X 1 , , X s }
15:
     for  X i M C B ( a j )  do
16:
         ( U , C D ) ( X i , B D )
17:
        if  M C B ( D ) = 1  then
18:
           L a b e l T f ( U , D )   B = ϕ
19:
           L a b e l T arg f ( U , D i ) max D i
20:
        else
21:
          return Step 8
22:
        end if
23:
     end for
24:
   end if
25:
end for
26:
return T

4. Experimental Analysis

The UCI Machine Learning Repository datasets (https://archive.ics.uci.edu/ml/index.php accessed on 12 April 2022) and the Sichuan telecom fraud phone datasets (https://aistudio.baidu.com/aistudio/datasetdetail/40690 accessed on 12 April 2022) are used to verify the effectiveness of the proposed method in this section. The MCIR-RGAD the orithms are coded in Python using Visual Studio Code and were run on a remote server with a GPU, NVIDIA GeForce RTX 3090, 48 RAM.
The Sichuan telecom fraud phone dataset consists of four datasets, namely call data (VOC), short message service data (SMS), user information data (USER), and Internet behavior data (APP). The Union data is an integrated dataset combined based on user phone numbers and contains the attribute of four datasets: Voc dataset, APP dataset, SMS dataset, and User dataset. The details of the datasets are described in Table 2.
The goal of this paper is to detect fraudulent users among regular users according to the important attribute efficiently selected by the correlation and independence from the perspective of data uncertainty and incompleteness. Next, we discuss the effectiveness of the method proposed in this paper from three aspects: incompleteness of data (Definition 3), MCIR attribute reduction algorithm (Section 3.2), and MCIR-RGAD anomaly detection classifier (Section 3.3).

4.1. Incompleteness of Data

Loss of data during recording, storage, or transmission is a very likely problem. Normally, the way to deal with incomplete information is to delete it directly, fill it with zeero, one or mean value; however, this simple way of dealing with it will cause the loss of information. As can be seen from Figure 3, in the Sichuan Telecom fraud dataset, most of the users with null values (red parts) are abnormal, and if they are directly deleted or simply assigned, the abnormal information will not be effectively used. Therefore, from the perspective of improving data quality, this paper uses the idea of maximal consistent blocks in rough set theory to deal with incomplete data to achieve effective information mining.
Then, Table 3 and Figure 4 further illustrate the effectiveness of the maximal consistent block in handling incomplete data. Table 3 and Figure 4 are the performance comparisons of tackling null values under authentic incomplete telecom fraud data and random deletion of artificially constructed incomplete data (5%, 10%,, 50%, 10 types of data missing ratios). From the perspective of the accuracy (Figure 4a), recall (Figure 4b), F1 (Figure 4c), and the number of correct predictions, the maximal consistent block (MCB) can effectively utilize incomplete information and avoid unnecessary information loss.

4.2. Attribute Reduction under MCIR Algorithm

This paper proposes an attribute reduction algorithm of MCIR which uses the entropy function to measure the correlation and independence of attributes from the perspective of rough set theory. The calculation time of the algorithm is reduced while ensuring the accuracy of the telecom fraud detection problem. The main idea is to reduce the computation time by filtering out partial attributes that are most relevant to fraudulent users and have the greatest independence (least redundancy).
Experiments on UCI and telecom fraud data show that the computation time of the data can be significantly reduced by filtering out important attributes. Figure 5 and Figure 6 further illustrate that the MCIR algorithm not only effectively reduces the computation time, but also eliminates the adverse effects of noise on information, improves data quality, and maintains or even improves the accuracy of model detection.
Generally, datasets can be roughly divided into four types, namely: non-redundant and noise-free dataset (Figure 7a Car, approximated as a strictly monotonically increasing function), non-redundant and noisy dataset (Figure 7b Adult, approximately concave function), redundant and noisy dataset (Figure 7c Bank, approximately non-increasing function), and redundant and non-noise dataset (Figure 7d Mushroom, approximately non-decreasing function). Redundancy shows the approximation, repetition, and correlation of attributes in the data with each other; noise refers to the interference and misleading effects of certain attributes in the data on the classification task. Specifically, for a non-redundant and noise-free dataset, there is no need to perform attribute reduction, and each dimension of features is important information. For other types of data, it is necessary to remove redundant and noisy attributes. In addition, it can be seen from Figure 7 that compared with other different attribute reduction algorithms ( J C G [41], J C H [42], J M G , J M H [43], J J G , J J H , ③, ① + ② + ③ [44], ① + ③ [19], ③–② [45]), the MCIR algorithm (red dotted line) designed in this paper achieves better accuracy with fewer attributes. Since the MCIR model removes as many redundant or noisy attributes from the data as possible and achieves data optimization through data dimensionality reduction, making the reduced data better for anomaly detection tasks, the model can maintain or even improve the accuracy of performance detection while reducing the time complexity.
Therefore, in the process of data processing, the MCIR algorithm can use partial important attribute information to shorten the computation time and effectively improve the detection accuracy of the model (Figure 7, the black dotted line).
Next, the feature selection of the telecom fraud dataset under the MCIR algorithm is discussed. Figure 8 shows correlations within attributes via a heatmap. Among them, Figure 8a,b are the correlations before and after attribute reduction, respectively. As shown in Figure 8, when attribute reduction is not performed, the data contain a lot of redundant information (dark patches). This paper constructs the MCIR attribute reduction algorithm from the perspective of attribute uncertainty and correlation, which can reduce the information redundancy degree of data while reducing data weight.
Further, the boxplot and probability distribution plot in Figure 9 show the difference in statistical distribution between normal and abnormal users. The important attributes selected based on the MCIR-RGAD algorithm can effectively highlight the difference between abnormal users and normal users, and fraudulent users can be filtered out by the selected important attributes. Compared with the original Union dataset of 84 attributes with 85.84% detection performance (Table 4), the detection performance of 10 attributes after MCIR simplification is improved to 89.96%, indicating that the MCIR model involved in this paper effectively achieves the selection of important attributes. To further visualize how the selected attributes distinguish between normal and fraudulent users, Figure 9 depicts the box line plot and statistical distribution of the 10 important attributes in the telecom fraud dataset filtered by the MCIR method. From Figure 9, it can be seen that the distributions of normal users and fraudulent users under the 10 attributes have large differences, mainly in the form of (a, f, e) with large difference in mean and variance, (b, d, j, g, h) with large difference in variances with similar means, and (c, i) with large difference in means with similar variances. The larger the difference between the mean and variance distributions of normal and fraudulent users for the selected attributes, the more effective it the method is in distinguishing fraudulent users.

4.3. Anomaly Detection under MCIR-RGAD Algorithm

Redundancy and noise attributes are removed from the original data to improve the data quality of the MCIR algorithm. Then, to perform effective anomaly detection on incomplete data containing missing content, this paper designs the MCIR-RGAD algorithm based on maximal consistent blocks. It provides an effective solution for the processing and utilization of incomplete data.
In the anomaly detection of the decision tree, six types of kernel function classification algorithms, namely Information Gain G ( D , B ) , Information Gain Rate G r ( D , B ) , Gini Coefficient, Rough Entropy R r ( D , B ) , Rough Entropy Rate R r ( D , B ) , and Rough Gain R G ( D , B ) , are compared in this paper. As shown in Figure 10, the rough gain anomaly detection algorithm (RGAD) integrates rough entropy and information gain as the kernel function has better performance.
The performance and computation time of nine types of attribute reduction algorithms are shown in Table 5 and Table 6. Compared with other algorithms, the MCIR-RGAD algorithm proposed in this paper can effectively achieve classification detection in a shorter time.
To effectively measure the trade-off between detection performance and computation time cost of an algorithm, this paper designs a robustness metric in Definition 10. In the robustness metric, since computation time and performance level have different importance in different application scenarios, a linear parameter k is designed to trade off the importance of time and performance. The telecom fraud problem in this paper pays more attention to the accuracy of the model; hence, the hyperparameter weight in the robustness metric is set as k = 0.4 .
Definition 10
(Performance Robustness).
R o b u s t = k · T + ( 1 k ) · P ,
where P = P a P b is the degree of performance retention, T = T b T a T b is the degree of time optimization, P b , P a , T b , and T a are the performance and time before and after attribute reduction, respectively, and k [ 0 , 1 ] is a weight parameter of time, which means the importance of time cost.
Then, Table 4 shows the number of attributes after attribute reduction for different datasets and shows the changes in performance and computation time of the MCIR-RGAD algorithm before and after attribute reduction. Note that this paper compares the performance and computation time of different algorithms in the same number of attribute reduction sets B .
The performance robustness metric with less computation time and high performance indicates that the designed classifier algorithm is better. The accuracies and computation time in Table 4 and Table 5 and Figure 11 show the robustness under the different attribute reduction algorithms. Compared with other algorithms, the MCIR-RGAD algorithm has strong robustness. That is, when the number of attributes in the attribute set is reduced to the same number, the anomaly detection algorithm MCIR-RGAD can effectively ensure the accuracy of classification detection while shortening the calculation time.

4.4. Statistical Test Analysis

Two nonparametric statistical test analyses of the Friedman test and the Nemenyi post hoc test are introduced to further verify the validity of the comparison method and the proposed method. We compare the performance differences at a significance level of α = 0.05 .

4.4.1. Friedman Test

The Friedman test can effectively determine whether there is a significant difference in algorithm performance. Suppose we compare K algorithms on N datasets. In the Friedman test, the null hypothesis assumes that there is no significant difference between the models. First, the models were ranked on different datasets using the performance accuracy cases in Table 5. Then, we acquire the average of overall ranking for each model, R a v e j = 1 N i = 1 K r i . The performance ranking of the nine algorithms on the nine datasets is given in Table 7. When the performance of the algorithms is equal, the ordinal values are averaged. For example, if the performance of the 7 algorithms ( J C G , J M G , J C H , J M H , ③, ①+③, J M C I R R G A D ) under the Car dataset in Table 5 is equal, then their rank values are r i = 1 + 2 + 3 + 4 + 5 + 6 + 7 7 = 4 .
The Friedman statistic τ χ 2 = 12 N K K + 1 i = 1 K r i 2 K K + 1 2 4 is distributed according to χ 2 -distribution with K 1 degrees of freedom, when K and N are large enough. Owing to the overly conservative nature of the original Friedman test, the variable τ F = N 1 τ χ 2 N K 1 τ χ 2 is commonly used today, which is distributed according to F-distribution with K 1 and ( K 1 ) ( N 1 ) degrees of freedom, i.e., τ F F ( K 1 , ( K 1 ) ( N 1 ) ) .
This paper compares nine algorithms using nine datasets. In the Friedman test, if the p-value is less than the significance level or the τ F value is greater than the critical value F ( 8 , 64 ) determined by the F-distribution table, the null hypothesis can be rejected, and at least two algorithms are considered to have significant differences. By checking the table and calculating, we have τ F = 5.0208 > F ( 8 , 64 ) = 2.0868 and p = 2.688 × 10 5 < 0.05 . Therefore, the null hypothesis can be rejected with 95% confidence level, indicating that there is a significant difference between the algorithms in the model. Then, a pairwise comparison of the benchmark algorithms was performed using the Nemenyi post hoc test.

4.4.2. Nemenyi Post Hoc Test

In the Nemenyi post hoc test, the performance of two models is considered to be significantly different if the average rank value R a v e j of the two models is greater than or equal to the criterion distance ( C D = q α K K + 1 6 N ) , where the critical value q α obeys the Tukey distribution. By checking the table and calculating, q 0.05 = 3.102 under the confidence level α = 0.05 , then C D = 4.0047 . It can be seen from Figure 12 that the MCIR-RGAD model is optimal and significantly different from J J H and J J G . In addition, J C H , J M H , and 3 are equivalent, and J M G is equivalent to J C G . Namely, the model performance can be ordered as J M C I R R G A D > J C H = J M H = > J M G = J C G > ①+③ > J J G > J J H .

5. Conclusions

It is crucial and time-consuming to obtain anomaly classification information in big data with uncertainty, redundancy, and incompleteness. In this paper, a new attribute reduction algorithm (MCIR) is proposed based on the correlation and independence of the data. Furthermore, considering the consistency of attribute reduction and decision tree in selecting features, this paper combines their advantages and constructs an anomaly detection algorithm called RGAD to tackle incomplete data based on the maximal consistent blocks. The proposed algorithm (MCIR-RGAD) can significantly reduce the computation time and effectively maintain or improve the accuracy. Therefore, facing the problem of anomaly detection, this paper provides an effective solution for the optimization of data quality and the processing of incomplete data.
In the future, we plan to extend this work in the context of unsupervised learning from the perspective of structural information among objects, using the concept of neighborhood information systems in rough set theory. The extended work will optimize the data quality and reduce the time complexity through attribute reduction methods, improve the detection performance of classification tasks through structural information, and maximize valuable information through incomplete mixed data (both categorical and numerical data). This will provide an effective solution to the research of information theory and rough set theory on anomaly detection problems.

Author Contributions

Conceptualization, software, R.L. and B.W.; methodology, H.C.; validation, R.L., B.W. and X.H.; formal analysis, K.W.; writing—original draft preparation, R.L.; writing—review and editing, S.L. and H.C.; visualization, X.H.; supervision, project administration, funding acquisition, H.C. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Major Scientific and Technological Special Project of Henan Province under Grant 221100210700.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in research are publicly available at https://archive.ics.uci.edu/ml/index.php accessed on 12 April 2022 and https://aistudio.baidu.com/aistudio/datasetdetail/40690 accessed on 12 April 2022. The codes of this research will be made available on request.

Acknowledgments

Thanks to the Sichuan Provincial Data Center and Sichuan Mobile for providing the data without which this work would not have been possible. In addition, the authors thank the editors and reviewers for their valuable comments. This work was supported by the by the Major Scientific and Technological Special Project of Henan Province under Grant 221100210700. The authors gratefully thank the associate editor and the reviewers.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 1 
For a complete information system, we have that X i X j = , Y i Y j = , i j . Support that U / A = { [ x 1 ] A , , [ x U ] A } , X i = { x i 1 , x i 2 , , x i k i } , Y j = { x j 1 , x j 2 , , x j l j } , and X i Y j = { x i j 1 , x i j 2 , x i j z i j } , i = 1 , 2 , , m , j = 1 , 2 , , n in set X i , where k i , l j and z i j represent the number of elements in the set X i , Y j and X i Y j , respectively. X i Y j X i and j = 1 n ( X i Y j ) = X i , i = 1 m ( X i Y j ) = Y j , then j = 1 n z i j = k i , i = 1 m z i j = l j , i = 1 m X i = i = 1 m k i = j = 1 n Y j = j = 1 n l j = U .
(i) Proof of C H ( D B ) = 1 U i = 1 U log 2 x i B x i D x i B .
In incomplete information system, X i = [ x i 1 ] B = [ x i 2 ] B = = [ x i k i ] B , X i Y j = [ x i j 1 ] B D = [ x i j 2 ] B D = = [ x i j z i j ] B D , and X i = [ x i 1 ] B = [ x i 2 ] B = = [ x i k i ] B , X i Y j = [ x i j 1 ] B D = [ x i j 2 ] B D = = [ x i j z i j ] B D .
X i Y j U log 2 X i Y j X i = 1 U ( log 2 [ x i j 1 ] B D [ x i 1 ] B + + log 2 [ x i j z i j ] B D [ x i k i ] B )
Then, we have that
j = 1 n X i Y j U log 2 X i Y j X i = j = 1 n 1 U log 2 [ x i j 1 ] B D [ x i 1 ] B + + j = 1 n 1 U log 2 [ x i j z i j ] B D [ x i k i ] B = 1 U log 2 [ x i 1 1 ] B D [ x i 1 ] B + + 1 U log 2 [ x i 1 z i 1 ] B D [ x i 1 ] B + + 1 U log 2 [ x i n 1 ] B D [ x i k i ] B + + 1 U log 2 [ x i n z i 1 ] B D [ x i k i ] B
According to z i 1 + z i 2 + + z i n = k i and i = 1 m j = 1 n z i j = i = 1 m k i = U , then X i Y 1 = { x i 1 1 , x i 1 2 , , x i 1 z i 1 } , X i Y 2 = { x i 2 1 , x i 2 2 , , x i 2 z i 2 } , , X i Y n = { x i n 1 , x i n 2 , , x i n z i n } .
Hence,
i = 1 m j = 1 n X i Y j U log 2 X i Y j X i = 1 U i = 1 m log 2 [ x i 1 1 ] B D [ x i 1 1 ] B + + log 2 [ x i 1 z i 1 ] B D [ x i 1 z i 1 ] B + + 1 U i = 1 m log 2 [ x i n 1 ] B D [ x i n 1 ] B + + log 2 [ x i n z i 1 ] B D [ x i n z i 1 ] B = 1 U log 2 [ x i 1 ] B D [ x i 1 ] B + log 2 [ x i k 1 ] B D [ x i k 1 ] B + + log 2 [ x i k n ] B D [ x i k n ] B = 1 U i = 1 U log 2 [ x i ] B D [ x i ] B
Therefore,
C H ( D B ) = i = 1 m j = 1 n X i Y j U log 2 X i Y j X i = 1 U i = 1 U log 2 [ x i ] B D [ x i ] B
This completes the proof (i).
Theorem (ii) and (iii) are easy to know by the relationship of H ( D ; B ) = H ( D ) H ( D B ) and H ( D B ) = H ( D ) H ( D B ) ; hence, they are omitted here. □

References

  1. Ahmed, I.M.; Kashmoola, M.Y. CCF Based System Framework In Federated Learning Against Data Poisoning Attacks. J. Appl. Sci. Eng. 2022, 26, 973–981. [Google Scholar]
  2. Lin, H.; Liu, G.N.; Wu, J.J.; Zuo, Y.; Wan, X.; Li, H. Fraud detection in dynamic interaction network. IEEE Trans. Knowl. Data Eng. 2019, 32, 1936–1950. [Google Scholar] [CrossRef]
  3. Shehnepoor, S.; Salehi, M.; Farahbakhsh, R. NetSpam: A network-based spam detection framework for reviews in online social media. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1585–1595. [Google Scholar] [CrossRef]
  4. Dal-Pozzolo, A.; Caelen, O.; Le-Borgne, Y.A. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 2014, 41, 4915–4928. [Google Scholar] [CrossRef]
  5. Repousis, S.; Lois, P.; Veli, V. An investigation of the fraud risk and fraud scheme methods in Greek commercial banks. J. Money Laund. Control. 2019, 22, 53–61. [Google Scholar] [CrossRef]
  6. Tsang, S.; Koh, Y.S.; Dobbie, G.; Alam, S. SPAN: Finding collaborative frauds in online auctions. Knowl. Based Syst. 2014, 71, 389–408. [Google Scholar] [CrossRef]
  7. Pourhabibi, T.; Ong, K.; Kam, B.H. Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decis. Support Syst. 2020, 133, 113303. [Google Scholar] [CrossRef]
  8. Zhao, Q.; Chen, K.; Li, T. Detecting telecommunication fraud by understanding the contents of a call. Cybersecurity 2018, 1, 1–12. [Google Scholar] [CrossRef]
  9. Yang, J.; Yang, T.; Shi, C. Research on fault identification method based on multi-resolution permutation entropy and ABC-SVM. J. Appl. Sci. Eng. 2021, 25, 733–742. [Google Scholar]
  10. Jurgovsky, J.; Granitzer, M.; Ziegler, K. Sequence classification for credit-card fraud detection. Expert Syst. Appl. 2018, 100, 234–245. [Google Scholar] [CrossRef]
  11. Wang, X.W.; Yin, S.L.; Li, H. A Network Intrusion Detection Method Based on Deep Multi-scale Convolutional Neural Network. Int. J. Wireless Inf. Netw. 2020, 27, 503–517. [Google Scholar] [CrossRef]
  12. Fiore, U.; De-Santis, A.; Perla, F. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf. Sci. 2019, 479, 448–455. [Google Scholar] [CrossRef]
  13. Barnett, V.; Lewis, T. Outliers in Staristical Data; John Wiley and Sons: Hoboken, NJ, USA, 1994. [Google Scholar]
  14. Wang, Y.; Li, Y. Outlier detection based on weighted neighbourhood information network for mixed-valued datasets. Inf. Sci. 2021, 564, 396–415. [Google Scholar] [CrossRef]
  15. Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying Density-Based Local Outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; pp. 93–104. [Google Scholar]
  16. Ali, D.; Omer, K. Efficient density and cluster based incremental outlier detection in data streams. Inf. Sci. 2022, 607, 901–920. [Google Scholar]
  17. Li, Z.; Qu, L.; Zhang, G. Attribute selection for heterogeneous data based on information entropy. Int. J. Gen. Syst. 2021, 50, 548–566. [Google Scholar] [CrossRef]
  18. Salehi, F.; Keyvanpour, M.R.; Sharifi, A. SMKFC-ER: Semi-supervised multiple kernel fuzzy clustering based on entropy and relative entropy. Inf. Sci. 2021, 547, 667–688. [Google Scholar] [CrossRef]
  19. Wang, J.; Wei, J.; Yang, Z. Feature selection by maximizing independent classification information. IEEE Trans. Knowl. Data Eng. 2017, 29, 828–841. [Google Scholar]
  20. Thuy, N.N.; Wongthanavasu, S. On reduction of attributes in inconsistent decision tables based on information entropies and stripped quotient sets. Expert Syst. Appl. 2019, 137, 308–323. [Google Scholar] [CrossRef]
  21. Patrick, G.C.; Cheng, G.; Jerzy, W.G.; Teresa, M. Characteristic sets and generalized maximal consistent blocks in mining incomplete data. Inf. Sci. 2018, 453, 66–79. [Google Scholar]
  22. Liu, G.N.; Guo, J.; Zuo, Y.; Wu, J.J.; Guo, R.Y. Fraud detection via behavioral sequence embedding. Knowl. Inf. Syst. 2020, 62, 2685–2708. [Google Scholar] [CrossRef]
  23. Hu, X.X.; Chen, H.C.; Liu, S.X.; Jiang, H.C.; Chu, G.H.; Li, R. BTG: A Bridge to Graph machine learning in telecommunications fraud detection. Fut Gen. Comp. Syst. 2022, 137, 274–287. [Google Scholar] [CrossRef]
  24. Emmanuel, O.; Rose, O.A.; Mohammed, U.A.; Ezekiel, R.A.; Bashir, T.; Salihu, S. Detecting Telecoms Fraud in a Cloud-Base Environment by Analyzing the Content of a Phone Conversation. Asian J. Res. Comp. Sci. 2022, 4, 115–131. [Google Scholar]
  25. Viktoras, C.; Andrej, B.; Rima, K.; Olegas, V. Outlier Analysis for Telecom Fraud Detection. Dig. Bus. Int. Syst. 2022, 1598, 219–231. [Google Scholar]
  26. Mollaoğlu, A.; Baltaoğlu, G.; Çakır, E.; Aktas, M.S. Fraud Detection on Streaming Customer Behavior Data with Unsupervised Learning Methods. In Proceedings of the 2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Kuala Lumpur, Malaysia, 12–13 June 2021. [Google Scholar]
  27. Zhong, Y.; Zhang, X.Y.; Shan, F. Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures. Expert Syst. Appl. 2018, 112, 243–257. [Google Scholar]
  28. Qian, Y.; Liang, J.; Wei-zhi, Z.W. Information granularity in fuzzy binary GrC model. IEEE Trans. Fuzzy Syst. 2010, 19, 253–264. [Google Scholar]
  29. Feng, Q.R.; Zhou, Y. Soft discernibility matrix and its applications in decision making. Appl. Soft Comp. 2014, 24, 749–756. [Google Scholar] [CrossRef]
  30. Shu, W.; Qian, W. An incremental approach to attribute reduction from dynamic incomplete decision systems in rough set theory. Data Knowl. Eng. 2015, 100, 116–132. [Google Scholar] [CrossRef]
  31. Sun, Y.; Mi, J.; Chen, J. A new fuzzy multi-attribute group decision-making method with generalized maximal consistent block and its application in emergency management. Knowl. Based Syst. 2021, 215, 106594. [Google Scholar] [CrossRef]
  32. Zhao, X.; Zhang, J.; Qin, X. LOMA: A local outlier mining algorithm based on attribute relevance analysis. Expert Syst. Appl. 2017, 84, 272–280. [Google Scholar] [CrossRef]
  33. Liang, B.H.; Liu, Y.; Shen, C.Y. Attribute Reduction Algorithm Based on Indistinguishable Degree. In Proceedings of the 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 20–22 April 2018. [Google Scholar]
  34. Luo, C.; Li, T.; Huang, Y. Updating three-way decisions in incomplete multi-scale information systems. Inf. Sci. 2019, 476, 274–289. [Google Scholar]
  35. Du, W.S.; Hu, B.Q. Attribute reduction in ordered decision tables via evidence theory. Inf. Sci. 2016, 364, 91–110. [Google Scholar] [CrossRef]
  36. Lang, G.; Cai, M.; Fujita, H. Related families-based attribute reduction of dynamic covering decision information systems. Knowl. Based Syst. 2018, 162, 161–173. [Google Scholar] [CrossRef]
  37. Liang, J.; Shi, Z.; Li, D. Information entropy, rough entropy and knowledge granulation in incomplete information systems. Int. J. Gen. Syst. 2006, 35, 641–654. [Google Scholar] [CrossRef]
  38. Liu, X. Research on Uncertainty Measurement and Attribute Reduction in Generalized Fuzzy Information Systems. Ph.D. Thesis, Hunan Normal University, Changsha, China, 2022. [Google Scholar]
  39. Dai, J.; Liu, Q. Semi-supervised attribute reduction for interval data based on misclassification cost. Int. J. Mach. Learn. Cybern. 2022, 13, 1739–1750. [Google Scholar] [CrossRef]
  40. Sun, L.; Xu, J.C.; Yun, T. Feature selection using rough entropy-based uncertainty measures in incomplete decision systems. Knowl. Based Syst. 2012, 36, 206–216. [Google Scholar] [CrossRef]
  41. Gao, C.; Zhou, J.; Miao, D.Q.; Yue, X.D.; Wan, J. Granular-conditional-entropy-based attribute reduction for partially labeled data with proxy labels. Inf. Sci. 2021, 580, 111–128. [Google Scholar] [CrossRef]
  42. Wang, Y.B.; Chen, X.J.; Dong, K. Attribute reduction via local conditional entropy. Int. J. Mach. Learn. Cyb. 2019, 10, 3619–3634. [Google Scholar] [CrossRef]
  43. Qu, L.D.; He, J.L.; Zhang, G.Q.; Xie, N.X. Entropy measure for a fuzzy relation and its application in attribute reduction for heterogeneous data. Appl. Soft Comp. 2022, 118, 108455. [Google Scholar] [CrossRef]
  44. Yang, H.; Moody, J. Data visualization and feature selection: New algorithms for nongaussian data. Adv. Neural Inf. Process. Syst. 1999, 12, 687–693. [Google Scholar]
  45. Jakulin, A. Machine Learning Based on Attribute Interactions. Ph.D. Thesis, University of Ljubljana, Ljubljana, Slovenia, 12 May 2006. [Google Scholar]
Figure 1. The framework of the proposed methodology.
Figure 1. The framework of the proposed methodology.
Entropy 25 00112 g001
Figure 2. The relationship of relevance and independence in the complete information system: (a) a i is a variable, a j is fixed; (b) a j is a variable, B is fixed; (c) candidate attribute selection.
Figure 2. The relationship of relevance and independence in the complete information system: (a) a i is a variable, a j is fixed; (b) a j is a variable, B is fixed; (c) candidate attribute selection.
Entropy 25 00112 g002
Figure 3. Incomplete data among telecom fraud users.
Figure 3. Incomplete data among telecom fraud users.
Entropy 25 00112 g003
Figure 4. Performance comparison of randomly deleting missing data of the Union dataset under the RGAD algorithm.
Figure 4. Performance comparison of randomly deleting missing data of the Union dataset under the RGAD algorithm.
Entropy 25 00112 g004
Figure 5. Comparison of the computation time before and after the MCIR-RGAD algorithm.
Figure 5. Comparison of the computation time before and after the MCIR-RGAD algorithm.
Entropy 25 00112 g005
Figure 6. Comparison of the classification accuracy before and after the MCIR-RGAD algorithm.
Figure 6. Comparison of the classification accuracy before and after the MCIR-RGAD algorithm.
Entropy 25 00112 g006
Figure 7. Performance comparison of different attribute reduction algorithms under the MCIR-RGAD algorithm.
Figure 7. Performance comparison of different attribute reduction algorithms under the MCIR-RGAD algorithm.
Entropy 25 00112 g007
Figure 8. Attribute correlations in the Union dataset.
Figure 8. Attribute correlations in the Union dataset.
Entropy 25 00112 g008
Figure 9. Classification of selected attributes under the MCIR-RGAD algorithm.
Figure 9. Classification of selected attributes under the MCIR-RGAD algorithm.
Entropy 25 00112 g009
Figure 10. Performance comparison of six classification detection algorithms.
Figure 10. Performance comparison of six classification detection algorithms.
Entropy 25 00112 g010
Figure 11. Comparison of the robustness in different datasets.
Figure 11. Comparison of the robustness in different datasets.
Entropy 25 00112 g011
Figure 12. Average ranks diagram comparing the benchmark methods in terms of accuracy.
Figure 12. Average ranks diagram comparing the benchmark methods in terms of accuracy.
Entropy 25 00112 g012
Table 1. An incomplete information system ( U , A ) about the telecom communication heterogeneous data.
Table 1. An incomplete information system ( U , A ) about the telecom communication heterogeneous data.
IDDurationAmountPlacePlatformFraud
x 1 [ 60 , + ] 1ForeignWeChatYes
x 2 [ 0 , 60 ] *ForeignTelecomNo
x 3 *3*TelecomNo
x 4 *3DomesticTelecomNo
x 5 [ 60 , + ] 1Foreign*Yes
Note: * means the incomplete information.
Table 2. Description of datasets.
Table 2. Description of datasets.
DatasetsSampleAttributeSource
Car17286 + 1UCI
Adult48,84214 + 1UCI
Bank452116 + 1UCI
Mushroom812422 + 1UCI
USER610611 + 1Telecom
SMS6,848,50911 + 1Telecom
APP3,283,60220 + 1Telecom
VOC5,015,43042 + 1Telecom
Union610684 + 1Telecom
Table 3. Incomplete information processing of authentic telecom fraud data.
Table 3. Incomplete information processing of authentic telecom fraud data.
MethodAccuracyRecallF1Current ObjectCorrect Prediction
Drop86.82%10.38%16.42%42483688
Fill 078.72%53.65%62.10%61064806
Fill 178.72%53.65%62.10%61064806
Fill Mean75.86%48.61%56.68%61064630
MCB84.12%62.47%71.88%61065136
Table 4. Comparison of accuracy, computational time, and robustness of attribute reduction in the MCIR-RGAD algorithm.
Table 4. Comparison of accuracy, computational time, and robustness of attribute reduction in the MCIR-RGAD algorithm.
Attribute NumberAccuracy(%)Time(s)Robustness(%)
DatasetsBeforeAfterBeforeAfterBeforeAfterMCIR-RGAD
Car6597.1192.490.03730.025469.91
Adult14576.0781.290.27650.051796.64
Bank16584.2089.060.46180.107494.16
Mushroom227100.00100.000.45230.181683.94
User11778.8178.810.40900.222378.26
SMS11785.7586.240.55480.219684.51
APP20777.6679.540.94320.301588.67
Voc421284.9888.054.82570.467797.99
Union841085.8189.96830.22830.3520102.88
Table 5. Comparison of classification accuracies of attribute reduction from set view and information view.
Table 5. Comparison of classification accuracies of attribute reduction from set view and information view.
Set Theory ViewInformation Theory ViewIndependence
Datasets J CG J MG J JG J CH J MH J JH ① + ③ J MCIR RGAD
Car92.49%92.49%70.23%92.49%92.49%70.23%92.49%92.49%92.49%
Adult79.75%79.75%80.98%81.29%81.29%80.98%81.29%78.68%81.29%
Bank87.29%87.29%87.18%88.84%88.84%87.18%88.84%88.73%89.06%
Mushroom99.63%99.63%95.75%99.63%99.63%95.75%99.63%99.26%100.00%
User78.81%78.81%78.81%78.81%78.81%78.40%78.81%78.81%78.81%
SMS85.50%85.50%80.92%86.24%86.24%80.92%86.24%85.59%86.24%
APP79.87%79.87%79.87%79.54%79.54%79.87%79.54%79.38%79.54%
VoC85.15%85.15%83.32%87.97%87.97%83.32%87.97%82.41%88.05%
Union88.22%88.22%75.44%88.71%88.71%75.68%88.71%85.98%89.96%
Table 6. Comparison of classification computation time of attribute reduction from set view and information view.
Table 6. Comparison of classification computation time of attribute reduction from set view and information view.
Set Theory ViewInformation Theory ViewIndependence
Datasets J CG J MG J JG J CH J MH J JH ① + ③ J MCIR RGAD
Car0.02800.02800.03290.02700.02700.03200.02700.02640.0254
Adult0.06890.06890.06890.05690.05690.06160.05690.04710.0517
Bank0.17370.17370.14760.17150.17150.17760.17150.17850.1074
Mushroom0.19700.19700.28170.19300.19300.22130.19300.19410.1816
User0.25590.25590.20510.23000.23000.21120.23000.24720.2223
SMS0.20420.20420.21130.24050.24050.20020.24050.23780.2196
APP0.29090.29090.29430.25310.25310.27120.25310.31340.3015
VoC0.46920.46920.43210.50540.50540.49970.50540.45700.4677
Union0.29370.29370.29750.34140.34140.30160.34140.28440.3520
Table 7. Ranking on 9 datasets for 9 algorithms.
Table 7. Ranking on 9 datasets for 9 algorithms.
Ranking Value r i (Accuracy(%))
Datasets J CG J MG J JG J CH J MH J JH ① + ③ J MCIR RGAD
Car4 (92.49)4 (92.49)8.5 (70.23)4 (92.49)4(92.49)8.5 (70.23)4 (92.49)4 (92.49)4 (92.49)
Adult7.5 (79.75)7.5 (79.75)5.5 (80.98)2.5 (81.29)2.5 (81.29)5.5 (80.98 )2.5 (81.29)9 (78.68)2.5 (81.29)
Bank6.5 (87.29)6.5 (87.29)8.5 (87.18)3 (88.84)3 (88.84)8.5 (87.18)3 (88.84)5 (88.73)1 (89.06)
Mushroom4 (99.63)4 (99.63)8.5 (95.75)4 (99.63)4 (99.63)8.5 (95.75)4 (99.63)7 (99.26)1 (100.00)
User4.5 (78.81)4.5 (78.81)4.5 (78.81)4.5 (78.81)4.5 (78.81)9 (78.40)4.5 (78.81)4.5 (78.81)4.5 (78.81)
SMS6.5 (85.50)6.5 (85.50)8.5 (80.92)2.5 (86.24)2.5 (86.24)8.5 (80.92)2.5 (86.24)5 (85.59)2.5 (86.24)
APP2.5 (79.87)2.5 (79.87)2.5 (79.87)6.5 (79.54)6.5 (79.54)2.5 (79.87)6.5 (79.54)9 (79.38)6.5 (79.54)
VoC5.5 (85.15)5.5 (85.15)7.5 (83.32)3 (87.97)3 (87.97)7.5 (83.32)3 (87.97)9 (82.41)1 (88.05)
Union5.5 (88.22)5.5 (88.22)9 (75.44)3 (88.71)3 (88.71)8 (75.68)3 (88.71)7 (85.98)1 (89.96)
R a v e j 5.1675.16773.6673.6677.3893.6676.6112.667
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, R.; Chen, H.; Liu, S.; Wang, K.; Wang, B.; Hu, X. TFD-IIS-CRMCB: Telecom Fraud Detection for Incomplete Information Systems Based on Correlated Relation and Maximal Consistent Block. Entropy 2023, 25, 112. https://doi.org/10.3390/e25010112

AMA Style

Li R, Chen H, Liu S, Wang K, Wang B, Hu X. TFD-IIS-CRMCB: Telecom Fraud Detection for Incomplete Information Systems Based on Correlated Relation and Maximal Consistent Block. Entropy. 2023; 25(1):112. https://doi.org/10.3390/e25010112

Chicago/Turabian Style

Li, Ran, Hongchang Chen, Shuxin Liu, Kai Wang, Biao Wang, and Xinxin Hu. 2023. "TFD-IIS-CRMCB: Telecom Fraud Detection for Incomplete Information Systems Based on Correlated Relation and Maximal Consistent Block" Entropy 25, no. 1: 112. https://doi.org/10.3390/e25010112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop