Credit Card-Not-Present Fraud Detection and Prevention Using Big Data Analytics Algorithms

Razaque, Abdul; Frej, Mohamed Ben Haj; Bektemyssova, Gulnara; Amsaad, Fathi; Almiani, Muder; Alotaibi, Aziz; Jhanjhi, N. Z.; Amanzholova, Saule; Alshammari, Majid

doi:10.3390/app13010057

Open AccessArticle

Credit Card-Not-Present Fraud Detection and Prevention Using Big Data Analytics Algorithms

by

Abdul Razaque

^1,*

,

Mohamed Ben Haj Frej

^2,*

,

Gulnara Bektemyssova

^3,*

,

Fathi Amsaad

^4,*,

Muder Almiani

⁵

,

Aziz Alotaibi

⁶

,

N. Z. Jhanjhi

⁷

,

Saule Amanzholova

¹ and

Majid Alshammari

⁶

¹

Department of Cyber Security, International Information Technology University, Almaty 050000, Kazakhstan

²

Department of Computer Science and Engineering, University of Bridgeport, Bridgeport, CT 06604, USA

³

Department of Computer Engineering, International Information Technology University, Almaty 050000, Kazakhstan

⁴

Department of Computer Science, Joshi Research Center, University of Wright, Dayton, OH 45435, USA

⁵

Department of Management Information System, Gulf University for Science and Technology, Kuwait City 32093, Kuwait

⁶

Computers and Information Technology College, Taif University, Taif 21974, Saudi Arabia

⁷

School of Computer Science, Taylor’s University, Subang Jaya 47500, Malaysia

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(1), 57; https://doi.org/10.3390/app13010057

Submission received: 1 October 2022 / Revised: 28 November 2022 / Accepted: 8 December 2022 / Published: 21 December 2022

Download

Browse Figures

Versions Notes

Abstract

:

Currently, fraud detection is employed in numerous domains, including banking, finance, insurance, government organizations, law enforcement, and so on. The amount of fraud attempts has recently grown significantly, making fraud detection critical when it comes to protecting your personal information or sensitive data. There are several forms of fraud issues, such as stolen credit cards, forged checks, deceptive accounting practices, card-not-present fraud (CNP), and so on. This article introduces the credit card-not-present fraud detection and prevention (CCFDP) method for dealing with CNP fraud utilizing big data analytics. In order to deal with suspicious behavior, the proposed CCFDP includes two steps: the fraud detection Process (FDP) and the fraud prevention process (FPP). The FDP examines the system to detect harmful behavior, after which the FPP assists in preventing malicious activity. Five cutting-edge methods are used in the FDP step: random undersampling (RU), t-distributed stochastic neighbor embedding (t-SNE), principal component analysis (PCA), singular value decomposition (SVD), and logistic regression learning (LRL). For conducting experiments, the FDP needs to balance the dataset. In order to overcome this issue, Random Undersampling is used. Furthermore, in order to better data presentation, FDP must lower the dimensionality characteristics. This procedure employs the t-SNE, PCA, and SVD algorithms, resulting in a speedier data training process and improved accuracy. The logistic regression learning (LRL) model is used by the FPP to evaluate the success and failure probability of CNP fraud. Python is used to implement the suggested CCFDP mechanism. We validate the efficacy of the hypothesized CCFDP mechanism based on the testing results.

Keywords:

fraud detection; fraud prevention; Big data analysis; t-SNE; PCA; SVD; LRL; RU; CNP

1. Introduction

Nowadays, e-commerce is a regular and necessary element of everyday life. It enables instant payment for services and commodities. Nonetheless, for the vast majority of people, the process of transmitting money by air is a “black box.” This condition invites scammers who seek to benefit unlawfully [1,2]. Since fraudulent strategies evolve at a quick pace, it is vital to create an adaptive detection system to retain its effectiveness [3,4].

Ref. [5] proposed a paradigm for integrating supervised and unsupervised algorithms for detecting credit card fraud. The model enables the discovery of new fraudulent activities and provides a comprehensive picture of the relationships between various variables. The model’s goal was to increase deduction accuracy. This paradigm, however, falls short in terms of local and global approaches. The integration of supervised and unsupervised data does not give the fine-grained resolution required to reap the benefits of unsupervised data. Ref. [6] compares the performance of four data mining-based fraud detection algorithms (support vector machine, K-nearest neighbors, decision trees, and naive bayes). The models made use of a real-world anonymized data collection of transactions. The performance evaluation was based on four criteria: The true positive rate (TPR), the false positive rate (FPR), the balanced classification rate (BCR), and the Matthews Correlation Coefficient (MCC). The authors found that no data mining approach is universally superior to others, and that progress can only be accomplished by combining several techniques.

Ref. [7] demonstrates a novel (APATE) approach for identifying fraudulent credit card transactions in internet stores. It has intrinsic features derived from spending history and incoming transactions using RFM (Recency-Frequency-Monetary) fundamentals, as well as network-based features derived from merchants’ and credit card holders’ networks, and then deriving a time-dependent suspiciousness score for each network object. According to the findings, intrinsic and network-based features are two sides of the same coin. Combining these two types of features results in models with AUC scores better than 0.98. Ref. [8] demonstrates a CNN-based fraud detection algorithm for identifying the intrinsic patterns of fraud activity gleaned from labeled data. Common approaches to identify fraud features include neglecting different scenarios and having an unbalanced imbalance of positive and negative samples when using rule-based expert systems. Real-world large commercial bank transactions were employed. As a result, when compared to other recent techniques, the performance was outstanding [9]. Facial expression recognition is proposed to track the applicant’s expressions throughout the interview [10]. Virtual interviews could significantly reduce the amount of effort required of human resource employees.

Ref. [11] analyses the detection accuracy and detection time of a neural network trained on examples of fraud due to lost cards, stolen cards, application fraud, counterfeit fraud, mail-order fraud, and NRI (non-received issue) fraud. When data normalization in artificial neural networks is employed, it has been demonstrated how the initial inquiry assists to minimize neural inputs by grouping attributes [12]. It describes a strategy for decreasing false positives in bank anti-fraud systems that uses rule induction in distributed tree-based models to explicitly differentiate abnormalities rather than profiling usual areas [13]. It makes extensive use of sub-sampling, resulting in an algorithm with linear time complexity, a low constant, and low memory demand. Ref. [14] shows how artificial immune systems (AIS) may be used to detect credit card fraud. It was pitted against neural networks (NN), Bayesian networks (BN), Naive Bayes (NB), and decision trees (DT). It has been determined that AIS performs best with GA-optimized settings. Moreover, several algorithms have been introduced to fight against credit card fraud detection, but this challenge remains. Figure 1 depicts the credit card fraud transaction process.

The following questions are addressed in this work: (1) Is a new incoming transaction consistent with normal customer behavior, i.e., does it correspond to that customer’s regular spending patterns in terms of (a) frequency or the average number of transactions over a given time, (b) recency or the average time between the current and previous transaction, and (c) monetary value or the amount spent on that transaction?

1.1. Research Contribution

The main contributions are summarized as follows:

▪: The state-of-the-art techniques (RU, t-SNE, PCA, and SVD) are combined to address the persistent problem of card-not-present fraud. These techniques perform a quicker data training process and increase accuracy, which helps them detect fraud successfully.
▪: The exploratory data analysis and predictive modeling are carried out to reduce dimensionality by projecting each data point onto only the first few major components to obtain lower-dimensional data while retaining as much variation in the data as feasible.
▪: t-SNE reduces dimensionality by keeping similar and dissimilar instances separately to further increase the accuracy.
▪: LRL is used to evaluate the success and failure probability of CNP fraud. The interaction of predictor factors is simulated to predict the link between distinct lawful and illegitimate transactions.

1.2. Paper Organization

Section 2 presents the salient features of existing approaches. Section 3 presents the proposed plan. Section 4 shows the implementation and experimental results. Section 5 discusses the significance of the result and limitations including suggestions for improvement. Section 6 concludes the entire paper.

1.3. Problem Identification

The fundamental issue that arises when attempting to create a fraud detection and prevention algorithm is that the number of fraudulent transactions is insignificant in comparison to the number of valid transactions. According to various statistics, credit card fraud accounts for around 0.1% of all card transactions [15]. This means that all detecting machine learning algorithms will believe that 99.9% of transactions are generally valid. This will have a significant impact on the accuracy of any system against credit card fraud, particularly for supervised algorithms, since future outcomes will be difficult to anticipate. In addition, if the system’s accuracy is low, defining fraud will be tough. Furthermore, if the system is unable to identify fraudulent behavior, consumers and institutions may suffer significant financial losses. It would be ideal if there was an efficient algorithm for fraud detection and prevention based on the unsupervised machine learning technique, which would aid in the prevention of fraudulent activities.

2. Related Work

In order to address the issue of credit card fraud, ref. [16] introduced the unsupervised credit card detection (UCCD) method, which combines two well-known algorithms: Principal Component Analysis and SIMPLEKMEANS. The transaction and the client’s geographic locations are added to an existing dataset to improve model accuracy. By foreseeing outcomes and classifying probable frauds, the suggested model achieves good results on the built database test.

It scores transactions quickly and accurately, and it can detect new fraudulent activities. Principal Component Analysis gives a more thorough image of family members among exclusive traits while also being more adaptable. However, the risk remains of achieving a ‘local’ best rather than a recognized one. This risk might be reduced by repeating the “k means” technique several times with unique beginning clusters at the price of increasing execution time.

Ref. [17] analyzed many approaches for detecting credit card fraud in this paper: BLAST-SSAHA hybridization, hidden markov model, fuzzy darwinian detection, neural networks, SVM, K-nearest Neighbor, and Bayes, naive. Following that, these algorithms were applied to datasets and compared based on essential criteria. The findings of comparison using credit card transactions demonstrate that these strategies are more effective in combating financial fraud than other techniques in the same industry. INave Bayes characterization is completed by using the Bayes standard to calculate the likelihood of the proper class indicating excellent execution.

Ref. [18] suggested a paradigm for fraud detection based on convolutional neural networks (CNN). It learns from labeled data and acquires innate fraud behavior features. Furthermore, trade entropy is proposed to improve transaction categorization accuracy. In addition, ref. [19] coupled the trade entropy with feature matrices and applied it to a convolutional neural system. The suggested CNN-based structure of mining inactive distortion designs in Mastercard exchanges converted trade data into a component network for each record, allowing natural relationships and collaborations in temporal arrangement to be discovered for the CNN.

The incredibly imbalanced example sets are reduced by consolidating the cost-based inspecting technique in trademark space, yielding a dominant performance of extortion identification, a story exchanging highlight called exchanging entropy is proposed to recognize progressively complex extortion designs. A characteristic matrix is used to represent many transaction records, on which a convolutional neural network is trained to recognize a set of latent patterns for each sample. Trial findings from a commercial bank’s actual trading data show that the suggested approach outperforms other conditions of craftsmanship strategies.

The issue of a false sense of security is addressed in [20]. When people see HTTPS on the command line, they instinctively believe. Though this was true in the past, it is no longer valid in a world where hackers may use the same technologies to their advantage. The authors decided to use a deep-learning system in the hopes of improving accuracy. The long short-term memory (LSTM) technique enables a deep learning approach based on web certificates. This algorithm may identify new patterns on its own, without the assistance of project developers, implying that it can probe beyond what humans can deduct from certificate text. The algorithm detects rogue certificates with a high degree of accuracy, eliminating the need to rely on sluggish browser detection mechanisms. This method decreased the amount of time required to detect new assaults from dangerous websites.

The authors of [21] reported the first analysis of a case study involving credit card fraud detection, in which Cluster Analysis was used to normalize data. The findings of using Artificial Neural Networks and Cluster Analysis on fraud detection have demonstrated that clustering qualities can minimize neural inputs. The paper provided a case study in which Multilayer Perceptron Artificial Neural Networks and Cluster Analysis were used to detect credit card theft. Cluster Analysis was used to successfully perform qualitative data standardization.

Ref. [22] suggested a fundamentally new model-based strategy that separates anomalies rather than profiling typical spots. Due to the isolation, iForest employs sub-sampling, which is not possible in previous approaches, resulting in an algorithm with a linear time complication and a low constant and memory use. It is also appropriate for high-dimensional issues with a significant number of irrelevant qualities, as well as when the training set contains no anomalies. There are two phases to detecting anomalies using iForest. The first phase is training, which involves creating isolation trees from subsamples of the training set. The second phase is testing, which involves running the test instances through isolation trees to acquire an anomaly score for each one.

In [23], a method of authenticating a credit card is presented that involves the stages of issuing a credit card to a holder, the card bearing an account number. A personal identification number (PIN) is assigned to the holder. A validation transaction must be performed before a credit card transaction may be authorized, in which the customer slips a credit card into an electronic terminal and then enters the PIN. The PIN code and account number are compared to information recorded in a database, and only if the PIN code matches the account number accurately is credit card usage approved.

This research [24] provided a complete examination of monetary fraud detection procedures utilizing such data processing methodologies, with a specific focus on computational intelligence-based solutions. Key elements such as detection rule employed, fraud kind explored, and the success rate is covered in the classification of procedures. Problems associated with current techniques, as well as prospective future directions of investigation, have also been identified. The current methods in monetary fraud detection using intelligent techniques, each applicable math, and process, were studied in this investigation [10]. Their effectiveness varied, but each approach was found to be somewhat capable of police work involving various types of monetary fraud. The ability of CI tactics such as neural networks and support vector machines to be informed and adapt to new things is helpful in fighting fraudsters’ developing methods. There are still various facets of intelligent fraud detection that have not yet been investigated. Various types of fraud, as well as some data processing tactics, are briefly examined but require further investigation to be fully understood. There is also the possibility to examine the performance of current strategies through customization or standardization, as well as the possibility to assess price profit analysis of method fraud detection. A new matrix profile (NMP) for the anomaly detection is used to tackle the all-pairs similarity search issue for time series data [25]. The proposed paradigm is based on two cutting-edge algorithms: Scalable time-series Ordered-search Matrix Profile (STOP) and scalable time series anytime matrix profile (STAMP). The proposed NMP may be applied to huge multivariate data sets and delivers high-quality approximation solutions in a reasonable amount of time. The findings show that the suggested NMP outperforms the other algorithms in terms of accuracy.

Comparative research on data mining strategies for credit card fraud detection is undertaken in this work [26]. The following algorithms were investigated: random forest and SVM were used in conjunction with Logistic Regression. A novel network-based model called CATCHM is introduced for credit card fraud detection [27]. The proposed model is based on representation learning. An inventive network design employs an effective inductive pooling operator and careful configuration of the downstream classifier. The long short-term memory-recurrent neural network (LSTM-RNN) is introduced for perceiving the credit card fraud [28]. This method reduces the occurrences of fraud. The framework is suggested to combine the potentials of cost-sensitive learning and meta-learning ensemble (CSLMLE) techniques for fraud detection [29].

Integration of multiple algorithms attempt to overcome the fraud detection of the card is proposed [30]. However, the proposed algorithm increases the complexity and produced a lower accuracy. The credit card fraud detection with integration of multiple (CCFDM) algorithms is carried out. Additionally, to distinguish between transactions that are fraudulent and those that are not, supervised machine learning and deep learning techniques are used. In order to overcome the problem of card-not-present fraud detection and prevention, the CCFDP is proposed which combines modern techniques (RU, t-SNE, PCA, and SVD). These techniques perform a quicker data training process and increase accuracy, which helps them detect fraud successfully. In order to obtain lower-dimensional data while retaining as much variation in the data as is practical, exploratory data analysis and predictive modeling are performed to reduce dimensionality by projecting each data point onto only the first few major components. In order to further improve accuracy, t-SNE is used to reduce dimensionality by keeping similar and dissimilar instances apart. LRL is also used to assess the success and failure probability of CNP fraud. In order to predict the relationship between various legitimate and illegitimate transactions, the interaction of predictor factors is simulated. The current contributions for addressing credit card fraud detection was summarized in Table 1.

3. Credit Card-Not-Present Fraud Detection and Prevention Method

In this section, we provide a description of our CCFDP mechanism. Our algorithm mainly focuses on solving the CNP committed fraud through online credit card transactions. The CCFDP provides automatic detection of the anomalies in the set of incoming transactions depicted in Figure 2. The detection involves two processes:

▪: Fraud Detection Process
▪: Fraud prevention Process

3.1. Fraud Detection Process

In order to detect the fraudulent activity of the credit card fraud, we will apply different types of rules. We will use the Logistic Regression algorithm to detect fraudulent activity. First of all, we apply the Random Undersampling (RU) method to balance our dataset. Next, we will train our model by using the dataset and log files of the user. Furthermore, when the model will be trained well enough, we start to apply it on new transactions. It compares the features of a new transaction with a history of user transactions, and if it finds anomalies it calls prevention process.

3.1.1. Dimensionality Feature Reduction

It describes methods for minimizing the number of variables in training data. It may be helpful to reduce dimensionality when working with high-dimensional data by projecting the data to a lower-dimensional subspace that captures the core of the data. The term high-dimensionality refers to input variables that have hundreds, thousands, or even millions of possible values. Fewer input dimensions may suggest fewer parameters or a simpler structure in the machine learning model, known as degrees of freedom. A model with many degrees of autonomy is prone to overfitting the training dataset and so performing badly on new data. Simple models that generalize well are preferable, as are input data with few input variables. This is especially true for linear models, which commonly relate the number of inputs and degrees of freedom.

It is difficult to reduce dimensionality because different components have different occurrence probabilities. A common issue is determining how to describe and reduce these variables. Thus, dimensionality reduction should be modeled. Let us assume there is training dataset

F (x)

, for each input variables

x

. There is a unique output for the reduced variables

F (x) = f_{n}, n \in [0, 1, \dots, V]

. The probability of each result

f_{n}

is

P (f_{n})

. Next, we can acquire the corresponding reduced data as follows:

I (F (x) = f_{n}) = - l o g_{2} (P (f_{n}))

(1)

The result

I (F (x))

is rounded up to specify how many variables are reduced to represent the information, and the data entropy of the corresponding reduced data is the anticipation of information entropy that can be determined by:

I (e) = - \sum_{n = 0}^{N} P (f_{n}) l o g_{2} (P (f_{n}))

(2)

The complexity of the features is composed of number of the variables in training data. The training data of a single variable are

F_{s} (\cdot), s \in S

. The self-information amount and information entropy of its individual variables can be defined as:

I_{s} (f_{n, s}) = - l o g_{2} (P (f_{n, s}))

(3)

H_{s} (F_{s}) = - \sum_{n = 0}^{N} P (f_{n, s}) l o g_{2} (P (f_{n, s}))

(4)

The information entropy of the variables can be reduced linearly if they are independent of each other, but this is not the case. Multiple variables’ training data are frequently related to one another. As a result, there is a need to use conditional self-information to reduce multiple variables at the same time, as given by:

I_{s + t} (f_{n, s}, f_{n, t}) = - l o g_{2} (P (f_{n, s} | f_{n, t}))

(5)

Random Undersampling

It entails arbitrarily selecting examples from the majority class to be removed from the training dataset. This process is repeated until the desired class distribution is obtained. For example, an equal number of examples for each class is maintained. The RU is appropriate for dimensionality reduction features in datasets because the minority class has an uneven and adequate number of examples.

Let us take a hypothesis

β_{h}

:

A \times B \to [0, 1]

that relates to every example

a_{i}

. Thus, the probability for obtaining the accurate label

l_{j}

or unfitting label

l_{j}

can be obtained as:

β_{h} (a_{i}, l_{j}) = 1

and

β_{h} (a_{i}, a : l \neq L_{j}) = 0

then

β_{h}

has accurately predicted that the label of

a_{i}

is

L_{j}

, not l. Similarly, if

β_{h} (a_{i}, L_{j}) = 0

and

β_{h} (a_{i}, a : l \neq L_{j}) = 1

,

β_{h}

has inaccurately predicated that the label of the

a_{i}

is L. The fraud detection process is explicitly defined in Algorithm 1.

Algorithm 1: Credit card fraud detection process for transactions

1.: Initialization: {D_s: Dataset, U: User account, A: Alert, $t_{a b}$ : Abnormal transaction, L: Data in log, D_b: Database, $D_{b a}$ : balance dataset, $R_{u s} :$ random undersampling, T_r: Transaction, M_d: Fraud detection model, $t_{s n a}$ : t-distributed Stochastic Neighbor Embedding, $P_{c a}$ : Principal component analysis, $S_{v d}$ : Singular value decomposition, ϓ: Feature duplication}
2.: Input: {D_s}
3.: Output: {A, $t_{a b}$ }
4.: Initiate $R_{us}$
5.: Do
6.: Process $t_{sna} o n D_{s}$
7.: Remove $Υ \to D_{s}$
8.: Apply $P_{c a} o n D_{s}$
9.: Operate $S_{v d} \to D_{s}$
10.: While ( $D_{s} = D_{ba}$ )
11.: Endwhile
12.: Decompose M_d
13.: Train M_d on $D_{ba}$
14.: U ୮ D_b
15.: If U ∉ D_b then return A
16.: Apply $M_{d} \to T_{r}$
17.: Endif
18.: If $t_{a b}$ ∈ M_d then return A
19.: Store T_r ⊂ D_s
20.: Endif
21.: Write L
22.: Make T_r

In Algorithm 1, the Step-1 shows the initialization of utilized variables. Steps-2-3 demonstrate the input and output procedures, respectively. Steps-4-11 initiate the random undersampling process to balance the dataset. Furthermore, it has been ensured that the high-dimensional dataset is reduced into a low-dimensional graph to retain the majority of the original data. Finally, feature duplication is also removed from the dataset. Step-12 starts both exploratory data analysis and predictive modelling. Step-13 uses the dataset and the logistic regression technique to train our model. Step-14 looks up the card owner in the database. Step-15 demonstrates that if the query result is not successful, then an alarm is given to Algorithm 2, otherwise the transaction is requested from the database. Steps-16-17 demonstrate the process of applying the proposed model on the given transaction to determine the nature of transaction (malicious or non-malicious). Step-18 determines if the model exhibits anomalous behavior, then it sends an alarm to Algorithm 2. Otherwise, Steps-19-20 record transactions in the dataset and end the process. Step-21 depicts the log writing procedure. Step-22 depicts the transactional process.

Undersampling removes irrelevant information. Since the precise probability cannot be determined. As a result, when using undersampling to drop information, it is best to first collect statistical data on the datasets used in the experiment. The difference between two datasets must be quantified. The degree of difference must be described. As a result, the Levenshtein distance can be used to quantify the degree of separation between two character strings (such as English characters). The Levenshtein distance is computed as follows:

L e v (x, y) = {\begin{array}{l} | x | i f | y | = 0 \\ | y | i f | x | = 0 \\ i f x | 0 | = y | 0 |, \\ l e v (t a i l (x), t a i l (y)) \\ 1 + m i n {\begin{matrix} l e v (t a i l (x), y) \\ l e v (x, t a i l (y)) \\ l e v (t a i l (x), t a i l (y)) \end{matrix} o t h e r w i s e, \end{array}

(6)

Another time-based difference characterization is disagreement decay. Disagreement decay is defined as the probability that changes the value of an attribute s within time t. The symbol d represents this probability (s,t). The mathematical statistics can be used to characterize the probability distribution function of this probability given by:

d^{\neq} (A, Δ t) = \frac{| f_{s \neq s, n, Δ t} |}{| f_{s, n, Δ t} |}

(7)

The absolute value denotes the number of samples needed to generate an agreement decay. The agreement decay is the probability that an entity retains the same value of an attribute s over time t.

3.1.2. t-Distributed Stochastic Neighbor Embedding

It reduces a high-dimensional dataset to a low-dimensional graph that retains the majority of the original data. It does this by placing each data point on a two- or three-dimensional map. This method finds data clusters, guaranteeing that an embedding keeps its meaning in the data. In this case, t-SNE reduces dimensionality by seeking to keep similar and dissimilar instances close together.

Here, t-SNE computes probabilities

P_{i | j}

of the similar high-dimensional datasets

D_{1}, . . ., D_{N}

, which are proportional to the similarity of datasets

D_{i}

and

D_{j}

given by:

For

i \neq j

, thus define.

P_{i | j} = \frac{\exp (- ‖ D_{i} - D_{j} ‖^{2} / 2 ω_{i}^{2})}{\sum_{k \neq i} \exp (- ‖ D_{i} - D_{k} ‖^{2} / 2 ω_{i}^{2})}

(8)

Therefore,

P_{i | j} = 0

and

\sum_{j} P_{i | j} = 1 for all i

.

The conditional probability

P_{j | i}

describes the similarity of datapoint

D_{j}

to datapoint

D_{i}

. Thus,

D_{i}

can select

D_{j}

as its neighbor. If the neighbors are selected with amount to their probability density under Gaussian centered at

D_{i}

. Thus, it can be defined as:

P_{i j} = \frac{P_{j | i} + P_{i | j}}{2 N}

(9)

Therefore,

P_{i j} = P_{i i}

.

Hence

P_{i i} = 0,

and

\sum_{i, j} P_{i, j} = 1

.

Where

N

is a high-dimensional dataset, and

ω

is the bandwidth of a Gaussian function.

3.1.3. Principal Component Analysis

It is the process of computing the primary components and then utilizing them to adjust the foundation of the data, frequently using only the top few primary components and dismissing the rest. PCA is used for both exploratory data analysis and predictive modelling. It is extensively used to reduce dimensionality by projecting each data point onto only the first few major components to obtain lower-dimensional data while retaining as much variation in the data as feasible. The direction that minimizes the variance of the anticipated data is defined as the first principal component. The major components are in charge of increasing the variation of the projected data. The standardization is especially important before PCA because the latter is very sensitive to the variances of the initial variables. That is, if the ranges of initial variables differ significantly, those with larger ranges will become dominate over those with small ranges (for example, a variable ranging from 0 to 100 will become dominate over a variable ranging from 0 to 1), resulting in biased results. As a result, converting the data to comparable scales can aid in avoiding this problem.

For each value of the variable, perform this mathematical operation by subtracting the mean and dividing by the standard deviation.

Let us assume

A_{s}

is the matrix that defines dataset. As a result, each column n be a feature and each row m be an example. Therefore, matrix

A

is the combination of matrix

m \times n

. Thus, the data foundation

Δ γ

for the dataset can be defined as:

Δ γ = (A_{s} \times S_{d}) + ρ - A

(10)

where

ρ

is a mean, and

S_{d}

is a standard deviation.

The matrix can be used to express the PCA is given by:

Q_{6} = [\begin{matrix} Q_{1} \\ \begin{matrix} . \\ . \end{matrix} \\ Q_{4} \end{matrix}] = [\begin{matrix} S_{1.1} & \dots & S_{1.4} \\ ⋮ & ⋱ & ⋮ \\ S_{4.1} & \dots & S_{4.4} \end{matrix}] [\begin{matrix} P_{1} \\ \begin{matrix} . \\ . \end{matrix} \\ P_{4} \end{matrix}]

(11)

where

S

is the transformation matrix,

P

is the vector of the original data, and

Q

is the vector of the main components. The eigenvectors that diagonalize the covariance matrix of the original bands are the coefficients of the transformation matrix

S

.

The accuracy of the principal component

A_{p c a}

depends on the number of coefficients available

c_{v}

in each data point

d_{p}

.

A_{p c a} = \frac{1}{c_{v}} \sum_{i = 1}^{T_{c_{v}}} {(d_{p} \times γ (g))}^{2}

(12)

where

γ (g)

is the properties of data point, and

T_{c_{v}}

is total number of the coefficients.

Therefore, Figure 3a,b demonstrate the accuracy of PCA with 2 and 3 components. The result demonstrates that the accuracy with 2 components is achieved 97.76% and 99.49% with 2 components.

3.1.4. Singular Value Decomposition

It is a method of matrix factorization that extends the eigenmode composition of a square matrix (n × n) to any matrix (n × m) (source). Thus, SVD can be obtained as:

Z = S \sum T^{*}

(13)

where,

Z

is the original matrix to be decomposed.

The left singular matrix

S

is (columns are left singular vectors). The SVD lowers the number of features in a dataset by changing the space dimension from N to K. It is organized as a matrix, with each row representing a user and each column representing an object. SVD handles the entire process of diagonalizing a matrix into specific matrices that are easier to manipulate and analyze. It laid the groundwork for untangling data into distinct components.

3.2. Fraud Prevention Process

Implementing a strategy to prevent fraudulent transactions boosts the customer’s confidence. In prevention process, the user sends a secret code to the user telephone, and if the code entered by the user is not the same as sent code, the transaction will be blocked, and the secret question will be sent. In addition, if the answer to the question is wrong, then the system will block the user. The Algorithm 2 describes the fraud prevention process.

Algorithm 2: Credit card fraud prevention process for transactions

1.: Initialization: {D_s: Dataset, U: User account, A: Alert, B: Block, M: Message, L: Data in log, D_b: Database, T_r: Transaction, C: Secret code, T: Telephone, V: Verification, Q: Secret question}
2.: Input: {A}
3.: Output: {D_s}
4.: Send C to T
5.: If V=C then
6.: Continue
7.: Endif
8.: Else block T_r
9.: Send Q to T
10.: If V=Q then
11.: Continue
12.: Endif
13.: Else block U
14.: Endelse
15.: Endelse
16.: Store T_r ⊂ D_s
17.: Write L

In Algorithm 2, Step 1 shows the initialization of variables. Steps-2- 3 define input and output processes. Step 4 sends the secret code to the telephone. In Steps 5-8, the code verification process is described to ensure that code is reached the legitimate user. If the user is legitimate, then it can respond with the correct code to enter into the system. If the user is not legitimate, then the transaction will be blocked. Step-9 initiates the second process of determining the identity of the user by sending the secret question. Steps 10-12 ensure if the secret question is answered correctly, then the user continues the transaction process. Step-13 explains that if the secret question is not answered properly, then the transaction is blocked permanently. Steps-16-17 show that all of the information of transactions is securely stored in the database to improve the fraud detection model. All procedure is written to log.

The FPP involves the logic regression learning (LLP) that tries to quantify the relationship between a categorical dependent variable and one or more independent variables by plotting the probability scores of the dependent variables.

Logistic Regression Learning Modeling for FPP

The FPP applies the LRL model to prevent the fraud transaction. The LRL simulates the interaction of predictor factors and a categorical response variable. We might, for example, utilize logistic regression to predict the link between distinct lawful and illegitimate transactions. Given a collection of variables, logistic regression estimates the likelihood of falling into a specific level of categorical response. It is necessary to calculate the log-odds that are given by:

ℓ = β_{0} + β_{1} x_{1} + β_{2} x_{2}

(14)

where

x_{1}

and

x_{2}

are predictors and coefficients

β_{i}

are the parameters of the model. To calculate corresponding odds, we use the following equation:

o = b^{β_{0} + β_{1} x_{1} + β_{2} x_{2}}

(15)

where b is the base of the logarithm and exponent. From this, fraud prevention can be obtained as:

P_{p} = \frac{β_{0} + β_{1} x_{1} + β_{2} x_{2}}{β_{0} + β_{1} x_{1} + β_{2} x_{2} + 1}

(16)

Thus, the logistic function can be defined as this, when there is a single explanatory variable x:

p (x) = \frac{1}{1 + e^{- (β_{0} + β_{1} x_{1})}} .

(17)

Based on the previous equation for probability of fraud prevention function, we can define the inverse of the logistic function, g, the logit:

g (p (x)) = l o g i t p (x) = l n (\frac{p (x)}{1 - p (x)}) = β_{0} + β_{1} x_{1,}

(18)

which then is exponentiated and transformed to the following form:

\frac{p (x)}{1 - p (x)} = e^{β_{0} + β_{1} x_{1}}

(19)

The odds of the dependent variable equaling a case, which serves as a link function between the probability and the linear regression expression is defined by:

o d d s = e^{β_{0} + β_{1} x}

(20)

When there is a continuous independent variable, the odds ratio can be calculated as:

O R = \frac{e^{β_{0}} + (x + 1)}{e^{β_{0} + β_{1} x}} = e^{β_{1}}

(21)

In addition, for multiple explanatory variables, the perdition can be defined as:

l o g \frac{p}{1 - p} = β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{m} x_{m}

(22)

where m is explanators of multiple regression, from which we acquire as:

p = \frac{1}{1 + b^{- (β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{m} x_{m})}}

(23)

After considering the Logistic Regression function itself, it is important to know how the model for machine learning algorithm is constructed. Here is the generalized linear model function with parameter θ:

h_{θ} (X) = \frac{1}{1 + e^{- θ^{T} X}} = P r (Y = 1 | X; θ)

(24)

where X is independent variable, Y is random variable, which may be 1 or 0, it is exponential and from which we can acquire the conditional probability of input and output variables given the parameter θ:

P r (y | X; θ) = h_{θ} {(X)}^{y} {(1 - h_{θ} (X))}^{(1 - y)}

(25)

We acquire the likelihood function assuming that all the observations in the sample are independently Bernoulli distributed:

(θ | x) = P r (Y | X; θ) = \prod_{i} P r (y_{i} | x_{i}; θ) = \prod_{i} h θ {(x_{i})}^{y_{i}} {(1 - h_{θ} (x_{i}))}^{(1 - y_{i})}

(26)

The log likelihood is typically maximized with a normalizing factor N⁻¹,

N^{- 1} l o g L (θ | x) = N^{- 1} \sum_{i = 1}^{N} l o g P r (y_{i} | x_{i}; θ)

(27)

which is maximized using gradient descent, one of the optimization techniques.

Assuming the (x, y) pairs are drawn uniformly from the underlying distribution, then in the limit of large N,

N^{- 1} \sum_{I = 1}^{N} l o g P r (y_{i} | x_{i}; θ) = - D_{K L} (Y | | Y_{θ}) - H (Y, X)

(28)

H (Y, X)

denotes the conditional probability, and DkL denotes the Kullback-Leibler divergence. This leads to the conclusion that by maximizing a model’s log-likelihood, you are reducing its KL divergence from the maximal entropy distribution. Searching intuitively for the model with the fewest assumptions in its parameters. The transaction request is initiated when the user purchases things or pays for services. The Logistic Regression approach is then used to develop our prediction model. In order to train the model, we use the creditcard.csv dataset and user information from the database. When our new transactions are included into the algorithm, the model predicts if the requested transaction is valid or exhibits anomalous behavior. If the model predicts a genuine transaction, the transaction will be completed and the money will be retained. However, if the transaction is irregular, the system will send a verification code to the user’s cell phone. Furthermore, if the user inputs the correct code, the transaction will be completed and saved in the dataset. However, if the user inputs the incorrect code, the system stalls and aborts the transaction, as well as publishing to the dataset.

Theorem 1.

Fraudulent transaction initiating rate

σ

, as well as fraud attempt capability should not be more than

c - k

, the probability

P_{r d}

of fraud detection using our proposed CCFDP method can be specified as:

P_{r d} = \frac{(c (1 - σ))! (c - k)!}{c! (c (1 - σ) - k)!}

(29)

Proof.

Suppose the fraud attempting capability

c σ

is smaller than

c - k

, several features of the proposed OCCFP

c (1 - σ)

is larger than

k

. A suitable detection process is obtained using

k

acquired from the number of

c (1 - σ)

legitimate transactions. Thus, the detection process is specified through

D_{c (1 - σ)}^{k}

. Thus, the cumulative number of possible detection processes is

D_{c}

, it ensures that the likelihood of accurate fraud detection can be measured as:

P_{r d} = \frac{D_{c (1 - σ)}^{k}}{D_{c}^{k}} = \frac{(c (1 - σ)! (c - k))!}{c! (c (1 - σ) - k)!}

(30)

□

4. Implementation and Experimental Results

This section covers the implementation and results.

4.1. Implementation

The materials utilized during implementation are shown in Table 2. The platform was developed using Ubuntu 18.04 with the Visual Studio IDE Python 3.4.6. We rent clusters with 0.1-s uptime, 16 GB RAM, 256 GB ROM, and Intel Core i7 processors running at 2.4 GHz. MySQL 8.0.31 is the database management system of choice.

Figure 4 illustrates a typical case of the credit card fraud. It can be seen that the thief who steals the credit card tries to use the card credentials in order to make electronic purchases and pay it through Internet. Every purchase is known as a one transaction. The payments may be quite small, this is one of the features of the fraudulent behavior. As the thief does not know how much money he can utilize. Next, our proposed CCFDP model will analyze every transaction made by this card, compares with the previous owner behavior. In addition, when the model detects the fraudulent activity, the system sends to the owner’s phone a verification code. If the thief fails to verify the code, the transaction will be aborted and then the system tries to verify the cardholder by his secret question, and if at that time the answer will be wrong, the credit card will be blocked, until the card owner unlocks it. Furthermore, the system sends alert message to the administrator.

4.2. Dataset

In order to train our model we have found the dataset from the Kaggle “creditcard.csv”. This file consists of 31 features. The dataset includes credit card transactions from September 2013. The dataset is unbalanced, with frauds making up 0.172% of all transactions in the positive class. It only has numeric input variables that have undergone Prompt Corrective Action (PCA) transformation. Unfortunately, we are unable to provide the original features and additional context for the data due to confidentiality concerns. The principal components obtained with PCA are the features

F = {F_{1}, F_{2}, . . ., F_{31}}

. The time and amount are the only features that have not been changed by PCA. The seconds that passed between each transaction and the dataset’s first transaction are listed in the feature “Time.” The transaction amount is represented by the feature “Amount,” which can be used for example-dependent, cost-sensitive learning. The response variable, feature “Class,” accepts a value of 1 in cases of fraud and 0 in all other circumstances. We suggest measuring accuracy using the Area Under the Precision-Recall Curve given the class imbalance ratio. For classification that is not balanced, confusion matrix accuracy is meaningless. Based on this dataset, an interesting result have been obtained. Python Plotly library is used to draw the graphical results.

4.3. Experimental Results

Based on this dataset, an interesting result have been obtained using proposed CCFDP. The Python Plotly library is used to draw the graphical results.

▪: Transaction amount distribution
▪: Normal VS Fraudulent transaction and transaction time
▪: Equally distributed class (legitimate VS fraudulent transactions)
▪: Detection of balanced and imbalanced correlation matrix
▪: Dimensionality feature reduction
▪: Validation rate
▪: Accuracy

4.3.1. Transaction Amount Distribution

The Figure 5 shows the imbalanced distribution of the data in the dataset. It has been the most important problem to be solved in order to detect fraudulent activity. The fraudulent cases are very few, and because of that, any algorithm can miscalculate that any transaction that is requested from the database will be normal. However, in reality, it is not the case.

4.3.2. Normal VS Fraudulent Transaction and Transaction Time

Figure 6a,b show the distribution of transaction amount and distribution of transaction time, respectively.

4.3.3. Equally Distributed Class (Legitimate vs. Fraudulent Transactions)

Figure 7 shows a balanced dataset that is used to train the model. In order to acquire this kind of dataset, we have applied the random undersampling method. This process selects the data from the majority class at random and removing them from the training dataset. The majority of class instances are discarded at random in random under-sampling until a more balanced distribution is reached. The main idea of this experiment is to randomly grab the same number of legitimate transactions as a fraudulent transaction. Furthermore, after choosing them, we create a new data frame based the information.

4.3.4. Detection of Balanced and Imbalanced Correlation Matrix

Figure 8a,b show the results of the imbalanced and balanced correlation matrix. This result aims to identify all features of the dataset. It has been observed in Figure 8a that the imbalanced correlation coefficients are unnoticeable. On the other, Figure 8b shows a subsample of a balanced correlation matrix that is more noticeable in a balanced dataset, which helps to identify outliers and remove redundant data.

4.3.5. Dimensionality Feature Reduction

Figure 9a–c depict the transaction distribution on the coordinate position. The transactions are based on classification results that have been clustered in the dataset. For dimensionality feature reduction, this experiment employs the t-SNE, PCA, and SVD methods. The main goal of this experiment is to remove unnecessary features that aid in the reduction of complexity. Furthermore, the reduction in dimensionality results in less storage space and thus less computation time. Reduced misleading data can also improve model accuracy.

4.3.6. Validation Rate

Figure 10a–d show the training and validation curves for the most advanced prediction classifiers. According to the results, the Logistic Regression produces the best results because the difference between cross-validation and training score is the smallest. As a result, we have decided to build our prediction model using this algorithm. The Logistic Regression produces a cross-validation rate of 93.35% and a training score of 94.31%. Other algorithms, on the other hand, produce lower results, such as the k-nears neighbors learning algorithm, which produces 92.84% cross-validation and 94.26% training score, and the support vector learning algorithm, which produces 93.76% cross-validation and 96.92% training score, and the decision tree classifier algorithm, which produces 91.98% cross-validation and 94.82% training score. According to the results, Logistic Regression has a smaller difference between cross-validation and training-score, which is found to be 0.96%, whereas other competing algorithms have larger differences. It has been observed that the support vector learning algorithm has a larger difference between cross-validation and training-score, which is 3.16%.

4.3.7. Accuracy

In the previous results, the feature reduction dimensionality of the dataset including balanced and imbalanced correlation is performed to determine the accuracy of the proposed CCFDP method. The suitability of the integrated algorithms/models in the proposed method confirmed how to detect normal and fraudulent transactions. The suitability of the proposed CCFDP method is assessed and compared with current state-of-the-art approaches (CATCHM [26], LSTM-RNN [27], CSLMLE [28] and CCFDM [29]). The proposed CCFDP method inherits amazing features of different models (RU, t-SNE, PCA, SVD, and LRL). The integration of these models helps acquire higher accuracy as compared to other competing approaches. Based on the results, it is confirmed that the proposed CCFDP obtains better accuracy. Figure 11a depicts the results based on a 1% fraud proportion. It has been observed that the proposed CCFDP obtained 99.92% fraud detection accuracy, whereas the contending methods obtained the fraud detection accuracy from 89.95% to 99.82%. In Figure 11b fraud proportion rate has been increased up to 2%. The results demonstrate that the fraud-increase proportion affects the proposed and contending methods. However, the proposed CCFDP is not highly affected. On the other hand, the contending methods are more affected. The CCFDP acquires 99.89% fraud detection rate. The competing methods achieve fraud detection accuracy ranging from 98.81% to 99.51%. Figure 11c depicts the outcome based on a 5% fraud proportion that has been significantly increased. This increase has only a minor impact on the proposed CCFDP, which achieves 99.72% fraud detection accuracy, which is still acceptable in practice. On the other hand, the competing methods significantly reduce fraud detection accuracy, which has been observed to range from 97% to 98%. During the experiments, it was discovered that an increase in fraud proportions has a similar effect on the competing methods. Figure 11d depicts the proposed CCFDP’s fraud prevention accuracy at various fraud proportion rates ranging from 1–5%. The fraud prevention accuracy has been significantly improved and is acceptable in real life. Moreover, the increase in fraud proportions does not have a greater negative effect.

4.4. Statistical Performance Metrics

In statistical implication, model selection is typically performed using statistical performance metrics. The model that performs the best in terms of a chosen performance criterion is ultimately chosen from among a number of models that were trained with various sets of parameters and hyper-parameters.

4.4.1. Root Mean Square Error

It is an average squared deviation of forecasting from real amounts. Therefore, RMSE can be obtained as:

R_{s r} = \sqrt{\frac{\sum i {(A_{v} - P_{v})}^{2}}{d_{p}}}

(31)

where

R_{s r}

is the root mean square error,

A_{v}

is the actual values,

P_{v}

is the prediction value, and

d_{p}

is data points.

4.4.2. Relative Root Mean Squared Error

Relative root mean squared error (RRMSE) is frequently expressed as a percentage and is normalized by the mean of the real amounts. RRMSE smaller values are preferred. The RRMSE

R_{γ}

can be calculated as:

R_{γ} = \frac{R_{s r}}{d_{p}}

(32)

4.4.3. Mean Bias Error

The Mean Bias Error (MBE), which measures model error, is typically not used because high individual predicted values can also result in a low MBE. In order to determine whether any actions are required to rectify the model bias, the MBE is principally used to calculate the average bias in the model. The average bias in the estimation is captured by the MBE. Data from datasets are overestimated when there is a constructive bias or error in a variable, and vice versa. The average bias in the forecasting is measured using the MBE. The MBE

M_{b e}

can be measured as:

M_{b e} = \sum i (\frac{P_{v} - A_{v}}{d_{p}})

(33)

4.4.4. Mean Directional Accuracy

A metric to determine the likelihood that the prediction model can identify the proper direction of time series is provided by mean directional accuracy (MDA). Studies in macroeconomics and economics frequently employ this metric. The MDA

A_{m d}

can be determined as:

A_{m d} = \frac{\sum_{t} μ_{s i g n (A_{v (t)} - A_{v (t)} - 1) = = s i g n (P_{v (t)} - A_{v (t) - 1})}}{d_{p}}

(34)

where

P_{v (t)}

and

A_{v (t)}

are the prediction and actual values at the time

t

,

μ

is the indicator function, and

s i g n

represents the sign function.

The statistical performance metrics (MDA, MBE, RRMSE, RMSE) are used to determine the performance of the proposed CCFDP and also compared with contending approaches CATCHM [27], LSTM-RNN [28], CSLMLE [29], CCFDM [30] enhance secure deep learning (ESDL) algorithm [31], Intelligent two-level credit card fraud detection (ITCCFD) [32], and Bridge to graph (BTG) [33]. Based on the experimental results, the performance of the proposed CCFDP is observed better than the contending approaches that are shown in Figure 12a–d and Table 3.

5. Discussion of Results

The proposed LRL is integrated with the proposed CCFDP which considerably enhances detection accuracy. In order to solve the unbalanced dataset problem, the Random Undersampling approach is employed to build a new balanced data frame. It has been observed that Random Undersampling can employ more authentic data and help to solve the unbalanced situation. During implementation, fraud proportions of 1%, 2%, and 5% are taken and randomly selected the same size as legitimate transactions. The CCFDP method’s performance is better than that of other current models from the accuracy perspective depicted in Table 4. All of the approaches considered are trained using the same training data and fraud rates. The result demonstrates that the proposed CCFDP detects fraud cases more accurately regardless of the proportion of available fraud rate in the dataset. The CCFDP outperforms all others on diverse sample sets. On the other hand, the proposed CCFDP has the capability to prevent fraud, whereas the contending methods do not have the capability to prevent fraud because they are designed for only fraud detection. The results shown in Table 5 validate the fraud prevention capability of the proposed method. The performance of the proposed CCFDP is also better than competing methods (CATCHM, LSTM-RNN, CSLMLE, CCFDM, ESDL, ITCCFD, and BTG from the perspective of the statistical performance metrics (MDA, MBE, RRMSE, RMSE).

The main reason for acquiring better accuracy is the use of integration of different modern techniques (RU, t-SNE, PCA, LRL, and SVD). These techniques perform a quicker data training process for increasing accuracy. Furthermore, these integrated techniques help to detect fraud successfully. PCA is employed to obtain lower-dimensional data while retaining as much variation in the data. The exploratory data analysis and predictive modeling are performed to reduce dimensionality by projecting each data point onto only the first few major components. In order to further improve accuracy, t-SNE is used to reduce dimensionality by keeping similar and dissimilar instances apart. LRL is also used to assess the success and failure probability of CNP fraud. In order to predict the relationship between various legitimate and illegitimate transactions, the interaction of predictor factors is simulated. However, the integration of modern techniques can increase the complexity. Thus, this issue has been resolved by using the dimensionality reduction feature process. This process helps to describe the procedure for reducing the number of variables in training data. When working with high-dimensional data, it may be useful to reduce dimensionality by projecting the data to a lower-dimensional subspace that captures the core of the data. The challenge encountered is the process of integration of different techniques.

6. Conclusions

The rise in fraudulent activity is massively increased due to e-banking transactions that put a burden on fraud control systems. In this research, we present a CCFDP method for detecting credit card fraud detection and prevention. The proposed method involves FDP and FPP. The FDP consists of four cutting-edge methods used in the FDP module: RU, t-SNE, PCA, and SVD processes. The FPP uses logistic regression learning. Furthermore, the Random Under Sampling approach has been employed to increase detection accuracy by balancing the number of fake samples with authentic ones. Different tests have been conducted on 1%, 2%, and 5% fraud proportions to demonstrate the efficiency of the proposed CCFDP. The results confirm that our proposed method has greater fraud detection accuracy. Furthermore, the accuracy of the proposed CCFDP is compared with the state-of-the-art (CATCHM, LSTM-RNN, CSLMLE, CCFDM, ESDL, ITCCFD, and BTG). Based on the comparison result, the proposed CCFDP outperforms current state-of-the-art methods. Moreover, the proposed CCFDP has the capability to prevent fraud as compared to counterpart methods. The result of fraud prevention accuracy also confirms the suitability of the proposed method.

In the future, we will assess the proposed CCFDP’s time and space complexity, as well as other quality-of-service factors.

Author Contributions

A.R., conceptualization, writing, idea proposal, software development, methodology, review, manuscript preparation, visualization, results and submission; M.B.H.F., data curation, software development, and preparation; M.A. (Muder Almiani), F.A. and N.Z.J., review, manuscript preparation, and visualization; G.B., A.A., M.A. (Majid Alshammari) and S.A., review, data curation and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Taif University Researchers Supporting Project number (TURSP-2020/302), Taif University, Taif, Saudi Arabia.

Data Availability Statement

The data that support the findings of this research are publicly available as indicated in the reference. Additionally, the links are given as below: https://datahub.io/machine-learning/creditcard#data; https://github.com/nsethi31/Kaggle-Data-Credit-Card-Fraud-Detection; https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud.

Acknowledgments

The authors gratefully acknowledge the support of Mohsin Ali for providing insights.

Conflicts of Interest

The authors declare no conflict of interest.

References

Oad, A.; Razaque, A.; Tolemyssov, A.; Alotaibi, M.; Alotaibi, B.; Zhao, C. Blockchain-Enabled Transaction Scanning Method for Money Laundering Detection. Electronics 2021, 10, 1766. [Google Scholar] [CrossRef]
Razaque, A.; Al Ajlan, A.; Melaoune, N.; Alotaibi, M.; Alotaibi, B.; Dias, I.; Oad, A.; Hariri, S.; Zhao, C. Avoidance of Cybersecurity Threats with the Deployment of a Web-Based Blockchain-Enabled Cybersecurity Awareness System. Appl. Sci. 2021, 11, 7880. [Google Scholar] [CrossRef]
Höppner, S.; Baesens, B.; Verbeke, W.; Verdonck, T. Instance-dependent cost-sensitive learning for detecting transfer fraud. Eur. J. Oper. Res. 2022, 297, 291–300. [Google Scholar] [CrossRef]
Lucas, Y.; Portier, P.-E.; Laporte, L.; He-Guelton, L.; Caelen, O.; Granitzer, M.; Calabretto, S. Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Futur. Gener. Comput. Syst. 2019, 102, 393–402. [Google Scholar] [CrossRef]
Trivedi, N.K.; Simaiya, S.; Lilhore, U.K.; Sharma, S.K. An efficient credit card fraud detection model based on machine learning methods. Int. J. Adv. Sci. Technol. 2020, 29, 3414–3424. [Google Scholar]
Hazım, L.R. Four Classification Methods Naïve Bayesian, Support Vector Machine, K-Nearest Neighbors and Random Forest Are Tested for Credit Card Fraud Detection. Master’s Thesis, Altınbaş Üniversitesi, Istanbul, Turkey, 2018. [Google Scholar]
Fu, K.; Cheng, D.; Tu, Y.; Zhang, L. Credit card fraud detection using convolutional neural networks. In International Conference on Neural Information Processing; Springer: Cham, Switzerland, 2016; pp. 483–490. [Google Scholar]
Zheng, F.; Yan, Q.; Leung, V.C.; Yu, F.R.; Ming, Z. HDP-CNN: Highway deep pyramid convolution neural network combining word-level and character-level representations for phishing website detection. Comput. Secur. 2021, 114, 102584. [Google Scholar] [CrossRef]
Fang, J.; Xie, Z.; Cheng, H.; Fan, B.; Xu, H.; Li, P. Anomaly detection of diabetes data based on hierarchical clustering and CNN. Procedia Comput. Sci. 2022, 199, 71–78. [Google Scholar] [CrossRef]
AbdElminaam, D.S.; ElMasry, N.; Talaat, Y.; Adel, M.; Hisham, A.; Atef, K.; Mohamed, A.; Akram, M. HR-chat bot: Designing and building effective interview chat-bots for fake CV detection. In Proceedings of the 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 26–27 May 2021; pp. 403–408. [Google Scholar]
Almiani, M.; AbuGhazleh, A.; Al-Rahayfeh, A.; Atiewi, S.; Razaque, A. Deep recurrent neural network for IoT intrusion detection system. Simul. Model. Pract. Theory 2019, 101, 102031. [Google Scholar] [CrossRef]
Razaque, A.; Alotaibi, B.; Alotaibi, M.; Hussain, S.; Alotaibi, A.; Jotsov, V. Clickbait Detection Using Deep Recurrent Neural Network. Appl. Sci. 2022, 12, 504. [Google Scholar] [CrossRef]
Vorobyev, I.; Krivitskaya, A. Reducing False Positives in Bank Anti-fraud Systems Based on Rule Induction in Distributed Tree-based Models. Comput. Secur. 2022, 120, 102786. [Google Scholar] [CrossRef]
Popat; Rimpal, R.; Chaudhary, J. A survey on credit card fraud detection using machine learning. In Proceedings of the 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 11–12 May 2018; pp. 1120–1125. [Google Scholar]
Porwal, U.; Mukund, S. Credit card fraud detection in e-commerce. In Proceedings of the 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), Rotorua, New Zealand, 5–8 August 2019; pp. 280–287. [Google Scholar]
Carcillo, F.; Le Borgne, Y.A.; Caelen, O.; Kessaci, Y.; Oblé, F.; Bontempi, G. Combining unsupervised and supervised learning in credit card fraud detection. Inf. Sci. 2021, 557, 317–331. [Google Scholar] [CrossRef]
Itoo, F.; Singh, S. Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int. J. Inf. Technol. 2021, 13, 1503–1511. [Google Scholar] [CrossRef]
Staar, B.; Lütjen, M.; Freitag, M. Anomaly detection with convolutional neural networks for industrial surface inspection. Procedia CIRP 2019, 79, 484–489. [Google Scholar] [CrossRef]
Park, P.; Di Marco, P.; Shin, H.; Bang, J. Fault Detection and Diagnosis Using Combined Autoencoder and Long Short-Term Memory Network. Sensors 2019, 19, 4612. [Google Scholar] [CrossRef] [Green Version]
Balagolla, E.M.S.W.; Fernando, W.P.C.; Rathnayake, R.M.N.S.; Wijesekera, M.J.M.R.P.; Senarathne, A.N.; Abeywardhana, K.Y. Credit card fraud prevention using blockchain. In Proceedings of the 2021 6th International Conference for Convergence in Technology (I2CT), Maharashtra, India, 2–4 April 2021. pp. 1–8.
Fiore, U.; De Santis, A.; Perla, F.; Zanetti, P.; Palmieri, F. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf. Sci. 2019, 479, 448–455. [Google Scholar] [CrossRef]
Du, J.; Han, G.; Lin, C.; Martinez-Garcia, M. ITrust: An anomaly-resilient trust model based on isolation forest for underwater acoustic sensor networks. IEEE Trans. Mob. Comput. 2020, 21, 1684–1696. [Google Scholar] [CrossRef]
Zhang, X.; Han, Y.; Xu, W.; Wang, Q. HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Inf. Sci. 2019, 557, 302–316. [Google Scholar] [CrossRef]
West, J.; Bhattacharya, M. Intelligent financial fraud detection: A comprehensive review. Comput. Secur. 2016, 57, 47–66. [Google Scholar] [CrossRef]
Razaque, A.; Abenova, M.; Alotaibi, M.; Alotaibi, B.; Alshammari, H.; Hariri, S.; Alotaibi, A. Anomaly detection paradigm for multivariate time series data mining for healthcare. Appl. Sci. 2022, 12, 8902. [Google Scholar] [CrossRef]
Ghosh, S.; Das, A. Wetland conversion risk assessment of East Kolkata Wetland: A Ramsar site using random forest and support vector machine model. J. Clean. Prod. 2020, 275, 123475. [Google Scholar] [CrossRef]
Van Belle, R.; Baesens, B.; De Weerdt, J. CATCHM: A novel network-based credit card fraud detection method using node representation learning. Decis. Support Syst. 2022, 164, 113866. [Google Scholar] [CrossRef]
Roseline, J.F.; Naidu, G.B.S.R.; Samuthira Pandi, V.; Alamelu, S.; Mageswari, N. Autonomous credit card fraud detection using machine learning approach. Comput. Electr. Eng. 2022, 102, 108132. [Google Scholar] [CrossRef]
Olowookere, T.; Adewale, O.S. A framework for detecting credit card fraud with cost-sensitive meta-learning ensemble approach. Sci. Afr. 2020, 8, e00464. [Google Scholar] [CrossRef]
Asha, R.B.; Kumar, S.K.R. Credit card fraud detection using artificial neural network. Glob. Transit. Proc. 2021, 2, 35–41. [Google Scholar]
Sanober, S.; Alam, I.; Pande, S.; Arslan, F.; Pitambar Rane, K.; Singh, B.K.; Khamparia, A.; Shabaz, M. An enhanced secure deep learning algorithm for fraud detection in wireless communication. Wirel. Commun. Mob. Comput. 2021, 2021, 6079582. [Google Scholar] [CrossRef]
Darwish, S.M. An intelligent credit card fraud detection approach based on semantic fusion of two classifiers. Soft Comput. 2020, 24, 1243–1253. [Google Scholar] [CrossRef]
Hu, X.; Chen, H.; Liu, S.; Jiang, H.; Chu, G.; Li, R. BTG: A Bridge to Graph machine learning in telecommunications fraud detection. Future Gener. Comput. Syst. 2022, 137, 274–287. [Google Scholar] [CrossRef]

Figure 1. Credit card-not-present fraud transaction process.

Figure 2. Credit card fraud detection and prevention mechanism.

Figure 3. (a) shows the accuracy 97.76% with 2 components, and (b) shows the accuracy 99.49% with 3 components.

Figure 4. Showing the fraudulent transaction process.

Figure 5. An imbalanced dataset.

Figure 6. (a) Legitimate vs. fraudulent transaction and (b) the transactions and transaction time. (a) depicts both legitimate and fraudulent transactions. A maximum of 25,000 transactions were processed, with a maximum fraudulent transaction rate of 0.0178% observed. (b) depicts 18,000 transactions with transaction times for each. The transaction took a maximum of 0.000010 s to complete. The increasing time is due to an attack on the transaction during that time.

Figure 7. The balanced dataset using the random undersampling method.

Figure 8. (a) shows imbalanced convolutional correlation matrix, and (b) shows subsample of a balanced convolutional correlation matrix.

Figure 9. (a) shows the dimensionality feature reduction using t-SNE, while (b) shows the dimensionality feature reduction using PCA and (c) shows the dimensionality feature reduction using SVD.

Figure 10. (a) Learning and validation curves for logistic regression, (b) Learning and validating curves for k-nears neighbors learning algorithm, (c) Learning and validation curves for support vector learning algorithm, and (d) Learning and validation curves for decision tree classifier algorithm.

Figure 11. (a) Fraud detection accuracy with 1% fraud proportion, (b) fraud detection accuracy with 2% fraud proportion, (c) fraud detection accuracy with 5% fraud proportion, and (d) fraud prevention accuracy with 1–5% fraud proportions.

Figure 12. (a) Root Mean Square Error for the proposed CCFDP and contending approaches with maximum 45,000 transactions. (b) Relative root mean squared error for the proposed CCFDP and contending approaches with maximum 45,000 transactions. (c) Mean Bias Error for the proposed CCFDP and contending approaches with maximum 45,000 transactions. (d) Mean directional accuracy for the proposed CCFDP and contending approaches with maximum 45,000 transactions.

Table 1. The current contributions for addressing credit card fraud detection.

Works	Algorithms for Credit Card Fraud Detection and Prevention	Features/Strengths	Deficiencies/Vulnerabilities
Carcillo et al. [16]	Unsupervised credit card fraud detection	Improves the model accuracy for credit card fraud detection.	Increases the execution time and there is possibility of the risk
Itoo, and Satwinder [17]	Evaluation of the multiple algorithms for credit card fraud detection	Determining the best fraud detection algorithm that can help to protect the financial assets	Failed to evaluate the analytical algorithms’ correctness and effectiveness. The work lacks any uniqueness.
Staar et al. [18]	Suggested a paradigm for fraud detection based on CNN	Provides innate fraud behavior features. Additionally, trade entropy is employed for the transaction categorization accuracy	Limited to fraud feature detection but failed to identify complete fraud transactions
Park et al. [19]	CNN has been employed after the trade entropy and feature matrices are combined.	proposes the use of entropy to identify more sophisticated extortion schemes. In addition, many transactions use a characteristic matrix, on which CNN is trained to detect the collection of latent patterns for each sample.	The work is sample-only and not intended to identify actual fraud transactions.
Balagolla et al. [20]	Addressed the issue of false sense of security using use a deep-learning system	The algorithm is highly accurate at spotting new patterns and illegitimate certifications. Additionally, it reduces the time needed to identify new attacks from risky websites	It is not specifically intended to identify fraudulent transactions.
Fiore et al. [21]	A case study involving credit card fraud detection	To standardize data, cluster analysis is employed. The results of employing cluster analysis and artificial neural networks for fraud detection have shown that clustering properties can reduce neural inputs.	Limited to cluster analysis and failed to detect the credit card fraud detection
Du et al. [22]	Suggested a fundamentally new model-based strategy that separates anomalies	Since iForest uses sub-sampling, which was previously impossible, the algorithm has a linear time complexity with minimal constant and memory usage	It is not specially designed for credit card fraud detection. Limited to determining sub-sampling
Zhang et al. [23]	Method for authenticating a credit card	Provides the procedure for transaction validation before a credit card transaction may be approved	Failed to provide better accuracy for the credit card fraud detection
West and Maumita [24]	Introduces computational intelligence-based solution for monetary fraud detection procedures	The categorization techniques include critical components such as detection rules for detecting various forms of fraud. A higher success rate is obtained	Numerous aspects of intelligent fraud detection have yet not been thoroughly studied
Razaque et al. [25]	Paradigm is used to detect the anomalies	A new matric profile is introduced for anomaly detection. The paradigm is applied to huge multivariate data sets and delivers high-quality approximation solutions in a reasonable amount of time with excellent accuracy	It is restricted to medical domains
Ghosh and Arijit [26]	Undertaken comparative research on data mining strategies for credit card fraud detection	Investigating the performance of Random Forest and Support Vector Machines in conjunction with logistic regression to address the issue of an unbalanced dataset. Furthermore, it is proved that the Random Forest method outperforms SVM in terms of accuracy	It is limited to comparison of two algorithms. No novelty and originality
Belle et al. [27]	A novel network-based model called CATCHM for credit card fraud detection	An inventive network design is employed for efficient inductive pooling and careful configuration of the downstream classifier.	The accuracy of the proposed method is lower
Roseline et al. [28]	LSTM-RNN method for analyzing the credit card fraud	LSTM-RNN is introduced for detecting credit card fraud. This method reduces the likelihood of fraud.	This approach is sensitive to various random weight initializations and easily overfits.
Olowookere and Olumide [29]	Framework for combining the potentials of cost-sensitive learning and meta-learning ensemble techniques for fraud detection	The framework is suggested to combine the potentials of cost-sensitive learning and meta-learning ensemble techniques for fraud detection. Additionally, it takes the approach of letting base-classifiers fit conventionally while incorporating cost-sensitive learning into the ensemble learning process	It is restricted to fraud rates and lower accuracy is produced
Asha and Suresh Kumar [30]	Integration of multiple algorithms for fraud detection	Integration of multiple algorithms attempt to overcome the fraud detection of the card	Increases the complexity and produced a lower accuracy
Our Proposed Method	Credit Card Fraud Detection and Prevention method for dealing with CNP fraud utilizing big data analytics	Presents CCFDP consists of two processes for detecting credit card fraud: FDP and FPP. Furthermore, cutting-edge algorithms (RU, t-SNE, PCA, SVD) have been incorporated. The detection rate’s efficacy is optimized based on the experimental findings	The integration of the algorithms increases the time complexity, but it considerably increases the accuracy rate

Table 2. Used materials for conducting experiments.

Used Materials	Description
Cluster uptime	0.1 s
Cluster time zone	Almaty, Kazakhstan
Cluster connection URL	http://127.0.0.1:54231: accessed on 17 August 2022
Connection proxy	none
Internal security	False
Platform	Python 3.4.6
Operating system	Ubuntu 18.04
RAM	16 Gb
ROM	256 Gb
Database	MySQL
Processor	2.4 GHz Intel Core i7
Environment	Visual Studio

Table 3. The comparison of the proposed CCFDP with contending approaches using statistical metrics.

Methods	RMSE	RRMSE	MBE	MDA
CATCHM [27]	0.068%	9.48%	0.294%	99.05%
LSTM-RNN [28]	0.078%	9.77%	0.361%	99.24%
CSLMLE [29]	0.076%	8.61%	0.122%	99.11%
CCFDM [30]	0.095%	10.76%	0.554%	99.87%
ESDL [31]	0.072%	9.62%	0.321%	99.45%
ITCCFD [32]	0.059%	8.96%	0.192%	99.18%
BTG [33].	0.079%	10.38%	0.472%	99.54%
Proposed CCFDP	0.056%	8.39%	0.078%	99.92%

Table 4. The fraud detection accuracy of the proposed CCFDP and contending methods.

Methods	Accuracy with 1% Fraud Proposition	Accuracy with 2% Fraud Proposition	Accuracy with 5% Fraud Proposition
CATCHM [27]	99.72%	99.19%	97.97%
LSTM-RNN [28]	98.95%	98.81%	97%
CSLMLE [29]	99.84%	99.51%	98%
CCFDM [30]	99.32%	98.91%	97.88%
Proposed CCFDP	99.94%	99.89%	99.72%

Table 5. The fraud prevention accuracy of the proposed CCFDP.

Methods	Fraud Prevention Accuracy %
CCFDP with 1% Fraud Proportion	99.99%
CCFDP with 2% Fraud Proportion	99.98%
CCFDP with 3% Fraud Proportion	99.95%
CCFDP with 4% Fraud Proportion	99.94%
CCFDP with 5% Fraud Proportion	99.90%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Razaque, A.; Frej, M.B.H.; Bektemyssova, G.; Amsaad, F.; Almiani, M.; Alotaibi, A.; Jhanjhi, N.Z.; Amanzholova, S.; Alshammari, M. Credit Card-Not-Present Fraud Detection and Prevention Using Big Data Analytics Algorithms. Appl. Sci. 2023, 13, 57. https://doi.org/10.3390/app13010057

AMA Style

Razaque A, Frej MBH, Bektemyssova G, Amsaad F, Almiani M, Alotaibi A, Jhanjhi NZ, Amanzholova S, Alshammari M. Credit Card-Not-Present Fraud Detection and Prevention Using Big Data Analytics Algorithms. Applied Sciences. 2023; 13(1):57. https://doi.org/10.3390/app13010057

Chicago/Turabian Style

Razaque, Abdul, Mohamed Ben Haj Frej, Gulnara Bektemyssova, Fathi Amsaad, Muder Almiani, Aziz Alotaibi, N. Z. Jhanjhi, Saule Amanzholova, and Majid Alshammari. 2023. "Credit Card-Not-Present Fraud Detection and Prevention Using Big Data Analytics Algorithms" Applied Sciences 13, no. 1: 57. https://doi.org/10.3390/app13010057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Credit Card-Not-Present Fraud Detection and Prevention Using Big Data Analytics Algorithms

Abstract

1. Introduction

1.1. Research Contribution

1.2. Paper Organization

1.3. Problem Identification

2. Related Work

3. Credit Card-Not-Present Fraud Detection and Prevention Method

3.1. Fraud Detection Process

3.1.1. Dimensionality Feature Reduction

Random Undersampling

3.1.2. t-Distributed Stochastic Neighbor Embedding

3.1.3. Principal Component Analysis

3.1.4. Singular Value Decomposition

3.2. Fraud Prevention Process

Logistic Regression Learning Modeling for FPP

4. Implementation and Experimental Results

4.1. Implementation

4.2. Dataset

4.3. Experimental Results

4.3.1. Transaction Amount Distribution

4.3.2. Normal VS Fraudulent Transaction and Transaction Time

4.3.3. Equally Distributed Class (Legitimate vs. Fraudulent Transactions)

4.3.4. Detection of Balanced and Imbalanced Correlation Matrix

4.3.5. Dimensionality Feature Reduction

4.3.6. Validation Rate

4.3.7. Accuracy

4.4. Statistical Performance Metrics

4.4.1. Root Mean Square Error

4.4.2. Relative Root Mean Squared Error

4.4.3. Mean Bias Error

4.4.4. Mean Directional Accuracy

5. Discussion of Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI