Hybrid Intrusion Detection System Based on Combination of Random Forest and Autoencoder

Wang, Chao; Sun, Yunxiao; Wang, Wenting; Liu, Hongri; Wang, Bailing

doi:10.3390/sym15030568

Open AccessArticle

Hybrid Intrusion Detection System Based on Combination of Random Forest and Autoencoder

by

Chao Wang

^1,2,

Yunxiao Sun

^1,2,

Wenting Wang

³,

Hongri Liu

^1,4 and

Bailing Wang

^1,2,*

¹

School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264209, China

²

School of Cyber Science and Technology, Harbin Institute of Technology, Harbin 150001, China

³

State Grid Shandong Electric Power Company, Electric Power Research Institute, Jinan 250003, China

⁴

Weihai Cyberguard Technologies Co., Ltd., Weihai 264209, China

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(3), 568; https://doi.org/10.3390/sym15030568

Submission received: 27 January 2023 / Revised: 8 February 2023 / Accepted: 14 February 2023 / Published: 21 February 2023

(This article belongs to the Special Issue The Study of Network Security and Symmetry)

Download

Browse Figures

Versions Notes

Abstract

:

To cope with the rising threats posed by network attacks, machine learning-based intrusion detection systems (IDSs) have been intensively researched. However, there are several issues that need to be addressed. It is difficult to deal with unknown attacks that do not appear in the training set, and as a result, poor detection rates are produced for these unknown attacks. Furthermore, IDSs suffer from high false positive rate. As different models learn data characteristics from different perspectives, in this work we propose a hybrid IDS which leverages both random forest (RF) and autoencoder (AE). The hybrid model operates in two steps. In particular, in the first step, we utilize the probability output of the RF classifier to determine whether a sample belongs to attack. The unknown attacks can be identified with the assistance of the probability output. In the second step, an additional AE is coupled to reduce the false positive rate. To simulate an unknown attack in experiments, we explicitly remove some samples belonging to one attack class from the training set. Compared with various baselines, our suggested technique demonstrates a high detection rate. Furthermore, the additional AE detection module decreases the false positive rate.

Keywords:

intrusion detection; random forest; autoencoder; hybrid model; unknown attack

1. Introduction

Network security has become a hot topic with the development of internet and communication technologies such as cloud computing, the Internet of Things, etc. Among the security tools to deal with the rising threats of network attacks and guarantee cyberspace security, intrusion detection systems (IDSs) play a significant role [1].

The concept of the IDS was proposed in 1980 [2]. From then on, more and more IDSs have been developed. In general, there are two criteria for the detection performance of a powerful IDS. First, it should identify more attack samples and obtain a higher detection rate. Secondly, when there is a large number of false alarms, security operators may overlook real network attacks; thus, it is vital to reduce the false alarm rate as much as feasible. In order to build powerful and effective IDSs, researchers have sought to utilize artificial intelligence methods [3] such as machine learning and deep learning [1,4,5,6,7]. One effective detection approach is based on supervised learning.

The supervised detection method utilizes a labeled dataset to train a classifier. For example, in the decision tree (DT)-based IDS [8], the tree learns multiple rules for the labeled dataset. Ensemble of DTs, known as random forest (RF) have also been used [9]. To detect DDoS attacks, the authors of [10] extracted new features of the flow packet size and time interval using the graphic symmetry concept. Since the development of deep learning, some IDS have been constructed based on deep neural networks (DNN) [11]. DNNs employ a number of hidden layers to learn the complex relations between the input features and the classification target. In addition, there is some work involving convolutional neural networks [12]. In datasets with both normal and attack samples, a classifier can find a decision boundary between the normal and attack samples.

As network attack methods are becoming more complicated, however, it is challenging to obtain samples of all attack types. When encountering unknown attacks during the detection phase, the supervised classifier may not generalize well [13], which may result in misclassification for these samples and decrease the detection rate. To tackle this challenge, researchers have attempted to create an IDS model that trains on the normal data alone, such as an autoencoder (AE) [14]. AEs employ the reconstruction error as the anomaly score, where one sample with a higher score could be detected as a attack. However, due to the lack of supervision by both normal and attack samples, it may not obtain the same high performance as supervised algorithms, which may learn complex decision boundaries between the normal and attack samples.

In this paper, we focus on the case where samples belonging to attack classes that are used for training are limited because it is difficult to collect samples of all attack types. For example, there are some types of attacks that have been used for training supervised models. As attack variants or new types of attacks continue to emerge, the trained detection model may not detect a novel attack. As a result, developing a robust IDS model with a higher detection rate and a lower false positive rate becomes critical. With the consideration that different techniques can learn the characteristics of data from different perspectives, in this work, we propose a hybrid IDS that combines an RF and an AE. In particular, the hybrid IDS comprises two steps in the detection phase. The first step is the application of RF with probabilistic methods to detect attacks. Then, considering how to reduce the false alarm rate further, we employ the AE module in the second step. The contributions of this study can be listed as follows:

In the first step, we employ RF to identify attacks. Unlike commonly used strategies, we employ the predicted probability to distinguish the samples. With a predefined threshold, probabilities higher than the threshold can be identified as attacks. In this manner, the RF can identify some unknown attacks.
In the second step, we combine another detector utilizing a different detection principle. In detail, we apply an AE to recheck samples that have been predicted as attacks by the RF classifier. The reconstruction error of samples with a lower value can be reclassified as normal. This additional step decreases the false positive rate even further.
To demonstrate the effectiveness of the proposed methods, we conduct experiments on two incursion datasets. In the experiments, we explicitly set some attacks as unknown. The combined approach provides a greater detection rate and a lower false positive rate compared with other baseline methods.

The remainder of this research is organized as follows: we describe the relevant work concerning the IDS in Section 2. The whole detection framework and corresponding methodology are presented in Section 3. Section 4 demonstrates the performance of the suggested approach via comprehensive tests. Finally, we draw relevant conclusions and highlight avenues for further work in Section 5.

2. Related Work

The purpose of IDSs is the discovery of anomalous operations within the monitoring environment. There are two ways to classify IDSs: based on the data source utilized or the detection methods. According to the data source utilized in the detection engine, IDSs can be divided into two categories: host-based IDSs (HIDS) and network-based IDSs (NIDS) [4]. The former utilizes data generated on one host, while the latter checks network traffic packets transmitted within the network. In this study, we focus on machine learning-based NIDS.

The overall process of machine learning can be summarized as two parts: training and testing [1]. During the training phase, models are trained on the collected dataset and learn the characteristics of the input features. After training, the model is deployed in the testing phase to examine the anomalous samples. There are a lot of classic machine learning methods have been applied to construct IDSs [15,16]. As an ensemble learning approach, the RF classifier yields considerable detection performance [17]. It constructs numerous DTs to obtain a higher detection rate than a single DT.

As there may exist some issues such as high-dimensional data [18,19] and data imbalances [20], researchers have proposed more and more enhanced classifiers. To reduce the impact of irrelevant features and enhance the detection rate, the authors of [21] selected useful features first based on the correlation between the features and classified samples using a combination of several distinct classifiers.

RF can be employed directly as a feature selection method. The authors of [22] applied an RF to discover the optimal features for classification based on feature importance first. After that, the selected features are utilized to train a support vector machine. There are also some other hybrid models involving two parts. For example, the authors of [23] used both AE and DNN to classify attacks. However, this method has difficulty with some unknown attacks that do not appear in the training set.

In some situations, it is difficult to collect or simulate the attack samples [24]. It is reasonable to employ some one-class classifiers to learn about the characteristics of network traffic. One-class learning aims to build a profile of normal traffic. For example, the one-class support vector machine (OCSVM) [25] attempts to distinguish between normal and anomalous data by learning the hyperplane that has the maximum distance between the normal samples and the origin [26]. In addition, the isolation forest (IF) algorithm can be used to detect anomalies [27]. Furthermore, there are various works employing AEs [14]. AEs are generally used for feature extraction [28,29], however they can also be utilized for anomaly detection [14].

In this work, we seek to develop an IDS with the objective of a high detection rate and a low false alarm rate. In a real deployment, there are some unknown attacks. Additionally, this work finds that some supervised classifiers may incorrectly classify unknown attacks as normal. To solve this problem, we employ a probabilistic RF and an AE. There is one work [30] utilized in fraud detection similar to ours that also utilizes RF and AE. However, they employed the AE as a dimension reduction approach to retrieve the representative characteristics. Furthermore, they applied the RF in a probabilistic approach to overcome the problem of data imbalance.

3. Proposed Methods

In this section, we describe the suggested model in detail. We introduce the employed techniques first, i.e., RF and AE. Then, we merge these two methods to introduce the full detection framework.

3.1. Random Forest

To obtain better prediction performance, ensemble methods employ a multiple of basic classifiers to make decisions. This approach generally shows increased classification performance over one basic classifier. The RF [31] is a powerful ensemble classifier; it combines bagging and feature randomness to train multiple DTs. The DT is a frequently used classification approach. It tries to learn a set of if–then rules to classify the data. An illustration of an RF is shown in Figure 1.

The bagging approach is one often-used ensemble strategy for constructing multiple classifiers. Using the bootstrap with replacement to sample the data, a large number of classifiers are trained independently.

As illustrated in Figure 1, given the training set, there are M distinct DT classifiers. To obtain final prediction results, majority voting is employed to aggregate the predictions from each DT. In the following, we present the detailed training procedure for RF. Considering the labeled dataset with samples

{x_{1}, x_{2}, \dots, x_{N}}

and labels

{y_{1}, y_{2}, \dots, y_{N}}

, where N is the number of samples and every sample includes j features, to train the RF, we aim to train M distinct DTs. The general step can be summarized as follows:

(1): Sampling from the training set with N samples using the bootstrap with replacement.
(2): Construction of a DT classifier using the selected samples.

To construct one DT, we first select k features from j features. The value of k is set as

s q r t (j)

. After that, we pick the best split feature from the chosen k features and divide the node into two children nodes. The Gini impurity is utilized as a split criterion for the split node. Repeat these procedures and grow the tree as deep as possible.

In this study, we focus on binary classification. To predict an input sample, the RF utilizes the votes of the trees in the forest weighted by their probability estimates [32]. Let p denote the probability of being predicted as the attack. Then, the probability of being predicted as normal is q and

q = 1 - p

. The predicted class probabilities of an input sample are calculated as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf node [32].

We utilize the “predict_proba” function in scikit-learn [32] to output the probability that an object belongs to a certain class. Usually, the samples can be classified into one class with the highest probability. However, in this approach, some samples belonging to unknown attacks may be wrongly classified as normal. In this part, we define a threshold T to help the process of deciding the samples. If the probability p belonging to the attack is greater than the threshold T, samples could be classified as attack. In this manner, with the sample

x_{i}

and corresponding probability

p_{i}

, we define a decision function

f (\cdot)

. The results are indicated by

\pm 1

, where

+ 1

is the anomalous samples. The calculation is shown below:

f (x_{i}) = \{\begin{matrix} - 1, & if p_{i} \leq T \\ + 1, & if p_{i} > T \end{matrix}

(1)

3.2. Autoencoder

Deep learning has shown to be quite effective in a variety of research fields [33]. It learns data representations with multiple neural network layers. The other part of our suggested model is a special unsupervised neural network [34], an AE. As it can rebuild the input, the reconstruction error can serve as the anomaly score for identifying abnormalities. The framework for anomaly detection using AE is depicted in Figure 2. We introduce the general process later.

AEs usually have a symmetry structure. As demonstrated in Figure 2, it can be divided into two parts: encoder and decoder. The AE attempts to reconstruct the original input as much as possible. Both the encoder and the decoder are made up of several hidden layers. In detail, the encoder generates a latent representation for the input sample. Usually, the latent representations have a lower dimension than the original input. The decoder tries to recover the original input from the compressed form. In this work, we employ mean square error (MSE) to quantify the reconstruction loss. With the original input

x_{i}

and reconstructed output

{\hat{x}}_{i}

, the MSE

e_{i}

is calculated as below:

e_{i} = {∥ x_{i} - {\hat{x}}_{i} ∥}^{2}

(2)

The training process aims to minimize the reconstruction loss. After training, with a well-trained AE we can identify the anomalous samples using the MSE. Similar to the handling process for the probability of the RF with a predefined threshold T, we use a function

f (\cdot)

to make the decision, as illustrated below:

f (x_{i}) = \{\begin{matrix} - 1, & if e_{i} \leq T \\ + 1, & if e_{i} > T \end{matrix}

(3)

where

+ 1

denotes the anomalous sample.

3.3. Whole Detection Framework

In this study, we target the scenario where we have collected samples belonging to certain attack types, but there may exist some unknown attacks as attack variations or new types of attacks continue to emerge.

With the foundational technique laid out, we present the whole detection framework of our suggested method. When there are any unknown attacks, the traditional RF classifier would misclassify them as normal, so we employ a probabilistic approach to make the decision. In addition, to reduce the number of false alarms, we apply an AE to recheck the attacks.

In total, we integrate these two techniques, leveraging the probabilistic RF and AE. The approach has two parts: training and testing. We present the two parts in Figure 3. In the training phase, each model is trained with a different subset of the dataset. Considering the labeled dataset at hand, we can train a binary classifier using an RF classifier. Then, an additional AE is constructed utilizing normal samples from the training set only. As the RF is trained on the labeled dataset, we use its output probability to make decisions for detecting attack samples.

Because the AE trains on normal data only, the normal samples would have a lower MSE than the anomalous ones. From this point of view, we can define a lower threshold, and samples lower than this threshold have a higher degree of confidence that they belong to the normal class. Under this assumption, we integrate these two decision processes. After obtaining the trained model, during the testing phase, we apply a two-step detection strategy.

We list the detection procedure in Algorithm 1. There are two hyperparameters considered for the decision. The first one is

T_{1}

, utilized for RF probability, and the second one is

T_{2}

, used for MSE. In the beginning, a sample

x_{i}

is classified using the RF classifier. When the probability of a sample is larger than the

T_{1}

, it would be classified as an attack. After that, we utilize the AE to examine the attack sample predicted by RF again. In this part, when the reconstruction error is smaller than the

T_{2}

, the sample is reclassified as normal.

In this two-step approach, the samples in the testing phase can be correctly classified, especially the misclassified attack samples.

Algorithm 1: Testing process of our proposed method

4. Experimental Results

In this section, we present the experimental results for the proposed approach. First, we describe the dataset and preprocessing procedures applied in the experiments. The evaluation metric and comparison methods are then introduced. After that, the specific experiment settings are listed. The results of the experiments are thoroughly analyzed in the final part.

4.1. Dataset

To conduct the experiments, we use two intrusion detection datasets [35], namely, NF-CSE-CIC-IDS2018-v2 and NF-BoT-IoT-v2. Both datasets are created using the NetFlow v9 features from the original datasets CSE-CIC-IDS2018 [36] and BoT-IoT [37]. In this study, we refer to them as IDS2018 and BoT-IoT, respectively. Considering that there is a large number of samples in the dataset, we randomly sample different categories of data, and the detailed distribution of the different categories is shown in Table 1.

In the IDS2018 dataset, there are six attacks (not including normal data): DDoS, DoS, Bot, Bruteforce, Infiltration, and Web attacks. As for the BoT-IoT dataset, there are four attacks. There are 43 features for every record. To preprocess the dataset, we remove some irrelevant columns, for example, source IP. After that, all numeric features are handled via the log function to decrease the effect of a larger number. Furthermore, some category features are encoded by the one-hot encoder method. We use min–max normalization to scale the feature into the range of 0 and 1. After handling the dataset, the dimension of IDS2018 dataset is about 300, and the dimension of BoT-IoT dataset is about 200.

4.2. Comparative Methods

To compare the detection performance of our method with some baseline methods, four supervised methods are used to demonstrate improvements while coping with the unknown attacks. Furthermore, there are two anomaly detection methods trained on normal data only.

DT. DT is a commonly used classification method, it learns a set of if–else decision rules.
Logistic regression (LG). LG uses a logistic function to learn the relations between features and target.
RF. The detailed theory has been laid out above. We use the “predict” method in scikit-learn to obtain classification results.
DNN. DNN uses multiple hidden layers to model the complex relationship between an input feature and its classification target.
IF. Multiple isolation trees are constructed to decide together whether one sample is normal or anomalous.
OCSVM [25]. OCSVM finds a hyperplane that separates the normal data from the origin. The hyperplane has the largest distance with the origin.

As there are two techniques in our combination method, they would be compared with our method. The first one is AE, and the second one is the probabilistic RF, which we denote as “RF(Pro)”. In total, there are eight baselines.

4.3. Evaluation Metric

Considering the attack type as the positive one, there are four classification results between the true label and prediction label: true positive (TP), false negative (FN), false positive (FP), and true negative (TN). One TP record, for example, indicates that an attack sample is correctly classified as anomalous. We list the four conditions in Figure 4.

Accounting for these four numbers of the classification results, we calculate some performance metrics. We list four metrics commonly used in the field of classification including accuracy, precision, recall, and F1. Their calculations are defined as follows:

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(4)

Precision = \frac{TP}{TP + FP}

(5)

Recall = \frac{TP}{TP + FN}

(6)

F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall}

(7)

The F1 is the harmonic mean of precision and recall. Furthermore, there is an additional metric named false positive rate (FPR) as below:

FPR = \frac{FP}{FP + TN}

(8)

4.4. Experimental Settings

Before listing the hyperparameters for the deployed methods, we introduce how the dataset is split. To simulate the unknown attack, in one experiment, we select one type of attack as unknown one. In detail, all of the samples that belong to this selected class are removed from the dataset first. After that, the remaining dataset is split by a ratio of 4:1 into a training set and a testing set. The selected attack data are then merged with the testing set to create a new testing set.

Furthermore, we randomly select 10% fraction samples from the training set as a validation set. Using these samples, we can select some hyperparameters based on their performance on the validation set. It should be noted that all the unknown attack samples are in the testing set. All the samples in the training set and validation set belong to normal or known attacks. In the case of the binary classification problem, the label contains only −1 or 1 after splitting the dataset.

To avoid the randomness caused by the model itself or data splitting, we repeat each attack as unknown five times. In total, for each method, there are thirty experiments for the IDS2018 dataset and twenty experiments for the BoT-IoT dataset.

We implement our method using the Python programming language. Furthermore, the Pytorch framework [38] is utilized to implement AE neural network. The machine learning methods are developed by the scikit-learn library [32].

For AE, we use five hidden layer settings for both datasets. As for the IDS2018 dataset, the number of neurons is “210-140-70-140-210”, and the setting of BoT-IoT is “150-100-50-100-150”. The activation function in the hidden layers is the PReLU function [39]. The batch size for training both AEs is 256, the number of epochs is 200, and the learning rate is 0.001. As for the RF, we use the default settings from the scikit-learn library. The number of DT is 100.

There are two parameters in our method that affect the performance:

T_{1}

and

T_{2}

.

T_{1}

is the lower bound of predicted probability output by the RF classifier for samples classified as attack. To avoid overfitting, we use the probabilities of normal data in the validation dataset to determine the

T_{1}

. The value of

T_{1}

is set as the 90th percentile of the probabilities.

T_{2}

is the value of MSE to recheck the attack samples predicted by RF. We set it as the 75th percentile of the MSE from normal data in the validation dataset. We will conduct a detailed analysis of these two parameters.

We set the hidden layer of DNN as “256-256”. The other three supervised methods, including DT, LG, and RF, use the default setting from scikit-learn. As for OCSVM, we use the radial basis function as the kernel function. Furthermore, the parameter nu is selected from

{0.01, 0.1}

, and gamma is from

{1 e - 5, 1 e - 4, 1 e - 3, 1 e - 2, 1 e - 1, 1 e - 0}

. The parameter contamination of IF is selected from

{0.01, 0.1}

, and the number of estimators is selected from

{100, 200, 300}

. We use the grid search method to find the optimal parameters that have the highest F1 on the validation dataset for these two methods.

The comparative method AE uses the same network settings as ours. In our method, the

T_{2}

for AE is used to reduce the FPR. Its value is set at a lower value to ensure that the samples can be classified as normal with higher confidence. When using AE only to detect attacks, we use another threshold. We use the validation dataset to select the best threshold, which results in the best F1 score.

4.5. Results Analysis

In this subsection, we evaluate the experimental results in sequence. We conduct an experiment utilizing the IDS2018 dataset as a case study for our method. Then we assess the overall experiment results for both datasets, including the general detection performance and the performance under various attacks are set as unknown. Finally, we examine the impact of two hyperparameters:

T_{1}

and

T_{2}

.

In this part, we use DoS as an unknown attack and present the probability and MSE distributions. First, we analyze the predicted probability distribution of RF using the bar plot, as displayed in Figure 5.

In the figure, the samples’ distribution of probability is plotted, where the x-axis is the probability, and the y-axis is the number of samples. From the figure, we can see that the probability of normal and known attack samples being distributed on the two sides is generally 0 or 1. However, the unknown attack samples are located in the middle position. From this point, if we use the default “predict” method, the normal and known attacks can be classified correctly, but the unknown attacks would be misclassified as normal.

In our method, we can use the probability to classify the samples. In this instance, we can set the threshold to 0.2 and obtain the correct prediction. Next, we display the MSE distribution, which is plotted in Figure 6. As there is no attack sample in the training set, it is hard for the AE to rebuild the attack samples well. Whether known or unknown, most attack samples have a higher MSE than normal samples.

To further compare the combination methods, we plot the confusion matrix of the four classifiers in Figure 7.

The first RF shows the lowest detection rate, as 64.8% of attacks have been wrongly categorized as normal. When it uses the highest probability to classify (as we can see from Figure 5), it wrongly classifies some unknown attack samples as normal. From Figure 7b, it can be seen that only 1.8% of attack samples were misclassified as normal for the RF(Pro) classifier. It has a big improvement over the RF classifier. Furthermore, the AE detector has a higher TP rate. In the final analysis, our proposed combination methods achieved the lowest FP fraction when compared with the other two. As stated before, we apply the AE to the RF results and relabel some attack samples that have a lower MSE as normal. In this manner, the combination method has a lower FP.

To further demonstrate the performance of our method, we average all of the results when different attacks are set as unknown. The detailed results are shown in Table 2 and Table 3 for both datasets. We report the mean value and standard deviation in the table.

In the first place, based on the results of accuracy, and F1 for both datasets, our method outperforms others. In detail, our method has the highest F1 of 99.72% for the BoT-IoT dataset and 95.90% for the IDS2018 dataset. Although the recall of OCSVM in the IDS2018 dataset is higher than ours, their other metrics perform worse than ours.

As the experimental results present similar conclusions for both datasets, we analyze them both. The first four supervised methods present higher precision and a lower FPR than other methods. Because some samples belong to the unknown attacks in the experiments, the supervised methods classified them directly into the normal class.

The detection performance of IF is not satisfying. It has a F1 of only 65% on the BoT-IoT dataset. The other two detection methods, OCSVM and AE, present higher performance than supervised methods. As these three methods only require normal data during training, they are capable of dealing with known or unknown attacks. The F1 of OCSVM and AE achieves about 93%, which is higher than the four supervised methods.

The single detector “RF(Pro)” which is part of our method, has a similar performance to our method. The RF classifier is significantly enhanced after using probability to make the decision. From the point of recall, the value of “RF(Pro)” is higher than RF with about 8% on BoT-IoT dataset and about 20% on IDS2018 dataset. As we stated before, we aim to reduce the false positive rate by combining the AE and RF, and the FPR of both datasets in the tables is lower than a single one. For example, in the IDS2018 dataset, our hybrid method has the lowest FPR of 1.81% compared to the “RF(PRO)” or AE. In addition, the F1 of our method is higher than these two basic methods.

After analyzing the average detection performance on different attacks, it is reasonable to investigate the performance of various classifiers when dealing with different unknown attacks. For simplification, we report the methods related to RF and AE for the IDS2018 dataset. We plot the F1 and FPR only in Figure 8 and Figure 9, with the consideration that F1 is the harmonic mean of recall and precision. Additionally, the FPR can prove the improvement of our method.

In Figure 8, there are six different attacks. The probability-based RF performs better than the classical RF method in most cases. Furthermore, our method presents the highest F1. The results shown in Figure 9, our combination method reduces the FPR significantly.

In our model, there are two hyperparameters that have significant effects:

T_{1}

and

T_{2}

. Because there are no unknown attacks in the training or validation set, we need to set them manually. To validate its effectiveness, we vary these two hyperparameters with some representative values. In detail, the

T_{1}

is selected from

{80, 85, 90, 95, 99}

and

T_{2}

is from

{65, 70, 75, 80, 85}

. We plot the comparisons in Figure 10. As before, we report the F1 and FPR for the IDS2018 dataset only.

In this section, we restate the effects of two parameters.

T_{1}

is the cut position for probability output by RF to decide whether a sample is attack. Furthermore,

T_{2}

is a value used for the MSE to determine whether a sample may be misclassified as an attack. It is hard to decide the optimal value of these two hyperparameters, because there no existing unknown attack samples in the training set or validation set.

At first, we find that the FPR in Figure 10b decreases with increases in

T_{1}

and

T_{2}

. However, F1 presents a different pattern with the change in threshold. When the

T_{1}

achieves the highest value, i.e., the 99th percentile, the F1 presents the lowest value. This is because with higher

T_{1}

, more and more attack samples are missed, although the FPR is lower. The F1 does not change too much, as with the varied

T_{2}

; however, FPR is lower.

In our experiment, we set

T_{1}

as the 90th percentile of the probabilities of normal data in the validation dataset, and

T_{2}

is the 75th percentile of the MSEs of normal data in the validation. We use the red circle to highlight the results of our selected value. We can see that the highest F1 is achieved at

T_{1}

= 85th and

T_{2}

= 75th, which is only higher than ours by 0.2%. The results prove that the selected values present satisfactory results.

5. Conclusions

With the increasing risks of network attacks, network environments need more powerful IDSs to protect them. As more and more attacks appear every day, it is important to handle the issues presented by unknown attacks. In this study, we develop a hybrid IDS to boost the detection rate when dealing with unknown attacks.

In detail, the proposed method combines RF and AE. Because the unknown attacks may be misclassified, we use the probability output of the RF classifier to check the samples first. Then, the AE is utilized to recheck the attack predicted by RF and reduce the FPR. We conducted experiments on two intrusion detection datasets while setting some attack samples explicitly as unknown. The experimental results prove that the combination method boosts the detection rate and reduces the FPR in comparison to the single detection methods.

Some directions are worth further investigation. Only one type of attack was set as the unknown during the experiments; it is important to set more than one type of attack as the unknown to test the model. In this study, we focus on binary classification. We plan to expand the method into a multi-class approach to provide more diagnostic information for security operators in the future.

Author Contributions

Conceptualization, C.W. and Y.S.; data curation, C.W.; formal analysis, W.W. and H.L.; funding acquisition, B.W.; investigation, C.W. and H.L.; methodology, C.W.; project administration, B.W.; software, C.W. and Y.S.; supervision, H.L. and B.W.; validation, Y.S. and W.W.; visualization, C.W.; writing—original draft, C.W.; writing—review and editing, B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the National Key Research and Development Program of China (No.2021YFB2012400).

Data Availability Statement

In this study, we use the intrusion detection dataset reported in [35]. Readers can refer to the corresponding paper for detail information.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ahmad, Z.; Shahid Khan, A.; Wai Shiang, C.; Abdullah, J.; Ahmad, F. Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 2021, 32, 1–29. [Google Scholar] [CrossRef]
Anderson, J.P. Computer Security Threat Monitoring and Surveillance; Technical Report; James P. Anderson Company: Philadelphia, PA, USA, 1980. [Google Scholar]
Vanin, P.; Newe, T.; Dhirani, L.L.; O’Connell, E.; O’Shea, D.; Lee, B.; Rao, M. A Study of Network Intrusion Detection Systems Using Artificial Intelligence/Machine Learning. Appl. Sci. 2022, 12, 11752. [Google Scholar] [CrossRef]
Liu, H.; Lang, B. Machine learning and deep learning methods for intrusion detection systems: A survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef] [Green Version]
Adnan, A.; Muhammed, A.; Abd Ghani, A.A.; Abdullah, A.; Hakim, F. An Intrusion Detection System for the Internet of Things Based on Machine Learning: Review and Challenges. Symmetry 2021, 13, 1011. [Google Scholar] [CrossRef]
Aldallal, A.; Alisa, F. Effective Intrusion Detection System to Secure Data in Cloud Using Machine Learning. Symmetry 2021, 13, 2306. [Google Scholar] [CrossRef]
Aldallal, A. Toward Efficient Intrusion Detection System Using Hybrid Deep Learning Approach. Symmetry 2022, 14, 1916. [Google Scholar] [CrossRef]
Ingre, B.; Yadav, A.; Soni, A.K. Decision Tree Based Intrusion Detection System for NSL-KDD Dataset. In Proceedings of the Information and Communication Technology for Intelligent Systems (ICTIS 2017); Satapathy, S.C., Joshi, A., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 2, pp. 207–218. [Google Scholar]
Balyan, A.K.; Ahuja, S.; Lilhore, U.K.; Sharma, S.K.; Manoharan, P.; Algarni, A.D.; Elmannai, H.; Raahemifar, K. A Hybrid Intrusion Detection Model Using EGA-PSO and Improved Random Forest Method. Sensors 2022, 22, 5986. [Google Scholar] [CrossRef]
Yang, Z.; Wang, B. A Feature Extraction Method for P2P Botnet Detection Using Graphic Symmetry Concept. Symmetry 2019, 11, 326. [Google Scholar] [CrossRef] [Green Version]
Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep Learning Approach for Intelligent Intrusion Detection System. IEEE Access 2019, 7, 41525–41550. [Google Scholar] [CrossRef]
Li, Z.; Qin, Z.; Huang, K.; Yang, X.; Ye, S. Intrusion Detection Using Convolutional Neural Networks for Representation Learning. In Proceedings of the Neural Information Processing; Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S.M., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 858–866. [Google Scholar]
Rudd, E.M.; Rozsa, A.; Günther, M.; Boult, T.E. A Survey of Stealth Malware Attacks, Mitigation Measures, and Steps Toward Autonomous Open World Solutions. IEEE Commun. Surv. Tutor. 2017, 19, 1145–1172. [Google Scholar] [CrossRef]
Song, Y.; Hyun, S.; Cheong, Y.G. Analysis of autoencoders for network intrusion detection†. Sensors 2021, 21, 4294. [Google Scholar] [CrossRef] [PubMed]
Magán-Carrión, R.; Urda, D.; Díaz-Cano, I.; Dorronsoro, B. Towards a reliable comparison and evaluation of network intrusion detection systems based on machine learning approaches. Appl. Sci. 2020, 10, 1775. [Google Scholar] [CrossRef] [Green Version]
Maseer, Z.K.; Yusof, R.; Bahaman, N.; Mostafa, S.A.; Foozy, C.F.M. Benchmarking of Machine Learning for Anomaly Based Intrusion Detection Systems in the CICIDS2017 Dataset. IEEE Access 2021, 9, 22351–22370. [Google Scholar] [CrossRef]
Resende, P.A.A.; Drummond, A.C. A survey of random forest based methods for intrusion detection systems. ACM Comput. Surv. 2018, 51, 1–36. [Google Scholar] [CrossRef]
Di Mauro, M.; Galatro, G.; Fortino, G.; Liotta, A. Supervised feature selection techniques in network intrusion detection: A critical review. Eng. Appl. Artif. Intell. 2021, 101, 104216. [Google Scholar] [CrossRef]
Abdulhammed, R.; Musafer, H.; Alessa, A.; Faezipour, M.; Abuzneid, A. Features dimensionality reduction approaches for machine learning based network intrusion detection. Electronics 2019, 8, 322. [Google Scholar] [CrossRef] [Green Version]
Seo, J.H.; Kim, Y.H. Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection. Comput. Intell. Neurosci. 2018, 2018, 9704672. [Google Scholar] [CrossRef]
Zhou, Y.; Cheng, G.; Jiang, S.; Dai, M. Building an efficient intrusion detection system based on feature selection and ensemble classifier. Comput. Netw. 2020, 174, 107247. [Google Scholar] [CrossRef] [Green Version]
Chang, Y.; Li, W.; Yang, Z. Network intrusion detection based on random forest and support vector machine. In Proceedings of the 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), Guangzhou, China, 21–24 July 2017; Volume 1, pp. 635–638. [Google Scholar] [CrossRef]
Narayana Rao, K.; Venkata Rao, K.; P.V.G.D., P.R. A hybrid Intrusion Detection System based on Sparse autoencoder and Deep Neural Network. Comput. Commun. 2021, 180, 77–88. [Google Scholar] [CrossRef]
Cao, V.L.; Nicolau, M.; McDermott, J. Learning Neural Representations for Network Anomaly Detection. IEEE Trans. Cybern. 2019, 49, 3074–3087. [Google Scholar] [CrossRef]
Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef]
Mahfouz, A.M.; Abuhussein, A.; Venugopal, D.; Shiva, S.G. Network Intrusion Detection Model Using One-Class Support Vector Machine. In Proceedings of the Advances in Machine Learning and Computational Intelligence; Patnaik, S., Yang, X.S., Sethi, I.K., Eds.; Springer: Singapore, 2021; pp. 79–86. [Google Scholar]
Javed, M.A.; Khan, M.Z.; Zafar, U.; Siddiqui, M.F.; Badar, R.; Lee, B.M.; Ahmad, F. ODPV: An Efficient Protocol to Mitigate Data Integrity Attacks in Intelligent Transport Systems. IEEE Access 2020, 8, 114733–114740. [Google Scholar] [CrossRef]
Al-Qatf, M.; Lasheng, Y.; Al-Habib, M.; Al-Sabahi, K. Deep Learning Approach Combining Sparse Autoencoder with SVM for Network Intrusion Detection. IEEE Access 2018, 6, 52843–52856. [Google Scholar] [CrossRef]
Kunang, Y.N.; Nurmaini, S.; Stiawan, D.; Zarkasi, A.; Jasmir, F. Automatic Features Extraction Using Autoencoder in Intrusion Detection System. In Proceedings of the 2018 International Conference on Electrical Engineering and Computer Science (ICECOS), Pangkal, Indonesia, 2–4 October 2018; Volume 17, pp. 219–224. [Google Scholar] [CrossRef]
Lin, T.H.; Jiang, J.R. Credit card fraud detection with autoencoder and probabilistic random forest. Mathematics 2021, 9, 2683. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2022. [Google Scholar]
Sarhan, M.; Layeghy, S.; Portmann, M. Towards a Standard Feature Set for Network Intrusion Detection System Datasets. Mob. Networks Appl. 2021, 27, 357–370. [Google Scholar] [CrossRef]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy, Funchal, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar] [CrossRef]
Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef] [Green Version]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1026–1034. [Google Scholar]

Figure 1. Illustration of random forest.

Figure 2. The framework of using AE to detect anomalies.

Figure 3. The overview of our proposed method. (a) Training process; (b) testing process.

Figure 4. Confusion matrix of detection results.

Figure 5. The probability of being predicted as attacks for testing samples of IDS2018 dataset when DoS is unknown attack.

Figure 6. The MSE of testing samples for IDS2018 dataset.

Figure 7. The detection confusion matrix (in %) of different classifiers on IDS2018 dataset when the unknown attack is DoS. (a) Classification confusion matrix of RF. (b) Classification confusion matrix of RF(Pro). (c) Classification confusion matrix of AE. (d) Classification confusion matrix of combination methods.

Figure 8. F1 (in %) of different detection methods against various unknown attacks.

Figure 9. FPR (in %) of different detection methods against various unknown attacks.

Figure 10. Performance (in %) comparisons of various threshold value on IDS2018 dataset methods combining with five threshold method construct 25 results. We use the red circle to highlight our results. (a) F1, higher is better. (b) FPR, lower is better.

Table 1. The sample distribution of different attack types for both datasets.

IDS2018			BoT-IoT
No.	Class	Number of Samples	No.	Class	Number of Samples
1	Normal	120,000	1	Normal	65,150
2	DDoS	68,000	2	DoS	42,332
3	DoS	48,000	3	DDoS	21,133
4	Bot	14,000	4	Reconnaissance	13,000
5	Bruteforce	12,000	5	Theft	2133
6	Infiltration	11,000	-	-	-
7	Web	3502	-	-	-

Table 2. The detection performance (in %) of different classifiers for the BoT-IoT dataset. The table reports the five metrics mentioned above. Both the mean value and standard deviation are reported.

Method	Accuracy	Precision	Recall	F1	FPR
DT	$90.97 \pm 13.77$	$99.99 \pm 0.02$	$80.04 \pm 28.14$	$85.86 \pm 20.96$	$0.01 \pm 0.01$
LG	$90.54 \pm 13.86$	$100.00 \pm 0.01$	$78.98 \pm 28.32$	$85.14 \pm 21.15$	$0.00 \pm 0.00$
RF	$94.14 \pm 8.52$	$99.99 \pm 0.01$	$91.11 \pm 12.71$	$94.85 \pm 7.54$	$0.02 \pm 0.01$
DNN	$91.46 \pm 13.62$	$100.00 \pm 0.00$	$81.13 \pm 27.85$	$86.58 \pm 20.84$	$0.00 \pm 0.00$
IF	$65.81 \pm 14.55$	$53.50 \pm 22.80$	$89.33 \pm 8.13$	$64.89 \pm 20.96$	$50.47 \pm 11.42$
OCSVM	$90.76 \pm 5.66$	$95.50 \pm 8.43$	$91.38 \pm 4.99$	$93.10 \pm 4.80$	$7.74 \pm 12.44$
AE	$91.33 \pm 7.10$	$97.47 \pm 2.05$	$89.19 \pm 10.86$	$92.79 \pm 6.38$	$4.94 \pm 3.63$
RF(Pro)	$98.79 \pm 0.66$	$98.23 \pm 1.14$	$99.95 \pm 0.07$	$99.08 \pm 0.58$	$3.77 \pm 1.86$
Ours	$99.63 \pm 0.11$	$99.50 \pm 0.20$	$99.95 \pm 0.07$	$99.72 \pm 0.11$	$1.06 \pm 0.16$

Bold font indicates best results. The following table are the same.

Table 3. Detection performance (in %) of IDS2018 dataset.

Method	Accuracy	Precision	Recall	F1	FPR
DT	$78.81 \pm 12.79$	$100.00 \pm 0.01$	$36.56 \pm 22.38$	$49.97 \pm 22.85$	$0.00 \pm 0.00$
LG	$65.76 \pm 15.72$	$99.99 \pm 0.03$	$21.86 \pm 17.51$	$33.12 \pm 20.18$	$0.00 \pm 0.00$
RF	$79.41 \pm 14.31$	$98.20 \pm 1.01$	$72.23 \pm 18.85$	$81.72 \pm 14.48$	$2.82 \pm 1.27$
DNN	$61.80 \pm 17.40$	$99.99 \pm 0.02$	$18.98 \pm 15.99$	$29.48 \pm 18.95$	$0.00 \pm 0.00$
IF	$72.19 \pm 6.97$	$63.44 \pm 11.11$	$92.85 \pm 2.58$	$74.97 \pm 8.59$	$46.36 \pm 7.74$
OCSVM	$88.85 \pm 7.61$	$88.77 \pm 11.09$	$93.99 \pm 2.09$	$91.01 \pm 6.92$	$18.16 \pm 12.93$
AE	$91.15 \pm 9.07$	$95.97 \pm 2.95$	$91.29 \pm 11.65$	$93.05 \pm 8.13$	$7.92 \pm 4.67$
RF(Pro)	$93.43 \pm 4.22$	$96.28 \pm 1.85$	$93.89 \pm 7.65$	$94.86 \pm 3.87$	$8.31 \pm 3.16$
Ours	$94.92 \pm 4.79$	$99.09 \pm 0.42$	$93.19 \pm 7.47$	$95.90 \pm 4.27$	$1.81 \pm 0.50$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, C.; Sun, Y.; Wang, W.; Liu, H.; Wang, B. Hybrid Intrusion Detection System Based on Combination of Random Forest and Autoencoder. Symmetry 2023, 15, 568. https://doi.org/10.3390/sym15030568

AMA Style

Wang C, Sun Y, Wang W, Liu H, Wang B. Hybrid Intrusion Detection System Based on Combination of Random Forest and Autoencoder. Symmetry. 2023; 15(3):568. https://doi.org/10.3390/sym15030568

Chicago/Turabian Style

Wang, Chao, Yunxiao Sun, Wenting Wang, Hongri Liu, and Bailing Wang. 2023. "Hybrid Intrusion Detection System Based on Combination of Random Forest and Autoencoder" Symmetry 15, no. 3: 568. https://doi.org/10.3390/sym15030568

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Intrusion Detection System Based on Combination of Random Forest and Autoencoder

Abstract

1. Introduction

2. Related Work

3. Proposed Methods

3.1. Random Forest

3.2. Autoencoder

3.3. Whole Detection Framework

4. Experimental Results

4.1. Dataset

4.2. Comparative Methods

4.3. Evaluation Metric

4.4. Experimental Settings

4.5. Results Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI