Open Set Recognition for Malware Traffic via Predictive Uncertainty

Li, Xue; Fei, Jinlong; Xie, Jiangtao; Li, Ding; Jiang, Heng; Wang, Ruonan; Qi, Zan

doi:10.3390/electronics12020323

Open AccessArticle

Open Set Recognition for Malware Traffic via Predictive Uncertainty

by

Xue Li

¹,

Jinlong Fei

^1,*,

Jiangtao Xie

¹,

Ding Li

¹,

Heng Jiang

²,

Ruonan Wang

¹ and

Zan Qi

¹

State Key Laboratory of Mathematical Engineering and Advanced Computing, PLA Information Engineering University, Zhengzhou 450001, China

²

National Digital Switching System Engineering Technological Research Center, PLA Information Engineering University, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(2), 323; https://doi.org/10.3390/electronics12020323

Submission received: 20 December 2022 / Revised: 2 January 2023 / Accepted: 3 January 2023 / Published: 8 January 2023

(This article belongs to the Special Issue Recent Advances in AI-Enabled Internet of Things Security and Privacy)

Download

Browse Figures

Versions Notes

Abstract

:

Existing machine learning-based malware traffic recognition techniques can effectively detect abnormal behaviors in the network. However, almost all of them focus on a closed-set scenario in which the data used for training and testing come from the same label space. Since sophisticated malware and advanced persistent threats are evolving, it is impossible to exhaust all attacks to train a complete recognition model under the existing technical conditions. Therefore, recognition in the real network is an open-set problem, i.e., the recognition system should identify unknown and unseen attacks at test time. In this paper, we propose an uncertainty-aware method to identify known malicious traffic accurately and handle unknown traffic effectively. This method employs predictive uncertainty in deep learning as an indicator for unknown class detection. The predictive uncertainty represents the confidence in neural network predictions. In particular, the Deep Evidence Malware Traffic Recognition (DEMTR) model is presented to provide the multi-classification probability and predictive uncertainty in open-set scenarios using evidential deep learning. We demonstrate the performance of DEMTR on the MCFP dataset. Experimental results indicate that the proposed model outperforms the baseline methods in accuracy and F₁-score.

Keywords:

machine learning; malware traffic recognition; open-set; predictive uncertainty

1. Introduction

Along with the spectacular improvement in and comprehensive application of big data, the Internet of Things (IoT), and cloud computing, the network has become pervasive in people’s daily lives. Correspondingly, network attacks have become increasingly frequent, and the network faces numerous attacks and threats [1]. The malicious traffic generated by network attacks is among the main network security threats and the key objective of network security monitoring. The application of Artificial Intelligence in malicious traffic recognition can fully utilize large amounts of traffic and numbers of logs in cyberspace and give play to the advantages of mining the characteristics and associations of massive data. At present, AI has been applied in many industrial and commercial products, such as Cisco Firepower NGIPS and Nsfocus NGIPS.

Malware traffic recognition aims to classify network traffic containing malicious behaviors into some predefined traffic classes (closed sets) [2]. The existing malware traffic recognition methods use supervised or unsupervised learning to identify network attacks. Supervised learning trains data-driven classifiers on traffic samples of known classes. As a result, it achieves satisfactory results on closed sets, but it does not take into account samples outside the training set [3]. Once a strange sample is submitted to the classifier, it may be misclassified as a predefined class, resulting in a high false alarm rate. Unsupervised learning methods, such as clustering, achieve the goal of traffic classification by gathering unlabeled samples from the same class in the feature space [4]. Therefore, unsupervised learning methods can naturally deal with unknown classes. However, their accuracy is not high enough when dealing with high-dimensional traffic data and their use in practical projects is limited. In addition to more flexible attack means, the number of malware categories is growing rapidly as new polymorphic malware and zero-day threats continuously emerge. Consequently, many malicious attacks remain undiscovered, and the unknown traffic generated by them brings potential threats to network management [5], which becomes the main obstacle to improving the performance of malicious traffic identification systems.

In practice, the classifiers in use will inevitably receive data from categories it has never seen before. Malware traffic recognition in the real world is essentially an open-set recognition problem in which the classifier should accurately identify known malicious traffic and distinguish unknown traffic when it appears. The core of open-set recognition is the ability to distinguish open-set data outside of K closed-set classes, which is more challenging than closed-set identification and more significant for security-related applications [6]. The difficulty involved in this process is how to model unknown classes without any unknown class instances. Existing efforts on open-set malware traffic recognition are quite limited, with few exceptions [7,8,9]. They mainly use a threshold-based unknown class detection scheme, which depends on how to train a classifier and perform unknown class discovery on this basis. Some studies have proposed using the maximum softmax probability value as an indicator for unknown class detection [9]. Here, the threshold is determined as the lower bound of the maximum softmax value of known class samples. However, the softmax outputs are often falsely high because of the normalized property of the softmax function, which eventually leads to large numbers of unknown class samples being incorrectly classified as known classes. The Open-CNN method [8] uses the distance between test instances and known class instances in the latent feature space and takes the upper bound of distance as the threshold. However, the distance function learned by the classifier from the training dataset cannot measure the test dataset correctly. Besides, it cannot play a full role in identifying unknown classes. These defects bring about performance degradation in open set recognition.

This paper proposes an uncertainty-aware open-set recognition method for malware traffic to accomplish the malicious traffic open-set identification task, which uses the predictive uncertainty in deep learning to discover unknown classes. Furthermore, the Deep Evidence Malware Traffic Recognition (DEMTR) model is presented by combining convolution neural networks with evidence theory. The major drawbacks of existing methods include a falsely high maximum softmax probability value and a weak generalization ability for distance metrics. To deal with these issues so that the model can better discover the unknown class data in the open-set malicious traffic identification task, we transform the original problem into an uncertainty estimation problem. DEMTR simultaneously accomplishes the multi-class classification and uncertainty estimation tasks using a deep neural network (DNN) to predict a Dirichlet distribution of class probabilities. The prediction procedure of DEMTR is described as evidence collection, which affords a foundation for quantifying the predictive uncertainty of diverse malware. In inference, known attacks will incur low uncertainty, while unknown attacks will incur high uncertainty so that the model can identify the unknown class. The main contributions of this paper are as follows:

The uncertainty quantification in deep neural networks is applied to the open-set malicious traffic recognition, which is a solution to the traditional closed-set methods’ inability to identify unknown attacks effectively.
A new model utilizing evidential deep learning is proposed that can quantify multi-class classification probability and predictive uncertainty. Also, the predictive uncertainty is calibrated with a new loss function. The proposed method can identify unknown classes to a certain extent.
Experiments are carried out on real datasets to test and verify the validity of DEMTR’s model. The proposed model significantly improved accuracy, F1-score, and other indicators compared with the existing methods.

2. Related Work

The open set recognition (OSR) problem was first discovered in the field of face recognition and then found ubiquitous in many fields. To reject unknown classes, the “1-vs-set SVM” method [10] was proposed by adding an additional hyperplane for each class to restrict the decision space of the class. On this basis, the Weibull calibrated SVM (W-SVM) [11] and PI-SVM [12] were successively proposed to calibrate the class confidence scores using statistical extreme value theory. To overcome the shortcomings of the softmax function in deep neural networks (DNNs), Bendale et al. [13] proposed OpenMax to constrain the open space risk of DNN models, and it is the first solution to apply deep learning to open-set recognition problems. To further enhance the unknown class detective ability of OpenMax, G-OpenMax [14] introduces a generative method to synthesize unknown class instances from known class data and train DNNs with synthetic data. Similarly, Generative Adversarial Networks (GANs) are used to simulate representative unknown class instances [15]. However, generative methods are subject to the authenticity and credibility of the generated data, and the classification results obtained by training with generated data are poor.

For highly structured traffic samples, detecting unknown classes is more complex, leading to more challenges for open-set recognition of malicious traffic. Zhang et al. [4] integrated supervised and unsupervised machine learning (ML) techniques to obtain additional confidence scores for determining whether test samples belonged to the unknown class. Bekerman et al. [16] proposed an end-to-end monitoring system named RTC to identify new threats by manually extracting features from different protocols and network layer traffic data. In another effort, Cruz et al. applied W-SVM to open-set intrusion identification and Weibull distribution to fit samples of the decision boundary to limit the open space risk [7]. However, ML relies on manually extracted features, which expend a great deal of labor power and material resources. As DL has a strong ability for feature extraction and automatic learning, the research on unknown attack detection is shifting from ML to DL. Javaid et al. [17] used Sparse Autoencoders to implement unsupervised feature learning, which was validated on the NSL-KDD dataset to detect unpredictable attacks. However, Sparse Autoencoders highly rely on training data, resulting in limited generalization ability. Inspired by RTC, the SEEN [18] approach employs siamese networks to obtain high-dimensional embedding representations of the samples. This approach sets a critical value between the known and unknown classes based on the distance between the samples. Employing the extreme value theory, Yong et al. [8] proposed an Open-CNN model to detect unknown network attacks. An Open-CNN calculates distances between activation vectors of each known class sample and average activation vectors of the class, fits larger distances to obtain an extreme value distribution model, and reassigns the activation vector to explain the unknown class. Nevertheless, the cross-entropy loss used by the model cannot directly motivate the class instances to be projected near the average activation vector. As a result, it leads to overlapping areas in the decision space and further reduces the accuracy of closed sets significantly. Besides, the distance function derived from training datasets is probably not the right metric for test sets, resulting in little effect on unknown class recognition.

Uncertainty estimation in deep learning aims to estimate the uncertainty of a prediction (the predictive uncertainty), which is important for safe decision-making in high-risk fields [19]. The most common way is based on separately modeling the uncertainty induced by models and the data. Inadequate knowledge leads to model uncertainty, which is an attribute of models and can be mitigated, while data uncertainty is irreducible because it is an inherent property of data distribution. Recently, deep learning uncertainty estimation has been used for out-of-distribution detection. To this end, Bayesian neural networks (BNNs) have been used to quantify predictive uncertainty [20] as a technique to improve the detection rate of out-of-distribution samples. The results show that the uncertainty has the potential to be used for out-of-distribution sample discovery. Similar to out-of-distribution detection, open set recognition also needs to find samples with semantic deviation. Inspired by out-of-distribution detection, this paper attempted to use deep learning uncertainty to identify malicious traffic in the open set setting. However, BNNs are limited by difficult exact posterior inference and complex sampling operations during uncertainty quantification. To solve this dilemma, the present study adopts evidential deep learning [21] instead of BNNs to help build uncertainty-aware deep learning models, and the uncertainty representation is learned directly without sampling.

3. Method

In this section, we design a malware traffic detection model orienting open environments named DEMTR. The DEMTR learns distinguishable features for malicious traffic recognition by training and models the uncertainty of the prediction to reject unknown samples. Traffic with high uncertainty is considered unknown, while traffic with low uncertainty is classified based on the learned classification probability. The overall framework of the proposed method is shown in Figure 1.

3.1. Problem Definition

Given the training dataset

D_{t r} = {(x_{i}, y_{i})}_{i = 1}^{N}

, where

x_{i} \in ℝ, y_{i} \in Y_{t r} = {1, 2, \dots, k}

,

x_{i}

is a traffic session,

y_{i}

is the label of

x_{i}

, and

N

represents the total sessions. The testing dataset

D_{t e} = {(x_{i}, y_{i})}_{i = 1}^{\infty}

,

y_{i} \in Y_{t e} = {1, 2, \dots, k, \dots, K}

,

K > k

,

D_{t e}

is open, which includes attack categories that are not present in the training set. Our method aims to get a model

M : x \to y, x \in D_{t e},

y \in Y_{o s} = {1, 2, \dots, k, u n k n o w n}

, where an instance x labeled as unknown is categorized as a new class that did not emerge during the training phase.

3.2. Data Preprocessing

In this section, the composition of the traffic data used for model training is described in detail. This paper used the raw network traffic packets for network attack detection. Unlike the commonly used manual traffic packet feature extraction methods, this method does not need to design or filter the traffic features to be extracted and can retain all packet information. The Original Flow Data Extraction is shown in Algorithm 1. The data preprocessing includes three key steps, with detailed descriptions presented as follows:

Session splitting: The original traffic files are divided into sessions based on five-tuple information.

Packet processing: This step removes useless and interfering information from the data packets. It first removes the Ethernet layer. Since the three fields in the Ethernet layer have little effect on the gain of traffic classification [22], the data in the Ethernet layer are not used in this paper. Then IP addresses are anonymized. To avoid the model treating various IP addresses in the network layer as a critical factor for attack identification, both the source and destination IP address in the network layer header should be set to 0.0.0.0 in the feature extraction process. Finally, the UDP packet header is filled. As the packet headers of TCP and UDP have unequal lengths, considering the uniformity of feature structure, the UDP packet header is filled with 0x00 of 12 bytes to make its length 20 bytes.

Algorithm 1. Original Flow Data Extraction
Input: $D = {x_{i}}_{i = 1}^{n}$ : network traffic pcap files, $n_{1}$ : packet_number, $n_{2}$ : byte_length Output: Original flow feature set $Y$
1:	$S \leftarrow \emptyset$ ; $Y \leftarrow \emptyset$ // Initialize traffic session set $S$ and flow feature set $Y$
2:	for $x_{i} \in D$ do
3:	Extract packets $P_{i} = {p_{j}}_{j = 1}^{m}$ and the five-tuple information $A_{i} = {q_{j}}_{j = 1}^{m}$ from traffic packages
4:	do
5:	Gather packets with the same five-tuple information in $P_{i}$ into $s = {p_{k}}_{k = 1}^{a}$
6:	Add $s$ to $S$
7:	until all packets in $P_{i}$ are selected
8:	end for
9:	for $s \in S$ do
10:	$i \leftarrow 0$ ; $f l o w_f e a t u r e \leftarrow \emptyset$
11:	for $p_{k} \in s$ do
12:	$p_{k} \leftarrow m o v e_I n t e r n e t_l a y e r (p_{k})$ // Discard the Ethernet layer of packets
13:	$p_{k} \leftarrow u d p_p a d d i n g (p_{k})$ // Pad UDP
14:	if $L e n g t h (p_{k}) > n_{2}$ then
15:	$p k t_f e a t u r e \leftarrow p_{k} [0 : n_{2}]$ // Intercept the first $n_{2}$ bytes of the packets
16:	else
17:	$p k t_f e a t u r e \leftarrow z e r o_p a d d i n g (p_{k}, n_{2})$ // Fill the packets to $n_{2}$
18:	end if
19:	Add $p k t_f e a t u r e$ to $f l o w_f e a t u r e$ ; $i = i + 1$
20:	if $i \geq n_{1}$ then break
21:	end for
22:	if $i < n_{1}$ then
23:	$f l o w_f e a t u r e \leftarrow z e r o_p a d d i n g (f l o w_f e a t u r e, n_{1} * n_{2})$ // Fill the flow feature to $n_{1} * n_{2}$
24:	end if
25:	Add $f l o w_f e a t u r e$ to $Y$ //add the flow feature vector to the session feature vector
26:	end for
27:	return Y

Feature vectorization: Since each session contains a different number of packets and each packet contains different lengths of bytes, we extract the first $n_{1}$ packets of each session and the first $n_{2}$ bytes of each packet to ensure that the data input to the model has the same dimension. Therefore, the final dimension of the session feature is $n_{1} \times n_{2}$ . If a session contains fewer than $n_{1}$ packets, it is padded with 0; otherwise, only the first $n_{1}$ packets are retained. A similar operation is taken for the bytes of a packet. To achieve optimal malware detection performance, the appropriate hyper-parameters are settled by comparing specific values of packet number and byte length, as detailed in Section 4.3.1.

3.3. DEMTR Model

The softmax function has long been in common usage in existing deep learning models, and the maximum softmax output is usually served as the credibility of the predictions. However, softmax output tends to be too “confident” in model predictions, even for wrong predictions [23]. To overcome the limitations of DNNs based on the softmax function, this paper uses evidence-based uncertainty estimation techniques to formalize multi-class classification and uncertainty modeling jointly. By placing a Dirichlet distribution on the class probabilities, we treat predictions of a neural net as subjective opinions and learn the function that collects the evidence leading to these opinions using a deterministic neural net from data. The structure of the DEMTR model is shown in Figure 2. As can be seen, DEMTR is split into two logical parts, namely, evidence generation and result derivation.

Evidence refers to the support collected from data and facilitates a sample to be classified into a certain class. In this study, a one-dimensional convolutional neural network (1D-CNN) is used to generate an evidence vector because neural networks can capture evidence from input data to induce classification opinions. Besides, a CNN is more suitable for processing data with higher feature dimensions (e.g., images, texts, and encrypted traffic) compared with other deep learning models because it has the characteristics of parameter sharing and sparse connection and is good at extracting the data’s local features [24]. Different from a 2D-CNN, a 1D-CNN does not need to convert inputs into two-dimensionality and can retain maximum information of original data, which is conducive to the classification of encrypted traffic. As shown in Figure 2, the evidence generation part includes two convolutional layers, a pooling layer, and two fully connected layers. The convolutional layer is targeted at extracting distinguishable characteristics from the input and dividing the global feature information into multiple local feature matrices, while the pooling layer performs dimension reduction and feature compression. In this study, we adopted maximum pooling (i.e., the maximum value of a certain local data is selected as the representative of the local data). Then, two fully connected layers are intended to map the latent space calculated by previous layers to label space and alleviate the impact of feature location on the classification results by integrating multidimensional feature vectors into several values. Vectorized session data is passed through the convolutional, pooling, and fully connected layers in sequence and then transformed into evidence. In particular, given a sample

x^{(i)}

for K-class classification, the corresponding evidence

e^{(i)}

is denoted as:

e^{(i)} = g (f (x^{(i)}; θ))

(1)

where

f (\cdot)

with parameters

θ

is learned by neural networks and

g (\cdot)

is an evidence function that keeps the evidence

e^{(i)}

non-negative. The evidence function can be implemented by the activation function (i.e., RELU and Sigmoid) to ensure that the network outputs a non-negative evidence vector

e^{(i)}

.

In the result derivation part, evidential deep learning is applied to quantify the classification uncertainty. This type of learning can model both the classification probability and the overall uncertainty. Subjective logic [25] treats the multi-classification problem as a belief mass assignment problem, assuming that the overall belief mass is constant. For the K classification problem, the belief mass is divided into K + 1 shares, which represent the belief mass of each class and the confidence in the current prediction, respectively. Each share is non-negative, and the sum of these K + 1 values is 1:

u + \sum_{k = 1}^{K} b_{k} = 1

(2)

where

u

denotes the overall uncertainty and

u \geq 0

,

b_{k}

is the belief mass of class

k

and

b_{k} \geq 0

.

Subjective Logic theory converts learned evidence

e^{(i)}

into concentration parameters of a Dirichlet distribution through

α^{(i)} = e^{(i)} + 1

. The Dirichlet distribution is regarded as a conjugate prior for the category distribution so that the DNN can present uncertainty while outputting the prediction results. The resultant predictor for a multi-class classification problem is another Dirichlet distribution whose parameters are set by the continuous output of the DNN. In the result derivation step, the concentration parameters of the Dirichlet distribution need to be determined. These parameters have a direct bearing on the uncertainty of prediction results. For a sample

x^{(i)}

with an evidence vector

e^{(i)} = [e_{1}^{(i)}, \dots, e_{K}^{(i)}]

, the Dirichlet distribution

D i r (p^{(i)} | α^{(i)})

is derived with parameters

α^{(i)} \in ℝ^{K}

,

α^{(i)} = [α_{1}^{(i)}, \dots, α_{K}^{(i)}]

. Then, the belief mass

b_{k}

and uncertainty

u

are calculated as follows:

b_{k}^{(i)} = \frac{e_{k}^{(i)}}{S^{(i)}} = \frac{α_{k}^{(i)} - 1}{S^{(i)}}, u^{(i)} = \frac{K}{S^{(i)}}

(3)

where

S^{(i)}

is the total strength of the Dirichlet distribution, expressed by

S^{(i)} = \sum_{k = 1}^{K} α_{k}^{(i)}

. From Equation (3), it can be inferred that the larger the amount of evidence obtained for a certain class, the higher its belief mass. In contrast, the uncertainty is inversely proportional to the total amount of observed evidence, such that the smaller the total amount of evidence, the greater the uncertainty.

A standard neural network classifier delivers a definite probability assignment of the possible classes to which a given sample belongs. However, the Dirichlet distribution parameterized on evidence denotes the density of each such probability assignment. Thus, it models second-order probability and uncertainty. The expectation probability that

x^{(i)}

is classified as the kth class equals the mean of the corresponding Dirichlet distribution, which is calculated as:

p_{k} = \frac{α_{k}}{S}

(4)

For clarity, the above formulas are further elaborated, taking the triple classification task as an example. Assuming evidence

e = 〈 30, 0, 0 〉

, the Dirichlet concentration parameter

α = 〈 31, 1, 1 〉

can be obtained, then its class probability

p = 〈 0.94, 0.03, 0.03 〉

and uncertainty

u = 0.09

are calculated to ensure whether sufficient evidence is observed to obtain a confident prediction. On the contrary, given

e = 〈 0.01, 0.01, 0.01 〉

, the Dirichlet concentration parameter is

α = 〈 1.01, 1.01, 1.01 〉

so that the uncertainty

u

is about 1. The evidence is highly insufficient, leading to a doubtful classification result. When

e = 〈 1, 1, 1 〉

, there is still a high uncertainty, although the uncertainty is reduced compared with the second case.

3.4. Training and Optimization

This section focuses on how to train a neural network to obtain classification evidence for each sample. The evidence is used to calculate the corresponding classification probability and the overall uncertainty. When a feature of the sample is associated with one of the K classes, the corresponding evidence is added, and the Dirichlet distribution is updated based on this finding. In this respect, specific patterns in network traffic samples may help classify them into a particular class, further illustrated by the example of remote-control malware njRAT. The traffic generated by the remote-control malware njRAT typically has characteristics such as more upstream traffic than downstream traffic and an increased proportion of packets with a PSH flag and SYN flag. If the network traffic has these characteristics, it is necessary to increase the Dirichlet concentration parameter corresponding to the njRAT class.

Traditional neural network classifiers typically use cross-entropy loss to guide the model in the right direction for training. The cross-entropy loss is expressed as:

L_{c e}^{(i)} (y^{(i)}, p^{(i)}; θ) = - \sum_{k = 1}^{K} y_{k}^{(i)} \log (p_{k}^{(i)})

(5)

where

p_{k}^{(i)}

is the predictive probability that

x^{(i)}

belongs to the kth class. As for the proposed DEMTR model, for

x^{(i)}

, given the evidence

e^{(i)}

output by the neural network, the parameter

α^{(i)}

of the Dirichlet distribution

D i r (p^{(i)} | α^{(i)})

can be obtained. Adjusting the cross-entropy loss so that the model produces more evidence for the correct class for each sample, the modified loss function is abbreviated to the following form:

L_{m c e}^{(i)} (y^{(i)}, α^{(i)}; θ) = \int [- \sum_{k = 1}^{K} y_{k}^{(i)} \log (p_{k}^{(i)})] \frac{1}{B (α^{(i)})} \prod_{k = 1}^{K} p_{k}^{α_{k}^{(i)} - 1} d p^{(i)} = \sum_{k = 1}^{K} y_{k}^{(i)} (ψ (S^{(i)}) - ψ (α_{k}^{(i)}))

(6)

where

ψ (\cdot)

is the digamma function.

The DEMTR model trained with

L_{m c e}^{}

can give the classification probability and predictive uncertainty. However, since its uncertainty has not been calibrated, it may be unreliable for unknown recognition directly. A well-calibrated model should be certain when it accurately predicts and give high uncertainty when it may be inaccurate. Moreover, it has been shown that the miscalibration of neural networks is related to the over-fitting of the negative log-likelihood [26]. Since the DEMTR objective in Equation (6) is equivalent to minimizing the negative log-likelihood, the trained model is likely to be over-fitted with poor generalization for open-set malware traffic recognition tasks. To calibrate the DEMTR model, we will follow the principles of [27] to maximize the Accuracy versus Uncertainty (AvU) utility function.

AvU = \frac{n_{A C} + n_{I U}}{n_{A C} + n_{A U} + n_{I C} + n_{I U}}

(7)

where

n_{A C}, n_{A U}, n_{I C}, n_{I U}

denote the number of samples for the following four cases, namely accurate and certain (AC), accurate and uncertain (AU), inaccurate and certain (IC), and inaccurate and uncertain (IU), respectively. Figure 3 shows a toy example of the four possible model outputs. To calibrate the predictive uncertainty, the model is encouraged to learn a skewed and sharp Dirichlet distribution to get accurate predictions (see Figure 3a) and give an unbiased and flat Dirichlet distribution simplex for incorrect predictions (see Figure 3d). To this end, we propose regularizing the model training process by maximizing the expectations of AC and IU cases. We establish a logarithm constraint between the maximum class probability and uncertainty to maximize the AvU function, defining

L_{AvU}

as:

L_{AvU}^{(i)} = - \log (p_{m}^{(i)} (1 - u^{(i)}) + (1 - p_{m}^{(i)}) u^{(i)})

(8)

where

p_{m}^{(i)}

is the maximum class probability of

x^{(i)}

and

u^{(i)}

is the corresponding evidential uncertainty. The class probability

p_{k}^{(i)}

should converge to 1 when the model predictions are accurate; otherwise, 0. Similarly, when the model predictions are certain, the uncertainty

u^{(i)}

should converge to 0, but

u^{(i)} \approx 1

when uncertain.

L_{AvU}

is 0 only if all accurate predictions are certain and all inaccurate predictions are uncertain. The AvU loss function is designed to improve the uncertainty calibration as an additional penalty term in conjunction with existing loss functions. In summary, the objective of the DEMTR model is:

L_{D E M T R} = \sum_{i = 1}^{N} L_{m c e}^{(i)} + L_{AvU}^{(i)}

(9)

where N denotes the total number of samples in the training set.

4. Experiments and Analysis

4.1. Data Set and Experimental Environment

This study evaluates the proposed approach using the MCFP dataset [28]. MCFP consists of raw traffic data collected from real network environments and stores the data in the form of PCAP files. Besides, MCFP covers multiple types of malicious software with a great amount of data. In this study, 20 types of attack traffic are randomly selected from MCFP, 10 making a known class dataset and the rest forming an unknown class dataset. Furthermore, to maximize known data utilization while accurately reflecting the effectiveness of the model, the known class dataset is separated into the training set, validation set, and known class test set in the proportion 8:1:1, as suggested by [29]. The unknown class dataset is devoted entirely to tests. The details of known and unknown attacks used in experiments are described in Table 1.

4.2. Evaluation Metrics

Since the proposed DEMTR is essentially a K + 1-class classification model, the accuracy and F₁-score are used as quantitative assessment indicators to reflect the model’s performance changes in the open-set setting. These indicators are defined below.

Accuracy: It defines the proportion of correct results predicted by models to the totality of data sets tested, which reveals the rightness rate of sample classification.

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(10)

F₁-Score: Precision and recall are integrated into F₁-Score and share the same weight. Notably, F₁-score is a more effective evaluation index that seeks to strike a balance between precision and recall and can reflect the comprehensive performance of the model.

F_{1} - Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(11)

Recall = \frac{TP}{TP + FN}

(12)

Precision = \frac{TP}{TP + FP}

(13)

Assuming that

C

represents a class in the test set,

T P

states the number of samples correctly marked as

C

,

T N

denotes the number of samples correctly classified as non-

C

,

F P

is the number of samples incorrectly classified as

C

, and

F N

counts the number of samples incorrectly classified as non-

C

.

4.3. Experimental Results and Analysis

The experiments in this study consist of three stages: The initial stage aims to determine the specific values of the hyper-parameters involved in data preprocessing in Section 3.2. In the second and third stages, based on the work in stage 1, the MCFP dataset is used to compare the proposed DEMTR with works in uncertainty estimation and open-set malware traffic recognition, respectively.

4.3.1. Comparison of Hyper-Parameters in Feature Extraction

The data preprocessing step in this paper involves a set of hyper-parameters

(n_{1}, n_{2})

, representing spatial features of the attack traffic whose product indicates the length of the extracted spatial features of sessions. To meet the input requirements of the 2D-CNN model, some researchers determined the value as a quadratic power form, such as 784 (28 × 28) or 1521 (39 × 39), and constructed a two-dimensional feature matrix with symmetric length. In this research, since a 1D-CNN is used to process spatial features, there is no need to consider this factor.

The hyper-parameter

n_{1}

represents the number of packets selected from each session. The statistical analysis of the traffic sessions in the dataset reveals that more than 95% of the sessions have more than 5 packets, and more than 95% have fewer than 30 packets. Therefore, the range of

n_{1}

is [5,30] with an interval of 5, and there are six different options {5, 10, 15, 20, 25, 30}. The hyper-parameter

n_{2}

indicates the length of intercepted bytes of each packet. The maximum transmission unit specified by the Ethernet protocol constricts packet length, which cannot exceed 1500 bytes. The protocol headers in the network layer and transport layer of encrypted traffic packets are retained because they are useful for attack identification. To cover the protocol header data and the transport layer payload of some specific data transmission steps as much as possible,

n_{2}

ranges from 100 to 1500 with an interval of 200 in this experiment, and there are eight possible cases, i.e., {100, 300, 500, 700, 900, 1100, 1300, 1500}. There are 48 combinations of

n_{1}

and

n_{2}

, and traversing the combinations can capture the dependencies between

n_{1}

and

n_{2}

to obtain better recognition performance.

For the malware traffic dataset, Table 2 presents the F₁-score of the DEMTR model for different

(n_{1}, n_{2})

. Concerning the overall trend, the F₁-score goes up with the number of selected packets in a traffic session and the length of intercepted bytes in a packet. Overall, the more bytes and packets you feed into the model, the better the results, but the longer the training time will be. Experimental results show that the F₁-score is not simply positively linear related with

(n_{1}, n_{2})

. Moreover, blindly increasing the dimensions of extracted features may cause the effective features to interfere, thereby weakening the effect of the whole model on malicious traffic identification. In all the cases listed in Table 2, the F₁-score reaches the peak value when

n_{1}

= 30 and

n_{2}

= 1500, and the second-best F₁-score is achieved by 20 and 500. Considering the effort in feature extraction and the time and resource requirements in training,

(n_{1}, n_{2})

is selected as (20, 500).

In the hyper-parameter comparison experiments, only 800 samples were randomly selected for each class to shorten running time and save running resources. Although the training samples were insufficient for the model to completely fit data distribution, it still captured the impact of hyper-parameters on model identification performance and gave the optimal option for hyper-parameters.

4.3.2. Comparison in Uncertainty Estimation

To prove that the predictive uncertainty derived from the DEMTR model can distinguish between known classes and unknown classes, this paper compares it with two representative uncertainty estimation methods, i.e., BNN SVI [30] and MC Dropout [31]. BNN SVI provides predictive uncertainty by approximating the posterior distribution of the neural network parameters. On the other hand, MC Dropout uses Dropout as a regularization term and repeats the inference several times to calculate the predictive uncertainty. The unknown detection performance can be evaluated by the histogram statistics in Figure 4. According to the figure, the uncertainty intervals of known class samples and unknown class samples are highly overlapping in BNN SVI and MC Dropout. In comparison, the proposed method assigns smaller uncertainty to known class instances and larger uncertainty to unknown class instances, which can better distinguish the known class from the unknown class using uncertainty.

4.3.3. Comparison in Malware Traffic Open Set Recognition

We compare the proposed DEMTR with state-of-the-art malware traffic recognition models to verify its effectiveness. The baseline models are: (i) CNN, a one-dimensional convolutional neural network model based on softmax function, directly uses softmax output as confidence filtering low confidence samples to adapt to open-set scenarios, (ii) CNN_LSTM [32], a state-of-the-art intrusion detection model based on deep learning, and (iii) Open-CNN [8], a model that applies statistical extreme value theory and convolutional neural network to unknown attack detection.

The proposed model is trained using L_DEMTR instead of the traditional cross-entropy loss for 20 iterations, and the batch size is set to 256. The proposed model uses the Adam optimizer with an initial learning rate of 0.0001, and the learning rate decays every 7 iterations. The threshold of CNN and DEMTR is determined by ensuring that 95% of training data is recognized as known. The Open-CNN and CNN_LSTM implementations are based on the corresponding literature. All models were trained on the training set and tested on single-type unknown attacks and multiple types of unknown attacks.

Tests on Single-Type Unknown Attacks

To evaluate the performance of the proposed DEMTR model in detecting a single type of unknown attack, a type of unknown attack is added to the known class test set in each round of experiments. Moreover, the accuracy and F₁-score are calculated to reflect the unknown class recognition performance. Table 3 displays the experimental results of the proposed DEMTR model and baseline models.

As can be seen from Table 3, the proposed DEMTR model has the best accuracy and F₁-score for each class of unknown attacks. Compared with the state-of-the-art malware traffic identification method CNN_LSTM, the unknown class detection performance of the proposed DEMTR model is greatly improved, with the highest increase in accuracy of up to 70%. Nevertheless, class imbalance easily affects the accuracy, and the F₁-score can better reflect the overall recognition performance. The F₁-score of the DEMTR model is improved by 21% at the highest versus that of CNN_LSTM. This result proves its effectiveness in the unknown attack recognition task. In addition, it is worth mentioning that the CNN_LSTM is a closed-set classification model that misclassifies all the unknown class instances appearing in the testing phase as known classes. Therefore, its effect is far worse than the proposed DEMTR model.

Comparing three open-set models (CNN model, Open-CNN model, and DEMTR model), the DEMTR model has the best recognition performance, the Open-CNN model has the second best, and the CNN model has the worst. The CNN model’s lowest accuracy and F1-score indicate that it is unreasonable to directly use the prediction probability as the condition to judge the unknown class. The explanation is that the neural network based on the softmax function also gives a high confidence value for misclassification and is overconfident in predictions. Open-CNN can identify the unknown class to some extent because it uses the OpenMax layer and outputs the predicted probability of an unknown class.

Tests on Multiple Types of Unknown Attacks

To investigate the impact of unknown attacks of different classes on the proposed algorithm, we drew the curve of the F₁-score and openness to show the change in the F₁-score with the increase in openness. Openness is an important concept in open-set recognition problems that indicates how “open” the problem is. In this experiment, if N and K are used to represent the number of known and unknown classes, the openness can be more accurately expressed as:

o p e n n e s s = 1 - \sqrt{\frac{2 N}{2 N + K}}

(14)

In the experiment, the recognition classifier is trained by the known class training set, followed by adding the unknown classes gradually to the known class test set for testing. As there are 10 unknown classes, K ranges from 0 to 10, with a larger value implying a larger openness. For each open point, K new classes are randomly selected from the unknown class test set, and the final F₁-score is calculated by averaging ten random selections.

Figure 5 demonstrates that the proposed DEMTR model achieves the best performance. According to Figure 5, when the openness is 0 (i.e., no unknown classes have been added yet), the F1-scores of the CNN model and the CNN_LSTM model exceed 95%. It indicates that the traditional malware traffic recognition model based on deep learning can achieve good performance when the test data does not contain any unknown class. However, once unknown classes are added to the test set, the recognition performances of CNN and LSTM rapidly deteriorate. As the proportion of unknown classes in the test set gradually increases with the increase in openness, the F1-score curves of the four models all decline. However, the proposed method has the smallest decrease, and the gap with other comparison methods keeps increasing, proving its robustness in detecting unknown classes. It is worth noting that the closed-set accuracy of Open-CNN is considerably lower than that of the other methods. The reason is that Open-CNN directly modifies the activation layer vector and adds a new class named unknown in inference, which may negatively affect the accurate prediction of known class instances.

5. Conclusions

In this study, an uncertainty-aware open-set recognition method for malware traffic is proposed to solve the problem that traditional closed-set traffic classification models misclassify unknown classes with high confidence. First, using the original traffic data as features for malicious traffic identification can retain all the feature information of the original packets and avoid information loss caused by manual feature extraction. Next, the Deep Evidence Malware Traffic Recognition model (DEMTR) is built to quantify both multi-classification probability and predictive uncertainty. Predictive uncertainty is used to distinguish unknown samples from known samples. The efficiency of the proposed method is verified experimentally on real traffic datasets. The experimental results prove that the predictive uncertainty generated by DEMTR can better reflect the credibility of predictions, contributing to detecting samples with semantic shifts. Overall, the proposed method adapts well to open-set scenarios while maintaining high performance in traditional closed-set recognition settings.

Author Contributions

Conceptualization, X.L.; methodology, X.L. and J.F.; validation, X.L. and J.X.; data curation, X.L. and R.W.; writing—original draft preparation, X.L.; writing—review and editing, X.L., D.L., and Z.Q.; visualization, X.L. and H.J.; funding acquisition, J.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Project of China, grant number 2019QY1302.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors would like to express their gratitude to EditSprings (https://www.editsprings.com/ (accessed on 16 December 2022)) for the expert linguistic services provided.

Conflicts of Interest

The authors declare no conflict of interest.

References

Xu, X.; Hu, Q.; Jie, Z. Cyber Intrusion Detection Based on a Mutative Scale Chaotic Bat Algorithm with Backpropagation Neural Network. Secur. Commun. Netw. 2022. [Google Scholar] [CrossRef]
Rezaei, S.; Liu, X. Deep learning for encrypted traffic classification: An overview. IEEE Commun. Mag. 2019, 57, 76–81. [Google Scholar] [CrossRef] [Green Version]
Este, A.; Gringoli, F.; Salgarelli, L. Support vector machines for TCP traffic classification. Comput. Netw. 2009, 53, 2476–2490. [Google Scholar] [CrossRef]
Zhang, J.; Chen, X.; Xiang, Y.; Zhou, W.; Wu, J. Robust network traffic classification. IEEE/ACM Trans. Netw. 2014, 23, 1257–1270. [Google Scholar] [CrossRef]
Feng, Y.Y. Research of Network Intrusion Detection Methods. Ph.D. Thesis, North University, Taiyuan, China, 2021. [Google Scholar]
Rudd, E.M.; Rozsa, A.; Gunther, M.; Boult, T.E. A survey of stealth malware attacks, mitigation measures, and steps toward autonomous open world solutions. IEEE Commun. Surv. Tutor. 2016, 19, 1145–1172. [Google Scholar] [CrossRef]
Steve, C.; Coleman, C.; Rudd, E.M.; Boult, T.E. Open set intrusion recognition for fine-grained attack categorization. In Proceedings of the 2017 IEEE International Symposium on Technologies for Homeland Security (HST), Waltham, MA, USA, 25–26 April 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
Zhang, Y.; Niu, J.; Guo, D.; Teng, Y.; Bao, X. Unknown network attack detection based on open set recognition. Procedia Comput. Sci. 2020, 174, 387–392. [Google Scholar] [CrossRef]
Yin, C.; Zhu, Y.; Fei, J.; He, X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 2017, 5, 21954–21961. [Google Scholar] [CrossRef]
Scheirer, W.J.; Rocha, A.; Sapkota, A.; Boult, T.E. Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1757–1772. [Google Scholar] [CrossRef] [PubMed]
Scheirer, W.J.; Jain, L.P.; Boult, T.E. Probability models for open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2317–2324. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jain, L.P.; Scheirer, W.J.; Boult, T.E. Multi-class open set recognition using probability of inclusion. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014. [Google Scholar]
Bendale, A.; Boult, T.E. Towards open set deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Ge, Z.; Demyanov, S.; Garnavi, R. Generative openmax for multi-class open set classification. arXiv 2017, arXiv:1707.07418. [Google Scholar]
Ditria, L.; Meyer, B.J.; Drummond, T. OpenGAN: Open set generative adversarial networks. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020; 2020. [Google Scholar]
Dmitri, B.; Shapira, B.; Rokach, L.; Bar, A. Unknown malware detection using network traffic classification. In Proceedings of the 2015 IEEE Conference on Communications and Network Security (CNS), Florence, Italy, 28–30 September 2015; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
Niyaz, Q.; Sun, W.; Javaid, A.; Alam, M. A deep learning approach for network intrusion detection system. In Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), New York, NY, USA, 24 May 2016. [Google Scholar]
Chen, Y.; Li, Z.; Shi, J.; Gou, G.; Liu, C.; Xiong, G. Not afraid of the unseen: A siamese network based scheme for unknown traffic discovery. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, 7–10 July 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
Gawlikowski, J.; Tassi, C.R.N.; Ali, M.; Lee, J.; Humt, M.; Feng, J.; Kruspe, A.; Triebel, R.; Jung, P.; Zhu, X.X. A survey of uncertainty in deep neural networks. arXiv 2021, arXiv:2107.03342. [Google Scholar]
Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 15 March 2017. [Google Scholar]
Sensoy, M.; Kaplan, L.; Kandemir, M. Evidential deep learning to quantify classification uncertainty. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 3 December 2017. [Google Scholar]
Anderson, J.P. Technical Report: Computer Security Threat Monitoring and Surveillance; James P. Anderson Company: Washington, DC, USA, 1980. [Google Scholar]
Moon, J.; Kim, J.; Shin, Y.; Hwang, S. Confidence-aware learning for deep neural networks. In Proceedings of the International Conference on Machine Learning, ICML, Vienna, Austria, 12–18 July 2020. [Google Scholar]
Chen, M.H.; Zhu, Y.F.; Lu, B.; Zhai, Y.; Li, D. Classification of application type of encrypted traffic based on Attention-CNN. Comput. Sci. 2021, 48, 325–332. [Google Scholar]
Audun, J. Subjective Logic: A Formalism for Reasoning under Uncertainty; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Mukhoti, J.; Kulharia, V.; Sanyal, A.; Golodetz, S.; Torr, P.; Dokania, P. Calibrating deep neural networks using focal loss. In Proceedings of the Advances in Neural Information Processing Systems 33, Seattle, Washington, USA, 6–12 December 2020; pp. 15288–15299. [Google Scholar]
Krishnan, R.; Tickoo, O. Improving model calibration with accuracy versus uncertainty optimization. In Proceedings of the Advances in Neural Information Processing Systems, Seattle, WA, USA, 6–12 December2020; pp. 18237–18248. [Google Scholar]
Malware Capture Facility Project. Available online: https://www.stratosphereips.org/datasets-malware (accessed on 16 December 2022).
Buczak, A.L.; Guven, E. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutor. 2015, 2022, 1153–1176. [Google Scholar] [CrossRef]
Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural network. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
Zhang, Y.; Chen, X.; Jin, L.; Wang, X.; Guo, D. Network intrusion detection: Based on deep hierarchical network and original flow data. IEEE Access 2019, 7, 37004–37016. [Google Scholar] [CrossRef]

Figure 1. The framework of the proposed method.

Figure 2. The structure of the DEMTR model.

Figure 3. Examples of the Dirichlet distribution (triple classification is taken as an example, where the sample label is the first category). The Dirichlet distribution is different when the predictions of model are (a) accurate and certain (AC), (b) accurate and uncertain (AU), (c) inaccurate and certain (IC), (d) inaccurate and uncertain (IU).

Figure 4. Histogram of uncertainty distribution for (a) BNN SVI, (b) MC Dropout and (c) the proposed DEMTR.

Figure 5. The relationship between F₁-score and openness.

Table 1. The details of known and unknown attacks used in experiments.

Dataset	Known Attacks	Number	Unknown Attacks	Number
MCFP	Dridex	54331	njRAT	51324
	Sathurbot	55402	HTBot	52069
	TrickBot	51489	Hancitor	5421
	Emotet	43913	CoinMiner	53461
	Trojan_Downloader	52645	WebCompanion	23451
	Locky	49708	WannaCry	44290
	Trojan_Dynamer	65029	Simda	34162
	Mirai	54994	Miuref	17177
	Sality	51034	neeris	35790
	Tinba	58038	Vawtrak	46977
	total	536583	total	364122
	$D_{t r} : D_{v a} : D_{t e} = 8 : 1 : 1$		$D_{t e}$

Table 2. The results of hyper-parameter experiments.

$n_{1}$	$n_{2}$
$n_{1}$	100	300	500	700	900	1100	1300	1500
5	46.50	51.19	46.85	46.59	48.82	53.35	47.19	54.04
10	57.78	57.77	61.39	55.41	55.85	59.45	59.63	51.08
15	49.97	60.57	64.85	60.73	64.99	57.61	57.40	59.13
20	59.81	58.80	66.54	66.24	55.14	65.16	65.67	57.04
25	48.41	62.88	64.76	65.21	52.49	60.81	55.55	59.22
30	56.17	60.20	64.31	65.35	61.71	64.61	64.05	72.43

Table 3. Performance comparison between the proposed DEMTR model and the baselines in detecting single-type unknown attacks.

Unknown Attack	Accuracy (%)				F1-Score (%)
Unknown Attack	CNN	CNN_LSTM	Open-CNN	DEMTR	CNN	CNN_LSTM	Open-CNN	DEMTR
njRAT	50.00	50.19	52.94	72.50	68.28	70.44	71.16	81.44
HTBot	49.65	49.83	51.71	71.14	66.78	67.86	65.99	78.28
Hancitor	88.85	89.18	88.85	89.85	85.50	85.68	85.91	90.18
CoinMiner	49.01	49.18	57.38	83.73	69.74	71.02	73.79	85.65
WebCompanion	68.08	68.33	71.21	84.39	77.93	78.43	79.05	87.97
WannaCry	53.59	53.79	53.57	90.44	77.12	79.18	80.85	90.57
Simda	59.78	59.99	72.98	73.46	74.33	75.17	84.18	85.25
Miuref	74.11	74.38	78.62	79.17	78.82	80.43	83.68	84.69
neeris	58.69	58.90	63.34	64.86	76.04	75.56	79.77	80.16
Vawtrak	52.16	52.35	52.46	69.33	72.07	72.01	72.57	78.28

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Fei, J.; Xie, J.; Li, D.; Jiang, H.; Wang, R.; Qi, Z. Open Set Recognition for Malware Traffic via Predictive Uncertainty. Electronics 2023, 12, 323. https://doi.org/10.3390/electronics12020323

AMA Style

Li X, Fei J, Xie J, Li D, Jiang H, Wang R, Qi Z. Open Set Recognition for Malware Traffic via Predictive Uncertainty. Electronics. 2023; 12(2):323. https://doi.org/10.3390/electronics12020323

Chicago/Turabian Style

Li, Xue, Jinlong Fei, Jiangtao Xie, Ding Li, Heng Jiang, Ruonan Wang, and Zan Qi. 2023. "Open Set Recognition for Malware Traffic via Predictive Uncertainty" Electronics 12, no. 2: 323. https://doi.org/10.3390/electronics12020323

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Open Set Recognition for Malware Traffic via Predictive Uncertainty

Abstract

1. Introduction

2. Related Work

3. Method

3.1. Problem Definition

3.2. Data Preprocessing

3.3. DEMTR Model

3.4. Training and Optimization

4. Experiments and Analysis

4.1. Data Set and Experimental Environment

4.2. Evaluation Metrics

4.3. Experimental Results and Analysis

4.3.1. Comparison of Hyper-Parameters in Feature Extraction

4.3.2. Comparison in Uncertainty Estimation

4.3.3. Comparison in Malware Traffic Open Set Recognition

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI