Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey

McCarthy, Andrew; Ghadafi, Essam; Andriotis, Panagiotis; Legg, Phil

doi:10.3390/jcp2010010

Open AccessArticle

Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey

Computer Science Research Centre, University of the West of England, Bristol BS16 1QY, UK

^*

Authors to whom correspondence should be addressed.

J. Cybersecur. Priv. 2022, 2(1), 154-190; https://doi.org/10.3390/jcp2010010

Submission received: 31 January 2022 / Revised: 14 March 2022 / Accepted: 15 March 2022 / Published: 17 March 2022

(This article belongs to the Collection Machine Learning and Data Analytics for Cyber Security)

Download

Browse Figures

Versions Notes

Abstract

:

Machine learning has become widely adopted as a strategy for dealing with a variety of cybersecurity issues, ranging from insider threat detection to intrusion and malware detection. However, by their very nature, machine learning systems can introduce vulnerabilities to a security defence whereby a learnt model is unaware of so-called adversarial examples that may intentionally result in mis-classification and therefore bypass a system. Adversarial machine learning has been a research topic for over a decade and is now an accepted but open problem. Much of the early research on adversarial examples has addressed issues related to computer vision, yet as machine learning continues to be adopted in other domains, then likewise it is important to assess the potential vulnerabilities that may occur. A key part of transferring to new domains relates to functionality-preservation, such that any crafted attack can still execute the original intended functionality when inspected by a human and/or a machine. In this literature survey, our main objective is to address the domain of adversarial machine learning attacks and examine the robustness of machine learning models in the cybersecurity and intrusion detection domains. We identify the key trends in current work observed in the literature, and explore how these relate to the research challenges that remain open for future works. Inclusion criteria were: articles related to functionality-preservation in adversarial machine learning for cybersecurity or intrusion detection with insight into robust classification. Generally, we excluded works that are not yet peer-reviewed; however, we included some significant papers that make a clear contribution to the domain. There is a risk of subjective bias in the selection of non-peer reviewed articles; however, this was mitigated by co-author review. We selected the following databases with a sizeable computer science element to search and retrieve literature: IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, SpringerLink, and Google Scholar. The literature search was conducted up to January 2022. We have striven to ensure a comprehensive coverage of the domain to the best of our knowledge. We have performed systematic searches of the literature, noting our search terms and results, and following up on all materials that appear relevant and fit within the topic domains of this review. This research was funded by the Partnership PhD scheme at the University of the West of England in collaboration with Techmodal Ltd.

Keywords:

cybersecurity; adversarial machine learning; machine learning; intrusion detection; functionality-preservation

1. Introduction

Machine learning (ML) has become widely adopted as a strategy for dealing with a variety of cybersecurity issues. Cybersecurity domains particularly suited to ML include: intrusion detection and prevention [1], network traffic analysis [2], malware analysis [3,4], user behaviour analytics [5], insider threat detection [6], social engineering detection [7], spam detection [8], detection of malicious social media usage [9], health misinformation [10], climate misinformation [11], and more generally “Fake News” [12]. These are essentially classification problems. Papernot et al. [13] stated that most ML models can be described mathematically as functions

h_{0} (x)

with an input x and parameterized by a vector

θ \in Θ

, although some models such as K nearest neighbor are non-paremetric. The output of the the function

h_{0} (x)

is the model’s prediction of some property of interest for the given input x. The input x is usually represented as a vector of values called features. The space of functions

h = x \mapsto h_{0} (x) | θ \in Θ

defines the set of candidate hypotheses. In supervised learning, the parameters are adjusted to align model predictions

h_{0} (x)

with the expected output y. This is achieved by minimizing a loss function that captures the dissimilarity of

h_{0} (x)

and the corresponding y. Model performance must be validated against a separate training dataset to confirm if the model also generalizes well for unseen data. Classification ML systems find a function (f) that matches a vector (

\vec{x}

) to its corresponding class (y).

Dhar et al. [14] noted that few studies analyzed the complexity of models and associated trade-offs between accuracy and complexity. The complexity of an algorithm is often expressed in Big-O notation. They reviewed models, stating the number of features and activations have an effect on memory usage and computational complexity. Moreover, they argued that accuracy alone cannot justify the choice of model type, particularly in regard to DNN; however, we consider the risks involved for inaccurate predictions will vary across domains. In security domains, greater accuracy may be considered critical, possibly assuaging concerns regarding computational complexity of models.

Critically, ML systems are increasingly trusted within cyber physical systems [15], such as power stations, factories, and oil and gas industries. In such complex physical environments, the potential damage that could be caused by a vulnerable system might even be life threatening [16]. Despite our reliance and trust in ML systems, the inherent nature of machine learning—learning to identify patterns—is in itself a potential attack vector for adversaries wishing to circumvent ML-based system detection processes. Adversarial examples are problematic for many ML algorithms and models including random forests (RF) and naive Bayes (NB) classifiers; however, we focus on artificial neural networks and particularly deep neural networks. Artificial neural networks (ANN) are inspired by the network of neurons in the human brain. ANNs are useful because they can generalize from a finite set of examples, essentially mapping a large input space (infinite for continuous inputs) to a range of discrete outputs. Unfortunately, in common with other ML algorithms, neural networks are vulnerable to attacks using carefully crafted perturbations to inputs, including evasion and poisoning attacks. In recent work, carefully crafted inputs described as “adversarial examples” are considered possible in ANN because of these inherent properties that exist within neural networks [17], such as:

The semantic information of the model is held across the model and not localised to specific neurons;
Neural networks learn input–output mappings that are discontinuous (and discontiguous).

These properties mean that even extremely small perturbations of an input could cause a neural network to provide a misclassified output. Given that neural networks have these properties, we reasonably expect our biological neural networks to suffer misclassifications, and/or to have evolved mitigations. Human brains are more complex than current artificial neural networks, yet they suffer a type of misclassification (illusory perception), in the form of face pareidolia [18,19]. This strengthens the case that the properties of neural networks are a source of adversarial examples (AE). In cybersecurity-related domains it has been seen how adversaries exploit adversarial examples, using carefully-crafted noise to evade detection through misclassification [20,21].

In this way, an adversarial arms race exists between adversaries and defenders. The recent SolarWinds supply chain attack [22,23] identified in December 2020 indicates the reliance that organisations have on intrusion detection software, and the presence of advanced persistent threats (APTs) with the expertise and resources to attack organisations’ network defenses. Adversarial machine learning is a critical area of research. If not addressed, there is increasing potential for novel attack strategies that seek to exploit the inherent weaknesses that exist within machine learning models. For this reason, this survey addresses the issues related to the robustness of machine learning models against adversarial attacks across the cybersecurity domain, where problems of functionality-preservation are recognized. While we use a case study of a network-based intrusion detection system (NIDS), these issues might be applicable in other areas where ML systems are used. We focus on papers detailing adversarial attacks and defenses. Attacks are further classified by attack type, attack objective, domain, model, knowledge required, and constraints. Defenses are further categorised by defense type, domain, and model. In the domain of network traffic analysis, adversaries need to evade detection methods. A suitable network firewall will reject adversarial traffic and malformed packets while accepting legitimate traffic. Therefore, successful adversarial examples must be crafted to comply with domain constraints such as those related to the transmission control protocol/internet protocol (TCP-IP) stack. Moreover, adversaries wish to preserve the functionality of their attacks. A successful attack must not lose functionality at the expense of evading a classifier. The essence of a simple adversarial attack is that a malicious payload evades detection by masquerading as benign. We refer to this characteristic as functionality-preserving. Compared to domains such as computer vision whereby the image modification is only to fool human vision sensors, adversarial attacks in other domains are significantly more challenging to fool both a human and/or system-based sensor.

The major contributions of this paper are:

We conduct a survey of the literature to identify the trends and characteristics of published works on adversarial learning in relation to cybersecurity, addressing both attack vectors and defensive strategies;
We address the issue of functionality-preservation in adversarial learning in contrast to domains such as computer vision, whereby a malformed input must suitably fool a system process as well as a human user such that the original functionality is maintained despite some modification;
We summarise this relatively-new research domain to address the future research challenges associated with adversarial machine learning across the cybersecurity domain.

The remainder of this paper is structured as follows: Section 2 provides an overview of other important surveys; Section 3 discusses background material; Section 4 details the literature survey; Section 5 details our results; Section 6 provides our discussion, and the conclusion summarises our findings and identifies research challenges.

2. Related Works

Corona et al. [24] provided a useful overview of intrusion detection systems. They predicted greater use of machine learning for intrusion detection and called for further investigation into adversarial machine learning. We now consider a number of related academic surveys that have been presented in the last five years with a focus on adversarial examples, security, and intrusion detection.

2.1. Secure and Trustworthy Systems

Machine learning systems are used in increasingly diverse areas including those of cyber-security. Trust in these systems is essential. Hankin and Barrèere [25] note that there are many aspects to trustworthiness: reliability, trust, dependability, privacy, resilience, and safety. Adversaries ranging from solo hackers to state-sponsored APTs have an interest in attacking these systems. Successful attacks against machine learning models mean that systems are vulnerable and therefore potentially dangerously deployed in cyber-security domains. Cho et al. [26] proposed a framework considering the security, trust, reliability and agility metrics of computer systems; however, they did not specifically consider adversarial machine learning, or robustness to adversarial examples.

2.2. Adversarial ML in General

Papernot et al. [13] noted that the security and privacy of ML is an active but nascent area of research. In this early work, they systematized their findings on security and privacy in machine learning. They noted that a science for understanding many of the vulnerabilities of ML and countermeasures is slowly emerging. They analysed ML systems using the classical confidentiality, integrity and availability (CIA) model. They analysed: training in adversarial settings; inferring adversarial settings; and robust, fair, accountable, and private ML models. Through their analysis, they identified a total of eight key takeaways that point towards two related notions of sensitivity. The sensitivity of learning models to their training data is essential to privacy-preserving ML, and similarly the sensitivity to inference data is essential to secure ML. Central to both notions of sensitivity is the generalization error (i.e., the gap between performance on training and test data). They focused on attacks and defenses for machine learning systems and hoped that understanding the sensitivity of modern ML algorithms to the data they analysed will foster a science of security and privacy in machine learning. They argued that the generalization error of models is key to secure and privacy-preserving ML.

Zhang and Li [27] discussed opportunities and challenges arising from adversarial examples. They introduced adversarial examples and surveyed state-of-the-art adversarial example generation methods and defenses, before raising future research opportunities and challenges. They noted three challenges for the construction of adversarial examples:

The difficulty of building a generalizable method;
The difficulty in controlling the size of perturbation (too small will not result in adversarial examples, and too large can easily be perceived);
Difficulty in maintaining adversarial stability in real-world applications (some adversarial examples do not hold for transformations such as blurring).

They identified two challenges for defense against adversarial examples. First, black-box attacks do not require knowledge of the model architecture and therefore cannot be easily resisted by modifying the model architecture or parameters. Second, defenses are often specific to an attack method and are less suitable as a general defense. Defenses against one attack method do not easily defend against adversarial examples based on other methods for generating adversarial examples. They subsequently identified three opportunities:

Construction of adversarial examples with high transferability (high confidence);
Construction of adversarial examples without perturbing the target image; they suggested that perturbation size will affect the success rate and transferability of adversarial examples;
Considering and modeling physical transformations (translation, rotation, brightness, and contrast).

Their focus was on the visual domain and they did not specifically discuss IDS or functionality-preserving adversarial attacks.

Apruzzese et al. [28] examined adversarial examples and considered realistic attacks, highlighting that most literature considers adversaries with complete knowledge about the classifier and are free to interact with the target systems. They further emphasized that few works consider “relizable” perturbations that take account of domain and/or real-world constraints. There is perhaps a perception that the threat from adversarial attacks is low based on the assumption that much prior knowledge of the system is required. This approach has some merit; however, this could be an over-confident position to take. Their idea was that realistically the adversary has less knowledge of the system. This conflicts with Shannon’s maxim [29] and Kerckhoff’s second cryptographic principle [30], which states that the fewer secrets the system contains, the higher its safety. The pessimistic “complete knowledge” position is often used in cryptographic studies; in cryptographic applications it is considered safe because it is a bleak expectation. This expectation is also realistic since we must expect well-resourced adversaries to eventually discover or acquire all details of the system. Many adversarial example papers assume complete knowledge; this is however unlikely to always be the case, perhaps leading some to believe models are more secure against adversarial examples. However, the transferability property of adversarial examples means that complete knowledge is not required for successful attacks, and black-box attacks are possible with no prior knowledge of machine learning models. An adversary may only learn through interacting with the model. We must therefore account for the level of knowledge required by an adversary, including white-box, black-box, and gray-box knowledge paradigms.

2.3. Intrusion Detection

Wu et al. [31] considered several types of deep learning systems for network attack detection, including supervised and unsupervised models, and they compared the efficiency and effectiveness of different attack detection methods using two intrusion detection datasets: the “KDD Cup 99” dataset and an improved version known as NSL-KDD [32,33]. These two datasets have been used widely in the past by academic researchers; however, they do not fairly represent modern network traffic analysis problems due to concept-drift. Networks have increasing numbers of connected devices, increasing communications per second, and new applications using the network. The use of computer networks and the Internet has changed substantially in twenty years. The continued introduction of IPv6, network address translation, Wi-Fi, mobile 5G networks, and cloud providers has changed network infrastructure [34]. Furthermore, the Internet is now increasingly used for financial services. Akamai [35] reported that financial services now see millions or tens of millions of attacks each day. These attacks were less common twenty years ago. Furthermore, social media now constitutes much internet traffic and most social media platforms were founded after the KDD Cup 99 and NSL-KDD datasets were introduced. For example, Facebook, YouTube, and Twitter were founded in 2004, 2005, and 2006, respectively. This limits the validity of some research using outdated datasets. Therefore, we suggest research should use modern datasets that represent modern network traffic.

Kok et al. [36] analysed intrusion detection systems (IDS) that use a machine learning approach. They specifically considered the datasets used, the ML algorithms, and the evaluation metrics. They warned that some researchers are still using datasets first introduced decades ago (e.g., KDD Cup 99, NSL-KDD). They warned that this trend could result in no or insufficient progress on IDS. This would ultimately lead to the untenable position of obsolete IDS while intrusion attacks continue to evolve along with user behaviour and the introduction of new technologies. Their paper did not consider adversarial examples or robustness of ML models. Alatwi and Morisset [37] tabulated a list of network intrusion datasets in the literature that we extend in Table 1.

Martins et al. [51] considered adversarial machine learning for intrusion detection and malware scenarios, noting that IDS are typically signature-based, and that machine learning approaches are being widely employed for intrusion detection. They described five “tribes” of ML algorithms before detailing some fundamentals of adversarial machine learning, including commonly used distance metrics:

L_{\infty}

,

L_{0}

, and

L_{2}

. They subsequently described common white-box methods to generate adversarial examples, including: Broyden–Fletcher–Goldfarb–Shanno algorithm (L-BFGS), the fast gradient sign method (FGSM), Jacobian-based saliency map attack (JSMA), Deepfool, and Carlini and Wagner attacks (C&W). They also considered black-box methods using generative adversarial networks (GANS). Traditional GANS sometimes suffer problems of mode collapse. Wasserstein generative adversarial networks (WGANS) solve some of these problems. They introduced Zeroth-order optimization attack (ZOO) as a black-box method. ZOO estimates the gradient and optimises an attack by iteratively adding perturbations to features. They noted that most attacks have been initially tested in the image domain, but can be applied to other types of data, which poses a security threat. Furthermore, they considered there is a trade-off when choosing an adversarial attack. For example, JSMA is more computationally intensive than FGSM but modifies fewer features. They considered JSMA to be the most realistic attack because it perturbs fewer features. When considering defenses, they tabulated advantages and disadvantages of common defenses. For example, feature squeezing is effective in image scenarios, but unsuitable for other applications because compression methods would result in data loss for tabular data. They noted that GANS are a very powerful technique that can result in effective adversarial attacks where the samples follow a similar distribution to the original data but cause misclassification.

2.4. Cyber-Physical Systems

Cyber-physical systems (CPSs) rely on computational systems to create actuation of physical devices. The range of devices is increasing from factory operations to power stations, autonomous vehicles, and healthcare operations. Shafique et al. [52] considered such smart cyber-physical systems. They discussed reliability and security vulnerabilities of machine learning systems, including hardware trojans, side channel attacks, and adversarial machine learning. This is important, because system aging and harsh operating environments mean CPSs are vulnerable to numerous security and reliability concerns. Advanced persistent threats could compromise the training or deployment of CPSs through stealthy supply-chain attacks. A single vulnerability is sufficient for an adversary to cause a misclassification that could lead to drastic effects in a CPS (e.g., an incorrect steering decision of an autonomous vehicle could cause a collision). We consider that vulnerabilities in ML could lead to a range of unwanted effects in CPSs, including those that could lead to life-threatening consequences [16]. The Stuxnet worm is an example of malware with dire consequences.

2.5. Contributions of This Survey

Our main objectives are:

Collect and collate current knowledge regarding robustness and functionality-preserving attacks in cybersecurity domains;
Formulate key takeaways based on our presentation of the information, aiming to assist understanding of the field.

This survey aims to complement existing work while addressing clear differences, by also studying the robustness of adversarial examples, specifically functionality-preserving use cases. Most previous work aimed to improve the accuracy of models or examine the effect of adversarial examples. Instead, we consider the robustness of models to adversarial examples.

Machine learning systems are already widely adopted in cybersecurity. Indeed, with increasing network traffic, automated network monitoring using ML is becoming essential. Modern computer networks carry private personal and corporate data including financial transactions. These data are an attractive lure to cyber-criminals. Adversaries may wish to steal or disturb data. Malware, spyware, and ransomware threats are endemic on many computer networks. IDS help keep networks safe; however, an adversarial arms race exists, and it is likely that adversaries, including advanced persistent threats, are developing new ways to evade network defenses. Some research has evaded intrusion detection classifiers using adversarial examples.

We identify that while adversarial examples in the visual domain are well understood, less work has focused on how adversarial examples can be applied to network traffic analysis and other non-visual domains, similarly to machine learning models used for image and object recognition. For example, convolutional neural networks (CNNs) are well researched, whereas other model types used for intrusion detection, e.g., recurrent neural networks (RNNs), receive less attention. The generation of adversarial examples to fool IDS is more complicated than visual domains because the features include discrete and non-continuous values [53]. Compounding the defense against adversarial examples is the overconfident assumption that successful adversarial examples require “complete knowledge” of the model and parameters. On the contrary, black-box attacks are possible with no or limited knowledge of the model. Most defenses so far proposed consider the visual domain and most are ineffective against strong and black-box attacks. This survey addresses the problem of adversarial machine learning across cyber-security domains. Further research is required to head off future mature attack methods that could facilitate more complex and destructive attacks.

3. Background

Here we provide further background on some key concepts that are related to adversarial learning, to support the reader of this survey. We cover the topics of model training, robustness, common adversarial example algorithms, adversary capabilities, goals, and attack methods.

3.1. Model Training

It is important to consider the dataset on which models are trained, because the trustworthiness and quality of a model is impacted by the distribution, quality, quantity, and complexity of dataset training samples [54]. Biased models are more susceptible to adversarial examples. Therefore, models must be trained on unbiased training data; however, Johnson et al. considered that the absolute number of training samples may be more important than the ratio of class imbalance [55]. For example, a small percentage of a large number of samples is sufficient to train a model regardless of high class imbalance (e.g., 1% malicious samples in 1 million network flows yields 10,000 samples). Unfortunately, cybersecurity datasets are often prone to bias, in part because of limited samples of some malicious traffic (e.g., zero-day attacks) and large amounts of benign traffic. Sheatsley et al. [56] state biased distributions enable successful adversarial examples with the modification of very few features.

3.1.1. Resampling

Three common data-level techniques tackle biased datasets by resampling:

Oversampling: Random samples of minority classes are duplicated until the bias of majority classes is compensated;
Undersampling: Random samples from the majority class are discarded until the bias of majority classes is compensated;
Hybrid Sampling: Combines modest oversampling of minority classes and modest undersampling of majority classes aiming to give better model performance than applying either technique alone.

Algorithm-level techniques tackling dataset bias commonly employ cost-sensitive learning where a class penalty or weight is considered or decision thresholds are shifted to reduce bias [55].

3.1.2. Loss Functions

When training a model the goal is to minimize the loss function through use of an optimizer that adjusts the weights at each training step. Common optimizers include stochastic gradient descent (SGD), adaptive moments (Adam), and root mean squared propagation (RMSProp). Commonly, a regularizer is employed during training to ensure the model generalizes well to new data. A dropout layer is often employed as a regularizer.

The loss function must be chosen carefully: for binary classification binary_crossentropy (Equation (1)) is usual; for multiclass classification problems categorical_crossentropy (Equation (2)) or mean_squared_error (Equation (3)) are suitable.

f_{b i n a r y_c r o s e n t r o p y} (y) = - y_{i}^{l a b e l} l o g y_{i} i^{p r e d i c t i o n} - (1 - y_{i}^{l a b e l}) l o g (1 - y_{i}^{p r e d i c t i o n})

(1)

f_{c a t e g o r i c a l_c r o s e n t r o p y} (y) = - \sum_{i = 1}^{c a t e g o r i e s} y_{i}^{l a b e l} l o g y_{i}^{p r e d i c t i o n}

(2)

f_{m e a n_s q u a r e d_e r r o r} (y) = \frac{1}{c a t e g o r i e s} \sum_{i = 1}^{c a t e g o r i e s} {(y_{i}^{l a b e l} - y_{i}^{p r e d i c t i o n})}^{2}

(3)

3.1.3. Cross Validation

Cross validation [57] is a widely used data resampling method to assess the generalizability of a model and to prevent over-fitting. Cross validation often involves stratified random sampling, meaning the sampling method retains the class proportions in the learning set. In leave-one-out cross validation, each sample is used in turn as the validation set. The test error approximates the true prediction error; however, it has high variance. Moreover, its computational cost can be high for large datasets. k-fold cross validation aims to optimise the bias/variance trade-off. In k-fold cross validation, the dataset is randomly split into k equal size partitions. A single partition is retained for test data, and the remaining

k - 1

partitions are used for training. The cross validation steps are reiterated until each partition is used once for validation, as shown in Figure 1. The results are averaged across all iterations to produce an estimation of the performance of the model (Equation (4)). Refaelzadeh et al. highlighted risks of elevated Type I errors (false positives). With larger values of k, variance is reduced. Moreover, bias also reduces because the model is trained on more of the dataset. We posit that resampling techniques could be used to improve robustness against adversarial examples.

C V_{(k)} = \frac{1}{k} \sum_{i = 1}^{k} M S E_{i}

(4)

3.1.4. Bootstrapping

Bootstrapping is resampling with replacement, and is often used to statistically quantify the performance of a model, to determine if a model is statistically significantly better than other models.

3.2. Robustness

Robustness can be defined as the performance of well-trained models facing adversarial examples [58]. Essentially, robustness considers how sensitive a model’s output is to a change in the input. The robustness of a model is related to the generalization error of the model. There is a recognised trade-off between accuracy and robustness in machine learning. That is, highly accurate models are less robust to adversarial examples. Machine learning models in adversarial domains must be both highly accurate and robust. Therefore, improving the robustness of machine learning models enables safer deployment of ML systems across a wider range of domains.

To critically evaluate and make fair comparisons, robustness metrics are necessary. Common metrics include precision (Equation (5)), recall (Equation (6)), and F1-score (Equation (7)).

P r e c i s i o n = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e P o s i t i v e s}

(5)

R e c a l l = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(6)

F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

Other possible useful metrics to evaluate robustness include: the Lipschitzian property, which monitors the changes in the output with respect to small changes to inputs; and CLEVER (cross-Lipschitz extreme value for network robustness), which is an extreme value theory (EVT)-based robustness score for large-scale deep neural networks (DNNs). The proposed CLEVER score is attack-agnostic and computationally feasible for large neural networks improving on the Lipschitzian property metric [59]. Table 2 details some advantages and disadvantages of some robustness metrics.

3.3. Common Adversarial Example Algorithms

There are numerous algorithms to produce adversarial examples. Szegedy et al. [17] used a box-constrained limited memory L-BFGS. Other methods include FGSM [63] and iterative derivatives, including the basic iterative method (BIM) and projected gradient descent (PGD). JSMA optimises for the minimal number of altered features (

L_{0}

). The Deepfool algorithm [62] optimises for the root-mean-square (Euclidean distance,

L_{2}

). Carlini and Wagner [64] proposed powerful C&W attacks optimizing for the

L_{0}

,

L_{2}

, and

L_{\infty}

distance metrics. There are many algorithms to choose from. Furthermore, Papernot et al. [65] developed a software library for the easy generation of adversarial examples. There are now a number of similar libraries that can be used to generate adversarial examples, as shown in Table 3.

Moreover, algorithms such as FGSM that modify all features are unlikely to preserve functionality. Algorithms such as JSMA that modify a small subset of features are not guaranteed to preserve functionality; however, with fewer modified features, the likelihood improves. Checking for and keeping only examples that preserve functionality is possible, although it is a time-consuming and inelegant solution. A potentially better solution could ensure only functionality-preserving adversarial examples are generated.

When considering the robustness of machine learning models, we first must consider the threat model. We must consider how much the adversary knows about the classifier, ranging from no knowledge to perfect knowledge. Adversaries may have a number of different goals:

Accuracy degradation (where the adversary wants to sabotage the effectiveness of the overall classifier accuracy);
Target misclassification (where the adversary wants to misclassify a particular instance as another given class);
Untargeted classification (where the adversary wants to misclassify a particular instance to any random class).

We now consider the attack surface. In IDS, the attack surface can be considered as an end-to-end pipeline, with varying vulnerabilities and potential for compromise at each stage of the pipeline.

In one basic pipeline, as shown in Figure 2, the raw network traffic on network interfaces is collected as packet capture files (PCAPs), which are then processed into network flows. There are different applications that could be used to process PCAPs into network flows. CICFlowMeter [69] is a network traffic flow generator and analyser that has been used in cyber-security datasets [70,71] and produces bidirectional flows with over 80 statistical network traffic features. The generated flows are unlabelled and so must be labelled manually with the traffic type, typically benign/malicious, although multiclasses could be labelled given sufficient information including attack type, IP source and destination dyad, duration, and start time. Finally, the labelled flows are used to train the model. Repetitive training cycles could enable detection of new attacks; however, the cyclic nature of the training means that an adversary could attack any iteration of training. Furthermore, an adversary could choose to attack any point in the pipeline. The training data used to train the model generally consist of feature-vectors and expected outputs, although some researchers are considering unsupervised learning models. The collection and validation of these data offer an attack surface. Separately, the inference phase also offers an attack surface. It is interesting to note that the size of the feature set a machine learning model uses can be exploited as an attack surface. A fundamental issue is that each feature processed by a model may be modified by an adversary. Moreover, Sarker et al. [72] noted that the computational complexity of a model can be reduced by reducing the feature dimensions. Large feature sets include more features and hence provide more opportunities to an adversary for manipulation. Almomani et al. [73] indicated that accuracy can be maintained with fewer features, and McCarthy et al. [74] indicated that more features tend to reduce the necessary size of perturbations. Therefore, larger feature sets are more readily perturbed than smaller feature sets, which have fewer modifiable features and hence require larger perturbations.

3.4. Threat Model—Adversary Capabilities

Adversaries are constrained by their skills, knowledge, tools, and access to the system under attack. An insider threat might have access to the classification model and other associated knowledge, whereas an external threat might only be able to examine data packets. While the attack surface may be the same for both adversaries, the insider threat is potentially a much stronger adversary because they have greater knowledge and access. Adversary capabilities mean that attacks can be split into three scenarios: white-box, black-box, and gray-box.

In white-box attacks, an adversary has access to all machine learning model parameters. In black-box attacks, the adversary has no access to the machine learning model’s parameters. Adversaries in black-box scenarios may therefore use a different model, or no model at all, to generate adversarial examples. The strategy depends on successfully transferring adversarial examples to the target model. Gray-box attacks consider scenarios where an adversary has some, but incomplete, knowledge of the system. White-box and black-box are most commonly considered.

3.5. Threat Model—Adversary Goals

Adversaries aim to subvert a model through attacking its confidentiality, integrity, or availability. Confidentiality attacks attempt to expose the model or the data encapsulated within. Integrity attacks occur when an adversary attempts to control the output of the model, for example, to misclassify some adversarial traffic and therefore allow it to pass a detection process. Availability attacks could misclassify all traffic types, or deteriorate a model’s confidence, consistency, performance, and access. In this way, an integrity attack resembles a subset of availability attack, since an incorrect response is similar in nature to a correct response being unavailable; however, the complete unavailability of a response would likely be more easily noticed than decreases in confidence, consistency, or performance. The goals of an adversary may be different but are often achieved with similar methods.

3.6. Threat Model—Common Attack Methods

Figure 3 shows some common categories of adversarial machine learning attack methods, that we explore in this section.

3.6.1. Poisoning

An adversary with access to the training data or procedure manipulates it, implanting an attack during the training phase, when the model is trained on adversarial training data. This is achieved with carefully crafted noise or sometimes random noise. Unused or dormant neurons in a trained deep neural network (DNN) signify that a model can learn more; essentially, an increased number of neurons allows for a greater set of distinct decision boundaries forming distinct classifications of data. The under-utilised degrees of freedom in the learned model could potentially be used for unexpected classification of inputs. That is, the model could learn to provide selected outputs based on adversarial inputs. These neurons have very small weights and biases. However, the existence of such neurons allows successful poisoning attacks through training the model to behave differently for poisoned data. This suggests that distillation [75] could be effective at preventing poisoning attacks, because smaller models have lower knowledge capacity and likely fewer unused neurons. Distillation reduces the number of neurons that contribute to a model by transferring knowledge from a large model to a smaller model. Despite initial analysis indicating reduction in the success of adversarial attacks, Carlini [64] experimented with three powerful adversarial attacks and a high confidence adversarial example in a transferability attack, and found that distillation does not eliminate adversarial examples and provides little security benefit over undistilled networks in relation to powerful attacks. Unfortunately, they did not specifically consider poisoning attacks. Additional experiments could determine whether distillation is an effective defense against poisoning attacks.

3.6.2. Evasion

In evasion attacks, the adversary is often assumed to have no access to the training data. Instead, adversaries exploit their knowledge of the model and its parameters, aiming to minimise the cost function of adversarial noise, which, when combined with the input, causes changes to the model output. Untargeted attacks lead to a random incorrect output, targeted attacks lead to a specific incorrect output, and an attack may disrupt the model by changing the confidence of the output class. In the visual domain, the added noise is often imperceptible to humans. In non-visual domains such as intrusion detection, this problem may be much more challenging, since even small modifications may corrupt network packets and may cause firewalls to drop these malformed packets. This highlights the need for functionality preservation in adversarial learning as a clear distinction from vision-based attacks that exploit the human visual system.

3.6.3. Transferability

The transferability property of adversarial examples means that adversarial examples generated against one model will likely also work against other models trained for the same purpose. The second model need not have the same architecture or underlying model as the first and need not be trained on the same data. The transferability property of adversarial examples can form the basis for some black-box attacks where a surrogate model is used to generate adversarial examples that are subsequently presented to the target model.

4. Methodology

In this section, we describe our approach to surveying the literature so as to conduct an effective and meaningful survey of the literature.

Eligibility Criteria We determined our search terms leading to the most relevant articles. We chose the search terms detailed in Table 4.

We expect these to result in good coverage of the relevant literature. We searched each database using the identified search terms. The literature search was conducted up to September 2021. Generally, we have chosen to exclude works that have not yet been peer-reviewed, such as those appearing on arXiv, unless deemed by the authors as a significant paper that makes a clear contribution to the subject domain. We collated the searches and any subsequent duplicates were removed. Each paper was screened by reading the title and abstract to determine the relevance. Inclusion criteria were: the article is related to functionality preservation in adversarial machine learning for cybersecurity or intrusion detection with insight into robust classification.

From this large list, we specifically focused on adversarial machine learning attacks and defenses, narrowing the literature down to relevant papers. Our selection process was roughly based on the preferred reporting items for systematic meta-analysis (PRISMA) framework [76]. Figure 4 details our selection process.

Information Sources We selected the following databases with a sizeable computer science element to search and retrieve literature: IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, SpringerLink, and Google Scholar.

5. Results

In this section, we describe the results of our search and selection process. We further describe our classification scheme, and tabulate and discuss our findings, including adversarial attacks in traditional and cybersecurity domains of malware, IDS, and CPS. We included 146 relevant papers in this survey.

5.1. Classification Scheme

We classify attacks by attack type, attack objective (targeted/untargeted), domain, model, knowledge required, and whether any constraints are placed on the adversarial examples. Defenses are classified by type, domain, and model. We summarise the attacks in Table 5.

5.2. Adversarial Example Attacks

The attacks we focus on exploit adversarial examples that cause differences in the output of neural networks. Adversarial examples were discovered by Szegedy et al. [17]. Adversarial examples are possible in ANN as a consequence of the properties of neural networks; however, they are possible for other ML models. This complicates mitigation efforts, and adversarial examples can be found for networks explicitly trained on adversarial examples [102]. Furthermore, adversarial examples can be algorithmically generated, e.g., using gradient descent. Moreover, adversarial examples are often transferable, that is, an adversarial example presented to a second machine learning model trained on a subset of the original dataset may also cause the second network to misclassify the adversarial example.

5.2.1. Adversarial Examples—Similarity Metrics

In the visual domain, distance metrics are well used to judge how similar two inputs are, and therefore how easy the differences might be perceived. The following metrics are commonly used to describe the difference between normal and adversarial inputs:

Number of altered pixels, $(L_{0})$ ;
Euclidean distance ( $L_{2}$ , root-mean-square);
Maximum change to any of the co-ordinates, $(L_{\infty})$ .

Human perception may not be the best criterion to judge a successful adversarial input. A successful attack in a vision ML task may be to fool a human. Success in an ML-based system is to fool some other detection routine, while conforming to the expected inputs of the system. For example, a malicious packet must remain malicious after any perturbation has been applied. If a perturbed packet is very close to the original packet, this would only be considered successful if it also retained its malicious properties, and hence its intended function.

5.2.2. Adversarial Examples-Types of Attack

White-Box Attacks: Most white-box attacks are commonly achieved through gradient descent to increase the loss function of the target model. The algorithmic generation of adversarial examples is possible. Moreover, Papernot et al. [65] developed a software library for the easy generation of adversarial examples and other libraries are now available. An early gradient descent approach was proposed by Szegedy et al. [17] using a box-constrained limited memory L-BFGS. Given an original image, this method finds a different image that is classified differently, whilst remaining similar to the original image. Gradient descent is used by many different algorithms; however, algorithms have been designed to be optimized for different distance metrics. There are numerous gradient descent algorithms that produce adversarial examples; they can differ in their optimization and computational complexity. We note the relative computational complexity of common adversarial example algorithms in Table 6 (adapted from [27]). High success rates correlate with high computational complexity. We expect this correlation to be more pronounced for functionality-preserving attacks. FGSM [63] was improved by Kurakin et al. [103], who refined the fast gradient sign by taking multiple smaller steps. This iterative granular approach improves on FGSM by limiting the difference between the original and adversarial inputs. This often results in adversarial inputs with a predictably smaller

L_{\infty}

metric. However, FGSM modifies all parameters. This is problematic for features that must remain unchanged or for discrete features such as application programming interface (API) calls. JSMA differs from FGSM in that it optimises to minimize the total number of modified features (

L_{0}

metric). In this greedy algorithm, individual features are chosen with the aim of step-wise increasing the target classification in each iteration. The gradient is used to generate a saliency map, modelling each feature’s impact towards the resulting classification. Large values significantly increase the likelihood of classification as the target class. Thus, the most important feature is modified at each stage. This process continues until the input is successfully classified as the target class, or a threshold number of pixels is reached. This algorithm results in adversarial inputs with fewer modified features. The Deepfool algorithm [62] similarly uses gradient descent but optimises for the root-mean-square, also known as Euclidean distance (

L_{2}

). This technique simplifies the task of shifting an input over a decision boundary by assuming a linear hyper-plane separates each class. The optimal solution is derived through analysis and subsequently an adversarial example is constructed; however, neural network decision boundaries are not truly linear. Therefore, subsequent repetitions may be required until a true adversarial image is found.

The optimizations for different distance metrics are types of constraint: maximum change to any feature (

L_{\infty}

); minimal root-mean-square (

L_{2}

); minimal number of altered features (

L_{0}

). Constrained adversarial examples are important for functionality-preserving attacks. Additional constraints for specific domains are likely required, and this remains an open avenue for further research.

Most gradient descent algorithms were originally presented in the visual domain and used on images and pixel values. The pixel values of images are often presented as continuous values (0–255). The use of adversarial examples with discrete data values is less well explored and remains an interesting avenue for further research.

Black-Box Attacks: Researchers have also considered black-box attacks that do rely on gradient descent. Some black-box techniques commonly rely on the transferability of adversarial examples. Table 5 shows that few researchers employed the transferability of adversarial examples. Other common black-box techniques include GANS and genetic algorithms (GAs). Sharif et al. [104] proposed a method of attacking DNNs with a general framework to train an attack generator or generative adversarial network (GAN). GANs can be trained to produce new, robust, and inconspicuous adversarial examples. Attacks like Biggio et al. [77] are more suitable for the security domain, where assessing the security of algorithms and systems under worst-case attacks is needed [105,106].

An important consideration in attacks against intrusion detection systems is that attackers cannot perform simple oracle queries against an intrusion detection system and must minimize the number of queries to decrease the likelihood of detection. Apruzzese et al. [28] further note that the output of the target model is not directly observable by the attacker; however, exceptions occur where detected malicious traffic is automatically stopped or dropped, or where the attacker gains access to/or knowledge of the system.

Gray-box attacks consider scenarios where an adversary has only partial knowledge of the system. Biggio et al. [77] highlighted the threat from skilled adversaries with limited knowledge; more recently, gray-box attacks have received some attention: Kuppa et al. [92] considered malicious users of the system with knowledge of the features and architecture of the system, recognizing that attackers may differ in their level of knowledge of the system. Labaca-Castro et al. [99] used universal adversarial perturbations, showing that unprotected systems remain vulnerable even under limited knowledge scenarios. Li et al. [101] considered limited knowledge attacks against cyber physical systems and successfully deployed universal adversarial perturbations where attackers have incomplete knowledge of measurements across all sensors.

Building on Simple Adversarial Examples:Table 5 shows that much research considers simple adversarial examples, although less research considers sequences of adversarial examples or transferability. We chose to classify attacks as either a simple adversarial example, a sequence of adversarial examples, or a transferable adversarial example. A simple adversarial example is sufficient to alter the output of a simple classifier. Lin et al. [82] suggested that using adversarial examples strategically could affect the specific critical outputs of a machine learning system. Sequences of adversarial examples consist of two or more adversarial examples. Sequences of adversarial examples are more challenging than simple adversarial examples. Lin et al. [82] further suggested an enchanting attack to lure a machine learning system to a target state through crafting a series of adversarial examples. Table 5 shows that most research considers simple adversarial examples. Researchers are starting to consider sequences of adversarial examples and consider the transferability of adversarial examples. We chose to classify attacks in this way to clarify the complexity level of attack types. Furthermore, the table shows that sequences of adversarial examples and the transferability of adversarial examples is under-represented, providing opportunities for further research.

5.2.3. Adversarial Examples—Attack Objectives

There is a distinction between the objectives of attacks: targeted or untargeted. An attack objective might be to cause a classifier to misclassify an input as any other class (untargeted) or to misclassify an input as a specific class (targeted). In the cyber-security domain, IDS often focus on binary classification: malicious or benign. For binary classification the effect of targeted and untargeted attacks is the same. More complex multi-class IDS can help network analysts triage or prioritise different types of intrusions. Network analysts would certainly treat a distributed denial of service (DDoS) attack differently than a BotNet or infiltration attempt. Adversaries could gain significant advantage through targeted attacks, for example, by camouflaging an infiltration attack as a comparatively less serious network intrusion.

Recent research goes beyond adversarial examples causing misclassification of a single input. Moosavi-Dezfooli et al. [107] further showed the existence of untargeted universal adversarial perturbation (UAP) vectors for images, and ventured that this is problematic for classifiers deployed in real-world and hostile environments. In the cyber-security domain, Labaca et al. [99] demonstrated UAPs in the feature space of malware detection. They showed that UAPs have similar effectiveness to adversarial examples generated for specific inputs. Sheatsley et al. [56] looked at UAP in the constrained domain of intrusion detection. Adversaries need only calculate one UAP that could be applied to multiple inputs. Pre-calculation of a UAP could enable faster network attacks (DDoS) that would otherwise require too much calculation time. Table 5 shows that most research considers untargeted attacks. Targeted attacks are less represented in the literature. Furthermore, UAPs are a more recent avenue for research.

5.2.4. Adversarial Examples in Traditional Domains

Table 5 shows that attacks in the visual domain were the subject of much early research, and the visual domain continues to attract researchers; however, researchers are beginning to consider attacks against other DNN systems such as machine learning models for natural language processing, with some considering semantic preserving attacks.

In visual domains, features are generally continuous. For example, pixel values range from 0 to 255. A consensus exists in the visual domain that adversarial examples are undetectable to humans. Moreover, the application domain is clearly interrelated with the choice of machine learning model. Models such as CNNs are appropriate for visual-based tasks, whereas RNNs are appropriate for sequence-based tasks. We discuss model types in Section 5.2.6.

Some models, such as recurrent neural networks, cannot be attacked using traditional attack algorithms; however, some research aims to discover new methods to attack these systems. Papernot et al. [78] noted that because RNNs handle time sequences by introducing cycles to their computational graphs, the presence of these computation cycles means that applying traditional adversarial example algorithms is challenging because cycles prevent direct computation of the gradients. They adapted adversarial example algorithms for RNNs and evaluated the performance of their adversarial samples. If the model is differential, FGSM can be applied even to RNN models. They used a case study of a binary classifier (positive or negative) for movie reviews. They defined an algorithm that iteratively modifies words in the input sentence to produce an adversarial sequence that is misclassified by a well-trained model. They noted that their attacks are white-box attacks, requiring access to, or knowledge of, the model parameters. Szegedy [17] discovered the transferability of adversarial examples, noting that the same perturbation can cause a different network that was trained on a different subset of the dataset to misclassify the same input. This property of adversarial examples has serious implications because it means gaining access to a model is unnecessary to attack it. An adversary can employ the transferability of adversarial examples, where adversarial examples generated against a model under the adversary’s control can be successfully used to attack the target model. The transferability of adversarial examples implies that an adversary does not need full access to a model to attack it (black-box).

5.2.5. Adversarial Examples in Cyber-Security Domains

Adversarial examples (AE) have been shown to exist in many domains. Indeed, no domain identified (so far) is immune to adversarial examples [56]. Researchers are beginning to consider cyber-security domains (Figure 5) where features are often a mixture of categorical, continuous, and discrete. Some research focuses on adversarial example attacks against IDS, although few studies specifically consider functionality-preserving attacks.

In the visual domain, we briefly discussed the consensus that adversarial examples are undetectable to humans. However, it is unclear how this idea should be translated to other domains. Carlini [64] held that, strictly speaking, adversarial examples must be similar to the original input. However, Sheatsley et al. [56] noted that research in non-visual domains provides domain-specific definitions: perturbed malware must preserve its malware functionality [56], perturbations in audio must be nearly inaudible [56], and perturbed text must preserve its meaning. Sheatsley et al. further offered a definition for adversarial examples in intrusion detection: perturbed network flows must maintain their attack behaviour. We consider that human perception may not be the best criterion for defining adversarial examples in cyber-security domains. Indeed, human perception in some domains might be immaterial. For example, only very skilled engineers could perceive network packets in any meaningful way even with the use of network analysis tools. Furthermore, users likely cannot perceive a difference between the execution of benign or malicious software. After malware is executed, the effects are clear; however, during malware execution users often suspect nothing wrong. We therefore consider that while fooling human perception remains a valid ambition, it is critical that adversarial perturbations in cyber-security domains preserve functionality and behaviour.

In the cyber-security domain, traditional gradient descent algorithms may be insufficient. Algorithms that preserve functionality are required. Moreover, some models used in the cyber-security domain are distinct from those used for purely visual problems. For example, RNNs are useful for time sequences of network traffic analysis. We now consider recent functionality-preserving attacks in the cybersecurity domains of malware, intrusion detection, and CPS. We further examine Functionality-preserving attacks in Table 7.

Malware: Hu and Tan [84] proposed a novel algorithm to generate adversarial sequences to attack an RNN-based malware detection system. They claimed that algorithms adapted for RNNs are limited because they are not truly sequential. They considered a system to detect malicious API sequences. Generating adversarial examples effective against such systems is non-trivial because API sequences are discrete values. There is a discrete set of API calls; changing any single letter in an API call will create an invalid API call and cause that API call to fail. This will result in a program crash. Therefore, any perturbation of an API call must result in a set of valid API calls. They proposed an algorithm based around a generative RNN and a substitute RNN. The generative RNN takes an API sequence as input and generates an adversarial API sequence. The substitute RNN is trained on benign sequences and the outputs of the generative RNN. The generative model aims to minimize the predicted malicious probability. Subsequently, adversarial sequences are presented to six different models. Following adversarial perturbation, the majority of the malware was not detected by any victim RNNs. The authors noted that even when the adversarial generation algorithm and the victim RNN were implemented with different models and trained on different training sets, the majority of the adversarial examples successfully attacked the victim RNN through the transferability property of adversarial examples. In MLP, they reported a TPR of 94.89% that fell to 0.00% under adversarial perturbations.

Demetrio et al. [98] preserved the functionality of malware while evading static windows malware detectors. Their attacks exploit the structure of the portable executable (PE) file format. Their framework has three categories of functionality-preserving manipulations: structural, behavioural, and padding. Some of their attacks work by injecting unexecuted (benign) content in new sections in the PE file, or at the end of the malware file. The attacks are a constrained minimization problem optimizing the trade-off between the probability of evading detection and the size of injected content. Their experiments successfully evaded two Windows malware detectors with few queries and a small payload size. Furthermore, they discovered that their attacks transfered to other Windows malware products. We note that the creation of new sections provides a larger attack surface that may be populated with adversarial content. They reported that their section-injection attack was able to drastically decrease the detection rate (e.g., from an original detection rate of 93.5% to 30.5%, also significantly outperforming their random attack at 85.5%).

Labaca-Castro et al. [53] presented a gradient-based method to generate valid executable files that preserve their intended malicious functionality. They noted that malware evasion is a current area of adversarial learning research. Evading the classifier is often the foremost objective; however, the perturbations must also be carefully crafted to preserve the functionality of malware. They noted that removing objects from a PE file often leads to corrupt files. Therefore, they only implement additive or modifying perturbations. Their gradient-based attack relies on complete knowledge of the system with the advantage that the likelihood of evasion can be calculated and maximised. Furthermore, they stated that their system only generates valid executable malware files.

Intrusion Detection: Usama et al. [94] used a generative adversarial network (GAN) to generate functionality-preserving adversarial examples. They noted that adversarial examples aiming to evade IDS should not invalidate network traffic features. A typical GAN composed of two neural networks, a generator G and discriminator D, was used to construct adversarial examples that masquerade as benign but functionally probe the network. Their attack was able to evade an IDS while preserving the intended behaviour. They suggested that adversarial training using GAN-generated adversarial examples improved the robustness of their model. They reported F1-scores of 89.03 (original), 40.86 (After attack), 78.49 (after adversarial training), and an improved 83.56 after GAN-based adversarial training.

Wang et al. [117] noted that relatively few researchers are addressing adversarial examples against IDS. They proposed an ensemble defense for network intrusion detection that integrates GANS and adversarial retraining. Their training framework improved robustness while maintaining accuracy of unperturbed samples. Unfortunately, they evaluated their defences against traditional attack algorithms: FGSM, basic iterative method (BIM), Deepfool, and JSMA. However, they did not specifically consider functionality-preserving adversarial examples. They further recognised the importance of using recent datasets for intrusion detection. They reported F1-scores for three classifiers and a range of adversarial example algorithms. For example, the F1-score for an ensemble classifier tested on clean data was 0.998 compared to 0.746 for JSMA. Among all classifiers, the ensemble classifier achieved superior F1-scores under all conditions.

Huang et al. [95] noted that it is more challenging to generate DDoS adversarial examples because of their discrete properties. They noted that work in the visual domain cannot be directly applied to adversarial examples for intrusion detection of DDoS. The input to their algorithm is a series of packets. This makes it difficult to optimize the distance between the original and adversarial sample while guaranteeing the validity of each packet. They proposed two black-box methods to generate DDoS adversarial examples against LSTM-based intrusion detection systems: genetic algorithm (GA) and probability weighted packet saliency attack (PWPSA). Each method modifies the original input, either inserting or modifying packets. The GA method evolves a population of DDoS samples and selects adversarial examples from the population. The PWPSA method finds the most important packet in the sequence and replaces it with a different “best packet” for this position. Both methods produce adversarial examples that can successfully evade their DDoS intrusion detection model. They reported success rates for their different attacks against different detectors. For example, success results for detector D: GA-Replace 91.37%, GA-Insert 74.5%, PWPSA-Replace 88.9%, and PWPSA-Insert 67.17%.

Cyber-Physical Systems: Cai et al. [100] warned that adversarial examples have consequences for system safety because they can cause systems to provide incorrect outputs. They presented a detection method for adversarial examples in CPS. They used a case study of an advanced emergency braking system, where a DNN estimates the distance to an obstacle. Their adversarial example detection method uses a variational auto-encoder to predict a target variable (distance) and compare it with a new input. Any anomalies are considered adversarial. Furthermore, adversarial example detectors for CPS must function efficiently in a real-time monitoring environment and maintain low false alarm rates. They reported that since the p-values for the adversarial examples are almost 0, the number of false alarms is very small and the detection delay is smaller than 10 frames or 0.5 s.

CPS include critical national infrastructure, such as power grids, water treatment plants, and transportation. Li et al. [101] asserted that adversarial examples could exploit vulnerabilities in CPS with terrible consequences; however, such adversarial examples must satisfy real-world constraints (commonly linear inequality constraints). For example, meter readings downstream may never be larger than meter readings upstream. Adversarial examples breaking constraints are noticeably anomalous. Risks to CPS arising from adversarial examples are not yet fully understood. Furthermore, algorithms and models from other domains may not readily apply because of distributed sensors and inherent real-world constraints. However, generated adversarial examples that meet such linear constraints were successfully applied to power grids and water treatment system case studies. The evaluation results show that even with constraints imposed by the physical systems, their approach still effectively generates adversarial examples, significantly decreasing the detection accuracy. For example, they reported the accuracy under adversarial conditions to be as low as 0%.

5.2.6. Adversarial Examples and Model Type

We classify models based on their architecture in four broad types: multi-layer perceptron (MLP), CNN, RNN, and RF. Ali et al. [118] observed that different deep learning architectures are more robust than others. They noted that CNN and RNN detectors are more robust than MLP and hybrid detectors, based on low attack success rates and high query counts. Architecture plays a role in the accuracy of these models because CNNs can learn contextual features due to their structure, and RNNs are temporally deeper, and thus demonstrate greater robustness.

Unsurprisingly, research on CNNs coincides with research in the visual domain, as shown in Table 5. The majority of adversarial example research on RNNs has until recently focused on the text or natural language domain; however, RNNs are also useful in the cybersecurity domain and researchers have recently considered adversarial example attacks against RNN-based IDS.

Other promising research shows that radial basis function neural networks (RBFNN) are more robust to adversarial examples [119]. RBFNNs fit a non-linear curve during training, as opposed to fitting linear decision boundaries. Commonly, RBFNNs transform the input such that when it is fed into the network it gives a linear separation. The non-linear nature of RBFNNs could be one potential direction for adversarial example research. Powerful attacks that are able to subvert RBFNNs would improve our understanding of decision boundaries. Goodfellow et al. [63] argued that the primary cause of neural networks’ vulnerability to adversarial perturbation is their linear nature. However, RBFNNs are less commonly deployed and are therefore not further discussed.

5.2.7. Adversarial Examples and Knowledge Requirement

The majority of the research focus is on white-box attacks, as shown in Table 5, perhaps because such attacks are known to be efficient and effective. Less research focuses on black-box attacks and few studies recognise gray-box attacks that need only partial model knowledge. Gray-box attacks will likely have advantages over black-box attacks. Adversaries will undoubtedly use any and all information available to them.

We classify the attacks on the knowledge required by the adversary. White-box attacks are likely the most effective and efficient method of attack, because the adversary has complete knowledge of the model architecture, and information on how the model was trained. However, access to this knowledge is harder to attain, although it might also be gained through insider threats [120] or model extraction attacks [121]. Extracted models might be a feasible proxy on which to develop and test adversarial examples.

Notwithstanding the efficiency of white-box attacks, effective black-box attacks are possible. Black-box (or oracle) attacks require no knowledge of the model. Adversaries only need the ability to query the model and receive its output. Adversaries generate inputs and receive the output of the model. Typical black-box attacks include GA [95], and GANs [89,97].

Gray-box attacks require only limited model knowledge, perhaps including knowledge of the features used by the model. This is a realistic prospect, as adversaries will likely have or gain at least partial knowledge of the model.

5.2.8. Adversarial Example Constraints

Table 5 shows little research considering constraints of any sort. Much research on IDS ignores constraints; however, network traffic is highly constrained by protocols, and some network firewalls may drop malformed packets. Furthermore, it is insufficient that well-formed adversarial examples progress past firewalls. They must also retain their intended functionality.

Stringent constraints exist in the cyber-security domain. Extreme care must be taken to create valid adversarial examples. For example, in IDS, adversaries must conform the protocol specification of the TCP/IP stack.

We classify adversarial example constraints into three groups: (1) box constraints, simple constraints where values must remain within certain values; (2) sparse constraints, where a maximum number of features can be modified, the most extreme version being where only one feature can be modified; and (3) functionality-preserving constraints, where adversarial examples must retain their original functionality. For example, malware must function as malware when perturbed to evade a malware detector, and DDoS attacks must function as DDoS attacks when perturbed to evade detection. Functionality-preserving adversarial examples are an interesting avenue for further research.

5.3. Defenses against Adversarial Examples

Figure 6 shows some common defence types that we explore in this section. We further detail proposed defenses against adversarial examples in Table 8.

It is hard to defend against adversarial examples. People expect ML models to give good outputs for all possible inputs. Because the range of possible inputs is so large, it is difficult to guarantee correct model behaviour for every input. Some researchers explored the possibility of exercising all neurons during training [132]. Furthermore, consideration must be given to how adversaries might react when faced with a defense. Researchers in secure machine learning must evaluate whether defenses remain secure against adversaries with knowledge of model defenses.

We classify the suggested defenses against adversarial examples into the following groups: pre-processing, adversarial training, architectural, detection, distillation, testing, ensembles, and game theory.

5.3.1. Pre-Processing as a Defense against Adversarial Examples

Some promising research considers transformations, such as translation, additive noise, blurring, cropping, and resizing. These often occur with cameras and scanners in the visual domain. Translations have shown initial success in the visual domain. Initial successes have prompted some researchers to discount security concerns. For example, Graese [123] overreached by declaring adversarial examples an “academic curiosity”, not a security threat. This position misunderstands the threat from adversarial examples, which remain a concern for cyber-security researchers.

Eykolt et al. [143] noted the creation of perturbations in physical space that survive more challenging physical conditions (distance, pose, and lighting). Transformations are appropriate for images; however, such translations may make little sense in cyber-security domains. For example, what would it mean to rotate or blur a network packet? Nevertheless, inspiration could be taken from pre-processing methods in the visual domain. Adapting pre-processing methods to cyber-security and other non-visual domains is an interesting avenue for research.

5.3.2. Adversarial Training as a Defense against Adversarial Examples

Szegedy et al. [17] found that robustness to adversarial examples can be improved by training a model on a mixture of adversarial examples and unperturbed samples. Specific vulnerabilities in the training data can be identified through exploring UAPs. Identified vulnerabilities could potentially be addressed with adversarial training. We recognise that adversarial training is a simple method aiming to improve robustness; however, it is potentially a cosmetic solution: the problem of adversarial examples cannot be solved only through ever greater amounts of adversarial examples in the training data. Tramér et al. [102] found that adversarial training is imperfect and can be bypassed. Moreover, black-box attacks have been shown to evade models subject to adversarial training. Adversarial training has some merit because it is a simple method to improve robustness. It is unfortunately not a panacea and should be bolstered by other defenses. Research avenues could combine adversarial training with other techniques. We warn that models used in cyber-security or other critical domains should not rely solely on adversarial training.

5.3.3. Architectural Defenses against Adversarial Examples

Some research, rather than modifying a model’s training data, investigated defenses through hardening the architecture of the model. This could involve changing model parameters or adding new layers. In Table 8, we classify such defenses as architectural.

Many white-box attacks rely on the quality of the gradient. Some research considers how the model’s weights can be used to disrupt adversarial examples. Amer and Maul [135] modified convolutional neural networks (CNN), adding a weight map layer. Their proposed layer easily integrates into existing CNNs. A weight mapping layer may be inserted between other CNN layers, thus increasing the network’s robustness to both noise and gradient-based adversarial attacks.

Other research aims to block algorithms from using weight transport and back-propagation to generate adversarial examples. Lillicrap et al. [122] proposed a mechanism called “feedback alignment”, which introduces a separate feedback path via random fixed synaptic weights. Feedback alignment blocks the generation of adversarial examples that rely on the gradient because it uses the separate feedback path rather than weight transport.

Techniques to improve accuracy could similarly help harden models. For example, dropout can improve accuracy when used during training. It is particularly useful where there is limited training data and over-fitting is more likely to occur. Wang et al. [134] proposed hardening DNN using defensive dropout at test time. Unfortunately, there is inherently a trade-off between defensive dropout and test accuracy; however, a relatively small decrease in test accuracy can significantly reduce the success rate of attacks. Such hardening techniques force successful attacks to use larger perturbations, which in turn may be more readily recognized as adversarial.

Defenses that block gradient-based attacks complicate the generation of adversarial examples; however, like adversarial training, these defenses could be bypassed. In particular, black-box attacks and transferability-based attacks are not blocked by such defenses. A more promising defense, “defensive dropout”, can block both black-box and transferability-based attacks.

5.3.4. Detecting Adversarial Examples

Much research has considered the best way to detect adversarial examples. If adversarial examples can be detected they could be more easily deflected, and perhaps even the original input could be salvaged and correctly classified. Grosse et al. [124] proposed a statistical test to detect adversarial examples before they are input into machine learning models. They observed that adversarial examples are unrepresentative of the distribution, and lie in unexpected regions of a model’s output surface. Their proposed outlier detection system relies on the statistical separation of adversarial examples. They subsequently evaluated their model against adaptive strategies and strong black-box strategies.

Metzen et al. [125] proposed a binary classifier “detector subnetwork” aiming to distinguish between genuine data and adversarial examples. The detection of adversarial examples does not unequivocally lead to correct classification; however, the effect of adversarial examples could perhaps be mitigated through fallback solutions, for example, by requesting human intervention. After successfully detecting adversarial examples in their experiments, they later bypassed their defenses by generating adversarial examples that fool both detector and classifier. They further proposed a training procedure called “dynamic adversary training” as a countermeasure to their attack against the detector.

Feiman et al. [126] also detected adversarial examples by considering which artifacts of adversarial examples could help detection. They considered two complementary features used to detect adversarial examples: density estimates and Bayesian uncertainty estimates. They evaluated these features on CNNs trained on MNIST and CIFAR-10 datasets. They effectively detected adversarial examples with an ROC-AUC of 92.6%. They further suggested that their method could be used in RNNs. This suggestion is bolstered by Gal and Ghahramani’s [144] assertion that Bayesian approximation using dropout can be applied to RNN networks.

Meng et al. [127] proposed a framework, “MagNET”, to detect adversarial examples. This framework precedes the classifier it defends. The framework has two components: (1) A detector finds and discards any out-of-distribution examples (those significantly far from the manifold boundary); (2) A reformer that aims to find close approximations to inputs before forwarding the approximations to the classifier. Their system generalizes well because it learns to detect adversarial examples without knowledge of how they were generated. They proposed a defense against gray-box attacks where the adversary has knowledge of the deployed defenses. The proposed defense trains a number of auto-encoders (or reformers). At test-time a single auto-encoder is selected at random.

Xu et al. [128] proposed “feature squeezing” as a strategy to detect adversarial examples by squeezing out unnecessary features in the input. Through comparing predictions of the original and feature squeezed inputs, adversarial examples are identified if the difference between the two predictions meets a threshold. Two feature-squeezing methods are used: (1) Reducing the colour bit-depth of the image; (2) Spatial smoothing. An adversary may adapt and circumvent this defense; however, the defense may frustrate the adversary because it changes the problem the adversary must overcome.

Rosenberg et al. [136] considered the feature squeezing defense designed for CNNs and proposed “sequence squeezing”, which is adapted for RNNs. Adversarial examples are similarly detected by running the classifier twice: once on the original sequence, and once for the sequence-squeezed input. An input is identified as adversarial if the difference in the confidence scores meets a threshold value.

Zhang et al. [141] proposed an image classifier based on a variational auto-encoder. They trained two models each on half the dataset: a target model and a surrogate model. On the surrogate model they generated three types of strong transfer-based adversarial examples:

L_{0}

,

L_{2}

, and

L_{\infty}

. Analysis of their model using the CIFAR-10, MNIST, and Fashion-MNIST datasets found that their model achieves state-of-the-art accuracy with significantly better robustness. Their work is in the visual domain; however, perhaps their ideas can be applied to other domains such as intrusion detection.

We have discussed some architectural defenses against adversarial examples. In particular, we have considered methods for detecting adversarial examples. Carlini and Wagner [145] showed that adversarial examples are harder to detect and that adversarial examples do not exhibit intrinsic properties. Moreover, many detection methods can be broken by choosing good attacker-loss functions. Grosse et al. [124] noted that adversarial defenses exist within an arms race and that guarantees against future attacks are difficult because adversaries may adapt to the defenses by adopting new strategies. Meng et al. [127] advocated that defenses against adversarial examples should be independent of any particular attack. We have seen that human-in-the-loop solutions could be useful where few cases need human intervention; however, repeated requests might quickly overwhelm human operators given large numbers of adversarial examples, for example, as might be seen in network traffic analysis.

5.3.5. Defensive Testing

Adversarial examples cause unexpected behaviour. Recent research considers testing deep learning systems. Pei et al. [146] aimed to discover unusual or unexpected behaviour of a neural network through systematic testing. They produced test data by solving a joint optimization problem. Their tests aim to trigger different behaviours and activate a high proportion of neurons in a neural network. Their method finds corner-cases where incorrect behaviour is exhibited, for example, malware masquerading as benign. They claimed to expose more inputs and types of unexpected behaviour than adversarial examples. They further used the generated inputs to perform adversarial training. As a defense we question the practicability of triggering all neurons in larger neural networks; however, as an attack, their method could produce different types of adversarial inputs.

Other researchers are considering similar techniques to generate test data. Tian et al. [132] evaluated a tool for automatically detecting erroneous behaviour, generating test inputs designed to maximise the number of activated neurons using realistic driving conditions, including blurring, rain, and fog. Zhang et al. [133] proposed a system to automatically synthesize large amounts of diverse driving scenes, including weather conditions, using GANs. We consider GANs useful for generating adversarial inputs. GANs should implicitly learn domain constraints.

5.3.6. Multi-Classifier Systems

Biggio et al. [105] highlighted that robustness against adversarial examples can be improved through the careful use of ensemble classifiers, for example, by using rejection-based mechanisms. Indeed, Biggio et al. had implemented a multi-classifier system (MCS) [147], which was hardened using randomisation. Randomising the decision boundary makes a classifier harder to evade. Since the attacker has less information on the exact position of a decision boundary, they must make too conservative or too risky choices when generating adversarial examples.

5.3.7. Game Theory

Zhou et al. [139] consider game theoretic modeling of adversarial machine learning problems. Many different models have been proposed. Some aim to optimise the feature set using a set of high-quality features, thus making adversarial attacks more difficult. Game theoretic models are proposed to address more complex situations with many adversaries of different types. Equilibrium strategies are acceptable to both players and neither has an incentive to change. Therefore, assuming rational adversaries, game theory-based approaches allowing a Nash equilibrium could potentially end the evolutionary arms race.

5.3.8. Adversarial Example Defenses in Cybersecurity Domains

We discussed domains in Section 5.2.4 and Section 5.2.5. Most research on defenses against adversarial examples has focused on the visual domain. Comparatively little research has so far considered defenses in cybersecurity domains such as intrusion detection and malware analysis. Applying current defenses in the visual domain to other domains might efficiently kickstart research into defenses for other domains. Effective defenses against adversarial examples could help enable the use of ML models in cybersecurity and other adversarial environments.

Different model types are more suited to domains. We consider that different model types may require different defenses. Again, we classify models into four types: MLP, CNN, RNN, and RF.

6. Discussion and Conclusions

ML systems are deployed in complex environments, including cybersecurity and critical national infrastructure. Such systems attract the interest of powerful advanced persistent threats that may target them. Crucially, we must address robustness against functionality-preserving adversarial examples before novel attack strategies exploit inherent weaknesses in critical ML models.

Machine learning and adversarial learning are becoming increasingly recognised by the research community, given the rapid uptake of ML models in a whole host of application domains. To put this in context, 2975 papers were published on arXiv in the last 12 months (October 2020–September 2021) related to machine learning and adversarial learning. Over recent years, the number of papers being published on this topic has grown substantially. According to Carlini, who maintains a blog post “A Complete List of All (arXiv) Adversarial Example Papers” [148], the cumulative number of adversarial example papers neared 4000 in the year 2021. It is therefore evident that there is a lot of interest and many researchers active in this area. Not all papers in this list are useful or relevant; we pass no judgement of their quality but merely aim to clarify the research landscape and draw important research to the fore. The majority of prior research has been applied to the visual domain. Seminal contributions have been made by Szegedy et al. [17], Goodfellow et al. [63], Carlini et al. [64], and Papernot [79]. It is clear that the visual domain continues to be well researched.

We conducted an extensive survey of the academic literature in relation to functionality-preservation in adversarial machine learning. We derived a classification based on both attack and defense. We consider edpossible robustness metrics. Moreover, we considered model training and data-level techniques that could help improve robustness through tackling biased datasets.

Analysis of functionality-preservation methods finds gradient-based methods may be less suitable for functionality-preservation and other constraints. Methods modifying large numbers of features are less likely to preserve functionality. We found that GANS and genetic algorithms are suitable for functionality-preserving attacks. We subsequently discussed defense strategies against functionality-preserving adversarial examples. We found that functionality-preserving adversarial machine learning is an open research topic. Finally, we will identify some key future directions and research challenges in functionality-preserving machine learning.

Future Directions and Research Challenges

We now discuss future research challenges. Few researchers address the problem of transferability, which remains a key area of concern because hard-to-attack models are nevertheless susceptible to transferable adversarial examples generated against easy-to-attack models. Breaking the transferability of adversarial examples is a key challenge for the research community. Currently, defensive dropout [134] at test time is a promising defense. Adversarial example detection is a useful area of research.

We consider the area of functionality-preserving adversarial examples under-explored. Research into improving robustness against such adversarial examples is an area requiring urgent research. We suggest that adapting defenses used in the visual domain and CNN models to other model types such as RNNs could offer potential solutions. Caution should be exercised when adapting defenses in the visual domain to other domains. For example, denoising defenses may not apply directly to discrete or noncontinuous data.

Constraints on adversarial examples are not limited to preserving the functionality of malware or IDS attacks. CPSs model the real world, where linear and other physical constraints must be respected. Adversarial examples that do not respect domain constraints risk marking themselves as obvious anomalies.

Concept-drift is a real concern for cybersecurity [1], as new attacks and techniques are discovered daily. As the model and the current state of the art diverge, the model suffers from hidden technical debt. Therefore, the model must be retrained to reflect the current state-of-the-art attacks and new network traffic patterns [149]. Researchers might develop and use more up-to-date datasets. Further avenues for research include semi-supervised/unsupervised ML and active learning methods that continuously update the underlying model and do not rely on labelled datasets.

We identify that data-level techniques such as resampling, balancing datasets, and cross validation could have effects on robustness against adversarial examples. Further research is required to explore how the bias-variance trade-off can effect robustness.

We prioritise the areas of future research, setting the agenda for research in this area. Critical areas of research include breaking the transferability of adversarial examples that would hopefully be applicable across domains. Non-visual domains including cybersecurity and cyber physical systems have been under-explored and this oversight should be rectified urgently. Further research on transformations in non-visual domains could provide useful knowledge. Detection of adversarial examples and pushing the fields of cybersecurity, intrusion detection, and cyber-physical systems will yield benefits beyond cybersecurity and may be applicable in other non-visual domains. Moreover, research is required in areas beyond instance classifiers. Areas of RNNs and reinforcement learning have been under-explored. More research is required to understand the use of domain constraints and functionality-preserving adversarial examples. Further research is needed towards effective countermeasures.

Additionally, we consider that more research attention could be given to dataset resampling strategies as a defence against adversarial examples. There is a need for better robustness metrics. Some researchers simply state accuracy, and others might state the better F1-score; however, the F1-score is biased by unbalanced datasets that are widespread in intrusion detection, partly due to large numbers of benign samples. Using F1-score could lead to a false sense of security. Researchers should adopt stronger metrics such as CLEVER [59] or empirical robustness [62].

Adversarial machine learning is a critical area of research. If not addressed, there is increasing potential for novel attack strategies that seek to exploit the inherent weaknesses that exist within machine learning models; however, few works consider “realisable” perturbations that take account of domain and/or real-world constraints. Successful adversarial examples must be crafted to comply with domain and real-world constraints. This may be challenging since even small modifications may corrupt network packets that are likely to be dropped by firewalls. This necessitates functionality preservation in adversarial learning.

We propose that human perception may not be the best criterion for analyzing adversarial examples. In cybersecurity domains we propose that adversarial examples must preserve functionality. Traditionally, adversarial examples are thought of as having imperceptible noise. That is, that humans cannot perceive the difference between the original and perturbed inputs. Indeed, human perception in some domains might be immaterial. For example, strategic attacks triggered at crucial moments might cause damage to CPS before any human could reasonably act.

In cyber-security domains traditional gradient descent algorithms may be insufficient, although JSMA may be reasonable because it perturbs few features. Stringent constraints exist in the cyber-security domain and extreme care must be taken to create valid adversarial examples. We offer some guidelines for generating functionality-preserving adversarial examples. Functionality-preserving adversarial examples should: only perform legitimate transformations; respect mathematical dependencies, real-world, and domain constraints; minimize the number of perturbed features and restrict modification to non-critical features; and where possible retain the original payload and/or packet order.

Defences against adversarial examples must consider that adversaries are likely to adapt by adopting new strategies. Many researchers propose adversarial training to improve robustness. Adversarial training is a simple method aiming to improve robustness; however, it is potentially a cosmetic solution: the problem of adversarial examples cannot be solved only through ever greater numbers of adversarial examples in the training data. Adversarial training, if used, must be bolstered by other defenses. Interesting defence strategies include randomisation: randomising decision boundaries makes evasion more difficult because attackers have less information on the exact position of a decision boundary. They must therefore make too conservative or too risky choices when generating adversarial examples.

Game theoretic models could be used to address more complex situations with many adversaries of different types as found in intrusion detection. Equilibrium strategies acceptable to both defender and adversary mean neither has an incentive to change. Therefore, assuming rational opponents, game theory-based approaches allowing a Nash equilibrium could potentially end the evolutionary arms race, although it is difficult to conceive a world where no advantage is possible.

Current promising defenses such as dropout exchange a relatively small decrease in accuracy for a significant reduction of successful attacks, even successfully blocking black-box and transferability-based attacks. Hardening techniques force successful attacks to use larger perturbations, which in turn may be more readily recognized as adversarial.

In a broader cybersecurity context, risks arising from adversarial examples are not yet fully understood. Furthermore, algorithms and models from other domains may not readily apply because of distributed sensors and inherent real-world constraints. It is uncertain whether current defences are sufficient. Furthermore, adversarial example detectors must function efficiently in a real-time monitoring environment while maintaining low false alarm rates.

Many academic researchers use old datasets that do not fairly represent modern network traffic analysis problems due to concept-drift. Problems of labelling data and retraining systems provide an impetus to explore unsupervised and active learning. Unfortunately, adversarial attacks are possible on active learning systems [150]. Lin et al. [82] described an enchanting attack to lure a machine learning system to a target state through crafting a series of adversarial examples. It is conceivable that similar attacks could lure anomaly detection systems towards normalizing and accepting malicious traffic.

Key Future Research Challenges

Adversarial ML is a critical area of research. Researchers must address the robustness of ML models against adversarial examples allowing safer deployment of ML models across cybersecurity domains. Better robustness metrics should be used and developed. We find the traditional benchmark of human perception may be less relevant in functionality preservation. Moreover, traditional gradient descent algorithms may be insufficient to generate functionality-preserving attacks, and adversaries may use other methods such as GANS. Therefore, defences against gradient descent algorithms may likewise be insufficient. Defences must consider reactive adversaries who adapt to defences. Randomisation of decision boundaries can make evasion more difficult. Moreover, research into multi-classifier systems could help thwart evasion attacks, making it harder to evade classification. Dropout is currently a promising defense against adversarial examples, although multiple defenses may be required and a combination of defenses will likely offer better defense capability. Game-theory approaches could potentially end the adversarial arms race by achieving a Nash equilibrium. Concept-drift requires further research. Many researchers are using outdated datasets. Simply using newer datasets could postpone problems of concept-drift and is a good first step. Unsupervised/semi-supervised and active learning could potentially offer longer-term solutions to concept-drift, aiming for models to learn and detect novel attack methods. Transferability of adversarial examples remains an open issue, and more research here has the potential to disrupt many attack strategies. More research is required in the area of functionality-preserving adversarial attacks, recognising the limits and trade-offs between functionality-preserving adversarial examples and their ability to evade classification; moreover, research into adversarial attacks in other constrained domains could improve robustness against complex attacks.

We offer these insights and hope that this survey offers other researchers a base for exploring the areas of robustness and functionality-preserving adversarial examples.

Author Contributions

Conceptualization, A.M., P.A., E.G. and P.L.; methodology, A.M.; formal analysis, A.M.; investigation, A.M.; writing—original draft preparation, A.M.; writing—review and editing, A.M., P.A., E.G. and P.L.; visualization, A.M.; supervision, P.A., E.G. and P.L.; funding acquisition, P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Partnership PhD scheme at the University of the West of England in collaboration with Techmodal Ltd.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Andresini, G.; Pendlebury, F.; Pierazzi, F.; Loglisci, C.; Appice, A.; Cavallaro, L. INSOMNIA: Towards Concept-Drift Robustness in Network Intrusion Detection. In Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec), ACM, Virtual Event, Korea, 15 November 2021. [Google Scholar]
Raghuraman, C.; Suresh, S.; Shivshankar, S.; Chapaneri, R. Static and dynamic malware analysis using machine learning. In First International Conference on Sustainable Technologies for Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2020; pp. 793–806. [Google Scholar]
Berger, H.; Hajaj, C.; Dvir, A. Evasion Is Not Enough: A Case Study of Android Malware. In International Symposium on Cyber Security Cryptography and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2020; pp. 167–174. [Google Scholar]
Hou, R.; Xiang, X.; Zhang, Q.; Liu, J.; Huang, T. Universal Adversarial Perturbations of Malware. In International Symposium on Cyberspace Safety and Security; Springer: Berlin/Heidelberg, Germany, 2020; pp. 9–19. [Google Scholar]
Parshutin, S.; Kirshners, A.; Kornijenko, Y.; Zabiniako, V.; Gasparovica-Asite, M.; Rozkalns, A. Classification with LSTM Networks in User Behaviour Analytics with Unbalanced Environment. Autom. Control. Comput. Sci. 2021, 55, 85–91. [Google Scholar] [CrossRef]
Le, D.C.; Zincir-Heywood, N. Exploring anomalous behaviour detection and classification for insider threat identification. Int. J. Netw. Manag. 2021, 31, e2109. [Google Scholar] [CrossRef]
Biswal, S. Real-Time Intelligent Vishing Prediction and Awareness Model (RIVPAM). In Proceedings of the 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), Dublin, Ireland, 14–18 June 2021; pp. 1–2. [Google Scholar]
Kumar, N.; Sonowal, S.; Nishant. Email Spam Detection Using Machine Learning Algorithms. In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 15–17 July 2020; pp. 108–113.
Kiela, D.; Firooz, H.; Mohan, A.; Goswami, V.; Singh, A.; Ringshia, P.; Testuggine, D. The hateful memes challenge: Detecting hate speech in multimodal memes. arXiv 2020, arXiv:2005.04790. [Google Scholar]
Bin Naeem, S.; Kamel Boulos, M.N. COVID-19 misinformation online and health literacy: A brief overview. Int. J. Environ. Res. Public Health 2021, 18, 8091. [Google Scholar] [CrossRef]
Coan, T.; Boussalis, C.; Cook, J.; Nanko, M. Computer-assisted detection and classification of misinformation about climate change. SocArXiv 2021, 1–12. [Google Scholar]
Khanam, Z.; Alwasel, B.; Sirafi, H.; Rashid, M. Fake News Detection Using Machine Learning Approaches. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Jeju Island, Korea, 12–14 March 2021; IOP Publishing: Jaipur, India, 2021; Volume 1099, p. 012040. [Google Scholar]
Papernot, N.; McDaniel, P.; Sinha, A.; Wellman, M.P. Sok: Security and privacy in machine learning. In Proceedings of the2018 IEEE European Symposium on Security and Privacy (EuroS&P), London, UK, 24–26 April 2018; pp. 399–414. [Google Scholar]
Dhar, S.; Guo, J.; Liu, J.; Tripathi, S.; Kurup, U.; Shah, M. On-device machine learning: An algorithms and learning theory perspective. arXiv 2019, arXiv:1911.00623. [Google Scholar] [CrossRef]
Gu, X.; Easwaran, A. Towards Safe Machine Learning for CPS: Infer Uncertainty from Training Data. In Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, Association for Computing Machinery, New York, NY, USA, 16–18 April 2019; pp. 249–258. [Google Scholar] [CrossRef] [Green Version]
Ghafouri, A.; Vorobeychik, Y.; Koutsoukos, X. Adversarial regression for detecting attacks in cyber-physical systems. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Wardle, S.G.; Taubert, J.; Teichmann, L.; Baker, C.I. Rapid and dynamic processing of face pareidolia in the human brain. Nat. Commun. 2020, 11, 1–14. [Google Scholar] [CrossRef]
Summerfield, C.; Egner, T.; Mangels, J.; Hirsch, J. Mistaking a house for a face: Neural correlates of misperception in healthy humans. Cereb. Cortex 2006, 16, 500–508. [Google Scholar] [CrossRef]
Huang, Y.; Verma, U.; Fralick, C.; Infantec-Lopez, G.; Kumar, B.; Woodward, C. Malware Evasion Attack and Defense. In Proceedings of the 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Portland, OR, USA, 24–27 June 2019; pp. 34–38. [Google Scholar] [CrossRef] [Green Version]
Ayub, M.A.; Johnson, W.A.; Talbert, D.A.; Siraj, A. Model Evasion Attack on Intrusion Detection Systems using Adversarial Machine Learning. In Proceedings of the 2020 54th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 18–20 March 2020; pp. 1–6. [Google Scholar] [CrossRef]
Satter, R. Experts Who Wrestled with SolarWinds Hackers say Cleanup Could Take Months-or Longer; Reuters: New York, NY, USA, 2020. [Google Scholar]
Sirota, S. Air Force response to SolarWinds hack: Preserve commercial partnerships, improve transparency into security efforts. Inside Cybersecur. 2021. [Google Scholar]
Corona, I.; Giacinto, G.; Roli, F. Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues. Inf. Sci. 2013, 239, 201–225. [Google Scholar] [CrossRef]
Hankin, C.; Barrère, M. Trustworthy Inter-connected Cyber-Physical Systems. In International Conference on Critical Information Infrastructures Security; Springer: Berlin/Heidelberg, Germany, 2020; pp. 3–13. [Google Scholar]
Cho, J.H.; Xu, S.; Hurley, P.M.; Mackay, M.; Benjamin, T.; Beaumont, M. Stram: Measuring the trustworthiness of computer-based systems. ACM Comput. Surv. (CSUR) 2019, 51, 1–47. [Google Scholar] [CrossRef]
Zhang, J.; Li, C. Adversarial examples: Opportunities and challenges. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 2578–2593. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Apruzzese, G.; Andreolini, M.; Ferretti, L.; Marchetti, M.; Colajanni, M. Modeling Realistic Adversarial Attacks against Network Intrusion Detection Systems. Digit. Threat. Res. Pract. 2021. [Google Scholar] [CrossRef]
Shannon, C.E. Communication theory of secrecy systems. Bell Syst. Tech. J. 1949, 28, 656–715. [Google Scholar] [CrossRef]
Taran, O.; Rezaeifar, S.; Voloshynovskiy, S. Bridging machine learning and cryptography in defence against adversarial attacks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
Wu, Y.; Wei, D.; Feng, J. Network attacks detection methods based on deep learning techniques: A survey. Secur. Commun. Netw. 2020, 2020, 8872923. [Google Scholar] [CrossRef]
Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
McHugh, J. Testing intrusion detection systems: A critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory. ACM Trans. Inf. Syst. Secur. (TISSEC) 2000, 3, 262–294. [Google Scholar] [CrossRef]
Cerf, V.G. 2021 Internet Perspectives. IEEE Netw. 2021, 35, 3. [Google Scholar] [CrossRef]
McKeay, M. Akamai State of the Internet/Security: A Year in Review. Available online: http://akamai.com/soti (accessed on 15 September 2021).
Kok, S.; Abdullah, A.; Jhanjhi, N.; Supramaniam, M. A review of intrusion detection system using machine learning approach. Int. J. Eng. Res. Technol. 2019, 12, 8–15. [Google Scholar]
Alatwi, H.A.; Morisset, C. Adversarial Machine Learning In Network Intrusion Detection Domain: A Systematic Review. arXiv 2021, arXiv:2112.03315. [Google Scholar]
Revathi, S.; Malathi, A. A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection. Int. J. Eng. Res. Technol. (IJERT) 2013, 2, 1848–1853. [Google Scholar]
Gharaibeh, M.; Papadopoulos, C. DARPA 2009 intrusion detection dataset. Colo. State Univ. Tech. Rep. 2014. [Google Scholar]
Garcia, S.; Grill, M.; Stiborek, J.; Zunino, A. An empirical comparison of botnet detection methods. Comput. Secur. 2014, 45, 100–123. [Google Scholar] [CrossRef]
Song, J.; Takakura, H.; Okabe, Y.; Eto, M.; Inoue, D.; Nakao, K. Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation. In Proceedings of the first Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, Salzburg, Austria, 10–13 April 2011; pp. 29–36. [Google Scholar]
Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 military communications and information systems conference (MilCIS), Canberra, Australia, 10 November 2015; pp. 1–6. [Google Scholar]
Almomani, I.; Al-Kasasbeh, B.; Al-Akhras, M. WSN-DS: A dataset for intrusion detection systems in wireless sensor networks. J. Sens. 2016, 2016, 4731953. [Google Scholar] [CrossRef] [Green Version]
Niyaz, Q.; Sun, W.; Javaid, A.Y. A deep learning based DDoS detection system in software-defined networking (SDN). arXiv 2016, arXiv:1611.07400. [Google Scholar] [CrossRef] [Green Version]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 2018, 1, 108–116. [Google Scholar]
Antonakakis, M.; April, T.; Bailey, M.; Bernhard, M.; Bursztein, E.; Cochran, J.; Durumeric, Z.; Halderman, J.A.; Invernizzi, L.; Kallitsis, M.; et al. Understanding the mirai botnet. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 1093–1110. [Google Scholar]
Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the Internet of things for network forensic analytics: Bot-iot dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef] [Green Version]
Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An ensemble of autoencoders for online network intrusion detection. arXiv 2018, arXiv:1802.09089. [Google Scholar]
Janusz, A.; Kałuza, D.; Chądzyńska-Krasowska, A.; Konarski, B.; Holland, J.; Ślęzak, D. IEEE BigData 2019 cup: Suspicious network event recognition. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, 9–12 December 2019; pp. 5881–5887. [Google Scholar]
Ferriyan, A.; Thamrin, A.H.; Takeda, K.; Murai, J. Generating Network Intrusion Detection Dataset Based on Real and Encrypted Synthetic Attack Traffic. Appl. Sci. 2021, 11, 7868. [Google Scholar] [CrossRef]
Martins, N.; Cruz, J.M.; Cruz, T.; Abreu, P.H. Adversarial machine learning applied to intrusion and malware scenarios: A systematic review. IEEE Access 2020, 8, 35403–35419. [Google Scholar] [CrossRef]
Shafique, M.; Naseer, M.; Theocharides, T.; Kyrkou, C.; Mutlu, O.; Orosa, L.; Choi, J. Robust machine learning systems: Challenges, current trends, perspectives, and the road ahead. IEEE Des. Test 2020, 37, 30–57. [Google Scholar] [CrossRef]
Labaca-Castro, R.; Biggio, B.; Dreo Rodosek, G. Poster: Attacking malware classifiers by crafting gradient-attacks that preserve functionality. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 2565–2567. [Google Scholar]
Gonzalez-Cuautle, D.; Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, L.K.; Portillo-Portillo, J.; Olivares-Mercado, J.; Perez-Meana, H.M.; Sandoval-Orozco, A.L. Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets. Appl. Sci. 2020, 10, 794. [Google Scholar] [CrossRef] [Green Version]
Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 1–54. [Google Scholar] [CrossRef]
Sheatsley, R.; Papernot, N.; Weisman, M.; Verma, G.; McDaniel, P. Adversarial Examples in Constrained Domains. arXiv 2020, arXiv:2011.01183. [Google Scholar]
Refaeilzadeh, P.; Tang, L.; Liu, H. Cross-validation. Encycl. Database Syst. 2009, 5, 532–538. [Google Scholar]
Bai, T.; Luo, J.; Zhao, J.; Wen, B. Recent Advances in Adversarial Training for Adversarial Robustness. arXiv 2021, arXiv:2102.01356. [Google Scholar]
Weng, T.W.; Zhang, H.; Chen, P.Y.; Yi, J.; Su, D.; Gao, Y.; Hsieh, C.J.; Daniel, L. Evaluating the robustness of neural networks: An extreme value theory approach. arXiv 2018, arXiv:1801.10578. [Google Scholar]
Goodfellow, I. Gradient masking causes clever to overestimate adversarial perturbation size. arXiv 2018, arXiv:1804.07870. [Google Scholar]
Weng, T.W.; Zhang, H.; Chen, P.Y.; Lozano, A.; Hsieh, C.J.; Daniel, L. On extensions of clever: A neural network robustness evaluation algorithm. In Proceedings of the 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Anaheim, CA, USA, 26–28 November 2018; pp. 1159–1163. [Google Scholar]
Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (sp), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
Papernot, N.; Faghri, F.; Carlini, N.; Goodfellow, I.; Feinman, R.; Kurakin, A.; Xie, C.; Sharma, Y.; Brown, T.; Roy, A.; et al. Technical report on the cleverhans v2. 1.0 adversarial examples library. arXiv 2016, arXiv:1610.00768. [Google Scholar]
Rauber, J.; Brendel, W.; Bethge, M. Foolbox: A python toolbox to benchmark the robustness of machine learning models. arXiv 2017, arXiv:1707.04131. [Google Scholar]
Nicolae, M.I.; Sinn, M.; Tran, M.N.; Buesser, B.; Rawat, A.; Wistuba, M.; Zantedeschi, V.; Baracaldo, N.; Chen, B.; Ludwig, H.; et al. Adversarial Robustness Toolbox v1. 0.0. arXiv 2018, arXiv:1807.01069. [Google Scholar]
Ding, G.W.; Wang, L.; Jin, X. AdverTorch v0. 1: An adversarial robustness toolbox based on pytorch. arXiv 2019, arXiv:1902.07623. [Google Scholar]
Lashkari, A.H.; Zang, Y.; Owhuo, G.; Mamun, M.; Gil, G. CICFlowMeter. Available online: https://www.unb.ca/cic/research/applications.html (accessed on 19 February 2021).
Habibi Lashkari, A.; Draper Gil, G.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of Tor Traffic using Time based Features. In Proceedings of the 3rd International Conference on Information Systems Security and Privacy-ICISSP, Porto, Portugal, 19–21 February 2017; pp. 253–262. [Google Scholar] [CrossRef]
Draper-Gil, G.; Lashkari, A.H.; Mamun, M.S.I.; Ghorbani, A. Characterization of Encrypted and VPN Traffic using Time-related Features. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy-ICISSP, Rome, Italy, 19–21 February 2016; pp. 407–414. [Google Scholar] [CrossRef]
Sarker, I.H.; Abushark, Y.B.; Alsolami, F.; Khan, A.I. Intrudtree: A machine learning based cyber security intrusion detection model. Symmetry 2020, 12, 754. [Google Scholar] [CrossRef]
Almomani, O. A feature selection model for network intrusion detection system based on PSO, GWO, FFA and GA algorithms. Symmetry 2020, 12, 1046. [Google Scholar] [CrossRef]
McCarthy, A.; Andriotis, P.; Ghadafi, E.; Legg, P. Feature Vulnerability and Robustness Assessment against Adversarial Machine Learning Attacks. In Proceedings of the 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), Dublin, Ireland, 14–18 June 2021; pp. 1–8. [Google Scholar] [CrossRef]
Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [Google Scholar]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, 1–9. [Google Scholar] [CrossRef]
Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.; Šrndić, N.; Laskov, P.; Giacinto, G.; Roli, F. Evasion attacks against machine learning at test time. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2013; pp. 387–402. [Google Scholar]
Papernot, N.; McDaniel, P.; Swami, A.; Harang, R. Crafting adversarial input sequences for recurrent neural networks. In Proceedings of the MILCOM 2016-2016 IEEE Military Communications Conference, Baltimore, MD, USA, 1–3 November 2016; pp. 49–54. [Google Scholar]
Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The Limitations of Deep Learning in Adversarial Settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS P), Saarbrucken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar]
Jia, R.; Liang, P. Adversarial Examples for Evaluating Reading Comprehension Systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 2021–2031. [Google Scholar]
Zhao, Z.; Dua, D.; Singh, S. Generating Natural Adversarial Examples. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Lin, Y.C.; Hong, Z.W.; Liao, Y.H.; Shih, M.L.; Liu, M.Y.; Sun, M. Tactics of adversarial attack on deep reinforcement learning agents. arXiv 2017, arXiv:1703.06748. [Google Scholar]
Rigaki, M. Adversarial Deep Learning against Intrusion Detection Classifiers; Luleå University of Technology: Luleå, Sweden, 2017. [Google Scholar]
Hu, W.; Tan, Y. Black-box attacks against RNN based malware detection algorithms. In Proceedings of the Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Homoliak, I.; Teknös, M.; Ochoa, M.; Breitenbacher, D.; Hosseini, S.; Hanacek, P. Improving Network Intrusion Detection Classifiers by Non-payload-Based Exploit-Independent Obfuscations: An Adversarial Approach. EAI Endorsed Trans. Secur. Saf. 2018, 5, e4. [Google Scholar] [CrossRef]
Rosenberg, I.; Shabtai, A.; Rokach, L.; Elovici, Y. Generic black-box end-to-end attack against state of the art API call based malware classifiers. In International Symposium on Research in Attacks, Intrusions, and Defenses; Springer: Berlin/Heidelberg, Germany, 2018; pp. 490–510. [Google Scholar]
Wang, Z. Deep learning-based intrusion detection with adversaries. IEEE Access 2018, 6, 38367–38384. [Google Scholar] [CrossRef]
Warzyński, A.; Kołaczek, G. Intrusion detection systems vulnerability on adversarial examples. In Proceedings of the 2018 Innovations in Intelligent Systems and Applications (INISTA), Thessaloniki, Greece, 3–5 July 2018; pp. 1–4. [Google Scholar]
Lin, Z.; Shi, Y.; Xue, Z. Idsgan: Generative adversarial networks for attack generation against intrusion detection. arXiv 2018, arXiv:1809.02077. [Google Scholar]
Yang, K.; Liu, J.; Zhang, C.; Fang, Y. Adversarial examples against the deep learning based network intrusion detection systems. In Proceedings of the MILCOM 2018-2018 IEEE Military Communications Conference (MILCOM), Los Angeles, CA, USA, 29–31 October 2018; pp. 559–564. [Google Scholar]
Su, J.; Vargas, D.V.; Sakurai, K. One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 2019, 23, 828–841. [Google Scholar] [CrossRef] [Green Version]
Kuppa, A.; Grzonkowski, S.; Asghar, M.R.; Le-Khac, N.A. Black box attacks on deep anomaly detectors. In Proceedings of the 14th International Conference on Availability, Reliability and Security, Canterbury, UK, 26–29 August 2019; pp. 1–10. [Google Scholar]
Ibitoye, O.; Shafiq, O.; Matrawy, A. Analyzing adversarial attacks against deep learning for intrusion detection in IoT networks. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
Usama, M.; Asim, M.; Latif, S.; Qadir, J. Generative adversarial networks for launching and thwarting adversarial attacks on network intrusion detection systems. In Proceedings of the 2019 15th International Wireless Communications &Mobile Computing Conference (IWCMC), Tangier, Morocco, 24–28 June 2019; pp. 78–83. [Google Scholar]
Huang, W.; Peng, X.; Shi, Z.; Ma, Y. Adversarial Attack against LSTM-based DDoS Intrusion Detection System. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; pp. 686–693. [Google Scholar]
Ogawa, Y.; Kimura, T.; Cheng, J. Vulnerability Assessment for Machine Learning Based Network Anomaly Detection System. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), Taoyuan, Taiwan, 28–30 September 2020; pp. 1–2. [Google Scholar]
Chen, J.; Gao, X.; Deng, R.; He, Y.; Fang, C.; Cheng, P. Generating Adversarial Examples against Machine Learning based Intrusion Detector in Industrial Control Systems. IEEE Trans. Dependable Secur. Comput. 2020; PrePrints. [Google Scholar] [CrossRef]
Demetrio, L.; Biggio, B.; Lagorio, G.; Roli, F.; Armando, A. Functionality-preserving black-box optimization of adversarial windows malware. IEEE Trans. Inf. Forensics Secur. 2021, 16, 3469–3478. [Google Scholar] [CrossRef]
Labaca-Castro, R.; Muñoz-González, L.; Pendlebury, F.; Rodosek, G.D.; Pierazzi, F.; Cavallaro, L. Universal Adversarial Perturbations for Malware. arXiv 2021, arXiv:2102.06747. [Google Scholar]
Cai, F.; Li, J.; Koutsoukos, X. Detecting adversarial examples in learning-enabled cyber-physical systems using variational autoencoder for regression. In Proceedings of the 2020 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 21 May 2020; pp. 208–214. [Google Scholar]
Li, J.; Yang, Y.; Sun, J.S.; Tomsovic, K.; Qi, H. Conaml: Constrained adversarial machine learning for cyber-physical systems. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, Hong Kong, China, 7–11 June 2021; pp. 52–66. [Google Scholar]
Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P. Ensemble adversarial training: Attacks and defenses. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Kurakin, A.; Goodfellow, I.; Bengio, S.; Dong, Y.; Liao, F.; Liang, M.; Pang, T.; Zhu, J.; Hu, X.; Xie, C.; et al. Adversarial attacks and defences competition. In The NIPS’17 Competition: Building Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2018; pp. 195–231. [Google Scholar]
Sharif, M.; Bhagavatula, S.; Bauer, L.; Reiter, M.K. A General Framework for Adversarial Examples with Objectives. ACM Trans. Priv. Secur. 2019, 22. [Google Scholar] [CrossRef]
Biggio, B.; Roli, F. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognit. 2018, 84, 317–331. [Google Scholar] [CrossRef] [Green Version]
Gilmer, J.; Adams, R.P.; Goodfellow, I.; Andersen, D.; Dahl, G.E. Motivating the rules of the game for adversarial example research. arXiv 2018, arXiv:1807.06732. [Google Scholar]
Moosavi-Dezfooli, S.M.; Fawzi, A.; Fawzi, O.; Frossard, P. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1765–1773. [Google Scholar]
Chernikova, A.; Oprea, A. Fence: Feasible evasion attacks on neural networks in constrained environments. arXiv 2019, arXiv:1909.10480. [Google Scholar]
Apruzzese, G.; Colajanni, M.; Marchetti, M. Evaluating the effectiveness of adversarial attacks against botnet detectors. In Proceedings of the 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA), Cambridge, MA, USA, 26–28 September 2019; pp. 1–8. [Google Scholar]
Hashemi, M.J.; Cusack, G.; Keller, E. Towards evaluation of nidss in adversarial setting. In Proceedings of the 3rd ACM CoNEXT Workshop on Big DAta, Machine Learning and Artificial Intelligence for Data Communication Networks, Orlando, FL, USA, 9 December 2019; pp. 14–21. [Google Scholar]
Papadopoulos, P.; Essen, O.T.v.; Pitropakis, N.; Chrysoulas, C.; Mylonas, A.; Buchanan, W.J. Launching Adversarial Attacks against Network Intrusion Detection Systems for IoT. J. Cybersecur. Priv. 2021, 1, 14. [Google Scholar] [CrossRef]
Anthi, E.; Williams, L.; Rhode, M.; Burnap, P.; Wedgbury, A. Adversarial attacks on machine learning cybersecurity defences in industrial control systems. J. Inf. Secur. Appl. 2021, 58, 102717. [Google Scholar] [CrossRef]
Han, D.; Wang, Z.; Zhong, Y.; Chen, W.; Yang, J.; Lu, S.; Shi, X.; Yin, X. Evaluating and Improving Adversarial Robustness of Machine Learning-Based Network Intrusion Detectors. IEEE J. Sel. Areas Commun. 2021, 39, 2632–2647. [Google Scholar] [CrossRef]
Duy, P.T.; Khoa, N.H.; Nguyen, A.G.T.; Pham, V.H. DIGFuPAS: Deceive IDS with GAN and Function-Preserving on Adversarial Samples in SDN-enabled networks. Comput. Secur. 2021, 109, 102367. [Google Scholar] [CrossRef]
Alhajjar, E.; Maxwell, P.; Bastian, N. Adversarial machine learning in network intrusion detection systems. Expert Syst. Appl. 2021, 186, 115782. [Google Scholar] [CrossRef]
Han, D.; Wang, Z.; Zhong, Y.; Chen, W.; Yang, J.; Lu, S.; Shi, X.; Yin, X. Practical Traffic-Space Adversarial Attacks on Learning-Based Nidss. arXiv 2005, arXiv:2005.07519. [Google Scholar]
Wang, J.; Pan, J.; AlQerm, I.; Liu, Y. Def-IDS: An Ensemble Defense Mechanism Against Adversarial Attacks for Deep Learning-based Network Intrusion Detection. In Proceedings of the 2021 International Conference on Computer Communications and Networks (ICCCN), Athens, Greece, 19–22 July 2021; pp. 1–9. [Google Scholar]
Ali, H.; Khan, M.S.; AlGhadhban, A.; Alazmi, M.; Alzamil, A.; Al-utaibi, K.; Qadir, J. Analyzing the Robustness of Fake-news Detectors under Black-box Adversarial Attacks. IEEE Access 2021, 9, 81678–81692. [Google Scholar] [CrossRef]
Chenou, J.; Hsieh, G.; Fields, T. Radial Basis Function Network: Its Robustness and Ability to Mitigate Adversarial Examples. In Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 5–7 December 2019; pp. 102–106. [Google Scholar]
Wei, W.; Liu, L.; Loper, M.; Truex, S.; Yu, L.; Gursoy, M.E.; Wu, Y. Adversarial examples in deep learning: Characterization and divergence. arXiv 2018, arXiv:1807.00051. [Google Scholar]
Tramèr, F.; Zhang, F.; Juels, A.; Reiter, M.K.; Ristenpart, T. Stealing machine learning models via prediction apis. In Proceedings of the 5th USENIX Security Symposium (USENIX Security 16), Austin, TX, USA, 10–12 August 2016; pp. 601–618. [Google Scholar]
Lillicrap, T.P.; Cownden, D.; Tweed, D.B.; Akerman, C.J. Random synaptic feedback weights support error backpropagation for deep learning. Nat. Commun. 2016, 7, 1–10. [Google Scholar] [CrossRef]
Graese, A.; Rozsa, A.; Boult, T.E. Assessing Threat of Adversarial Examples on Deep Neural Networks. In Proceedings of the 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, USA, 18–20 December 2016; pp. 69–74. [Google Scholar] [CrossRef] [Green Version]
Grosse, K.; Manoharan, P.; Papernot, N.; Backes, M.; McDaniel, P. On the (statistical) detection of adversarial examples. arXiv 2017, arXiv:1702.06280. [Google Scholar]
Metzen, J.H.; Genewein, T.; Fischer, V.; Bischoff, B. On detecting adversarial perturbations. arXiv 2017, arXiv:1702.04267. [Google Scholar]
Feinman, R.; Curtin, R.R.; Shintre, S.; Gardner, A.B. Detecting adversarial samples from artifacts. arXiv 2017, arXiv:1703.00410. [Google Scholar]
Meng, D.; Chen, H. Magnet: A two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 135–147. [Google Scholar]
Xu, W.; Evans, D.; Qi, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv 2017, arXiv:1704.01155. [Google Scholar]
Lee, H.; Han, S.; Lee, J. Generative adversarial trainer: Defense to adversarial perturbations with gan. arXiv 2017, arXiv:1705.03387. [Google Scholar]
Samangouei, P.; Kabkab, M.; Chellappa, R. Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv 2018, arXiv:1805.06605. [Google Scholar]
Dhillon, G.S.; Azizzadenesheli, K.; Lipton, Z.C.; Bernstein, J.; Kossaifi, J.; Khanna, A.; Anandkumar, A. Stochastic activation pruning for robust adversarial defense. arXiv 2018, arXiv:1803.01442. [Google Scholar]
Tian, Y.; Pei, K.; Jana, S.; Ray, B. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering, Gothenburg, Sweden, 27 May–3 June 2018; pp. 303–314. [Google Scholar]
Zhang, M.; Zhang, Y.; Zhang, L.; Liu, C.; Khurshid, S. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), Montpellier, France, 3–7 September 2018; pp. 132–142. [Google Scholar]
Wang, S.; Wang, X.; Zhao, P.; Wen, W.; Kaeli, D.; Chin, P.; Lin, X. Defensive dropout for hardening deep neural networks under adversarial attacks. In Proceedings of the International Conference on Computer-Aided Design, San Diego, CA, USA, 5–8 November 2018; pp. 1–8. [Google Scholar]
Amer, M.; Maul, T. Weight Map Layer for Noise and Adversarial Attack Robustness. arXiv 2019, arXiv:1905.00568. [Google Scholar]
Rosenberg, I.; Shabtai, A.; Elovici, Y.; Rokach, L. Defense methods against adversarial examples for recurrent neural networks. arXiv 2019, arXiv:1901.09963. [Google Scholar]
Apruzzese, G.; Andreolini, M.; Marchetti, M.; Venturi, A.; Colajanni, M. Deep reinforcement adversarial learning against botnet evasion attacks. IEEE Trans. Netw. Serv. Manag. 2020, 17, 1975–1987. [Google Scholar] [CrossRef]
Apruzzese, G.; Colajanni, M.; Ferretti, L.; Marchetti, M. Addressing adversarial attacks against security systems based on machine learning. In Proceedings of the 2019 11th International Conference on Cyber Conflict (CyCon), Tallinn, Estonia. 28–31 May 2019; Volume 900, pp. 1–18. [Google Scholar]
Zhou, Y.; Kantarcioglu, M.; Xi, B. A survey of game theoretic approach for adversarial machine learning. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1259. [Google Scholar] [CrossRef]
Apruzzese, G.; Andreolini, M.; Colajanni, M.; Marchetti, M. Hardening random forest cyber detectors against adversarial attacks. IEEE Trans. Emerg. Top. Comput. Intell. 2020, 4, 427–439. [Google Scholar] [CrossRef]
Zhang, C.; Tang, Z.; Zuo, Y.; Li, K.; Li, K. A robust generative classifier against transfer attacks based on variational auto-encoders. Inf. Sci. 2021, 550, 57–70. [Google Scholar] [CrossRef]
Wang, N.; Chen, Y.; Hu, Y.; Lou, W.; Hou, Y.T. MANDA: On Adversarial Example Detection for Network Intrusion Detection System. In Proceedings of the IEEE INFOCOM 2021-IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar]
Song, D.; Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Tramer, F.; Prakash, A.; Kohno, T. Physical adversarial examples for object detectors. In Proceedings of the 12th USENIX Workshop on Offensive Technologies (WOOT 18), Baltimore, MA, USA, 13–14 August 2018. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning; PMLR: New York, NY, USA, 2016; pp. 1050–1059. [Google Scholar]
Carlini, N.; Wagner, D. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, Dallas, TX, USA, 3 November 2017; pp. 3–14. [Google Scholar]
Pei, K.; Cao, Y.; Yang, J.; Jana, S. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, 28–31 October 2017; pp. 1–18. [Google Scholar]
Biggio, B.; Fumera, G.; Roli, F. Adversarial pattern classification using multiple classifiers and randomisation. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR); Springer: Berlin/Heidelberg, Germany, 2008; pp. 500–509. [Google Scholar]
Carlini, N. A Complete List of All (Arxiv) Adversarial Example Papers. Available online: https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html (accessed on 30 January 2022).
Sculley, D.; Holt, G.; Golovin, D.; Davydov, E.; Phillips, T.; Ebner, D.; Chaudhary, V.; Young, M.; Crespo, J.F.; Dennison, D. Hidden technical debt in machine learning systems. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2503–2511. [Google Scholar]
Shu, D.; Leslie, N.O.; Kamhoua, C.A.; Tucker, C.S. Generative adversarial attacks against intrusion detection systems using active learning. In Proceedings of the 2nd ACM Workshop on Wireless Security and Machine Learning, Linz, Austria, 13 July 2020; pp. 1–6. [Google Scholar]

Figure 1. k-fold cross validation.

Figure 2. End-to-end pipeline for network intrusion detection system.

Figure 3. Common adversarial machine learning attacks.

Figure 4. Preferred reporting items for systematic meta-analysis.

Figure 5. Common machine learning tasks in cyber security.

Figure 6. Common defence types against adversarial machine learning.

Table 1. Datasets used in the literature.

Work	Dataset	Network	Year	Attack Categories
[32]	KDD Cup 99	Traditional	1999	DoS, Probe, User 2 Root and Remote to User
[38]	NSL-KDD	Traditional	2009	DoS, Probe, User 2 Root and Remote to User
[39]	DARPA	Traditional	2009	DDoS, Malware, Spambots, Scans, Phishing
[40]	CTU-13	Traditional	2011	Botnet
[41]	Kyoto	Traditional	2015	Botnet
[42]	UNSW-NB15	Traditional	2016	Backdoors, Fuzzers, DoS, Generic, Shell code, Reconnaissance, Worms, Exploits, Analysis
[43]	WSN-D5	Wireless	2016	Greyhole, Blackhole, Scheduling, Flooding.
[44]	SDN Traffic	SDN	2016	DDoS
[45]	CICIDS2017	Traditional	2017	DoS, DDoS, SSH-Patator, Web, PortScan, FTP-Patator, Bot.
[46]	Mirai	IoT	2017	Botnet
[45]	CICIDS2018	Traditional	2018	Bruteforce Web, DoS, DDoS, Botnet, Infiltration.
[45]	CICDDoS2019	Traditional	2019	DDoS
[47]	Bot-IOT	IoT	2018	DDoS, DoS, OS Service Scan, Keylogging, Data exfiltration
[48]	Kitsune	IoT	2018	Recon, Man in the Middle, DoS, Botnet Malware
[49]	IEEE BigData Cup	Traditional	2019	N/A
[50]	HIKARI-2021	IoT	2021	Brute force attack, Brute force attack (XMLRPC), Vulnerability, probing, Synthetic Traffic

Table 2. Robustness metrics.

Work	Metric	Advantages	Disadvantages
N/A	F1-Score	Commonly used by researchers.	Biased by the majority class
[59]	CLEVER	Attack-agnostic and computationally feasible.	CLEVER is less suited to black-box attacks and where gradient masking occurs [60]; However, extensions to CLEVER help mitigate these scenarios [61].
[62]	Empirical robustness	Suitable for very deep neural networks and large datasets.	N/A

Table 3. Libraries for Generating Adversarial Examples.

Work	Library Name	Year	Advantages	Disadvantages
[65]	CleverHans	2016	Recently updated to v4.0.0, well used by the community. MIT License	Can be complicated to configure.
[66]	Foolbox	2017	Fast generation of adversarial examples. MIT License	Large number of open issues.
[67]	Adversarial robustness toolbox	2018	Well maintained and supported. Supports most known machine learning frameworks. Extensive attacks and model robustness tools are supported.	Does not support all models.
[68]	Advertorch	2019	GNU lesser public license.	Few active contributors.

Table 4. Topics and associated search terms used in this survey.

Topic	Search Query
Cyber security/intrusion detection	(“cyber security” OR “intrusion detection” OR “IDS”)
Adversarial machine learning attacks and defences	(“adversarial machine learning” OR “machine learning” OR “adversarial example”) and (“attack” OR “defence”)
Robustness/Functionality Preservation	((“robustness” OR “generalization error” OR “accuracy” OR “f1score” OR “f-score” OR “TPR” OR “FPR”) OR ((“functionality” OR “payload”) AND “preservation”)))

Table 5. Chronologically ordered summary of adversarial example attacks.

Work	Year	Attack	Type			Obj		Domain			Model			Knowledge			Constraint
			AE	Sequence of AEs	Transferability	Targeted	Untargeted	Visual	Cybersecurity	Text	MLP	CNN	RNN	White-Box	Black-box	Gray-Box	Box	Sparse	Func-Preserving
[17]	2014	L-BFGS	✓			✓		✓				✓		✓			✓
[77]	2013	GradientDescent	✓			✓		✓				✓		✓		✓
[78]	2016	Adversarial Sequences		✓			✓	✓		✓			✓	✓					✓
[79]	2016	JSMA	✓			✓	✓	✓				✓		✓				✓
[62]	2016	Deepfool	✓				✓	✓				✓		✓
[80]	2017	AddSent,AddOneSent	✓	✓			✓	✓		✓			✓		✓
[81]	2018	GAN	✓				✓	✓				✓			✓
[82]	2017	EnchantingAttack		✓		✓		✓			✓			✓
[82]	2017	StrategicAttack	✓			✓		✓			✓			✓
[64]	2017	C&W, $L_{0}$ , $L_{2}$ , $L_{\infty}$	✓				✓	✓				✓		✓
[83]	2017	FGSM,JSMA	✓				✓		✓		✓			✓
[84]	2018	Generative RNN	✓				✓		✓				✓		✓				✓
[85]	2018	NPBO	✓				✓		✓		✓			✓					✓
[86]	2018	GADGET	✓				✓		✓				✓		✓				✓
[87]	2018	JSMA,FGSM,DeepFool,CW	✓			✓			✓		✓			✓
[88]	2018	FGSM	✓				✓		✓		✓			✓
[89]	2018	IDS-GAN	✓				✓		✓		✓				✓
[90]	2018	ZOO,GAN	✓				✓		✓		✓				✓
[91]	2019	One Pixel Attack	✓			✓		✓				✓		✓				✓
[92]	2019	ManifoldApproximation	✓				✓		✓		✓				✓				✓
[93]	2019	FGSM,BIM,PGD	✓				✓		✓		✓			✓		✓
[94]	2019	GAN Attack	✓				✓		✓		✓				✓				✓
[95]	2020	PWPSA	✓	✓	✓		✓		✓				✓	✓					✓
[95]	2020	GA	✓	✓	✓		✓		✓				✓	✓					✓
[96]	2020	One Pixel Attack	✓				✓		✓			✓		✓				✓
[97]	2020	Opt Attack,GAN Attack	✓				✓		✓		✓				✓				✓
[98]	2021	GAMMA	✓				✓		✓						✓				✓
[99]	2021	UAP	✓				✓		✓		✓			✓		✓			✓
[100]	2020	Variational Auto Encoder	✓				✓		✓		✓			✓					✓
[101]	2021	Best-Effort Search	✓				✓		✓		✓	✓		✓	✓	✓	✓		✓

Table 6. Computational complexity of common adversarial example algorithms.

Method	Computational Complexity	Success Rate
L-BFGS	High	High
FGSM	Low	Low
JSMA	High	High
DeepFool	Low	Low
One-pixel	Low	Low
CW Attack	High	High

Table 7. Functionality-preservation in cybersecurity and intrusion detection.

Work	Year	Domain	Generation Method	Realistic Constraints	Findings
[53]	2019	Malware	Gradient-based	Minimal content additions/modification	Experiments showed that we are able to use that information to find optimal sequences of transformations without rendering the malware sample corrupt.
[94]	2019	IDS	GAN	Preserve functionality	The proposed adversarial attack successfully evades the IDS while ensuring preservation of functional behavior and network traffic features.
[108]	2019	IDS	Gradient-based	Respects mathematical dependencies and domain constraints.	Evasion attacks achieved by inserting a dozen network connections.
[109]	2019	IDS	Random modification $\leq 4$ features: flow duration, sent bytes, received bytes, exchanged packets.	Retains internal logic	Feature removal is insufficient defense against functionality-preserving attacks, which may are possible by modifying very few features.
[110]	2019	IDS	Legitimate transformations: split, delay, inject	Packets must maintain malicious intent, transformations hold to underlying protocols.	Detection rate of packet-level features dropped by up to 70% and flow-level features dropped by up to 68%.
[56]	2020	IDS—Flows	Jacobian method (JSMA)	Obeys TCP/IP constraints	Biased distributions with low dimensionality enable constrained adversarial examples. Constrained to five random features, −50% adversarial examples succeed.
[95]	2020	IDS—packet	Valid packet	Minimal modification/insertion of packets	Experimental results show powerful and effective functionality-preserving attacks. More accurate models are more susceptible to adversarial examples.
[98]	2021	Malware	Injected unexecuted benign content	Minimal injected content	Section-injection attack can decrease the detection rate. Their analysis highlights that commercial products can be evaded via transfer attacks.
[101]	2021	CPS	Best-effort search	Real-world linear inequality	Best-effort search algorithms effectively generate adversarial examples meeting linear constraints. Their evaluation shows constrained adversarial examples significantly decrease detection accuracy.
[111]	2021	IDS	Minimal perturbation of each feature	FGSM	Functionality is not reported, but is less likely to preserve functionality because all features are perturbed.
[112]	2021	CPS/ICS	JSMA	Minimal number of perturbed features	Functionality is not reported, but is more likely to preserve functionality because relatively few features are perturbed.
[113]	2021	IDS	PSO-based mutation	Original traffic retained and packet order is unchanged	Measured attack effect, malicious behavior and attack efficiency
[114]	2021	IDS	GAN	preserving functional features of attack traffic	F1 score drop to zero from around 99% DIGFuPAS adversarial examples.
[115]	2021	IDS	PSO/GA/GAN	Only modifies features where network functionality is retained	In the network traffic data, it is unrealistic to assume an adversary can alter all traffic features—constraints on features that do not break functionality
[116]	2020	IDS	GAN/PSO	Original traffic and packet order is retained.	Detection performance and robustness should both be considered in feature extraction systems.

Table 8. Chronologically ordered summary of defenses against adversarial examples.

Work	Year	Defense	Type								Domain			Model
			Pre-Process	Detection	Adv-Training	Testing	Architectural	Distillation	Ensemble	Game Theory	Visual	Cybersecurity	Text	MLP	CNN	RNN	RF
[63]	2014	Adversarial Training			✓						✓				✓
[75]	2016	Distillation as defense						✓			✓				✓
[122]	2016	Feedback Alignment					✓				✓				✓
[123]	2016	Assessing Threat	✓								✓				✓
[124]	2017	Statistical Test		✓							✓	✓		✓	✓
[125]	2017	Detector SubNetwork		✓							✓				✓
[126]	2017	Artifacts		✓							✓				✓
[127]	2017	MagNet		✓							✓				✓
[128]	2017	Feature Squeezing		✓							✓				✓
[129]	2017	GAT			✓						✓				✓
[102]	2018	EAT			✓						✓				✓
[130]	2018	Defense-GAN			✓						✓				✓
[103]	2018	Assessing Threat	✓								✓				✓
[131]	2018	Stochastic Activation Pruning					✓				✓			✓	✓
[132]	2018	DeepTest				✓					✓				✓
[133]	2018	DeepRoad				✓					✓				✓
[134]	2018	Defensive Dropout					✓				✓				✓
[134]	2018	Def-IDS			✓							✓		✓
[105]	2018	Multi-Classifier System							✓		✓				✓
[135]	2019	Weight Map Layers					✓				✓				✓
[136]	2019	Sequence Squeezing		✓								✓				✓
[109]	2019	Feature Removal	✓									✓		✓
[137]	2020	Adversarial Training			✓							✓					✓
[138]	2020	Adversarial Training			✓							✓					✓
[139]	2019	Game Theory								✓		✓		✓
[140]	2020	Hardening			✓					✓		✓					✓
[141]	2021	Variational Auto-encoder		✓							✓				✓
[142]	2021	MANDA		✓								✓		✓

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

McCarthy, A.; Ghadafi, E.; Andriotis, P.; Legg, P. Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey. J. Cybersecur. Priv. 2022, 2, 154-190. https://doi.org/10.3390/jcp2010010

AMA Style

McCarthy A, Ghadafi E, Andriotis P, Legg P. Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey. Journal of Cybersecurity and Privacy. 2022; 2(1):154-190. https://doi.org/10.3390/jcp2010010

Chicago/Turabian Style

McCarthy, Andrew, Essam Ghadafi, Panagiotis Andriotis, and Phil Legg. 2022. "Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey" Journal of Cybersecurity and Privacy 2, no. 1: 154-190. https://doi.org/10.3390/jcp2010010

Article Menu

Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey

Abstract

1. Introduction

2. Related Works

2.1. Secure and Trustworthy Systems

2.2. Adversarial ML in General

2.3. Intrusion Detection

2.4. Cyber-Physical Systems

2.5. Contributions of This Survey

3. Background

3.1. Model Training

3.1.1. Resampling

3.1.2. Loss Functions

3.1.3. Cross Validation

3.1.4. Bootstrapping

3.2. Robustness

3.3. Common Adversarial Example Algorithms

3.4. Threat Model—Adversary Capabilities

3.5. Threat Model—Adversary Goals

3.6. Threat Model—Common Attack Methods

3.6.1. Poisoning

3.6.2. Evasion

3.6.3. Transferability

4. Methodology

5. Results

5.1. Classification Scheme

5.2. Adversarial Example Attacks

5.2.1. Adversarial Examples—Similarity Metrics

5.2.2. Adversarial Examples-Types of Attack

5.2.3. Adversarial Examples—Attack Objectives

5.2.4. Adversarial Examples in Traditional Domains

5.2.5. Adversarial Examples in Cyber-Security Domains

5.2.6. Adversarial Examples and Model Type

5.2.7. Adversarial Examples and Knowledge Requirement

5.2.8. Adversarial Example Constraints

5.3. Defenses against Adversarial Examples

5.3.1. Pre-Processing as a Defense against Adversarial Examples

5.3.2. Adversarial Training as a Defense against Adversarial Examples

5.3.3. Architectural Defenses against Adversarial Examples

5.3.4. Detecting Adversarial Examples

5.3.5. Defensive Testing

5.3.6. Multi-Classifier Systems

5.3.7. Game Theory

5.3.8. Adversarial Example Defenses in Cybersecurity Domains

6. Discussion and Conclusions

Future Directions and Research Challenges

Key Future Research Challenges

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI