Adversarial Attack for Deep Steganography Based on Surrogate Training and Knowledge Diffusion

Tao, Fangjian; Cao, Chunjie; Li, Hong; Zou, Binghui; Wang, Longjuan; Sun, Jingzhang

doi:10.3390/app13116588

Open AccessArticle

Adversarial Attack for Deep Steganography Based on Surrogate Training and Knowledge Diffusion

by

Fangjian Tao

^1,2

,

Chunjie Cao

^1,2,*,

Hong Li

¹,

Binghui Zou

^1,2,

Longjuan Wang

^1,2 and

Jingzhang Sun

^1,2,*

¹

School of Cyberspace Security, Hainan University, Haikou 570228, China

²

Key Laboratory of Internet Information Retrieval of Hainan Province, Hainan University, Haikou 570228, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(11), 6588; https://doi.org/10.3390/app13116588

Submission received: 25 April 2023 / Revised: 24 May 2023 / Accepted: 25 May 2023 / Published: 29 May 2023

(This article belongs to the Special Issue Data Hiding and Its Applications: Digital Watermarking and Steganography (Volume II))

Download

Browse Figures

Versions Notes

Abstract

:

Deep steganography (DS), using neural networks to hide one image in another, has performed well in terms of invisibility, embedding capacity, etc. Current steganalysis methods for DS can only detect or remove secret images hidden in natural images and cannot analyze or modify secret content. Our technique is the first approach to not only effectively prevent covert communications using DS, but also analyze and modify the content of covert communications. We proposed a novel adversarial attack method for DS considering both white-box and black-box scenarios. For the white-box attack, several novel loss functions were applied to construct a gradient- and optimizer-based adversarial attack that could delete and modify secret images. As a more realistic case, a black-box method was proposed based on surrogate training and a knowledge distillation technique. All methods were tested on the Tiny ImageNet and MS COCO datasets. The experimental results showed that the proposed attack method could completely remove or even modify the secret image in the container image while maintaining the latter’s high quality. More importantly, the proposed adversarial attack method can also be regarded as a new DS approach.

Keywords:

adversarial examples; deep hiding; deep steganography; deep steganalysis

1. Introduction

Steganography is the art of covered or hidden writing. Modern steganography is an effective method of embedding confidential information in digital media for end-to-end covert communication [1,2]. Carriers, e.g., text, video, voice, and digital images, are used to carry secret information, with digital images being the most commonly employed. The carrier image, also known as the cover image, can be embedded with secret information and is generally called a container image. Image steganography systems have three fundamental properties, namely, imperceptibility, security, and the capacity to hide information [3]. Imperceptibility has the highest priority of these three attributes, as the secret information embedded in the image should not be perceivable by the human eye or statistical tools [4]. Security means that the embedded secret information cannot be detected by methods such as statistics or removed after detection. The goal of an efficient steganographic system is to increase the payload capacity as much as possible while ensuring invisibility and security. The payload capacity is measured in bits per pixel (bpp) [5]. Steganography is easily used for covert communication between spies, criminals, and members of terrorist organizations [6]. In general, both communicating parties embed confidential information into pictures, such as remote-sensing pictures, commercial source code, malware, and action plans, and transmit them through open channels so that they are not easily discovered by third parties.

As a method for combatting steganography, steganalysis aims to detect or remove secret information contained in container images and can be divided into passive and active methods [7,8]. Passive steganalysis algorithms only analyze whether the image samples contain secret information [7]. Active steganalysis algorithms, also known as steganographic firewalls, attempt to remove secret information within container images [8], while minimizing the impact on the quality of the container image [9].

Recently, deep learning has been proven to be promising in computer vision. Deep-learning-based image steganography has been applied, mostly using deep neural networks, demonstrating promising performance in terms of embedding capacity as compared to traditional methods [10]. In addition to hiding one or more images in another image of the same size [11,12,13,14], deep neural networks can also hide binary information in an image [15,16], and the embedded binary information can be transmitted in light fields [17,18,19]. Generally, the backbone network of image-to-image deep-learning-based models is an autoencoder, which is trained in an end-to-end manner. When the network is well-trained, the sender can encode a secret image into a container image of the same size through the encoder. The receiver can use the decoder provided by the sender to recover the secret image from the container image. Hayes et al. [20] showed that deep-learning-based steganography is difficult to detect by traditional passive attack methods.

Despite benefiting from deep learning techniques, deep steganography (DS) also introduces adversarial examples, which could be vulnerable to adversarial attacks. Furthermore, DS and adversarial attacks are naturally connected [21]. We attempted to use adversarial attack technology to achieve attack capabilities that are difficult to reach through traditional steganalysis methods.

To the best of our knowledge, current steganalysis methods for DS are limited to the elimination of secret images and cannot modify these images or analyze their content [22,23,24]. An attack method for deep steganography is proposed, taking advantage of the fact that deep learning models are vulnerable to adversarial attacks [25,26]. DS secret image removal, modification, and content analysis were achieved using the proposal method. The main contributions of this paper are as follows:

To the best of our knowledge, we are the first group to apply the white-box attack method to deep steganographic models by constructing an adversarial perturbation. Attackers can remove or modify secret images by superimposing imperceptible adversarial perturbations onto container images (Figure 1).
A deep steganalysis threat model in a black-box scenario is also proposed based on surrogate training and knowledge distillation to extend the above white-box attack method to the black-box environment, thus achieving the removal, modification, and content analysis of secret images in a black-box environment.
In general, current DS methods are based on encoder-decoder networks, and the container image is obtained through the encoding network [11,12,13,14,15,16,17,18,19]. We are the first group to propose a novel DS scheme based on adversarial attacks instead of encoding networks.

2. Related Works

2.1. Deep Steganography

Baluja et al. [11] first proposed an end-to-end DS model based on convolutional networks for hiding an RGB image in another RBG image of the same size, achieving a higher concealment capacity of 24 bpp as compared to traditional methods (less than 0.5 bpp) [27]. Their method adopted looser constraints without the need to decode the secret information perfectly and further reduced the difference between the reconstructed image and the secret image while keeping the difference between the container image and the cover image as small as possible. Zhu et al. [15] proposed HiDDeN, a network for hiding binary information in images, which introduced noise layers to enhance the robustness of the model for both steganography and image watermarking. HiDDeN also improved the anti-steganalysis capability by introducing a steganalysis network as an adversary. However, the two-bit error rate of the reconstructed binary information was still high, and the type of noise layer was limited; thus, the network struggled to deal with unknown noises. Luo et al. [16] reduced the bit error rate of reconstructed information by introducing message coding instead of directly embedding binary messages. Additionally, they adopted adversarial networks to generate distortion in order to achieve robustness without modeling the distortion. Qin et al. [28] proposed a method that used CNNs and GANs to achieve coverless steganography. Shang et al. [29] proposed a DS scheme and improved the security of the algorithm through adversarial example techniques. Zhu et al. [30] proposed a new image-hiding convolutional neural network based on a residual network and pixel shuffle combined with image encryption. Wang et al. [13] first introduced Transformer into image steganography and achieved higher image quality. Inspired by universal adversarial examples [31], Zhang et al. [19] proposed a universal deep hiding model (UDH) to explore the generation of cover-independent perturbations in order to hide a secret image in different unknown cover images. UDH could be adapted for the copyright protection of videos. In addition to steganography on digital images, printed and digitally displayed photos can also be used to hide imperceptible digital data. Deep photographic steganography [17] and light field messaging (LFM) [19] are robust methods for image perturbations in physical printing and photography. Chen et al. [32] proposed a low-frequency image DS method to improve the robustness of the model. Yin et al. [33] proposed an image DS method with a separable fine-tuning network structure, which solved the problem of precision loss introduced by rounding the stego matrix. Pan et al. [34] proposed a deep-reinforcement-learning-based image DS method that adaptively hid secrets locally. Wang et al. [35] proposed a DS technology based on a capsule network that could carry additional data during steganography.

2.2. Deep Steganalysis

Jung et al. [22] proposed an active steganalysis method (PixelSteganalysis) for deep learning steganography by training an analyzer to detect suspicious areas and modifying the pixels of the area with an eraser. In fact, if some pixel blocks of the container image are removed, the corresponding area in the secret image will also be removed in DS. Xiang et al. [23] proposed a secret image removal attack method based on pixel block removal and restoration. Inspired by adversarial examples, Zhong et al. [24] regarded steganographic information as an adversarial perturbation and proposed a deep steganographic document image removal attack method based on a denoising autoencoder [36,37]. Although the above methods were robust for removing secret images to a certain extent, they were unsuited to more complex attack scenarios, e.g., modifying or analyzing the content of a secret image.

2.3. White-Box Adversarial Attacks

Adversarial examples introduce visually imperceptible perturbations to the original image, resulting in an error output from the deep model [25]. Szegedy et al. [25] first revealed the adversarial example problem of deep neural networks (DNNs). Ilyas et al. [38] noted that adversarial examples are useful features rather than bugs for classification models. According to the attacker’s knowledge, the generative methods of adversarial samples can be divided into white-box and black-box attacks [39]. White-box attacks assume that the adversary knows everything related to the target neural network models, including the training data, model architectures, hyperparameters, number of layers, activation functions, and model weights, e.g., gradient-based algorithms (FGSM [26] and PGD [40]) and optimization-based algorithms (CW [41]). Several researchers have proposed adversarial patch attacks that realized adversarial attack methods for target recognition and face recognition models by constraining the perturbation range and modeling distortion [42,43,44].

2.4. Black-Box Adversarial Attacks

Black-box attacks assume that the adversary has no knowledge about the target neural network model. The adversary, acting as a standard user, only knows the output of the model (label or confidence score) [39]. This assumption is common for attacks on online machine learning services, e.g., Google Cloud AI. In this setting, the attacker cannot construct an adversarial example by directly calculating the gradient through backpropagation as in the white-box setting [45]. To solve this problem, query-based black-box adversarial attacks were proposed [46,47,48]. The attacker must send a large number of queries to the target model to obtain the confidence score or category of the output and construct adversarial samples through gradient estimation [47,48] or random search [46]. However, in the case of deep steganalysis, the output of the target model cannot be obtained by querying. Transfer-based attacks use surrogate models to compute gradients indirectly without the need to query the target model [49,50,51] and attempt to generate highly transferable adversarial examples. Attacking an ensemble of surrogate models is a common way to improve the transferability of adversarial examples [52,53].

3. Adversarial Attack Method for Deep Steganography

3.1. Preliminaries

In general, the end-to-end DS methods are mostly based on autoencoders. The encoder takes the secret image S and the cover image C simultaneously as the input and then outputs the container image

C^{'}

. The decoder takes the container image

C^{'}

generated by the encoder as input, and the output is the revealed image

S^{'}

. The encoder-decoder network is trained to minimize the differences between

(C, S)

and

(C^{'}, S^{'})

, as described in Equation (1).

min \{d (C, C^{'}) + β d (S, S^{'})\}

(1)

where d is the visual distance (e.g., mean square error (MSE), structural similarity index measure (SSIM), or a linear combination of both) between two images [11,12,14].

3.2. Attack on the Container Image

The attackers have knowledge of the parameters of the decoder F and can modify container images.

3.2.1. Adversarial Attack to Destroy the Secret Image

The purpose of active steganalysis is to destroy the secret image S in container image

C^{'}

without compromising the container image quality. For a given container image

C^{'}

, the goal is to generate a small additive perturbation

δ (δ < ϵ)

, making it difficult for the receiver to recover image

S^{'}

in the modified container image

C^{″} (C^{″} = C^{'} + δ)

. This can be achieved by maximizing the visual distance metric between revealed images

S^{'}

and

S^{″}

(Equation (2)). F is the decoder,

F (C^{'}) = S^{'}

,

F (C^{″}) = S^{″}

.

max_{δ \leq ϵ} d (F (C^{'}), F (C^{″}))

(2)

Active steganalysis requires imperceptibility, that is, as little perturbation of the container image as possible [9]. The constraints

L^{\infty}

are introduced to solve this problem

{∥C^{″} - C^{'}∥}_{\infty} \leq ϵ

. The perturbation

δ

is projected back onto the radius

ϵ

so that the pixel changes are within the limited range. Using MSE as the visual distance metric, the objective function

L^{u t}

of the attack is as follows:

L^{u t} = m s e (F (C^{″}), F (C^{'}))

(3)

3.2.2. Adversarial Attack to Modify the Secret Image

The current active steganalysis method [22,23,24] for DS can only realize removal attacks. It cannot change the image

S^{'}

to any image specified by the attacker by slightly modifying the container image

C^{'}

. Given a container image

C^{'}

and an arbitrary target image T specified by the attacker, the targeted attack can be described as generating a small additive perturbation

δ (δ < ϵ)

so that the image

S^{″}

revealed by the receiver approximates the target image T (Equation (4)).

min_{δ \leq ϵ} d (T, F (C^{″}))

(4)

F is the decoder,

F (C^{″}) = S^{″}

. From the perspective of attack results, the process of targeted attack is equivalent to the superposition of removal and re-steganography. Recalling the process of DS, encoding a secret image into a container image requires an encoder. The object of a targeted attack is the decoder F, and the attacker needs to guess how to encode any specified secret image only based on the decoder F. Therefore, targeted attacks are more difficult than untargeted attacks. The objective function

L^{t}

of such an attack is as follows:

Unlike Algorithm 1, which solves

δ

iteratively under

L^{\infty}

, Algorithm 2 directly optimizes

C^{″}

. The objective function of Algorithm 2 is defined as follows for untargeted attack

L_{u t}

(Equation (5)) and targeted attack

L_{t}

(Equation (6)), respectively:

L_{u t} = m s e (C^{″}, C^{'}) - β_{1} m s e (F (C^{″}, C^{'}))

(5)

L_{t} = m s e (C^{″}, C^{'}) - β_{2} (C^{″}, T) + β_{3} m s e (F (C^{″}), T)

(6)

where

C^{″} = tanh (ω)

;

ω

is a matrix to be optimized;

m s e (C^{″}, C^{'})

is a penalty to constrain the distance between

C^{″}

and

C^{'}

; and

β_{s}

are hyperparameters used to make a trade-off between the invisibility of the attack and the quality of the decoded image. Algorithms 1 and 2 are adapted from PGD [40] and CW [41], respectively. The main difference lies in the objective functions.

L^{t} = - m s e (F (C^{″}), T)

(7)

3.3. Adversarial Attack Based on Surrogate Training and Knowledge Distillation

The attacker does not have access to the decoder F, training dataset, model structure, hyperparameters, etc., which is more in line with real-world situations. The most critical detail is that the attacker does not have knowledge of decoder F; therefore, the attacker cannot calculate the gradient accurately through backpropagation as in a white-box scenario to optimize the objective functions. Furthermore, in the usual threat models of black-box attacks, the attacker is assumed to be able to send many queries to the model [46,47,48]. However, this is unreasonable in deep steganalysis. Therefore, defining a suitable black-box threat model for deep steganalysis is necessary.

First, assume that an attacker can obtain multiple instances created by the target model. Baluja et al. [11] investigated whether an attacker could train a network to reveal secret images without having access to the original network after the deployment of a deep steganographic system. They proposed a possible attack scenario wherein an attacker is able to obtain multiple container image instances

C^{'}

created by the target system, with at least one of either the cover C or secret image S included in each instance.

Algorithm 1: Container Image Attack

Algorithm 2: Container Image Attack

Moreover, thanks to its transferability, DS can be used to reduce the difficulty of obtaining instances created by the target model. As with the transferability of adversarial examples [25], DS also exhibits transferability, i.e., the decoder of one model can partially recover the secret image encoded by another model (Figure 2). The attacker can not only roughly know the secret image but also may find the original secret image S corresponding to the revealed image

S^{*}

. This will help the attacker to collect enough instances

(C; S o r S^{'})

. Therefore, the following variant of the black-box scenario is proposed as a threat model for deep steganalysis.

3.3.1. Threat Model for Deep Steganography

Attacker’s knowledge: The attacker does not have knowledge of the training dataset, model structure, model parameters, or hyperparameters of the target model.

Attacker’s capabilities: The attacker cannot query the model. The attacker can obtain multiple instances created by the target model. Under this threat model, a method based on surrogate training and knowledge distillation is proposed to realize the removal, modification, and content analysis of secret images in a black-box scenario.

3.3.2. Initialization through Surrogate Training

Surrogate models can be trained to enact black-box transfer adversarial attacks. Only black-box untargeted transfer attacks can be achieved through surrogate training. However, black-box targeted transfer attacks are more challenging, and the surrogate model needs to fit the target model more accurately. In order to solve the problem of secret image content analysis and targeted attack under the premise of obtaining a small number of instances [11], a targeted black-box transfer attack method based on surrogate training and knowledge distillation is proposed. Unlike model integration [52,53], to improve the transferability of adversarial examples and implement black-box transfer attacks, the goal of surrogate training is for the surrogate model to approximate the target model as closely as possible without pursuing transferability to multiple models.

First, the attacker initializes the surrogate model

F^{'}

using the surrogate dataset. Then, multiple instances created by the target model

([C^{'}, S] o r [C^{'}, S^{'}])

are used to perform knowledge distillation [54] in order to optimize the surrogate model

F^{'}

so that it gradually approaches the target model F (Figure 3).

3.3.3. Optimization of Surrogate Model Based on Knowledge Distillation

Hinton et al. [54] first proposed a neural network compression technique. They initially revealed how to compress an already trained classification model with many parameters into a model with fewer parameters while preserving the accuracy of the original model as much as possible.

In contrast to the classification task, the concept of knowledge distillation was adopted herein. The decoder F of the target model was used as the teacher model, the decoder

F^{'}

of the pre-trained initial surrogate model was used as the student model, and the outputs of the models were regarded as logits. Knowledge distillation in the traditional sense aims to calculate the cross-entropy of the class probabilities output by the teacher model and the student model at a high temperature T as the loss function of the student model training [54]. Here, the mean square error of the logits was used as the loss function for the training of the student model. Hinton et al. [54] proved that when the high temperature T approaches ∞, the class probability output is equivalent to the logits. In contrast to the instances proposed in [11], a distinction was made between the secret image S and the revealed image

S^{'}

. Although the two were very close, the student model could better approximate the teacher model by learning its output

S^{'}

. In cases where

S^{'}

could not be obtained, S was used instead of

S^{'}

. The training objective of the student model

F^{'}

is described by Equation (8), where d is the mean square error loss function.

min d (F (C^{'}), F^{'} (C^{'}))

(8)

4. Experiment

4.1. Experimental Setup

4.1.1. Datasets

We evaluated the performance of the proposed system on two datasets: Tiny ImageNet [55] and MS COCO [56]. The former comprises 200 categories (500 for each class) and was downsized to 64 × 64 colored images. The latter is a large-scale object detection, segmentation, key-point detection, and captioning dataset containing 80 object categories, and the images were randomly cropped to a size of 64 × 64. The training and test sets for both datasets contained 100,000 and 10,000 images, respectively.

4.1.2. Evaluation Metrics

The peak signal-to-noise ratio (PSNR) [57], structural similarity index measure (SSIM) [58], and average pixel error (APE) [11] were applied to evaluate the performance. MSE was used to measure the pixel errors between images, but it ignored the structural relationship between pixels. PSNR is an indicator widely used to evaluate image quality and determine the level of image distortion or noise. It is a logarithmic representation of MSE. SSIM is used to measure the structural similarity between images and includes three comparison items: brightness, contrast, and structural similarity. APE is used to measure the average pixel error between two images.

4.1.3. Deep Steganography

We evaluated the performance of the proposed system on classic deep steganographic networks. Two convolutional networks [11], an encoder and decoder, with different depths and widths were trained on the above two datasets. After around 40 epochs of training, the DS model no longer converged. We trained for 60 epochs and chose the best results. To evaluate the transferability of DS for different models, the model trained on Tiny-ImageNet [55] was set as the target model F of the attack. The model trained on MS COCO [56] was used as the initial surrogate model

F^{'}

. The performances of the target model F and the surrogate model

F^{'}

on their corresponding test sets are shown in Table 1.

4.1.4. Baselines

Gaussian noise (GN) and Gaussian blurring (GB) were applied to perturb the container images. Different intensities of GN were tested, and the kernel size and standard deviation of GB were set to 5 and 3, respectively.

4.2. White-Box Attack Results

In the white-box scenario, both attack Algorithms 1 and 2 realized untargeted (Figure 4) and targeted attacks (Figure 5). The attack performance of Algorithm 2 was better, but it produced a significant difference in color saturation between the attacked image and the container image, which was more obvious in the targeted attack (column 5 of Attacked in Figure 5). Algorithm 1 demonstrated better imperceptibility under the premise of ensuring the success of the attack. The strength of the attack increased with an increase in the disturbance radius

ϵ

. As

ϵ

increased from 0.03 to 0.06, the revealed image

S^{″}

lost more details (Figure 5).

One thousand and twenty-four images in the testing dataset of Tiny-ImageNet were randomly selected to evaluate the white-box untargeted (Table 2) and targeted (Table 3) attacks of Algorithm 1. The untargeted attack of Algorithm 1 resulted in less perturbation to the original container image and produced a greater decoding error compared to GN and GB. The targeted attack could be regarded as the secret image encoding process of DS. Comparing Table 1 and Table 3, the performance of the targeted attack of Algorithm 1 was close to that of DS. Therefore, Algorithm 1 is actually a novel DS method.

4.3. Black-Box Attack Results

For untargeted attacks in a black-box scenario, the attacker could leverage the adversarial samples calculated from the initial surrogate model

F^{'}

and attack the target model F according to the transferability of the adversarial samples. Figure 6 reveals that the untargeted attack was easy to successfully transfer. As with the white-box scenario, a larger disturbance radius

ϵ

led to a larger decoding error. The quantitative results of the Tiny-ImageNet testing dataset for the black-box untargeted attack of Algorithm 1 are shown in Table 4.

Secret Image Analysis and Black-Box Target Attack

Through knowledge distillation, the surrogate model

F^{'}

could more closely approximate the target model F. As the number of instances

(C^{'}, S o r S^{'})

of the target model increased, the surrogate model came closer to the target model. Figure 7 and Table 5 reflect the trend of the model’s approximation for the number of instances.

Knowledge distillation was performed using different numbers of instances and evaluated the approximation of the two models for the entire dataset. The ALD (average logits distance) of the model output was used to measure the model similarity (Equation (9)).

A L D = 1 / n \sum m s e (F (C^{'}), F^{'} (C^{'}))

(9)

Through knowledge distillation, the attacker could use the optimized surrogate model

F^{'}

to analyze the secret image. As the number of instances increased, the recovered secret image became clearer (Figure 8). Figure 8 shows that the attacker only needed to obtain 32 instances (about 0.03% of the training set) to accurately decode the secret image without having access to the target model.

Black-box targeted attacks requires more precise gradients, and thus more instances were needed to achieve the ideal attack effect. As shown in Figure 9 and Table 6, when the attacker mastered 128 instances, the ideal attack effect was basically achieved.

Figure 10 shows the

L 2

-distance between different

ϵ

.

L 2

-distance was calculated between the decoded image and the target image or secret image. For targeted attacks, as the number of iterations increases, the decoded image gets closer to the target image. For the untargeted attack, as the number of iterations increases, the decoded image is far away from the secret image, which means that it is difficult to recover the correct secret image after the attack. The larger the

ϵ

, the better the results of the attacks.

In contrast to current DS steganalysis methods, our approach achieved multiple attack capabilities, including deleting secret images (DSI), modifying secret images (MSI), and analyzing secret image content (ASI), as shown in Table 7.

4.4. Discussion

4.4.1. Untargeted Attacks Are a Special Case of Targeted Attacks

The goal of an untargeted attack (Equation (2)) is that the container image is made as difficult to decode as possible, i.e., the MSE value between the decoded image

S^{'}

and the secret image S is as high as possible. For each pixel of S, if the pixel value is less than 255/2, then the MSE reaches its maximum when the pixel value at the corresponding position in

S^{'}

equals 255. Similarly, if the pixel value is greater than 255/2, the MSE reaches its maximum when the pixel value at the corresponding position in

S^{'}

equals 0. This implicitly specifies a binary target image T, derived from the mapping function g (Equation (10)),

T = g (S)

, where

x_{i j}

represents the pixel value of row i and column j. As shown in Figure 4, the decoded images after the attack were all close to binary images.

g (x_{i, j}) = \{\begin{matrix} 0, x_{i, j} > 255 / 2 \\ 255, x_{i, j} \leq 255 / 2 \end{matrix}

(10)

4.4.2. Adversarial Attack Is a New Deep Steganography Method

In general, current DS approaches are based on encoding and decoding networks [11,12,13,14,15,16,17,18,19], and the steganography of secret images requires the encoding of networks. We showed that the encoding of secret images could also be achieved through targeted adversarial attacks, even without an encoder. Table 8 compares the performance of DS (Table 1) and targeted adversarial attacks (Table 3). Evidently, adversarial attacks performed competitively in relation to DS. By fine-tuning the number of iterations and strides of the attack (Algorithm 1), the steganographic performance of the adversarial attack could be improved.

Adversarial attacks and DS procedures are two sides of the same coin. Both adversarial attacks and DS methods are based on the ability of deep models to easily extract features that are difficult for the human eye to perceive. During the training process of a DS model [11], one could use adversarial attacks to replace the encoding stage of DS. This new DS approach (Figure 11) no longer depends on the structure of the encoding-decoding network [11,12,13,14,15,16,17,18,19]. The training method of the DS model is described in Algorithm 3. Only the cover image is input into the network, and the container image can be obtained by an iterative adversarial attack on the cover image (Figure 11). Since this method was not the main contribution of this study, we will address it further in follow-up work.

Algorithm 3: DS based on adversarial attacks

4.4.3. Adversarial Training of the DS Model

In the training process of the DS model, Gaussian noise (GN) and adversarial attack perturbation (AAP) can be used to improve the robustness, but at the cost of accuracy (Figure 12). A comparison between columns 4 and 6 shows that the DS model had a certain level of robustness after noise enhancement. However, the green-framed image indicates that the quality of the image encoded by the DS model was greatly reduced after noise enhancement, and the outline of the secret image is visible, which is unacceptable for DS. After applying GN for robustness enhancement, the DS model was more robust to GN. However, after introducing AAP during training, model convergence became difficult. After adversarial training, the model had a degree of adversarial robustness, but the loss in accuracy led to an almost complete loss of invisibility for the DS model.

5. Conclusions

Inspired by adversarial examples, we proposed a novel deep steganalysis method based on adversarial attacks, realizing the application of secondary steganography to container images. Our method could effectively destroy, decode, and even modify DS-based covert communication content to avoid such an approach being used by criminals. The results indicated that the untargeted adversarial attack was equivalent to a particular case of targeted adversarial attack. The targeted adversarial attack could be regarded as a new DS method. Nevertheless, the adversarial-attack-based DS performed worse than encoding-decoding-networks-based DS. In the future, we will study the use of adversarial attacks to achieve improved DS performance and more functions. Further investigation into the design of DS schemes based on adversarial attacks and robust DS method to adversarial examples is warranted. We conclude that adversarial attacks are themselves a kind of DS, which suggests that DS systems could adopt the method of adversarial perturbation construction and do not have to be based on an encoding-decoding network structure.

Author Contributions

Conceptualization, F.T. and C.C.; methodology, F.T. and H.L.; software, F.T., H.L. and B.Z.; investigation, C.C. and J.S.; writing—original draft preparation, F.T., J.S. and C.C.; writing—review and editing, C.C., L.W. and J.S.; supervision, C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Joint Funds of the National Natural Science Foundation of China (No. U19B2044).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Morkel, T.; Eloff, J.H.; Olivier, M.S. An overview of image steganography. In Proceedings of the Fifth Annual Information Security South Africa Conference (ISSA2005), Sandton, South Africa, 29 June–1 July 2005; pp. 1–11. [Google Scholar]
Kadhim, I.J.; Premaratne, P.; Vial, P.J.; Halloran, B. Comprehensive survey of image steganography: Techniques, Evaluations, and trends in future research. Neurocomputing 2019, 335, 299–326. [Google Scholar] [CrossRef]
Li, B.; He, J.; Huang, J.; Shi, Y.Q. A survey on image steganography and steganalysis. J. Inf. Hiding Multim. Signal Process. 2011, 2, 142–172. [Google Scholar]
Cox, I.; Miller, M.; Bloom, J.; Fridrich, J.; Kalker, T. Digital Watermarking and Steganography; Morgan Kaufmann: Burlington, MA, USA, 2007. [Google Scholar]
Abraham, A.; Paprzycki, M. Significance of steganography on data security. In Proceedings of the International Conference on Information Technology: Coding and Computing, Las Vegas, NV, USA, 5–7 April 2004; Volume 2, pp. 347–351. [Google Scholar]
Fridrich, J.; Goljan, M.; Du, R. Detecting LSB steganography in color, and gray-scale images. IEEE Multimed. 2001, 8, 22–28. [Google Scholar] [CrossRef]
Johnson, N.F.; Jajodia, S. Steganalysis of images created using current steganography software. In Proceedings of the Information Hiding: Second International Workshop, IH’98, Portland, OR, USA, 14–17 April 1998; pp. 273–289. [Google Scholar]
Amritha, P.; Sethumadhavan, M.; Krishnan, R. On the Removal of Steganographic Content from Images. Def. Sci. J. 2016, 66, 574. [Google Scholar] [CrossRef]
Hosam, O. Attacking image watermarking and steganography-a survey. Int. J. Inf. Technol. Comput. Sci. 2019, 11, 23–37. [Google Scholar] [CrossRef]
Zhang, C.; Lin, C.; Benz, P.; Chen, K.; Zhang, W.; Kweon, I.S. A brief survey on deep learning based data hiding, steganography and watermarking. arXiv 2021, arXiv:2103.01607v2. [Google Scholar]
Baluja, S. Hiding images in plain sight: Deep steganography. Adv. Neural Inf. Process. Syst. 2017, 30, 2069–2079. [Google Scholar]
Baluja, S. Hiding images within images. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 1685–1697. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Zhou, M.; Liu, B.; Li, T. Deep Image Steganography Using Transformer and Recursive Permutation. Entropy 2022, 24, 878. [Google Scholar] [CrossRef]
Chen, F.; Xing, Q.; Fan, C. Multilevel Strong Auxiliary Network for Enhancing Feature Representation to Protect Secret Images. IEEE Trans. Ind. Inform. 2021, 18, 4577–4586. [Google Scholar] [CrossRef]
Zhu, J.; Kaplan, R.; Johnson, J.; Fei-Fei, L. Hidden: Hiding data with deep networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 657–672. [Google Scholar]
Luo, X.; Zhan, R.; Chang, H.; Yang, F.; Milanfar, P. Distortion agnostic deep watermarking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 13548–13557. [Google Scholar]
Tancik, M.; Mildenhall, B.; Ng, R. Stegastamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2117–2126. [Google Scholar]
Wengrowski, E.; Dana, K. Light field messaging with deep photographic steganography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1515–1524. [Google Scholar]
Zhang, C.; Benz, P.; Karjauv, A.; Sun, G.; Kweon, I.S. Udh: Universal deep hiding for steganography, watermarking, and light field messaging. Adv. Neural Inf. Process. Syst. 2020, 33, 10223–10234. [Google Scholar]
Hayes, J.; Danezis, G. Generating steganographic images via adversarial training. Adv. Neural Inf. Process. Syst. 2017, 30, 1954–1963. [Google Scholar]
Zhang, C.; Benz, P.; Karjauv, A.; Kweon, I.S. Universal adversarial perturbations through the lens of deep steganography: Towards a fourier perspective. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 3296–3304. [Google Scholar]
Jung, D.; Bae, H.; Choi, H.S.; Yoon, S. Pixelsteganalysis: Pixel-wise hidden information removal with low visual degradation. IEEE Trans. Dependable Secur. Comput. 2023, 20, 331–342. [Google Scholar] [CrossRef]
Xiang, T.; Liu, H.; Guo, S.; Zhang, T. PEEL: A Provable Removal Attack on Deep Hiding. arXiv 2021, arXiv:2106.02779. [Google Scholar]
Zhong, S.; Weng, W.; Chen, K.; Lai, J. Deep-learning steganalysis for removing document images on the basis of geometric median pruning. Symmetry 2020, 12, 1426. [Google Scholar] [CrossRef]
Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Pevnỳ, T.; Filler, T.; Bas, P. Using high-dimensional image models to perform highly undetectable steganography. In Proceedings of the Information Hiding: 12th International Conference, IH 2010, Calgary, AB, Canada, 28–30 June 2010; pp. 161–177. [Google Scholar]
Qin, J.; Wang, J.; Tan, Y.; Huang, H.; Xiang, X.; He, Z. Coverless image steganography based on generative adversarial network. Mathematics 2020, 8, 1394. [Google Scholar] [CrossRef]
Shang, Y.; Jiang, S.; Ye, D.; Huang, J. Enhancing the security of deep learning steganography via adversarial examples. Mathematics 2020, 8, 1446. [Google Scholar] [CrossRef]
Zhu, X.; Lai, Z.; Zhou, N.; Wu, J. Steganography with High Reconstruction Robustness: Hiding of Encrypted Secret Images. Mathematics 2022, 10, 2934. [Google Scholar] [CrossRef]
Moosavi-Dezfooli, S.M.; Fawzi, A.; Fawzi, O.; Frossard, P. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1765–1773. [Google Scholar]
Chen, H.; Zhu, T.; Zhao, Y.; Liu, B.; Yu, X.; Zhou, W. Low-frequency Image Deep Steganography: Manipulate the Frequency Distribution to Hide Secrets with Tenacious Robustness. arXiv 2023, arXiv:2303.13713. [Google Scholar]
Yin, X.; Wu, S.; Wang, K.; Lu, W.; Zhou, Y.; Huang, J. Anti-rounding Image Steganography with Separable Fine-tuned Network. IEEE Trans. Circuits Syst. Video Technol. 2023. [Google Scholar] [CrossRef]
Pan, W.; Yin, Y.; Wang, X.; Jing, Y.; Song, M. Seek-and-hide: Adversarial steganography via deep reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7871–7884. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Feng, G.; Wu, H.; Zhang, X. Data hiding during image processing using capsule networks. Neurocomputing 2023, 537, 49–60. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef]
Liao, F.; Liang, M.; Dong, Y.; Pang, T.; Hu, X.; Zhu, J. Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1778–1787. [Google Scholar]
Ilyas, A.; Santurkar, S.; Tsipras, D.; Engstrom, L.; Tran, B.; Madry, A. Adversarial examples are not bugs, they are features. Adv. Neural Inf. Process. Syst. 2019, 32, 125–136. [Google Scholar]
Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial examples: Attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2805–2824. [Google Scholar] [CrossRef]
Mądry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (sp), San Jose, CA, USA, 22–24 May 2017; pp. 39–57. [Google Scholar]
Thys, S.; Van Ranst, W.; Goedemé, T. Fooling automated surveillance cameras: Adversarial patches to attack person detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
Xu, K.; Zhang, G.; Liu, S.; Fan, Q.; Sun, M.; Chen, H.; Chen, P.Y.; Wang, Y.; Lin, X. Adversarial t-shirt! evading person detectors in a physical world. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 665–681. [Google Scholar]
Komkov, S.; Petiushko, A. Advhat: Real-world adversarial attack on arcface face id system. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 819–826. [Google Scholar]
Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Black-box adversarial attacks with limited queries and information. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2137–2146. [Google Scholar]
Brendel, W.; Rauber, J.; Bethge, M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv 2017, arXiv:1712.04248. [Google Scholar]
Chen, P.Y.; Zhang, H.; Sharma, Y.; Yi, J.; Hsieh, C.J. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, Dallas, TX, USA, 3 November 2017; pp. 15–26. [Google Scholar]
Cheng, M.; Singh, S.; Chen, P.; Chen, P.Y.; Liu, S.; Hsieh, C.J. Sign-opt: A query-efficient hard-label adversarial attack. arXiv 2019, arXiv:1909.10773. [Google Scholar]
Byun, J.; Cho, S.; Kwon, M.J.; Kim, H.S.; Kim, C. Improving the transferability of targeted adversarial examples through object-based diverse input. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 15244–15253. [Google Scholar]
Li, M.; Deng, C.; Li, T.; Yan, J.; Gao, X.; Huang, H. Towards transferable targeted attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 641–649. [Google Scholar]
Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; Li, J. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9185–9193. [Google Scholar]
Liu, Y.; Chen, X.; Liu, C.; Song, D. Delving into transferable adversarial examples and black-box attacks. arXiv 2016, arXiv:1611.02770. [Google Scholar]
Li, Y.; Bai, S.; Zhou, Y.; Xie, C.; Zhang, Z.; Yuille, A. Learning transferable adversarial examples via ghost networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11458–11465. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
Le, Y.; Yang, X. Tiny imagenet visual recognition challenge. In CS 231N; Stanford University: Stanford, CA, USA, 2015. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Typical encoder-decoder network for deep stenography model. Line—1 attack path: delete the secret image. Line—2 attack path: modify the secret image. Line—3 normal path: the receiver can recover the correct secret image.

Figure 2. Transferability of DS. The first four columns are the encoded and decoded results of the target model, and the fifth column is the decoded image of the surrogate model for container

C^{'}

.

Figure 2. Transferability of DS. The first four columns are the encoded and decoded results of the target model, and the fifth column is the decoded image of the surrogate model for container

C^{'}

.

Figure 3. Surrogate training and knowledge distillation. The student model (surrogate model) is first initialized by a surrogate dataset. Then, several instances of the teacher model (the target model) are used to further refine the student model.

Figure 4. White-box untargeted attack based on Algorithm 1 (

ϵ = 0.03

and

ϵ = 0.06

) and Algorithm 2 (

l r = 0.01

). The first four columns contain the results of target model encoding and decoding. Column 5 contains the results

C^{″}

of Algorithm 1 (

ϵ = 0.03

)’s attack on the encoded image

C^{'}

of the target model, and column 6 contains the decoding results of the image

C^{″}

(column 5) produced by the target model. Similarly, columns 7 and 8 present the attack effects of Algorithm 1 (

ϵ = 0.06

). Column 9 presents the results of Algorithm 2 (

l r = 0.01

)’s attack on the encoded image

C^{'}

of the target model. Column 10 contains the decoding results of the target model for image

C^{″}

(column 9).

Figure 4. White-box untargeted attack based on Algorithm 1 (

ϵ = 0.03

and

ϵ = 0.06

) and Algorithm 2 (

l r = 0.01

). The first four columns contain the results of target model encoding and decoding. Column 5 contains the results

C^{″}

of Algorithm 1 (

ϵ = 0.03

)’s attack on the encoded image

C^{'}

of the target model, and column 6 contains the decoding results of the image

C^{″}

(column 5) produced by the target model. Similarly, columns 7 and 8 present the attack effects of Algorithm 1 (

ϵ = 0.06

). Column 9 presents the results of Algorithm 2 (

l r = 0.01

)’s attack on the encoded image

C^{'}

of the target model. Column 10 contains the decoding results of the target model for image

C^{″}

(column 9).

Figure 5. White-box targeted attack based on Algorithm 1 (

ϵ = 0.03

) and Algorithm 2 (

l r = 0.01

). The first two columns present the results of target model decoding. Column 7 contains the images that the attacker wants the target model to decode. Column 3 contains the results

C^{″}

of Algorithm 1 (

ϵ = 0.03

) attack on the encoded image

C^{'}

of the target model, and column 4 contains the decoding results of the image

C^{″}

(column 3) produced by the target model. Column 5 presents the results of Algorithm 2 (

l r = 0.01

)’s attack on the encoded image

C^{'}

of the target model. Column 6 presents the decoding results of the target model for image

C^{″}

(column 5).

Figure 5. White-box targeted attack based on Algorithm 1 (

ϵ = 0.03

) and Algorithm 2 (

l r = 0.01

). The first two columns present the results of target model decoding. Column 7 contains the images that the attacker wants the target model to decode. Column 3 contains the results

C^{″}

of Algorithm 1 (

ϵ = 0.03

) attack on the encoded image

C^{'}

of the target model, and column 4 contains the decoding results of the image

C^{″}

(column 3) produced by the target model. Column 5 presents the results of Algorithm 2 (

l r = 0.01

)’s attack on the encoded image

C^{'}

of the target model. Column 6 presents the decoding results of the target model for image

C^{″}

(column 5).

Figure 6. Black-box untargeted attack based on Algorithm 1. The first two columns present the results of target model decoding. Column 3 contains the results

C^{″}

of Algorithm 1 (

ϵ = 0.03

)’s attack on the encoded image

C^{'}

of the target model, and column 4 contains the decoding results of the image

C^{″}

(column 3) by the target model. Similarly, columns 5 and 6 present the attack effects of Algorithm 1 (

ϵ = 0.06

).

Figure 6. Black-box untargeted attack based on Algorithm 1. The first two columns present the results of target model decoding. Column 3 contains the results

C^{″}

of Algorithm 1 (

ϵ = 0.03

)’s attack on the encoded image

C^{'}

of the target model, and column 4 contains the decoding results of the image

C^{″}

(column 3) by the target model. Similarly, columns 5 and 6 present the attack effects of Algorithm 1 (

ϵ = 0.06

).

Figure 7. Knowledge distillation between F and

F^{'}

. The horizontal axis shows the number of instances using the teacher model (target model). The vertical axis shows the approximation between the teacher model (target model) and the student model (surrogate model) after knowledge distillation for the corresponding number of instances.

Figure 7. Knowledge distillation between F and

F^{'}

. The horizontal axis shows the number of instances using the teacher model (target model). The vertical axis shows the approximation between the teacher model (target model) and the student model (surrogate model) after knowledge distillation for the corresponding number of instances.

Figure 8. Content analysis of secret images.

S_n (n = 0, 32, \dots, 2048)

represents the secret image decoded by the surrogate model

F^{'}

when using n instances for knowledge distillation.

Figure 8. Content analysis of secret images.

S_n (n = 0, 32, \dots, 2048)

represents the secret image decoded by the surrogate model

F^{'}

when using n instances for knowledge distillation.

Figure 9. Black-box targeted attack.

S_n (n = 0, 32, \dots, 2048)

represents the decoding after a black-box targeted attack of Algorithm 1 (

ϵ = 0.06

) when n instances are used for knowledge distillation.

Figure 9. Black-box targeted attack.

S_n (n = 0, 32, \dots, 2048)

represents the decoding after a black-box targeted attack of Algorithm 1 (

ϵ = 0.06

) when n instances are used for knowledge distillation.

Figure 10.

L 2

-distance between different

ϵ

. White-box methods (top) and black-box method (bottom). Target attack (left) and untargeted attack (right).

Figure 10.

L 2

-distance between different

ϵ

. White-box methods (top) and black-box method (bottom). Target attack (left) and untargeted attack (right).

Figure 11. DS Based on Adversarial Attack. Use the targeted adversarial attacks of Algorithm 1 to replace the encoding process of DS model. Encoding phase: the decoding loss guides the targeted adversarial attacks to generate perturbations. Decoding stage: optimize the decoder’s parameters according to the decoding loss. The adversarial attacks and the model’s decoding alternate during the training phase.

Figure 12. Robustness of the DS Model. Trained without (left) and with (right) noise. Evaluated with GN (top) and AAP (bottom). Columns 1 and 2 present the encoding and decoding images of the DS model without noise training, respectively. Column 3 shows the container image after GN attack and AAP attack, while Column 4 contains decoded images. Column 5 presents the encoded images of the DS model trained with GN and AAP noise, respectively, while column 6 contains the corresponding decoded images.

Table 1. Performance of DS on target model and surrogate model.

	$(C, C^{'})$			$(S, S^{'})$
	PSNR	SSIM	APE	PSNR	SSIM	APE
F	36.09	0.98	3.02	34.07	0.98	3.88
$F^{'}$	35.83	0.98	3.01	32.96	0.98	4.18

Table 2. White-box untargeted attacks.

	$(C^{'}, C^{'} + δ)$			$(S^{'}, S^{''})$
	PSNR	SSIM	APE	PSNR	SSIM	APE
GN (0.3)	33.26	0.95	4.35	12.86	0.47	45.04
GN (0.5)	30.59	0.92	5.98	9.02	0.29	71.89
GB	27.92	0.94	6.85	15.40	0.80	35.58
$δ = 0.03$	37.32	0.98	3.33	4.76	0.16	136.24
$δ = 0.06$	32.46	0.95	5.51	3.22	0.04	166.27

Table 3. White-box targeted attacks.

	$(C^{'}, C^{'} + δ)$			$(S^{''}, Target)$
	PSNR	SSIM	APE	PSNR	SSIM	APE
$δ = 0.03$	39.09	0.99	2.53	23.58	0.86	11.43
$δ = 0.06$	37.88	0.98	2.66	30.49	0.95	5.74

Table 4. Black-box untargeted attacks.

	$(C^{'}, C^{'} + δ)$			$(S^{'}, S^{''})$
	PSNR	SSIM	APE	PSNR	SSIM	APE
GN (0.3)	33.26	0.95	4.35	12.86	0.47	45.04
GN (0.5)	30.59	0.92	5.98	9.02	0.29	71.89
GB	27.92	0.94	6.85	15.40	0.80	35.58
$δ = 0.03$	36.99	0.98	3.53	8.76	0.34	76.53
$δ = 0.06$	31.86	0.94	6.12	6.22	0.18	104.94

Table 5. Knowledge distillation between F and

F^{'}

.

Table 5. Knowledge distillation between F and

F^{'}

.

Instances Numbers	32	64	128	256	512	1024	2048
ALD	0.0276	0.0124	0.0055	0.0026	0.0015	0.0010	0.0006

Table 6. Black-box targeted attacks.

Instance Numbers	$δ$	$(C^{'}, C^{'} + δ)$			$(S^{''}, Target)$
		PSNR	SSIM	APE	PSNR	SSIM	APE
0 (Initial)	0.03	40.92	0.99	1.94	11.85	0.35	52.44
	0.06	38.05	0.98	2.59	12.19	0.45	49.81
32	0.03	40.23	0.99	2.13	16.48	0.58	29.85
	0.06	37.91	0.98	2.54	17.64	0.69	25.61
64	0.03	39.46	0.99	2.38	19.38	0.72	19.79
	0.06	37.73	0.98	2.64	20.97	0.77	17.21
128	0.03	39.37	0.99	2.43	21.81	0.78	14.51
	0.06	37.72	0.98	2.67	24.66	0.88	11.22
256	0.03	39.60	0.99	2.35	22.60	0.80	13.17
	0.06	37.98	0.98	2.61	27.14	0.88	8.48
512	0.03	39.57	0.99	2.35	23.65	0.82	11.43
	0.06	37.85	0.98	2.65	28.13	0.93	7.46
1024	0.03	39.73	0.99	2.30	24.19	0.81	10.67
	0.06	38.07	0.98	2.64	28.91	0.91	6.85
2048	0.03	39.73	0.99	2.30	24.04	0.84	10.86
	0.06	37.90	0.98	2.66	29.44	0.93	6.50

Table 7. Comparison with current steganalysis methods for DS. We use 🗸 and × to symbolize achievable and unattainable, respectively.

	DSI	MSI	ASI
Jung et al. [22]	🗸	×	×
Xiang et al. [23]	🗸	×	×
Zhong et al. [24]	🗸	×	×
our	🗸	🗸	🗸

Table 8. Performance of DS (DH) and Targeted Adversarial Attacks (TAA).

	$(C, C^{'})$			$(S, S^{'})$
	PSNR	SSIM	APE	PSNR	SSIM	APE
DH	36.09	0.98	3.02	34.07	0.98	3.88
TAA $(δ = 0.06)$	37.88	0.98	2.66	30.49	0.95	5.74

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tao, F.; Cao, C.; Li, H.; Zou, B.; Wang, L.; Sun, J. Adversarial Attack for Deep Steganography Based on Surrogate Training and Knowledge Diffusion. Appl. Sci. 2023, 13, 6588. https://doi.org/10.3390/app13116588

AMA Style

Tao F, Cao C, Li H, Zou B, Wang L, Sun J. Adversarial Attack for Deep Steganography Based on Surrogate Training and Knowledge Diffusion. Applied Sciences. 2023; 13(11):6588. https://doi.org/10.3390/app13116588

Chicago/Turabian Style

Tao, Fangjian, Chunjie Cao, Hong Li, Binghui Zou, Longjuan Wang, and Jingzhang Sun. 2023. "Adversarial Attack for Deep Steganography Based on Surrogate Training and Knowledge Diffusion" Applied Sciences 13, no. 11: 6588. https://doi.org/10.3390/app13116588

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adversarial Attack for Deep Steganography Based on Surrogate Training and Knowledge Diffusion

Abstract

1. Introduction

2. Related Works

2.1. Deep Steganography

2.2. Deep Steganalysis

2.3. White-Box Adversarial Attacks

2.4. Black-Box Adversarial Attacks

3. Adversarial Attack Method for Deep Steganography

3.1. Preliminaries

3.2. Attack on the Container Image

3.2.1. Adversarial Attack to Destroy the Secret Image

3.2.2. Adversarial Attack to Modify the Secret Image

3.3. Adversarial Attack Based on Surrogate Training and Knowledge Distillation

3.3.1. Threat Model for Deep Steganography

3.3.2. Initialization through Surrogate Training

3.3.3. Optimization of Surrogate Model Based on Knowledge Distillation

4. Experiment

4.1. Experimental Setup

4.1.1. Datasets

4.1.2. Evaluation Metrics

4.1.3. Deep Steganography

4.1.4. Baselines

4.2. White-Box Attack Results

4.3. Black-Box Attack Results

Secret Image Analysis and Black-Box Target Attack

4.4. Discussion

4.4.1. Untargeted Attacks Are a Special Case of Targeted Attacks

4.4.2. Adversarial Attack Is a New Deep Steganography Method

4.4.3. Adversarial Training of the DS Model

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI