Interpretation of Latent Codes in InfoGAN with SAR Images

Feng, Zhenpeng; Daković, Miloš; Ji, Hongbing; Zhou, Xianda; Zhu, Mingzhe; Cui, Xiyang; Stanković, Ljubiša

doi:10.3390/rs15051254

Open AccessArticle

Interpretation of Latent Codes in InfoGAN with SAR Images

by

Zhenpeng Feng

¹

,

Miloš Daković

²

,

Hongbing Ji

¹,

Xianda Zhou

^3,*,

Mingzhe Zhu

¹

,

Xiyang Cui

¹ and

Ljubiša Stanković

²

¹

School of Electronic Engineering, Xidian University, Xi’an 710071, China

²

Faculty of Electricty Engineering, University of Montenegro, 81000 Podgorica, Montenegro

³

National Key Laboratory of Science and Technology on Aerospace Intelligence Control, Beijing Aerospace Automatic Control Institute, Beijing 100854, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(5), 1254; https://doi.org/10.3390/rs15051254

Submission received: 16 December 2022 / Revised: 20 February 2023 / Accepted: 23 February 2023 / Published: 24 February 2023

(This article belongs to the Special Issue Intelligent Remote Sensing Data Interpretation)

Download

Browse Figures

Versions Notes

Abstract

:

Generative adversarial networks (GANs) can synthesize abundant photo-realistic synthetic aperture radar (SAR) images. Some modified GANs (e.g., InfoGAN) are even able to edit specific properties of the synthesized images by introducing latent codes. It is crucial for SAR image synthesis since the targets in real SAR images have different properties due to the imaging mechanism. Despite the success of the InfoGAN in manipulating properties, there still lacks a clear explanation of how these latent codes affect synthesized properties; thus, editing specific properties usually relies on empirical trials, which are unreliable and time-consuming. In this paper, we show that latent codes are almost disentangled to affect the properties of SAR images in a nonlinear manner. By introducing some property estimators for latent codes, we are able to decompose the complex causality between latent codes and different properties. Both qualitative and quantitative experimental results demonstrate that the property value can be computed by the property estimators; inversely, the required latent codes can be computed given the desired properties. Unlike the original InfoGAN, which only provides the visual trend between properties and latent codes, the properties of SAR images can be manipulated numerically by latent codes as users expect.

Keywords:

interpreting neural networks; InfoGAN; SAR image synthesis

1. Introduction

Synthetic aperture radar (SAR) is considered a well-established technology for providing day-and-night and weather-independent remote sensing images. Therefore, it is widely used in geological exploration, ocean research, disaster monitoring, the military, environmental and earth system monitoring, etc. [1,2,3,4,5,6]. However, SAR is always an expensive means of imaging because the expenditure for airplane flights or satellites launch is much higher than other optical or infrared imaging devices [7,8]. Hence, the cost of obtaining abundant SAR images is quite high.

To obtain SAR images in an efficient, effective, and economic manner, numerous generative models are utilized to synthesize images, such as variational auto-encoder (VAE) [9,10,11], generative adversarial network (GAN) [12,13,14,15,16], and diffusion models [17,18,19,20,21]. The VAE takes an image from a target distribution and compresses it into a low-dimensional latent space [9,10]. Then, the decoder’s mission is to take that latent space representation and reproduce the original image [9]. The GAN’s generator directly samples from a relatively low-dimensional random variable and produces an image. Then, the discriminator predicts whether the produced image belongs to a target distribution or not [12]. Diffusion models are inspired by nonequilibrium thermodynamics. They define a Markov chain of diffusion steps to slowly add random noise to data and then learn to reverse the diffusion process to construct desired data samples from the noise [17,18]. In this paper, we will focus on the Information Maximization GAN (InfoGAN).

The GAN was proposed by Goodfellow et al., containing a generator network, G, and a discriminator network, D [12,22]. The generator manages to approximate the real data distribution from a random distribution, and the discriminator estimates the probability that the input sample is a real image or synthesized by the generator. Such optimization is achieved by a mini-max two-player game, thus, it is termed “adversarial”. It should be noted that the GAN only adopts a simple noise vector as the input to G without imposing any restrictions on how the generator uses this noise [22]. In this case, the direction of image generation can hardly be controlled as we expect since the noise is used by the generator in a highly entangled way [23]. However, SAR images naturally include some semantically meaningful properties due to the imaging mechanism. For instance, some rotation, translation, and scaling of the target usually emerge with different view angles between the radar and the target [13]. To further control the generation direction of the GAN, X. Chen et al. proposed the InfoGAN to further disentangle the input noise by introducing latent codes [24]. A strong correlation between latent codes and those properties will be established by maximizing their mutual information during the InfoGAN’s training.

Although the InfoGAN can generate SAR images with semantically meaningful properties by latent codes, the relation between properties and latent codes still lacks clear interpretation [23,25]. It brings in two problems: (1) How is the property value obtained given a set of latent codes? (2) How are satisfying latent codes obtained given a desired property value? In this paper, we introduce several property estimators to interpret the relation between properties and latent codes in different cases. The results show that a single latent code retains an approximately tanh relation with a certain property while multiple latent codes are combined to edit different properties in a complex nonlinear manner. The main contribution of this paper is that a clear interpretation is provided of the relation between properties and latent codes, providing the possibility to edit the properties analytically by manipulating latent codes as users expect.

This paper aims to provide a numerical interpretation of the relation between some properties of generated SAR images and latent codes in the InfoGAN. The highlight of this work is that users can control those properties by manipulating latent codes. In the original InfoGAN, the relation between properties and latent codes is observed only empirically. The rest of this paper is organized as follows. Section 2 introduces how these properties emerge in SAR imaging and the mechanism of the InfoGAN. Section 3 describes how to quantify the relation between properties and latent codes. In Section 4, experimental results are provided and analyzed with fully simulated, semi-simulated, and real SAR images (with/without a background) in various cases. Section 5 provides some discussion on the experiments. Section 6 concludes this paper.

2. Background Knowledge and Motivation

2.1. Basic SAR Principles

A radar image is obtained by transmitting repeated pulses and processing the echoes returned from the target [26,27,28,29,30,31,32]. A common choice for the pulse is a linear-frequency-modulated continuous-wave (LFM-CW) signal, transmitted in the form of a series of chirps. The received signal, which is scattered from a target, is delayed and changed in amplitude as compared to the transmitted signal, containing in that way the information about the target position and reflectivity. The received signal from an elementary (a point) scatter, after an appropriate mixing with the transmitted signal, demodulation, compensation, and residual video phase filtering is of the form [1]

\begin{matrix} S (m, t) = σ exp (j ω_{0} \frac{2 d (t)}{c}) exp (- j 2 π \frac{B (t - m T_{r})}{T_{r}} \frac{2 d (t)}{c}) \end{matrix}

(1)

where

σ

is the reflection coefficient of the scattering point,

ω_{0}

is the radar operating frequency,

exp (j ω_{0} \frac{2 d (t)}{c})

is the scattering phase, and

exp (- j 2 π \frac{B}{T_{r}} (t - m T_{r}) \frac{2 d (t)}{c})

describes the phase variation due to the varying distance. The transmission and receiving procedure is repeated every

T_{r}

second (the pulse repetition interval—PRI).

In SAR imaging, the radar platform movement is crucial in producing a high-resolution image. Therefore, SAR systems are based on a pulsed radar installed on a platform with a forward movement. The distance between the radar moving at a constant velocity v and a point target on the ground can be described as [2]

\begin{matrix} d (t) = \sqrt{d_{0}^{2} + {(v t)}^{2}} \end{matrix}

(2)

where

t = 0

is the time of closest approach when the distance is minimum as

d (0) = d_{0}

. Assume M pulses are transmitted and N range cells are inside a pulse interval,

t = n T_{s}

. The received echo signal can form an

M \times N

data matrix of complex samples. The column dimension corresponds to the range direction. Note, the radar acquires a range line in each PRI thus forming the row dimension of the data matrix, termed the azimuth direction. In the case of multi-point targets, the superposition principle applies. Therefore, the raw SAR data are the echoes from the illuminated scene (of multiple points or even continuous targets) sampled both in the range direction and azimuth direction.

Different from optical sensors, however, raw SAR data do not provide any visible information on the scene [1]. It is only after basic SAR processing steps that an image is obtained. In a very simplified way, the complete processing can be understood as two separate matched filter operations along with the range and azimuth dimensions; instead of performing a convolution in the time domain, multiplication in the frequency domain is adopted due to the much lower computational load. The first step is to compress the transmitted chirp signals to a short pulse. Azimuth compression follows the same basic reasoning; that is, the signal is convolved with its reference function, which is the complex conjugate of the response expected from a point target on the ground. The SAR image is efficiently calculated using, for example, the two-dimensional fast Fourier transform (FFT) codes [33].

To know a target or scene for analysis, detection, or classification, it is desirable to have its SAR image acquired from different positions [34,35]. Different relative viewing angles (resulting from changes in flight direction or target movement in different revisits) result in a kind of target rotation in SAR images. The radar revisits could be also conducted from different distances to the target or the target could move between revisits resulting in a kind of target shifting and/or scaling in the SAR image. These kinds of target changes in radar images will be referred to as properties of the target, as illustrated in Figure 1. In some cases, numerous revisits or observations may be expensive or in some hostile or unique environments even not possible. Then, it would be of interest to use the available set of data and try to synthesize new possible images, preferably with controlled properties, defined by, for example, different rotations, translation, and scaling that would at the same time fully correspond to the existing data. To this aim, we will present and apply the GAN and InfoGAN.

2.2. GAN and InfoGAN

The main task of a generative adversarial network is to train a transposed neural network to produce images that match real images

x_{n}

from a set

P

[12,36]. It means that the GAN learns a generator (transposed convolution neural network), denoted by G, to synthesize images close to

P

by feeding the generator with a noise vector

z

, commonly Gaussian or uniformly distributed.

G (z)

denotes an image from a set of generated images,

P_{G}

. The generator is trained against an adversarial discriminator network, D, whose structure corresponds to a convolution neural network with the aim to distinguish (discriminate the cases) if the sample image as the input to the discriminator is from the true dataset of images,

P

, or from the generator-produced set of images,

P_{G}

. The basic structure of a GAN is included in Figure 2.

After both networks, the generator and the discriminator, are initialized by random weights, the training process is defined based on the loss function. First, we will consider the discriminator only. At its input, we have an image (as is common for a convolutional neural network), either a sample image

x

from the set of real data,

P

, or a synthesized image,

G (z)

, produced by the generator with a random input noise,

z

. The output of the discriminator is a scalar denoted by

D (\cdot)

. It is either

D (x)

or

D (G (z))

. The output value of the discriminator is normalized such that

0 \leq D (x), D (G (z)) \leq 1

. The aim of the discriminator is to discriminate the cases when the input is (i) a real image from

P (x)

or (ii) a generated “fake” (synthesized) image

G (z)

, by learning to produce the output values

D (x)

close to 1 and the values

D (G (z))

close to 0. The target signal, which will be used during the supervised learning, will be denoted by

y_{x}

. It assumes that the values are as follows:

$y_{x} = 1$ if the input to the discriminator is a real image $x$ from the set $P (x)$ ;
$y_{x} = 0$ if the input to the discriminator is a synthesized image, $G (z)$ , being output from the generator.

The value of the target signal,

y_{x}

, is provided at the output of the discriminator as a reference signal for the loss function calculation during the training process. A simple loss function could be in a quadratic form

L (D) = y_{x} D^{2} (x) + (1 - y_{x}) {(1 - D (G (z)))}^{2} .

(3)

This function assumes only one of two values

L \in {D^{2} (x), (1 - D {(G (z))}^{2}}

. Since

0 \leq D (x), D (G (z)) \leq 1

, the loss function will reach its maximum value

L (D) = 1

for any input to the discriminator, either

x

or

G (x)

, if

D (x) = 1

and

D (G (z)) = 0

. Therefore, by maximizing the loss function

L (D)

, we can achieve the ideal discriminator performance.

In a vanilla GAN, the cross-entropy form of the loss function is used (with the same aim and the same qualitative analysis as in the quadratic loss function) [37]. The cross-entropy form of the loss function is defined by

y_{x} log D (x) + (1 - y_{x}) log (1 - D (G (z))

, with the learning process for the discriminator neural network defined as

max_{D} L (D) = max_{D} {y_{x} log D (x) + (1 - y_{x}) log (1 - D (G (z))} .

(4)

It is easy to conclude that, for

0 \leq D (x), D (G (z)) \leq 1

, this loss function achieves its maximum

L (D) = 0

when

D (x) = 1

and

D (G (z)) = 0

.

The maximization of the cross-entropy loss function is commonly conducted over a set (mini-batch) of input real images,

x_{1}

,

x_{2}

,…,

x_{m}

, and generated images,

G (z_{1}), G (z_{2}), \dots, G (z_{m})

. The expression for the cross-entropy loss function will also be simplified by omitting

y_{x_{i}}

. Namely, it will be assumed that the input to the discriminator is fed by alternating

x_{1}

and

G (z_{1})

, then

x_{2}

and

G (z_{2})

, and so on in succession until

x_{m}

and

G (z_{m})

. In this way, we may write two loss function terms: (i)

log D (x_{i})

for

x_{i}

and (ii)

log (1 - D (G (z_{i})))

for

G (z_{i})

as

log D (x_{i}) + log (1 - D (G (z_{i})))

, for each

i = 1, 2, \dots, m

. The mean value over

2 m

images (m real images and m generated images) is then defined by

max_{D} L (D) = max_{D} \frac{1}{m} \sum_{i = 1}^{m} (log D (x_{i}) + log (1 - D (G (z_{i})))) .

(5)

After the discriminator is trained (in the first cycle) based on the loss function (5), its weights are frozen and the generator network is now trained for this cycle. Since the generator does not have any knowledge about the real images, the part

log D (x)

is not used in the loss function for the generator weight training (only generated images are used, when

y_{x_{i}} = 0

). The aim of the generator is to produce images as similar to those from the set

P (x)

as possible. Within the loss function framework, this aim will be achieved if the generator can close the gap between the discriminator output values

D (x)

and

D (G (x))

as much as possible. Since it can not change

D (x)

, this should be conducted by increasing the value of

D (G (x))

toward 1 or, in other words, by making the new loss function

L (G) = log (1 - D (G (z)))

as small as possible, that is (within the same mini-batch), find

min_{G} {\frac{1}{m} \sum_{i = 1}^{m} log (1 - D (G (z_{i})))} .

(6)

After the generator is trained in this way (in the first cycle), its weights are frozen and the discriminator network is trained again within the second cycle. These cycles are continued for a defined number of echoes when the GAN is assumed to be fully trained. In the ideal case, after the training is finished, the discriminator will not be able to discriminate the real and the synthesized images from the generator, meaning it will produce the output

D (x) = D (G (z)) = 1 / 2

and the loss function value of form (5) will be

L (D) = 2 log (1 / 2) = - 4

.

The combined loss function of GAN for both the discriminator and the generator can be summarized by the following expression:

\begin{matrix} min_{G} max_{D} L (G, D) = & E_{x} {log D (x)} \\ + E_{z} {log (1 - D (G (z)))} . \end{matrix}

(7)

It is clear from (7) that no restrictions are imposed on the input noise data; thus, the properties are highly entangled in generated images. To generate images with semantically meaningful properties, the InfoGAN introduces latent codes,

c = [c_{1}, c_{2}, \dots, c_{n}]

, and a classifier, Q, with the same architecture sharing the trainable parameters with the discriminator. The purpose of classifier is to maximize the mutual information

I (c; G (z, c))

between

c

and

G (z, c)

, defined as follows:

\begin{matrix} I (c; G (z, c)) = H (c) - H (c | (z, c)) \end{matrix}

(8)

where

H (c) = - \sum_{i} p (c_{i}) log (p (c_{i}))

is the entropy of

c = [c_{1}, c_{2}, \dots, c_{n}]

. The mutual information

I (c; G (z, c))

means that if

c

and

G (z, c)

are independent, then

I (c; G (z, c)) = 0

, because knowing c reveals nothing about the

G (z, c)

(degraded to the classic GAN); by contrast, if

c

and

G (z, c)

are strongly related, then maximal mutual information is attained. It means that the information in the latent code

c

should not be lost in the generation process. Hence, the information-regularized loss function is as follows:

\begin{matrix} min_{G} max_{D} L_{I} (G, D) = & E_{x} {log D (x)} + E_{z} {log (1 - D (G (z)))} \\ + λ I (c; G (z, c)) . \end{matrix}

(9)

Figure 2 shows the architecture of an InfoGAN. To show the difference between the GAN and InfoGAN vividly, we particularly provide some generated images from two networks in Figure 3. Here, we set one latent code,

c_{1}

, in the InfoGAN and show the generated images corresponding to 25 values of

c_{1}

uniformly distributed within

[- 1

,

1]

. We further utilize a commonly used quantitative measurement, i.e., Fréchet Inception Distance (FID) [38,39], to evaluate the quality of generated SAR images produced by the GAN and InfoGAN, respectively. The FID measures similarity between two sets of images (

z_{1}

and

z_{2}

), and it is defined as follows:

\begin{matrix} FID (z_{1}, z_{2}) = {∥ μ_{z_{1}} - μ_{z_{2}} ∥}_{2}^{2} + Tr [Ψ_{z_{1}} + Ψ_{z_{2}} - 2 {(Ψ_{z_{1}} Ψ_{z_{2}})}^{\frac{1}{2}}], \end{matrix}

(10)

where

{∥ ∥}_{2}^{2}

denotes the squared

L_{2}

norm,

μ

is the mean of a dataset,

T r

denotes the trace of a matrix,

Ψ_{z_{1}}

and

Ψ_{z_{2}}

refer to the covariance matrices of

z_{1}

and

z_{2}

, respectively. Hence, a small value of FID means a high similarity between two datasets (

F I D = 0

only when

z_{1}

is completely the same as

z_{2}

). We computed the

F I D

between SAR images generated by two GAN models and training SAR images, as shown in Table 1. It is clear that the images generated by two GANs are almost equally similar to the training images (slightly favoring the InfoGAN), while only the InfoGAN focuses on property manipulation using latent codes.

3. Methodology

Next, we will consider SAR images of the target taken with various setups and relate them to the latent codes in the InfoGAN. The aim is to train the InfoGAN to synthesize available images with various target properties and to produce new ones by changing latent codes. This process could be controlled by relating the latent codes to the SAR image transformations. Cases with one and two properties will be considered. In the analysis of one property, we will use one or two latent codes, while in the case of two-properties two latent codes are used.

3.1. Property Measurement

When the radar illuminates a target (for example, a vehicle, a ship, or any other object of interest) in two different visits, SAR images may differ due to different viewing angles, target maneuvering, or different distances between the radar and the target in these two illuminations. The changes in the radar image can be described by a rotation (with possible changes in the reflectivity or visibility of some scatters in the target). Other possible change in the SAR image results from the possible distance change between the radar and the target and may be described by a scaling of the target in SAR images (with possible changes in the radar image structures due to the fusing or separation of close scatters due to the resolution values). This will be referred to as the scaling property. In addition, the target relative position can be changed in two different illuminations, causing shifts in the radar image.

To quantify these properties of radar images, we should introduce their relative measures with respect to one SAR image, assumed to be the reference image. To this aim, we will use the cross-correlation function to evaluate the similarity between two images [40]. Assume

X

and

Y

are two images of the same size,

N \times N

. The cross-correlation between these two images,

r (X, Y)

, is defined as

\begin{matrix} r (X, Y) & = \frac{\sum_{i} \sum_{j} (X (i, j) - \bar{X}) \sum_{i} \sum_{j} (Y (i, j) - \bar{Y})}{\sqrt{\sum_{i} \sum_{j} {(X (i, j) - \bar{X})}^{2}} \sqrt{\sum_{i} \sum_{j} {(Y (i, j) - \bar{Y})}^{2}}} \end{matrix}

(11)

\begin{matrix} \bar{X} & = \frac{1}{N^{2}} \sum_{i} \sum_{j} X (i, j), \bar{Y} = \frac{1}{N^{2}} \sum_{i} \sum_{j} Y (i, j) \end{matrix}

(12)

where

\bar{X}

and

\bar{Y}

denote the mean of images

X

and

Y

, and the denominator normalizes the cross-correlation to the range from 0 to 1. The summation range is from 1 to N for all sums in (11) and (12). It can be observed that

r (X, Y)

will be 1 if

X = Y

, and

r (X, Y)

will assume a value smaller than 1 if

X

is becoming more different from

Y

.

If we want to use cross-correlation to measure the translation of a target

I_{j}

with respect to the reference image

I_{0}

, then we will perform the translation operation of the reference image

I_{0}

for different

d_{x}

with steps

Δ d_{x}

and

d_{y}

with steps

Δ d_{y}

, denoted by

T_{δ} {I_{0}}

, and find the resulting translation parameter as the position

d_{x}

,

d_{y}

when the maximum of the function

r (T_{δ} {I_{0}}, I_{j})

is found

\begin{matrix} δ_{S} (j) = arg max_{δ} {r (T_{δ} {I_{0}}, I_{j})}, \end{matrix}

(13)

where

δ_{S}

is, in general, a vector, with corresponding shifts in the direction of the range and cross-range [6].

Similarly, we say that the original image is rotated for

δ_{R}

when the maximum of the cross-correlation between the reference image, rotated for an angle

δ_{R}

, and the considered image

I_{j}

is found, that is

\begin{matrix} δ_{R} (j) = arg max_{δ} {r (R_{δ} {I_{0}}, I_{j})}, \end{matrix}

(14)

where now

R_{δ} {I_{0}}

denotes the reference image rotated for an angle

δ_{R} (j)

. The rotated and reference image may differ in reflectivity, meaning that the maximum value of the cross-correlation will not be equal to one. To reduce the influence of the variations in the reflectivity during the rotations, we can introduce a threshold (limiting) or even consider only the support functions (the support function of an image assumes value 0 where the image is 0 or close to 0 and 1 otherwise) of the considered objects. The rotation parameter is then calculated as

\begin{matrix} δ_{R} (j) = arg max_{δ} {r (R_{δ} {H_{T} {I_{0}}}, H_{T} {I_{j}})}, \end{matrix}

(15)

where

H_{T} {I}

denotes the limited version of the image

I

, with a threshold T, that is

\begin{matrix} H_{T} {I (i, j)} = \{\begin{matrix} I (i, j) & for I (i, j) \leq T \\ T & for I (i, j) > T . \end{matrix} \end{matrix}

(16)

Finally, the scaling property is defined in the same way, as the position of the maximum correlation between the considered image

I_{j}

and the scaled reference image

S_{δ} {I_{0}}

for a scaling parameter

δ

, that is

\begin{matrix} δ_{A} (j) = arg max_{δ} {r (S_{δ} {I_{0}}, I_{j})} . \end{matrix}

(17)

After we introduced measures of various mage transformations, we are now ready to relate them with latent codes in the InfoGAN.

3.2. Relation of the Properties and Latent Codes

As mentioned above, we have three collocations of property–latent code pairs, i.e., one property—one latent code, one property—two latent codes, and two properties—two latent codes. It is necessary to clarify that the cross-correlation will not be sensitive enough to gauge each individual property if we combine three properties together. To simplify the issue and avoid entanglement among latent codes, we only consider one latent code and two latent codes. It is the reason why such three collocations are set in our experiments.

One property—one latent code: Next, we assume that the InfoGAN is trained with P real SAR images when one of the considered properties (for example, the relative angle of a target with respect to the radar direction) changes. After the learning process, the InfoGAN is able to synthesize the corresponding SAR images, in an ideal case the same as the real original images, with the latent code

c_{1}

, being related to the property change in the particular SAR images. After the learning process has finished, we generate a new set of K latent code values

c_{1} = {[c_{1} (1), c_{1} (2), \dots, c_{1} (K)]}^{T}

. Then, a set of images is generated using the values

c_{1} (k)

,

k = 1, 2, \dots, K

, and random input noises

z_{k}

. The obtained images are denoted by

\begin{matrix} I_{k} = G (z_{k}, c_{1} (k)), k = 1, 2, \dots, K . \end{matrix}

(18)

Then, we use one of the measures (13), (15), or (17) to calculate the measure of properties for each synthesized SAR image from the set. The relative measure of the rotation with respect to the reference image

I_{0}

is calculated using

\begin{matrix} \begin{matrix} δ_{R} (1) & = arg max {r (R_{δ} {H_{T} {I_{0}}}, H_{T} {I_{1}})} \\ δ_{R} (2) & = arg max {r (R_{δ} {H_{T} {I_{0}}}, H_{T} {I_{2}})} \\ \dots \\ δ_{R} (K) & = arg max {r (R_{δ} {H_{T} {I_{0}}}, H_{T} {I_{K}})} \end{matrix} \end{matrix}

(19)

(a) Linear model: For the rough analysis, we consider a linear model for the approximation of the obtained measure of rotation and the latent code used to produce the corresponding image

\begin{matrix} {\hat{δ}}_{R} (k) = v_{1} c_{1} (k) + v_{0}, k = 1, 2, \dots, K . \end{matrix}

(20)

where

v_{0}

and

v_{1}

are two unknown parameters. To estimate them, we can write a matrix form of these equations

\begin{matrix} {\hat{δ}}_{R} = [\begin{matrix} {\hat{δ}}_{R} (1) \\ {\hat{δ}}_{R} (2) \\ ⋮ \\ {\hat{δ}}_{R} (K) \end{matrix}] = [\begin{matrix} c_{1} (1) & 1 \\ c_{1} (2) & 1 \\ ⋮ \\ c_{1} (K) & 1 \end{matrix}] [\begin{matrix} v_{1} \\ v_{0} \end{matrix}] = A V, \end{matrix}

(21)

where

A

is a matrix with a column of latent code and a column of 1, and

V = {[v_{1}, v_{0}]}^{T}

.

Now, we can obtain the optimal parameters

v_{0}

and

v_{1}

by optimizing the following equation:

\begin{matrix} V = arg min ∥ δ_{R} - {\hat{δ}}_{R} ∥_{2}^{2} \end{matrix}

(22)

where

δ_{R}

represents the vector column of the values obtained from (19) and

{\hat{δ}}_{R}

is given by (21). The solution is

\begin{matrix} V = {(A^{T} A)}^{- 1} A^{T} {\hat{δ}}_{R} . \end{matrix}

(23)

After the relation between the considered property (rotation) and latent code is established, we can now use it to calculate a satisfying value of the latent code

c_{1}

to produce a SAR image,

I_{d}

, for any desired rotation angle

δ_{R d}

,

\begin{matrix} c_{1} = \frac{δ_{R d} - v_{0}}{v_{1}}, \end{matrix}

(24)

as

I_{d} = G (z, c_{1})

.

The linear model is very simple; however, as will be seen from the experiments, it can be used as a rough model only. Namely, the true relation between rotation and latent code is nonlinear, governed by the nonlinearity in the InfoGAN.

(b) Nonlinear model: From the experiments, we concluded that a general form of a function (following the sigmoid function at the output of the neural network) is quite an appropriate model for the relation between the physical properties of the SAR image and the latent codes. The sigmoid follows the tanh function. A nonlinear model of, for example, rotation, with one latent code

c_{1}

could be written as follows:

\begin{matrix} {\hat{δ}}_{R} (k) = v_{3} tanh (v_{1} c_{1} (k) + v_{2}) + v_{0}, k = 1, 2, \dots, K . \end{matrix}

(25)

The solution to the minimization problem (22) cannot be obtained in an analytic form in this case. However, the tools for numerical solutions to this problem are well developed in all programming environments. Therefore, we may say that the values of

V = {[v_{0}, v_{1}, v_{2}, v_{3}]}^{T}

can be obtained from a set of k nonlinear equations in (25). After the model coefficients,

V

, are found, we can again easily find a latent code

c_{1}

to generate a SAR image,

I_{d}

, with a desired parameter

δ_{R d}

, as

\begin{matrix} c_{1} = \frac{1}{v_{1}} {tanh}^{- 1} (\frac{δ_{R d}}{v_{3}} - v_{0}) - v_{2} . \end{matrix}

(26)

as

I_{d} = G (z, c_{1})

.

One property—two latent codes: In SAR images, after the basic property change, we can expect other changes to occur as well (such as changes in the reflectivity and visibility of scatters). This means that, even with one geometric property change, we may still use more than one latent code. Now, we extend the analysis to two latent codes

c_{1}

and

c_{2}

. The linear model for a two-latent-code space can be expressed as

\begin{matrix} {\hat{δ}}_{R} (k_{1}, k_{2}) = v_{2} c_{2} (k_{2}) + v_{1} c_{1} (k_{1}) + v_{0}, k_{1}, k_{2} = 1, 2, \dots, K . \end{matrix}

If we form a stacked column vector

{\hat{δ}}_{R}

with

K^{2}

elements

{\hat{δ}}_{R} (k_{1}, k_{2})

, a

K^{2} \times 3

matrix

A

with rows

[c_{2} (k_{2}), c_{1} (k_{1}), 1]

, and the column vector of unknown coefficients

V = {[v_{2}, v_{1}, v_{0}]}^{T}

, then the solution is again obtained in the form

V = {(A^{T} A)}^{- 1} A^{T} {\hat{δ}}_{R}

.

In this case, the latent code values for a given property, for example, rotation

δ_{R d}

, are not unique since all combinations of the latent codes are along the line

v_{2} c_{2} + v_{1} c_{1} = v_{0} - δ_{R d}

(27)

in the

c_{1}

-

c_{2}

plane, which will produce the same desired rotation

δ_{R d}

. The desired rotation can be obtained by fixing one latent code,

c_{1}

or

c_{2}

, and calculating the other latent code value.

For two latent codes, the nonlinear model is of the form

\begin{matrix} {\hat{δ}}_{R} (k_{1}, k_{2}) = v_{4} tanh (v_{1} c_{1} (k_{1}) + v_{2} c_{2} (k_{2}) + v_{3}) + v_{0}, \\ k_{1}, k_{2} = 1, 2, \dots, K \end{matrix}

(28)

The optimization of parameters

v_{4}

,

v_{3}

,

v_{2}

,

v_{1}

, and

v_{0}

is conducted using common nonlinear fitting tools. The line for a desired

δ_{R d}

is obtained in the form

\begin{matrix} v_{1} c_{1} + v_{2} c_{2} = {tanh}^{- 1} (\frac{δ_{R d} - v_{0}}{v_{4}}) . \end{matrix}

(29)

Again, a desired

δ_{R d}

can be achieved with all pairs of

(c_{1}, c_{2})

on the previous line.

In the nonlinear model, we further introduce a quadratic term in the argument of the tanh function as

\begin{matrix} δ_{R} (k_{1}, k_{2}) = v_{7} tanh (P_{R} (c_{1} (k), c_{2} (k_{2})) + v_{0}, \\ k_{1}, k_{2} = 1, 2, \dots, K . \end{matrix}

(30)

where

P_{R} (c_{1} (k), c_{2} (k_{2})) = v_{1} c_{1}^{2} (k_{1}) + v_{2} c_{2}^{2} (k_{2}) + v_{3} c_{1} (k_{1}) c_{2} (k_{2}) + v_{4} c_{1} (k), + v_{5} c_{2} (k) + v_{6}

,

k_{1}, k_{2} = 1, 2, \dots, K

. For a desired

δ_{R d}

, (

c_{1}

,

c_{2}

) should be satisfied by the following relation

\begin{matrix} P_{R} (c_{1}, c_{2}) = {tanh}^{- 1} (\frac{δ_{R d - v_{0}}}{v_{7}}) \end{matrix}

(31)

meaning all combinations of the latent codes are along a quadratic form line. Namely, (31) is a general quadratic equation, producing conic sections (circles, ellipses, parabolas, and hyperbolas) in the

c_{1}

-

c_{2}

plane, depending on the specific parameter

v_{0}, v_{1}, v_{2}, \dots, v_{7}

values.

Two properties—two latent codes: For a simultaneous change in two properties, we will use two codes and a nonlinear model. In the nonlinear model, we will use a linear argument form of the tanh function and a quadratic argument of this function. In the case of the linear argument, we will use the model

\begin{matrix} δ_{R} (k_{1}, k_{2}) & = v_{4} tanh (v_{1} c_{1} (k_{1}) + v_{2} c_{2} (k_{2}) + v_{3}) + v_{0}, \\ δ_{S} (k_{1}, k_{2}) & = v_{9} tanh (v_{6} c_{1} (k_{1}) + v_{7} c_{2} (k_{2}) + v_{8}) + v_{5}, \end{matrix}

(32)

The quadratic argument model is of the form

\begin{matrix} δ_{R} (k_{1}, k_{2}) & = v_{7} tanh (P_{R} (c_{1} (k), c_{2} (k_{2})) + v_{0}, \end{matrix}

(33)

\begin{matrix} δ_{S} (k_{1}, k_{2}) & = v_{15} tanh (P_{S} (c_{1} (k), c_{2} (k_{2})) + v_{8}, \\ k_{1}, k_{2} = 1, 2, \dots, K, \end{matrix}

(34)

where the polynomial arguments for the two properties are defined by

\begin{matrix} P_{R} (c_{1} (k_{1}), c_{2} (k_{2})) = & v_{1} c_{1}^{2} (k_{1}) + v_{2} c_{2}^{2} (k_{2}) + v_{3} c_{1} (k_{1}) c_{2} (k_{2}) \\ + v_{4} c_{1} (k_{1}) + v_{5} (c (k 2)) + v_{6}, \end{matrix}

(35)

\begin{matrix} P_{S} (c_{1} (k_{1}), c_{2} (k_{2})) = & v_{9} c_{1}^{2} (k_{1}) + v_{10} c_{2}^{2} (k_{2}) + v_{11} c_{1} (k_{1}) c_{2} (k_{2}) \\ + v_{12} c_{1} (k_{1}) + v_{13} (c (k 2)) + v_{14}, \end{matrix}

(36)

for

k_{1}, k_{2} = 1, 2, \dots, K

. These two systems are independently solved for the corresponding sets of coefficients in the model.

In this case, the desired SAR image is generated at the intersection of the lines producing the desired rotation,

δ_{R d}

, and scaling,

δ_{S d}

, since for each of them we get the corresponding lines as in (29) and (31).

All the previous setups will be illustrated and explained in detail in the next section dealing with experimental results. It is worth noting that motion error is a key problem in the practical application of SAR image formation [41]. Specifically, SAR images will be unfocused or blurred if there are motion errors. Here, we clarify that the relation between motion error and latent codes is beyond the scope of this paper because (1) motion error is a complex issue that could be too difficult for one or two latent codes to capture its physical regulation; (2) motion error is also difficult to gauge numerically while the objective of this paper is to provide a numerical interpretation of the relation between properties (i.e., the properties should be gauged numerically and easily) and latent codes. Nonetheless, it is still an important issue worth studying in the future and could be feasible to interpret by introducing more smart estimators and regularization.

4. Experiments

Dataset: In our experiments, four kinds of datasets are utilized as shown in Figure 4 and Table 2:

Simulated SAR images: This dataset contains SAR images produced by a simulation model, retaining the scattering characteristics with rotation, translation, and scaling.
Semi-simulated SAR images: In this dataset, real images are manually rotated, translated, and scaled; thus, it is termed semi-simulated SAR images. It is worth noting the purpose of this dataset without scattering characteristics is to demonstrate the validity of our method in a clear and intuitive manner. The conclusions are also applicable to other datasets.
Real SAR images without background: This dataset concludes SAR images from MSTAR that is a popular and open-access dataset of SAR images. The background of SAR images is removed by self-matching CAM.
Real SAR images with background: This dataset is the same as the above except for the maintained background.

The above four datasets have their specific purposes in the following experiments. The simulated data are used to comprehensively demonstrate the validity of the numerical relation computed by the property estimators because the properties of these images are a priori known. The semi-simulated data provide the images of real objects with precisely defined properties. The third dataset and the fourth dataset are used to test the performance of property estimators on realistic SAR images without property annotations (the estimation of ground-truth property is illustrated in the following contents).

InfoGAN architecture: The generator G contains one fully connected layer and four transposed convolutional layers. The input

z

to the generator is a one-dimensional vector concatenating pure noise and latent codes in the length of

N_{z}

(

N_{z} = N_{N} + N_{C}

), where

N_{N}

and

N_{C}

denote the length of noise and latent codes. Unless specified,

N_{z} = 62

in this paper.

N_{C}

equals the number of classes and latent codes. The discriminator D contains four convolutional layers and one fully connected layer. The classifier Q contains four convolutional layers and two fully connected layers. D and Q share the parameters for all convolutional layers. In our experiments, there are two latent codes at most; thus, two single neurons are set in the output layer of Q. Table 3 and Table 4 show the details of G, D, and Q, respectively. To avoid modifying the InfoGAN’s architecture, we assign a 0 weight to the loss function of the second one of two latent codes when only one latent code is required.

In the following experiments, the simulated images are of size

28 \times 28

pixels, while the real data images are downsampled to this size. The learning process for the InfoGAN lasted about 10 minutes with 10,000 iterations on a laptop computer with a CPU of 3.2 GHz, RAM of 32 GB, and GPU NVIDIA Geforce RTX 3070. Larger images can be processed in the same way with some increase in the computation time. It should be pointed out that 10,000 iterations are set from an empirical observation on the generated SAR images in the training process, as shown in Figure 5. It shows that, in the early stages of training (50 and 500 iterations), the generated images are quite rough even in the basic shape of the object. When the number of iterations reaches 5000, some details are captured but still not perfect. For 10,000 iterations, the details are further completed; thus, we chose 10,000 as the number of iterations. It is worth pointing out that overfitting is an important and challenging problem in the GAN’s training, whereas there are few trustworthy and robust overfitting check algorithms. Generally, it is recognized as an acceptable GAN when the generator can produce visually satisfactory images and the discriminator is not completely fooled by the generator. The InfoGANs in the following experiments are also checked in this manner.

4.1. Simulated SAR Images

The SAR images of a ship are simulated in this experiment. The radar operating frequency

f_{0} = 157

GHz,

T_{r} = 93.75

μs, with 28 pulses and 28 range cells inside a pulse. The target is illuminated from different angles (or the target is rotated) with an angle from

10^{\circ}

to

70^{\circ}

with respect to the line of flight. For the first experiment, only the rotation is considered since it is the most complex property for simulated SAR images as discussed in Section 2.

The InfoGAN described above (Table 3 and Table 4) was trained with only one latent code,

c_{1}

, activated. For the beginning, only 13 training images (

5^{\circ}

step) were used to train the InfoGAN. The reason why we set 13 is to demonstrate that the InfoGAN’s continuous latent code can capture the trend of how properties change with a limited number of training images. In fact, we started from thousands of training samples and succeeded in manipulating the properties. Then, we gradually reduced the number of training images to seek the minimum number of training images for obtaining acceptable results. Finally, we found about 13 is basically enough for this rotation range in this dataset. Using a small number of images to train the InfoGAN increases the practical value of this method. Therefore, we first set only a few samples (13 SAR images) for training to show that the InfoGAN can learn the relation between properties and latent codes from a limited number of training samples. After the InfoGan was trained, we tested various values of

c_{1}

and generated new SAR images. The resulting images covered almost the whole rotation angle range. This means that some rotation angles not appearing in training can be synthesized by manipulating the latent code

c_{1}

values with examples as shown in Figure 6.

For a detailed analysis of the relation between the rotation angle,

δ_{R}

, and the latent code,

c_{1}

, the number of training images was increased to 121 within the same range from

10^{\circ}

to

70^{\circ}

with respect to the line of flight.

After the InfoGAN was trained, we generated a set of images corresponding to the various values of the latent code,

c_{1} (1), \dots, c_{1} (K), K = 30

, uniformly sampled from the interval

[- 1.5, 1.5]

. After the SAR images were synthesized using these latent code values, the rotation angles,

δ_{R} (k)

,

k = 1, 2, \dots, K

, were measured for the obtained SAR images with each latent code, using (19), and the parameters

V

of a linear and nonlinear model were calculated by Equation (23) or solving the system (25), respectively. The liner model solution is shown in the Figure 7 (top-left) with a green line, while the measured angles

δ_{R} (k)

are given by dots. This panel shows that the rotation angle changes in an approximately linear way with respect to

c_{1}

. A direct comparison of the measured angle,

δ_{R} (k)

, and the estimated angle by a linear model,

{\hat{δ}}_{R} (k)

, is shown in Figure 7 (bottom-left). The procedure was repeated with the nonlinear model (25) and the corresponding results are shown Figure 7 (top-right) and Figure 7 (bottom-right). It is clear that the nonlinear model performs better than the linear model, which will be even more evident in the next experiments.

Finally, the model was tested with four desired rotation angles,

δ_{R d} = 21 . 67^{\circ}

,

33 . 33^{\circ}

,

45 . 33^{\circ}

, and

56 . 67^{\circ}

. The latent code values,

c_{1}

, for these rotations were calculated using (26). Then, the InfoGAN produced the synthesized SAR images, shown in Figure 7 (bottom row). The estimated rotations

δ_{R} (k)

were obtained from (19). They are within a few degrees of margin with respect to the desired ones.

4.2. Real Object from a SAR Image with Simulated Properties

After the simulated SAR examples, before a real data example, as an intermediate step, we shall consider a SAR image from the real dataset MSTAR [42] (a popular public SAR image dataset, which will be elaborated in the next subsection), but to fully control the transformations, we will produce new images by rotating, scaling, and shifting the assumed real SAR image. Unless otherwise specified, the background in each SAR image has been removed before all experiments by using self-matching CAM [43,44]. Recall that geometrical transformations will be, in general, referred to the properties. As in Section 3, we set three cases for the considered images and the InfoGAN: (1) one property—one latent code; (2) one property—two latent codes; (3) two properties—two latent codes. Here, we particularly clarify that this kind of manual rotation/translation/scaling is different in scattering properties in real scenarios. The purpose for which we set this toy data is to show how latent codes affect geometric properties in a clear and intuitive manner. The results of real properties in real SAR images are analyzed in the next subsection.

4.2.1. One Property—One Latent Code

All three properties were considered separately: for rotation, a real SAR image was analytically rotated from

- 30

to 30 degrees to obtain 601 images; for translation, the target in a real image was translated from

- 6

to 6 pixels from the original position to obtain 151 images; for scaling, the target in a real image was scaled from

0.5

to 2 times of the original size to obtain 301 images. After the InfoGAN was trained independently with three datasets, respectively (in three separate experiments), we synthesized the new images corresponding to the various values of the latent code,

c_{1} (1), \dots, c_{1} (K), K = 30

, uniformly sampled from the interval

[- 1.0, 1.0]

for each property. Then, the properties,

δ_{R}

,

δ_{S}

, and

δ_{A}

can be measured by (19) and the estimated properties,

{\hat{δ}}_{R}

,

{\hat{δ}}_{S}

, and

{\hat{δ}}_{A}

, can be calculated using (20) and (25). The comparison of the measured properties and estimated properties shows that the nonlinear estimator performs better than the linear estimator in all cases, especially for rotation (top-right) and scaling (bottom-right), as shown in Figure 8. For each case, we synthesized SAR images for four desired

δ_{R d}

,

δ_{S d}

, and

δ_{A d}

, respectively, using

c_{1}

calculated by (26). The estimated properties of the synthesized SAR images,

δ_{R}

,

δ_{S}

, and

δ_{A}

, are measured by (19). We can see that the agreement is good in all considered cases.

4.2.2. One Property—Two Latent Codes

Now, we introduce two latent codes

c_{1}

and

c_{2}

to train the InfoGAN with input images exhibiting one-property variations in order to check whether one property will remain within one latent code or will propagate to the other latent code as well. We use completely the same data as in Section 4.2.1, i.e., the only difference is that two latent codes are considered here. Taking rotation, for instance, we have generated 900 images with

δ_{R} (k_{1}, k_{2})

,

k_{1}, k_{2} = 1, 2, \dots, 30

, from the InfoGAN trained with both

c_{1}

and

c_{2}

activated. Figure 9 reveals that the value of a specific property is spread over the available latent codes and therefore is determined by multiple pairs of

c_{1}

and

c_{2}

, because the solution to (31) is not unique, as discussed in Section 3.

To show this relation vividly, we generated several SAR images by using some selected values of

c_{1}

and

c_{2}

, as shown in Figure 9 (bottom-right). In this panel, consisting of

3 \times 3

images, the first and the second image in the top row are with different

c_{1}

and

c_{2}

but both result in the same

δ_{R} = - 20^{\circ}

. In comparison, the third one in the top row shows

δ_{R} = 25^{\circ}

with

c_{1} = - 0.5

and

c_{2} = 0.0

. This comparison further demonstrates the solution to (27) is not unique. This conclusion is also applicable to

δ_{S}

and

δ_{A}

as shown in the second and the third row; thus, it is feasible to retain or change any property by manipulating

c_{1}

and

c_{2}

. Finally, the properties measured by (19) and the estimated properties using (30) are compared in Figure 10 to validate the performance of the estimator (only the nonlinear model is considered because the relation between one property and two latent codes is obviously much more complex than the linear model). The results show that

{\hat{δ}}_{R}

,

{\hat{δ}}_{S}

, and

{\hat{δ}}_{A}

, calculated by (30), basically match the

δ_{R}

,

δ_{S}

, and

δ_{A}

, respectively, even though the accuracy is slightly lower than in Figure 9.

4.2.3. Two Properties—Two Latent Codes

In this experiment, we consider two entangled properties emerging in each training SAR image simultaneously. Firstly, we generate three combinations of training data: rotation–translation, rotation–scaling, and translation–scaling. For rotation–translation, there are 3721 training images with 61 rotation angles uniformly dividing

[- 60^{\circ}, 60^{\circ}]

and 61 translation pixels uniformly dividing

[- 6, 6]

. For rotation–scaling, there are 1891 training images with 31 scaling uniformly dividing

[0.5, 2]

and 61 rotation angles uniformly dividing

[- 60^{\circ}, 60^{\circ}]

. For translation–scaling, there are 3751 training images with 121 translation pixels uniformly dividing

[- 6, 6]

and 31 scaling uniformly dividing

[0.5, 2]

. We have generated 900 images for each property using different combinations of

c_{1}

and

c_{2}

and show their relation in Figure 11 and Figure 12. Next, we conduct an experiment to visualize how to edit the entangled properties by manipulating

c_{1}

and

c_{2}

. In each case, we select 9 combinations of

c_{1}

and

c_{2}

in intersections of two contour lines (green dots in the bottom-left of Figure 11 and Figure 12). The synthesized SAR images, by using these (

c_{1}

,

c_{2}

) in the bottom-right, show that if

c_{1}

and

c_{2}

are along one curve only the property corresponding to this curve will be changed while the other property remains still. Furthermore, given two desired properties, for example,

δ_{R d}

and

δ_{S d}

, the satisfying combination of

c_{1}

and

c_{2}

is unique in a certain range (the green dots). Thus, it is feasible to precisely edit either a single property or two properties simultaneously by manipulating

c_{1}

and

c_{2}

as we expected.

4.3. Real SAR Images with Suppressed Background

The real-measured dataset is a MSTAR dataset with SAR images of ground stationary targets released by the MSTAR program supported by the Defense Advanced Research Projects Agency (DARPA) of the United States [42]. The MSTAR dataset includes 2536 SAR images for training and 2636 for testing with 10 classes of vehicles. We chose 60 images of 2S1 (self-propelled artillery) with rotation angles (with respect to one called the reference SAR image) from

[- 34^{\circ}

,

44^{\circ}]

. The images are downsampled to the size of

28 \times 28

pixels.

After the InfoGAN was trained with only

c_{1}

activated, the same experiments as for simulated SAR images were conducted, as shown in Figure 13. We can see that the latent code

c_{1}

, after the training process, is associated with the SAR image rotation. The modeling of the rotation angle and the latent code was performed using the linear and nonlinear model (Figure 13, top row). While the linear model is simple, the nonlinear model fits the data better. Finally, the model was used to synthesize new SAR images for a given desired rotation angle,

δ_{R d}

. The obtained images are shown in the bottom row of Figure 13 for four desired angles. The estimated rotation angles,

{\hat{δ}}_{R}

, of the SAR images synthesized with

c_{1}

calculated by (26), are given in this panel as well, and we can see that they are close to the desired ones,

δ_{R d}

.

4.4. SAR Images with Background

Furthermore, we conducted the same experiments with real SAR images, but not removing the background, and the results are similar to the previous experiment, as shown in Figure 13, where the measured and modeled rotation angle is shown (with respect to the reference SAR image). Four synthesized SAR images with the desired rotation, controlled by the latent code values, are given in Figure 13 (bottom). The experiment with the included background was repeated with two latent codes in the InfoGAN. Some synthesized SAR images are shown in Figure 14. As can be seen from this figure, the latent code

c_{1}

controls the rotation, while the latent code

c_{2}

, in this case, takes control over the background intensity. Thus, if we want to obtain images with suppressed backgrounds, we can use high values of

c_{2}

.

4.5. Robustness and Generalization Analysis on Other SAR Datasets

Here, we introduce another dataset, AIR-SARShip-1.0 (released by the Chinese Academy of Sciences and University of Chinese Academy of Sciences), to further demonstrate the robustness and generalization of the proposed method. AIR-SARShip-1.0 comprises 31 images from Gaofen-3 satellite SAR images, including harbors, islands, reefs, and the sea surface in different conditions. The backgrounds include various scenarios, such as the nearshore and open sea. We selected a SAR image indexed as 05_8_21 from the AIR-SARShip-1.0 and cropped a slice of a ship target as a baseline image, shown in Figure 15. Then we manually imposed three properties to the baseline image, as in Section 4. Specifically, there are 30 images uniformly dividing the rotation range (from

1^{\circ}

to

30^{\circ}

), 15 images uniformly dividing the translation range (from 1 to 15 pixels), and 15 images uniformly dividing the scaling range (from 1 to

1.8

), as shown in Figure 15.

Next, we implemented the same experiments on these three groups of SAR images to interpret the relation between each property and the latent code,

c_{1}

. Figure 16 presents the synthesized SAR images and estimated values of the properties. A similar conclusion can be drawn to the previous experiments, which further proves the robustness and generalization of our method on different datasets.

5. Discussion

The experiments were carried out with four datasets: simulated images, real objects from SAR images with simulated properties, SAR images with suppressed backgrounds, and SAR images with backgrounds. In the first experimental setup, the results demonstrate that the relation between a single latent code and one property matches a sigmoid function. In the second case, the results show that quadratic terms in the argument are required to cater to more complex relations when two latent codes are considered. The third and fourth experimental setups further demonstrate such a conclusion is applicable to real SAR images. Therefore, it is possible to synthesize SAR images of these properties by manipulating latent codes according to such a relation interpreted by our proposed method.

6. Conclusions

This paper sheds some light on interpreting the relation between different properties of SAR images and latent codes in the InfoGAN. In general, the unclear relation between properties and latent codes is modeled in a numerical manner by proposing property estimators. Specifically, the trend of how properties vary with latent codes can be measured mathematically, i.e., the corresponding property can be computed regarding a specific collocation of latent codes and the latent codes can also be computed provided some desired properties. In this case, it is feasible to produce a large scale of photo-realistic SAR images with numerical properties by manipulating the latent codes in the InfoGAN, which could alleviate the shortage of data for deep learning techniques with SAR images.

Author Contributions

Conceptualization, Z.F.; methodology, Z.F. and M.D.; software, Z.F., M.D. and L.S.; validation, M.Z.; resources, H.J. and X.Z.; writing—original draft preparation, Z.F.; visualization, Z.F., X.Z. and X.C.; supervision, M.Z., H.J. and L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers 62276204 and 61871301. The APC was funded by Xianda Zhou.

Data Availability Statement

The simulated data of the ship model were provided by the University of Montenegro, which is not open access. The MSTAR data (latest version) are an open-access SAR image set, which can be downloaded from the website https://www.sdms.afrl.af.mil/index.php?collection=mstar (accessed on 3 August 2022). The AIR-SARShip-1.0 dataset is an open-access dataset, which can be downloaded at http://radars.ie.ac.cn/web/data/getData?dataType=SARDataset (accessed on 1 December 2019).

Acknowledgments

The authors are thankful for the editors and reviewers’ help in improving the quality of this paper.

Conflicts of Interest

The authors declare no conflict of interests.

References

Ender, J.; Amin, M.G.; Fornaro, G.; Rosen, P.A. Recent Advances in Radar Imaging. IEEE Signal Process. Mag. 2014, 31, 15. [Google Scholar] [CrossRef]
Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A Tutorial on Synthetic Aperture Radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef] [Green Version]
Song, L.; Bai, B.; Li, X.; Niu, G.; Liu, Y.; Zhao, L. Space-Time Varying Plasma Sheath Effect on Hypersonic Vehicle-borne SAR Imaging. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 4527–4539. [Google Scholar] [CrossRef]
Ge, B.; An, D.; Chen, L.; Wang, W.; Feng, D.; Zhou, Z. Ground Moving Target Detection and Trajectory Reconstruction Methods for Multi-Channel Airborne Circular SAR. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 2900–2915. [Google Scholar] [CrossRef]
Berizzi, F.; Martorella, M.; Giusti, E. Radar Imaging for Maritime Observation; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
Popović, V.; Djurović, I.; Stanković, L.; Thayaparan, T.; Daković, M. Autofocusing of SAR Images Based on Parameters Estimated from the PHAF. Signal Process. 2010, 90, 1382–1391. [Google Scholar] [CrossRef]
Franceschetti, G.; Guida, R.; Iodice, A.; Riccio, D.; Ruello, G. Efficient Simulation of Hybrid Stripmap/Spotlight SAR Raw Signals from Extended Scenes. IEEE Trans. Geosci. Remote Sens. 2004, 42, 2385–2396. [Google Scholar] [CrossRef]
Ding, B.; Wen, G.; Huang, X.; Ma, C.; Yang, X. Data Augmentation by Multilevel Reconstruction Using Attributed Scattering Center for SAR Target Recognition. IEEE Geosci. Remote Sens. Lett. 2017, 14, 979–983. [Google Scholar] [CrossRef]
Diederik, P.; Kingma, M.W. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Qian, D.; Cheung, W.K. Learning Hierarchical Variational Autoencoders With Mutual Information Maximization for Autoregressive Sequence Modeling. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 1949–1962. [Google Scholar] [CrossRef]
Jin, F.; Sengupta, A.; Cao, S. mmFall: Fall Detection Using 4-D mmWave Radar and a Hybrid Variational RNN AutoEncoder. IEEE Trans. Autom. Sci. Eng. 2022, 19, 1245–1257. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Adv. Neural Inf. Process. Syst. 2014, 63, 139–144. [Google Scholar]
Doi, K.; Sakurada, K.; Onishi, M.; Iwasaki, A. GAN-Based SAR-to-Optical Image Translation with Region Information. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 2069–2072. [Google Scholar]
Du, S.; Hong, J.; Wang, Y.; Qi, Y. A High-Quality Multicategory SAR Images Generation Method With Multiconstraint GAN for ATR. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Liu, Q.; Zhou, H.; Xu, Q.; Liu, X.; Wang, Y. PSGAN: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10227–10242. [Google Scholar] [CrossRef]
Xie, W.; Cui, Y.; Li, Y.; Lei, J.; Du, Q.; Li, J. HPGAN: Hyperspectral Pansharpening Using 3-D Generative Adversarial Networks. IEEE Trans. Geosci. Remote Sens. 2021, 59, 463–477. [Google Scholar] [CrossRef]
Nichol, A.; Dhariwal, P.; Ramesh, A.; Shyam, P.; Sishkin, P.; McGrew, B.; Sutskever, I.; Chen, M. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. arXiv 2021, arXiv:2112.10741v3. [Google Scholar]
Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv 2022, arXiv:2204.06125. [Google Scholar]
Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.; Ghasemipour, S.K.S.; Ayan, B.K.; Madhavi, S.S.; Lopez, R.G.; et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv 2022, arXiv:2205.11487. [Google Scholar]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 10684–10695. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Pan, Z.; Yu, W.; Yi, X.; Khan, A.; Yuan, F.; Zheng, Y. Recent Progress on Generative Adversarial Networks (GANs): A Survey. IEEE Access 2019, 7, 36322–36333. [Google Scholar] [CrossRef]
Yang, C.; Shen, Y.; Zhou, B. Semantic hierarchy emerges in deep generative representations for scene synthesis. Int. J. Comput. Vis. 2021, 129, 1451–1466. [Google Scholar] [CrossRef]
Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016; p. 29. [Google Scholar]
Schwegmann, C.P.; Kleynhans, W.; Salmon, B.P.; Mdakane, L.W.; Meyer, R.G. Synthetic Aperture Radar Ship Discrimination, Generation and Latent Variable Extraction using Information Maximizing Generative Adversarial Networks. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2263–2266. [Google Scholar]
Martorella, M.; Giusti, E.; Demi, L.; Zhou, Z.; Cacciamano, A.; Berizzi, F.; Bates, B. Target Recognition by Means of Polarimetric ISAR Images. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 225–239. [Google Scholar] [CrossRef]
Wu, Q.; Zhang, Y.D.; Amin, M.G.; Himed, B. High-resolution Passive SAR Imaging Exploiting Structured Bayesian Compressive Sensing. IEEE J. Sel. Top. Signal Process. 2015, 9, 1484–1497. [Google Scholar] [CrossRef]
Papson, S.; Narayanan, R.M. Classification via the Shadow Region in SAR Imagery. IEEE Trans. Aerosp. Electron. Syst. 2012, 48, 969–980. [Google Scholar] [CrossRef]
Stanković, L.; Brajović, M.; Stanković, I.; Ioana, C.; Daković, M. Reconstruction Error in Nonuniformly Sampled Approximately Sparse Signals. IEEE Geosci. Remote Sens. Lett. 2021, 18, 28–32. [Google Scholar] [CrossRef]
Stanković, L. ISAR Image Analysis and Recovery with Unavailable or Heavily Corrupted Data. IEEE Trans. Aerosp. Electron. Syst. 2015, 51, 2093–2106. [Google Scholar] [CrossRef] [Green Version]
Brisken, S.; Martorella, M.; Mathy, T.; Wasserzier, C.; Worms, J.G.; Ender, J.H. Motion Estimation and Imaging with a Multistatic ISAR System. IEEE Trans. Aerosp. Electron. Syst. 2014, 50, 1701–1714. [Google Scholar] [CrossRef]
Arnous, F.I.; Narayanan, R.M.; Li, B.C. Application of Multidomain Data Fusion, Machine Learning and Feature Learning Paradigms Towards Enhanced Image-based SAR Class Vehicle Recognition. In Proceedings of the Radar Sensor Technology XXV, International Society for Optics and Photonics, Online, 12–17 April 2021; Volume 11742, p. 1174209. [Google Scholar]
Franceschetti, G.; Schirinzi, G. A SAR Processor Based on Two-dimensional FFT Codes. IEEE Trans. Aerosp. Electron. Syst. 1990, 26, 356–366. [Google Scholar] [CrossRef]
Zhang, S.; Pavel, M.S.R.; Zhang, Y.D. Crossterm-free Time-frequency Representation Exploiting Deep Convolutional Neural Network. Signal Process. 2022, 192, 108372. [Google Scholar] [CrossRef]
Belloni, C.; Balleri, A.; Aouf, N.; Le Caillec, J.M.; Merlet, T. Explainability of Deep SAR ATR Through Feature Analysis. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 659–673. [Google Scholar] [CrossRef]
Fahimi, F.; Dosen, S.; Ang, K.K.; Mrachacz-Kersting, N.; Guan, C. Generative Adversarial Networks-Based Data Augmentation for Brain–Computer Interface. IEEE Trans. Neural Networks Learn. Syst. 2021, 32, 4039–4051. [Google Scholar] [CrossRef]
Song, R.; Huang, Y.; Xu, K.; Ye, X.; Li, C.; Chen, X. Electromagnetic Inverse Scattering With Perceptual Generative Adversarial Networks. IEEE Trans. Comput. Imaging 2021, 7, 689–699. [Google Scholar] [CrossRef]
O’Reilly, J.A.; Asadi, F. Pre-trained vs. Random Weights for Calculating Fréchet Inception Distance in Medical Imaging. In Proceedings of the 2021 13th Biomedical Engineering International Conference (BMEiCON), Ayutthaya, Thailand, 19–21 November 2021; pp. 1–4. [Google Scholar]
Sekar, A.; Perumal, V. CFC-GAN: Forecasting Road Surface Crack Using Forecasted Crack Generative Adversarial Network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21378–21391. [Google Scholar] [CrossRef]
Chen, S.J.; Shen, H.L. Multispectral Image Out-of-Focus Deblurring Using Interchannel Correlation. IEEE Trans. Image Process. 2015, 24, 4433–4445. [Google Scholar] [CrossRef] [PubMed]
Pu, W. SAE-Net: A Deep Neural Network for SAR Autofocus. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
The Sensor Data Management System, MSTAR Database. Available online: https://www.sdms.afrl.af.mil/index.php?collection=mstar (accessed on 3 August 2022).
Feng, Z.; Zhu, M.; Stanković, L.; Ji, H. Self-matching CAM: A Novel Accurate Visual Explanation of CNNs for SAR Image Interpretation. Remote Sens. 2021, 13, 1772. [Google Scholar] [CrossRef]
Feng, Z.; Ji, H.; Stanković, L.; Fan, J.; Zhu, M. SC-SM CAM: An Efficient Visual Interpretation of CNN for SAR Images Target Recognition. Remote Sens. 2021, 13, 4139. [Google Scholar] [CrossRef]

Figure 1. Synthetic aperture radar setup with various relative positions of the radar and the target. The mechanism of SAR imaging (left). The emergence of scaling of the target in a SAR image (middle). The emergence of rotation and translation of the target in a SAR image (right).

Figure 2. The architecture of GAN and InfoGAN. The basic GAN is obtained by excluding the red blocks and latent codes

c

.

Figure 2. The architecture of GAN and InfoGAN. The basic GAN is obtained by excluding the red blocks and latent codes

c

.

Figure 3. The comparison of generated SAR images between GAN and InfoGAN. The rotation angles are not controllable in GAN (left). The rotation angles are highly related to the latent code,

c_{1}

(right).

Figure 3. The comparison of generated SAR images between GAN and InfoGAN. The rotation angles are not controllable in GAN (left). The rotation angles are highly related to the latent code,

c_{1}

(right).

Figure 4. Illustration of SAR image samples from four datasets considered in the experimental setup: simulated SAR images with different viewing angles (top row); a radar image from the MSTAR dataset, with suppressed background, rotated for various angles (second row); SAR images from MSTAR dataset corresponding to different viewing angles of the same target, with suppressed background (third row); SAR images from MSTAR dataset corresponding to different viewing angles with a background (bottom row).

Figure 5. Some images generated by InfoGAN with different numbers of iterations in the training process: 50 iterations (first); 500 iterations (second); 5000 iterations (third); 10,000 iterations (fourth).

Figure 6. Real and synthesized SAR images for various rotation angles. The first, fourth, and seventh images (marked by red squares) are SAR images used for the training of the InfoGAN, while the second, third, fifth, and sixth images are the SAR images synthesized by the InfoGAN with the latent code values

c_{1} = - 0.8, - 0.6, 0.3, 0.5

, respectively.

Figure 6. Real and synthesized SAR images for various rotation angles. The first, fourth, and seventh images (marked by red squares) are SAR images used for the training of the InfoGAN, while the second, third, fifth, and sixth images are the SAR images synthesized by the InfoGAN with the latent code values

c_{1} = - 0.8, - 0.6, 0.3, 0.5

, respectively.

Figure 7. The results for the estimated and modeled rotation angle for the SAR images synthesized by the InfoGAN trained with simulated SAR images. The rotation angles in SAR images as a function of the latent code,

c_{1}

, measured by cross-correlation (black dots) and the estimated values with a linear model (green line) (top-left). The rotation angles in SAR images as a function of the latent code,

c_{1}

, measured by cross-correlation (black dots) and the estimated values with a nonlinear model (yellow line) (top-right). Comparison of the measured angle values by cross-correlation with the ones obtained using the linear model (blue dots), where the red line denotes the ideal case that

{\hat{δ}}_{R} (k) = δ_{R} (k)

for all k (middle-left). Comparison of the estimated angle values with the ones obtained using the nonlinear model (blue dots) (middle-right). The synthesized SAR images using

c_{1}

calculated by (26) for four desired rotation angles,

δ_{R d} = 21 . 67^{\circ}, 33 . 33^{\circ}, 45 . 33^{\circ}, 56 . 67^{\circ}

(bottom row). The estimated rotations of the synthesized SAR images,

δ_{R} (k)

are calculated using (19). They are close to the desired ones.

Figure 7. The results for the estimated and modeled rotation angle for the SAR images synthesized by the InfoGAN trained with simulated SAR images. The rotation angles in SAR images as a function of the latent code,

c_{1}

, measured by cross-correlation (black dots) and the estimated values with a linear model (green line) (top-left). The rotation angles in SAR images as a function of the latent code,

c_{1}

, measured by cross-correlation (black dots) and the estimated values with a nonlinear model (yellow line) (top-right). Comparison of the measured angle values by cross-correlation with the ones obtained using the linear model (blue dots), where the red line denotes the ideal case that

{\hat{δ}}_{R} (k) = δ_{R} (k)

for all k (middle-left). Comparison of the estimated angle values with the ones obtained using the nonlinear model (blue dots) (middle-right). The synthesized SAR images using

c_{1}

calculated by (26) for four desired rotation angles,

δ_{R d} = 21 . 67^{\circ}, 33 . 33^{\circ}, 45 . 33^{\circ}, 56 . 67^{\circ}

(bottom row). The estimated rotations of the synthesized SAR images,

δ_{R} (k)

are calculated using (19). They are close to the desired ones.

Figure 8. The results for the measured and modeled rotation (top), translation (middle), and scaling (bottom) for the SAR images synthesized by the InfoGAN trained with the second dataset. In each case, we show the relation between

c_{1}

and the considered property (dots), approximations using linear (green line in left subplots) and nonlinear models (yellow line in right subplots), and synthesized SAR images using

c_{1}

calculated by (26) for four desired

δ_{R d}

,

δ_{S d}

, and

δ_{A d}

. The estimated properties of the synthesized SAR images,

δ_{R}

,

δ_{S}

, and

δ_{A}

, are measured by (19). They are close to the desired ones.

Figure 8. The results for the measured and modeled rotation (top), translation (middle), and scaling (bottom) for the SAR images synthesized by the InfoGAN trained with the second dataset. In each case, we show the relation between

c_{1}

and the considered property (dots), approximations using linear (green line in left subplots) and nonlinear models (yellow line in right subplots), and synthesized SAR images using

c_{1}

calculated by (26) for four desired

δ_{R d}

,

δ_{S d}

, and

δ_{A d}

. The estimated properties of the synthesized SAR images,

δ_{R}

,

δ_{S}

, and

δ_{A}

, are measured by (19). They are close to the desired ones.

Figure 9. The relation between each property and two latent codes. The relation between rotation angle

δ_{R}

and

c_{1}

,

c_{2}

(top-left). The relation between translation pixels

δ_{S}

and

c_{1}

,

c_{2}

(top-right). The relation between scaling

δ_{A}

and

c_{1}

,

c_{2}

(bottom-left). The synthesized SAR images corresponding to (

c_{1}

,

c_{2}

) labeled below each image except for the original image (marked by red square) (bottom-right). In this panel (bottom-right), the first two images in the top row exhibit the same rotation angle

δ_{R}

with different

c_{1}

and

c_{2}

, i.e.,

c_{1} = 0.0

,

c_{2} = - 1.0

and

c_{1} = 0.5

,

c_{2} = - 0.5

, both resulting in

- 20^{\circ}

rotation. The third one in the top row shows

δ_{R} = 25^{\circ}

with

c_{1} = - 0.5

and

c_{2} = 0.0

. These figures further demonstrate the solution to (31) is not unique; thus, it is possible to retain or change property by manipulating

c_{1}

and

c_{2}

. This conclusion is also applicable to translation

δ_{S}

and scaling

δ_{A}

, as shown in the second and the third rows (bottom-right).

Figure 9. The relation between each property and two latent codes. The relation between rotation angle

δ_{R}

and

c_{1}

,

c_{2}

(top-left). The relation between translation pixels

δ_{S}

and

c_{1}

,

c_{2}

(top-right). The relation between scaling

δ_{A}

and

c_{1}

,

c_{2}

(bottom-left). The synthesized SAR images corresponding to (

c_{1}

,

c_{2}

) labeled below each image except for the original image (marked by red square) (bottom-right). In this panel (bottom-right), the first two images in the top row exhibit the same rotation angle

δ_{R}

with different

c_{1}

and

c_{2}

, i.e.,

c_{1} = 0.0

,

c_{2} = - 1.0

and

c_{1} = 0.5

,

c_{2} = - 0.5

, both resulting in

- 20^{\circ}

rotation. The third one in the top row shows

δ_{R} = 25^{\circ}

with

c_{1} = - 0.5

and

c_{2} = 0.0

. These figures further demonstrate the solution to (31) is not unique; thus, it is possible to retain or change property by manipulating

c_{1}

and

c_{2}

. This conclusion is also applicable to translation

δ_{S}

and scaling

δ_{A}

, as shown in the second and the third rows (bottom-right).

Figure 10. The comparison of three estimated properties

{\hat{δ}}_{R}

,

{\hat{δ}}_{S}

, and

{\hat{δ}}_{A}

, using (30), and the measured ones,

δ_{R}

,

δ_{S}

, and

δ_{A}

, using (19). The relation with

δ_{R}

(dots) and two latent codes,

c_{1}

and

c_{2}

(different colors denote different values of

c_{2}

) (top-left). The

{\hat{δ}}_{R}

is shown with blue lines. They are close to the

δ_{R}

. The comparison of

δ_{R}

and

{\hat{δ}}_{R}

(top-right). The results of

δ_{S}

and

{\hat{δ}}_{S}

are shown in the middle-right and middle-left images, respectively. The results of

δ_{A}

and

{\hat{δ}}_{A}

are shown in the bottom-right and bottom-left images, respectively.

Figure 10. The comparison of three estimated properties

{\hat{δ}}_{R}

,

{\hat{δ}}_{S}

, and

{\hat{δ}}_{A}

, using (30), and the measured ones,

δ_{R}

,

δ_{S}

, and

δ_{A}

, using (19). The relation with

δ_{R}

(dots) and two latent codes,

c_{1}

and

c_{2}

(different colors denote different values of

c_{2}

) (top-left). The

{\hat{δ}}_{R}

is shown with blue lines. They are close to the

δ_{R}

. The comparison of

δ_{R}

and

{\hat{δ}}_{R}

(top-right). The results of

δ_{S}

and

{\hat{δ}}_{S}

are shown in the middle-right and middle-left images, respectively. The results of

δ_{A}

and

{\hat{δ}}_{A}

are shown in the bottom-right and bottom-left images, respectively.

Figure 11. The relation between rotation–translation and two latent codes. The relation between rotation angle

δ_{R}

and

c_{1}

,

c_{2}

(top-left). The relation between translation

δ_{S}

and

c_{1}

,

c_{2}

(top-right). The overlapped curves of the above two contours as well as some selected intersections (green dots) (bottom-left). The synthesized SAR images with (

c_{1}

,

c_{2}

) corresponding to the coordinates of the green dots in the former contour (bottom-right). Here nine collocations of

c_{1}

and

c_{2}

are selected and labeled as a, b, c, d, e, f, g, h, and i in contour maps.

Figure 11. The relation between rotation–translation and two latent codes. The relation between rotation angle

δ_{R}

and

c_{1}

,

c_{2}

(top-left). The relation between translation

δ_{S}

and

c_{1}

,

c_{2}

(top-right). The overlapped curves of the above two contours as well as some selected intersections (green dots) (bottom-left). The synthesized SAR images with (

c_{1}

,

c_{2}

) corresponding to the coordinates of the green dots in the former contour (bottom-right). Here nine collocations of

c_{1}

and

c_{2}

are selected and labeled as a, b, c, d, e, f, g, h, and i in contour maps.

Figure 12. The relation between translation–scaling and two latent codes. The relation between translation angle

δ_{S}

and

c_{1}

,

c_{2}

(top-left). The relation between scaling

δ_{A}

and

c_{1}

,

c_{2}

(top-right). The overlapped curves of the above two contours as well as some selected intersections (green dots) (bottom-left). The synthesized SAR images with (

c_{1}

,

c_{2}

) corresponding to the coordinates of the green dots in the former contour (bottom-right).

Figure 12. The relation between translation–scaling and two latent codes. The relation between translation angle

δ_{S}

and

c_{1}

,

c_{2}

(top-left). The relation between scaling

δ_{A}

and

c_{1}

,

c_{2}

(top-right). The overlapped curves of the above two contours as well as some selected intersections (green dots) (bottom-left). The synthesized SAR images with (

c_{1}

,

c_{2}

) corresponding to the coordinates of the green dots in the former contour (bottom-right).

Figure 13. The results for the estimated and modeled rotation angle for the SAR images synthesized by the InfoGAN trained with real SAR images. The rotation angles in SAR images as a function of the latent code,

c_{1}

, measured by cross-correlation (black dots) and the estimated values with a linear model (green line) (top-left). The rotation angles in SAR images as a function of the latent code,

c_{1}

, measured by cross-correlation (black dots) and the estimated values with a nonlinear model (yellow line) (top-right). The synthesized SAR images using

c_{1}

calculated by (26) for four desired rotation angles,

δ_{R d} = - 20^{\circ}, - 10^{\circ}, 5^{\circ}, 10^{\circ}

(bottom row). The estimated rotations of the synthesized SAR images,

δ_{R} (k)

, are calculated using (19).

Figure 13. The results for the estimated and modeled rotation angle for the SAR images synthesized by the InfoGAN trained with real SAR images. The rotation angles in SAR images as a function of the latent code,

c_{1}

, measured by cross-correlation (black dots) and the estimated values with a linear model (green line) (top-left). The rotation angles in SAR images as a function of the latent code,

c_{1}

, measured by cross-correlation (black dots) and the estimated values with a nonlinear model (yellow line) (top-right). The synthesized SAR images using

c_{1}

calculated by (26) for four desired rotation angles,

δ_{R d} = - 20^{\circ}, - 10^{\circ}, 5^{\circ}, 10^{\circ}

(bottom row). The estimated rotations of the synthesized SAR images,

δ_{R} (k)

, are calculated using (19).

Figure 14. The synthesized SAR images (with background). Two latent codes are used.

Figure 15. Some samples from AIR-SARShip-1.0 dataset. A large-scale SAR image indexed as 05_8_21 in AIR-SARShip-1.0 (left). A slice of one ship (marked by a green box in left subfigure) with three properties manually manipulated (right).

Figure 16. Some experimental results of AIR-SARShip-1.0 dataset. The first, second, and third rows are SAR images produced by InfoGAN with one latent code,

c_{1} \in [- 1, 1]

, corresponding to rotation, translation, and scaling, respectively. The fourth row shows the comparison between measured rotation (left), translation (middle), and scaling (right) by blue dots and corresponding estimated properties (red curve).

Figure 16. Some experimental results of AIR-SARShip-1.0 dataset. The first, second, and third rows are SAR images produced by InfoGAN with one latent code,

c_{1} \in [- 1, 1]

, corresponding to rotation, translation, and scaling, respectively. The fourth row shows the comparison between measured rotation (left), translation (middle), and scaling (right) by blue dots and corresponding estimated properties (red curve).

Table 1. The FID of images generated by two GANs and training images.

Model	FID
GAN	18.74
InfoGAN	17.59

Table 2. The detailed information of each dataset.

Dataset	Property	Spatial Size	Number of Samples
simulated	rotation	$28 \times 28$	$13 / 121$
semi-simulated	rotation	$28 \times 28$	601
semi-simulated	translation	$28 \times 28$	151
semi-simulated	scaling	$28 \times 28$	301
semi-simulated	rotation and translation	$28 \times 28$	3721
semi-simulated	rotation and scaling	$28 \times 28$	1891
semi-simulated	translation and scaling	$28 \times 28$	3751
real without background	rotation	$28 \times 28$	60
real with background	rotation	$28 \times 28$	60

Table 3. The architecture of the generator, G.

Layer	Input Shape	Output Shape	Activation
Fully connected	$N_{z}$	6272
Reshape	6272	$7 \times 7 \times 128$
BatchNormalize	$7 \times 7 \times 128$	$7 \times 7 \times 128$	Sigmoid
TransposedConv2D	$7 \times 7 \times 128$	$14 \times 14 \times 128$
BatchNormalize	$14 \times 14 \times 128$	$14 \times 14 \times 128$	Sigmoid
TransposedConv2D	$14 \times 14 \times 128$	$28 \times 28 \times 64$
BatchNormalize	$28 \times 28 \times 64$	$28 \times 28 \times 64$	Sigmoid
TransposedConv2D	$28 \times 28 \times 64$	$28 \times 28 \times 32$
BatchNormalize	$28 \times 28 \times 32$	$28 \times 28 \times 32$	Sigmoid
TransposedConv2D	$28 \times 28 \times 32$	$28 \times 28 \times 1$	Sigmoid

Table 4. The architecture of the discriminator D and the classifier, Q.

Layer	Input Shape	Output Shape	Activation
Conv2D	$28 \times 28 \times 1$	$14 \times 14 \times 32$	Leaky ReLU
Conv2D	$14 \times 14 \times 32$	$7 \times 7 \times 64$	Leaky ReLU
Conv2D	$7 \times 7 \times 64$	$4 \times 4 \times 128$	Leaky ReLU
Conv2D	$4 \times 4 \times 128$	$4 \times 4 \times 256$	Leaky ReLU
Flatten	$4 \times 4 \times 256$	4096
D: Fully connected	4096	1	Sigmoid
Q: Fully connected	4096	128
Fully connected	128	$N_{C}$	Sigmoid

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, Z.; Daković, M.; Ji, H.; Zhou, X.; Zhu, M.; Cui, X.; Stanković, L. Interpretation of Latent Codes in InfoGAN with SAR Images. Remote Sens. 2023, 15, 1254. https://doi.org/10.3390/rs15051254

AMA Style

Feng Z, Daković M, Ji H, Zhou X, Zhu M, Cui X, Stanković L. Interpretation of Latent Codes in InfoGAN with SAR Images. Remote Sensing. 2023; 15(5):1254. https://doi.org/10.3390/rs15051254

Chicago/Turabian Style

Feng, Zhenpeng, Miloš Daković, Hongbing Ji, Xianda Zhou, Mingzhe Zhu, Xiyang Cui, and Ljubiša Stanković. 2023. "Interpretation of Latent Codes in InfoGAN with SAR Images" Remote Sensing 15, no. 5: 1254. https://doi.org/10.3390/rs15051254

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretation of Latent Codes in InfoGAN with SAR Images

Abstract

1. Introduction

2. Background Knowledge and Motivation

2.1. Basic SAR Principles

2.2. GAN and InfoGAN

3. Methodology

3.1. Property Measurement

3.2. Relation of the Properties and Latent Codes

4. Experiments

4.1. Simulated SAR Images

4.2. Real Object from a SAR Image with Simulated Properties

4.2.1. One Property—One Latent Code

4.2.2. One Property—Two Latent Codes

4.2.3. Two Properties—Two Latent Codes

4.3. Real SAR Images with Suppressed Background

4.4. SAR Images with Background

4.5. Robustness and Generalization Analysis on Other SAR Datasets

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI