Cloud Removal for Optical Remote Sensing Imagery Using Distortion Coding Network Combined with Compound Loss Functions

Zhou, Jianjun; Luo, Xiaobo; Rong, Wentao; Xu, Hao

doi:10.3390/rs14143452

Open AccessArticle

Cloud Removal for Optical Remote Sensing Imagery Using Distortion Coding Network Combined with Compound Loss Functions

¹

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

²

Chongqing Engineering Research Center of Spatial Big Data Intelligent Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(14), 3452; https://doi.org/10.3390/rs14143452

Submission received: 27 May 2022 / Revised: 11 July 2022 / Accepted: 13 July 2022 / Published: 18 July 2022

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Optical remote sensing (RS) satellites perform imaging in the visible and infrared electromagnetic spectrum to collect data and analyze information on the optical characteristics of the objects of interest. However, optical RS is sensitive to illumination and atmospheric conditions, especially clouds, and multiple acquisitions are typically required to obtain an image of sufficient quality. To accurately reproduce surface information that has been contaminated by clouds, this work proposes a generative adversarial network (GAN)-based cloud removal framework using a distortion coding network combined with compound loss functions (DC-GAN-CL). A novel generator embedded with distortion coding and feature refinement mechanisms is applied to focus on cloudy regions and enhance the transmission of optical information. In addition, to achieve feature and pixel consistency, both coherent semantics and local adaptive reconstruction factors are considered in our loss functions. Extensive numerical evaluations on RICE1, RICE2, and Paris datasets are performed to validate the good performance achieved by the proposed DC-GAN-CL in both peak signal-to-noise ratio (PSNR) and visual perception. This system can thus restore images to obtain similar quality to cloud-free reference images, in a dynamic range of over 30 dB. The restoration effect on the coherence of image semantics produced by this technique is competitive compared with other methods.

Keywords:

remote sensing images; cloud removal; generative adversarial network; distortion coding; local adaptive reconstruction

Graphical Abstract

1. Introduction

Earth observation technology has evolved considerably, and remote sensing (RS) images represent the core method of technical analysis. RS is highly applicable to a variety of fields such as agricultural production, ecological construction, military reconnaissance, and geological surveys. However, the reflected light from ground objects captured by optical sensors inevitably undergoes absorption and scattering from the atmosphere. Detected RS data may be partially or even completely obscured by clouds, which impedes the implementation of global surveying and mapping missions, thereby weakening the potential of RS to explore the earth. Currently, approaches for reconstructing cloud-contaminated images have been heavily investigated in this research area and can be categorized as conventional or deep learning-based approaches.

1.1. Conventional Cloud Removal Approaches

Depending on whether ground objects are completely obscured by clouds, conventional cloud removal processing can be further classified into two cases: thin and thick cloud removal. Thin clouds do not entirely block the transmission of electromagnetic waves reflected from the earth. This is a prerequisite for reconstructing distorted regions utilizing spatial correlation, frequency differences, and spectral complementarity between cloudy and non-cloudy regions. Interpolation is a representative technique for cloud removal using spatial characteristics, such as Kriging interpolation [1,2] and neighborhood pixel interpolation [3,4]. Others have used frequency-domain filtering for RS image processing to enhance the spectral information of ground objects in the high-frequency range and suppress clouds in the low-frequency range [5,6,7]. Bands with high cloud transmittance and high correlation with neighboring bands exist in multi-spectral images. A haze-optimized transformation (HOT) method, as proposed by reference [8], aims to correct the band value deviation caused by clouds by virtue of the high correlation of red and blue bands in the clear regions. Nonetheless, these clear regions require manual selection and for this reason, Xu et al. [9] developed a linear cloud removal method using a cirrus band. Simultaneously, these methods [10,11] rely on spectral analysis to remove the thin cloud component from the visible spectrum. In addition, He et al. [12] proposed a dark channel prior (DCP) defogging algorithm, and this work was gradually developed to remove thin clouds [13].

However, if surface radiation is absorbed by thick, concentrated clouds before reaching the sensor and the spectral information from a single image is insufficient to achieve a detailed reconstruction of the ground object, the above methods are no longer applicable. Instead, conventional thick cloud removal methods integrate the complementary advantages of multiple images to supply the missing information. These methods are mainly divided into multi-temporal methods [14,15,16,17,18,19,20,21] and multi-source methods [22,23,24,25]. Multi-temporal methods introduce additional observations from different acquisition times to reassemble RS images; however, such methods require consistent land cover types in all images. Cheng et al. [15] proposed a spatio-temporal Markov to determine the optimal clean pixels of the replaceable cloud in the auxiliary images. Li et al. [18] presented two expanded multitemporal dictionary learning-based algorithms to recover the missing information from MODIS data. Based on the low ranks of multi-temporal image sequences, a discriminative robust principal component analysis (RPCA) model was devised, which assigns penalty weights to cloud pixels and then reconstructs cloudy images [19]. Ji et al. [20] and Lin et al. [21] introduced the sparse function to describe the cloud component and exploit the latent multi-temporal relationship to estimate the missing pixels. Essentially, multi-source methods mostly rely on data obtained from different sensors for a given location to reconstruct RS images. Synthetic aperture radar (SAR) can work in all-time and the radiation waves have a strong penetrating ability. The collected SAR images are rich in spatial detail features and are used as auxiliary data for innovative cloud removal methods. Using the cloud-penetrating properties of SAR signals, Eckardt et al. [22], Li et al. [23], and Zhu et al. [24] combined the intact structural information in SAR images to reconstruct regions with missing optical information. Moreover, Landsat-MODIS data fusion was used to fill the missing holes and be consistent with the context [25].

The above conventional methods have achieved good performance but suffer from some deficiencies. For thin cloud removal methods, due to their reliance on prior assumptions, their modeling performance is limited and it is difficult to moderately remove clouds. Most reconstructions combined with multiple image methods fail to consistently achieve high-quality and timely results, and cannot address the increasing demand for high-quality RS images from cloud removal. Moreover, due to the differences in imaging principles and spatial resolutions, images obtained by different sensors are prone to spectral distortion and texture information loss during the fusion process.

1.2. Deep Learning-Based Cloud Removal Approaches

Nowadays, deep learning has been widely applied to cloud removal in RS images and has significantly improved the cloud removal effect compared with conventional methods. Specifically, restoring cloud-contaminated images simulates a nonlinear mapping from cloud-contaminated images to cloud-free images using deep learning-based methods. The convolution neural network (CNN) has shown impressive performance in image processing tasks and some corresponding methods have been applied to cloud removal. Zhang et al. [26] proposed the unified spatial-temporal-spectral deep convolutional neural network (STSCNN) to remove thick clouds. Similarly, a CNN model based on the U-Net [27] architecture to estimate a thickness coefficient map and achieve cloud removal [28]; however, details of the image are inevitably lost due to the up-sampling and down-sampling operations. In Li et al. [29], residual convolutional and deconvolutional operations were utilized to better preserve useful information. Chen et al. [30] presented the content-texture-spectral CNN (CTS-CNN) to reconstruct regions under clouds in ZY-3 satellite images. Dai et al. [31] used a novel gated convolution to process cloud pixels and clear pixels separately, achieving consistency in deep and shallow features between global and local areas.

This is a breakthrough in cloud removal and uses the powerful image generation capabilities of the generative adversarial network (GAN) to reconstruct the cloud-contaminated regions. The integration of the CNN model into the generator has now become mainstream. A multi-spectral conditional generative adversarial network (McGAN) [32] integrated the visible image (Red, Green, Blue) and the near-infrared (NIR) band to remove thin clouds and accurately predict the color of ground objects. This work was further extended by the introduction of edge filtering [33]. Singh et al. [34] designed a cloud removal model (Cloud-GAN) based on the cycle-consistent generative adversarial network (CycleGAN) [35] using the bidirectional mapping relationship between the cloudy image and the cloud-removed image to reconstruct the missing information without requiring cloudy and cloud-free image pairs as the dataset. The method provided in Sarukkai et al. [36] captures the correlation of multi-temporal images by using a spatio-temporal generator network (STGAN) to efficiently restore images. Subsequently, some advanced techniques have been employed to optimize the network performance, for example, Pan [37], Yu et al. [38], and Xu et al. [39] embedded the attention mechanism into GAN to better characterize the distribution and features of clouds and to improve the restoration of cloudy regions.

SAR images and optical image fusion have also been borrowed for some deep learning-based cloud removal methods, and the larger differences between images are well attenuated by the deep learning models. Sentinel-1 SAR images were used as auxiliary data in Meraner et al. [40] and Ebel et al. [41] to remove clouds from Sentinel-2 images. A new idea for multi-source image fusion involved the translation of the SAR image into the corresponding simulated optical image, then the adoption of a fusion network to fuse the cloudy image, SAR image, and simulated optical image to obtain a cloud-removed image [42,43]. There are also other strategies that approach cloud removal from a novel perspective. Wen et al. [44] achieved thin cloud removal in the YUV color space, while Zhang et al. [45] uploaded a trained CNN-based fusion model onto Google Earth Engine (GEE) and removed the clouds in Sentinel-2 images.

Although various deep learning-based cloud removal methods have emerged and facilitated the widespread use of RS images, some critical issues remain to be overcome. Specifically, the development of deep learning-based cloud removal frameworks has not yet fully matched the properties of RS images. These methods [26,28,29,30,32,33,34,35,36,37,38,39,44] fit the unidirectional mapping from cloudy images to cloud-free images without distinguishing between cloudy and non-cloudy regions, then use plain convolution layers to extract features, thus neglecting the reconstruction of local regions. Meanwhile, with exception of [31,40,41,42,43], other approaches use per-pixel loss functions in the network training process, which constrain the global generation of pixels without additional attention to the reconstructed regions. There is no guarantee that the reconstructed regions will be semantically compatible with the surrounding clear regions. To mitigate this problem, the cloud mask is introduced in these methods [31,40,41,42,43] to participate in the loss function to guide the reconstruction of local regions. However, in [31,42], the treatment of cloudy and non-cloudy regions only partitions the corresponding region mapping from a global view without maximizing the original information of non-cloudy regions. Another point worth considering is that the direct fusion of SAR images and RS images aggravates the burden on the network because of the lower resolution and speckle noise of SAR images.

To address these problems, we propose a GAN-based framework, dubbed DC-GAN-CL, which incorporates both a distortion coding network and compound loss functions for RS image restoration tasks. The main contributions of the proposed approach are presented as follows.

A new methodology for cloud removal from RS images is proposed with a novel generator employing the symmetric cascade structure of U-Net and two fresh embedded modules for better reconstruction results, and the multitask loss function is designed to restore repetitive details and complex textures.
We convert the implicit data prior learned in the image to an explicit network prior and a distortion coding module for cloud pixels is introduced in the generator for encoding distortion factors in cloudy images into parameters that can be used for network training. A feature refinement module is applied to optimize the integration and transmission of optical information and consists of two tandem phases: extraction and fusion.
Dedicated to exploiting the properties of RS images, we develop the compound loss functions that maximally adapt to cloud removal in RS images in three main aspects, namely, model training, constraints on coherent image semantics, and local adaptive reconstruction.

The remainder of this work is structured as follows. Section 2 describes the proposed method in detail, including the network architecture, additional modules, and loss functions. The experimental setting, results, ablation study, and parameter sensitivity analysis are provided in Section 3. Section 4 reiterates and discusses the previous experiments, and adds additional datasets to demonstrate the generalization capability of the proposed method. Finally, the conclusion is given in Section 5.

2. Methodology

2.1. Overview of the Proposed Framework

In this work, the DC-GAN-CL is developed to achieve end-to-end cloud removal for optical remote sensing images. The overall flowchart of the proposed DC-GAN-CL is displayed in Figure 1 and covers two main stages: the generator (G) containing distortion coding modules (DCM) and feature refinement modules (FRM), and the discriminator (D). G takes a cloudy image (

x_{RGB}

) and a cloud mask (

x_{CM}

) as the input and outputs a distortion coding map (

m_{DC}

) from DCM and a cloud-removed image (

y_{CR}

). Then, the cloudy image (

x_{RGB}

) is used as the input to D along with the cloud-free image (

{\hat{y}}_{CF}

) or the cloud-removed image (

y_{CR}

) generated by G. The resultant output is ‘True’ or ‘False’, indicating whether the input image is the cloud-free image (

{\hat{y}}_{CF}

) or the cloud-removed image (

y_{CR}

). In addition, DC-GAN-CL combines local and non-local operations to adaptively reconstruct images distinguishingly. Extracting non-cloudy regions, boundaries, and cloudy regions, which are local regions, and calculating the corresponding loss functions (

L_{non - cloudy}

,

L_{cs}

,

L_{cloudy}

) separately, as well as calculating these loss functions (

L_{map}

,

L_{G}

,

L_{cGAN}

), train G and D against each other to reach a Nash equilibrium guided by the compound loss functions. The final expectation is to generate high-quality cloud-removed images. Below, we will describe the architecture of the network and loss functions in detail.

2.2. Architecture of the Distortion Coding Network

Generator: As shown in Figure 2, the generator is a network based on a cascade of residual blocks [46], the main advantage of which is the internal short connection that can drive the direct constant mapping of low-level information backward. The up-sampling and down-sampling layers are eliminated to better preserve the detailed information of the non-cloudy regions during feature mapping. The purpose of the generator is to convert a cloudy image (

x_{RGB}

) into a cloud-removed image (

y_{CR}

). Meanwhile, the additional input cloud mask (

x_{CM}

) is used to extract the optical information of non-cloudy regions for backward transmission, eliminating the need to reconstruct from scratch. Not only that, the cloud mask (

x_{CM}

) is involved in the loss function to improve the recovery effect of the generator for the cloudy regions, while also acting as a regulator for non-cloudy regions and penalizing unnecessary adjustments.

On the input side, the cloudy image (

x_{RGB}

) is converted into 32 feature maps by 3 × 3-size convolution kernels, followed by rectified linear unit (ReLU) activation. The subsequent residual blocks take 32 feature maps as the input and output 32 feature maps as the input to the next layer and the symmetric layer. Starting with the third residual block, a distortion coding module (DCM) is introduced, fully drawing on the informational representation of distorted regions in the RS image as a benchmark for balancing the network weights, as described in Section 2.3. Followed by five feature refinement modules (FRM), each of them takes 64 feature maps as the input, including 32 feature maps copied from the symmetric layer, and the other half are produced by the previous layer, and again with 32 feature maps as the output, as described in Section 2.4.

At the end of the generator process, 32 feature maps are transformed into a 3-channel cloud-removed image (

y_{CR}

) by the convolution kernels with a size of 3 × 3 after two residual block operations.

Discriminator: The discriminator, depicted in Table 1, has a five-layer convolutional neural network architecture, with the first four layers containing the convolution layer, batch normalization (BNorm) layer, and leaky rectified linear unit (Leaky ReLU), with the last layer being the convolution layer. To ensure control of the global spectral features of the cloud-removed image (

y_{CR}

) output by the generator, the cloudy image (

x_{RGB}

) is taken into the discriminator as additional prior information jointly with the cloud-free image (

{\hat{y}}_{CF}

) or cloud-removed image (

y_{CR}

), which should be judged as ‘True’ or ‘False’ by the discriminator, respectively.

2.3. Distortion Coding Module

To highlight and exploit the impact caused by cloud distortion, a 2× stacked 4-directional recurrent neural network with ReLU recurrent transition (IRNN) [47] is introduced into the generator to capture the fine-grained details of cloud pixels and compute contextual information. The specific architecture is shown in Figure 3. There are four independent IRNNs that move in four directions (up, right, left, down) and the 1D convolution (Conv 1D) allows sharing in different directions. We can determine the updated movement for each direction using Equation (1). After the second 4-directional IRNN, we obtain a distortion coding map (

m_{DC}

) that potentially predicts the thickness of the cloud and appropriately accounts for the weighted regions in the network. The operation of the distortion coding map when applied to the residual block is given by Equation (2).

f_{(i, j)}^{u, r, l, d} \leftarrow \sum_{i = 1}^{H} \sum_{j = 1}^{W} Max (ω f_{(i, j - 1)}^{u, r, l, d} + f_{(i, j)}^{u, r, l, d}, 0)

(1)

F_{k} (x_{(i, j)}) = γ [m_{DC (i, j)} ⊙ φ (x_{(i, j)}) + x_{(i, j)}]

(2)

where

Max (\cdot)

obtains the maximum number,

ω

is the weight parameter of feature

f

, and

H

,

W

and k stand for the height, width, and number of the channels of the input feature maps

x_{(i, j)}

, respectively.

F_{k}

is the output feature map.

φ (\cdot)

contains the convolution and ReLU operations.

γ (\cdot)

denotes the ReLU.

⊙

and

+

denote the element-wise multiplication and addition of two vectors, respectively.

Unlike the binary mask entered in the generator, the distortion coding maps (

m_{DC}

) are produced by DCM for each cloudy image, with continuous pixel values within [0, 1]. The distortion coding map (

m_{DC}

) can explicitly interfere with the network processing of the cloud coverage information, and this map integrating fine-grained details of clouds is interpreted as the likelihood that pixels are covered by clouds according to [41,48], which is commensurate with our perception that they represent cloud thickness. Figure 4 shows the cloudy images (

x_{RGB}

) from the RICE1 and RICE2 datasets and the corresponding distortion coding maps (

m_{DC}

). A value closer to 1 indicates a thicker cloud and greater attention in the network. It can be observed from Figure 4 that the value of cloudy region’s location is larger.

2.4. Feature Refinement Module

As the feature maps undergo the convolution operation, the cloud components are gradually filtered out but the details in non-cloudy regions inevitably become inappropriately corrected. To overcome this, a semantic information complementarity process can be implemented, whereby feature maps that retain more valuable information in the non-cloudy region are copied and concatenated to the symmetric layer. Given the importance of the symmetrical concatenation in these methods [28,29,32,33,44], we design a feature refinement module (FRM) with two phases of extraction and fusion. Figure 5 shows the architecture of the FRM. As described in Section 2.2, FRM takes 64 feature maps as the input and produces 32 feature maps as the output. In the feature extraction process, a parallel structure of max-pooling and average-pooling is used to aggregate spatial information. Then, the feature vectors output by Conv 1D are combined to act on the feature maps from the symmetric layer. Finally, the enhanced feature maps and the feature maps output from the previous layer are sent to the fusion operation. FRM operations in the network can be considered as:

F_{k} (x_{(i, j)}^{1}) = S i g m o i d [ϑ (\frac{1}{H W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} x_{(i, j)}^{1}) + ϑ (Max (x_{(i, j)}^{1}))] ⊙ x_{(i, j)}^{1}

(3)

{\bar{F}}_{k} (x_{(i, j)}^{1}, x_{(i, j)}^{2}) = γ [F_{k} (x_{(i, j)}^{1}) + φ (C (F_{k} (x_{(i, j)}^{1}), x_{(i, j)}^{2}))]

(4)

where

ϑ (\cdot)

represents the 1D convolution layer (Conv 1D), and

x_{(i, j)}^{1}

and

x_{(i, j)}^{2}

denote the feature maps copied from the symmetric layer and the feature maps output from the previous layer, respectively.

{C (F}_{k} {(x}_{(i, j)}^{1}), x_{(i, j)}^{2})

is the whole input feature maps obtained by concatenating

F_{k} {(x}_{(i, j)}^{1})

to

x_{(i, j)}^{2}

.

2.5. Loss Functions

In this section, we present the loss functions designed to adapt to the DC-GAN-CL defined above. The objective of the loss functions is no longer only for pixel-based reconstruction accuracy but takes into account three factors: network training, image properties, and local smoothing. First, to match the proposed network architecture, the adversarial loss function and the distortion coding loss function are introduced, which are defined in Equations (5) and (6), respectively.

(1) The adversarial loss function (

L_{cGAN}

):

L_{cGAN} = \arg \min_{G} \max_{D} [Ε_{{\hat{y}}_{CF} \sim P ({\hat{y}}_{CF})} [\log D (x_{RGB}, {\hat{y}}_{CF})] + E_{x_{RGB} \sim P (x_{RGB})} [\log (1 - D (x_{RGB}, y_{CR}))]]

(5)

where

x_{RGB}

,

y_{CR}

and

{\hat{y}}_{CF}

represent the cloudy image, cloud-removed image, and cloud-free image, respectively.

(2) The distortion coding loss function (

L_{map}

): In Equation (6), this loss function is applied to motivate the DC-GAN-CL to pay attention to the cloud pixels.

m_{DC}

is the distortion coding map output by the DCM, and

ϕ (\cdot)

is the absolute value of the difference between the cloudy image (

x_{RGB}

) and the cloud-free image (

{\hat{y}}_{CF}

).

L_{map} = {‖ m_{DC} - ϕ (x_{RGB}, {\hat{y}}_{CF}) ‖}_{2}

(6)

(3) The regression loss function (

L_{G}

): The reconstruction accuracy of each pixel is measured in terms of the L₁ loss of the difference between the cloud-removed image (

y_{CR}

) and the cloud-free image (

{\hat{y}}_{CF}

). This loss is defined as:

L_{G} = {‖ {\hat{y}}_{CF} - y_{CR} ‖}_{1}

(7)

(4) The coherent semantic loss function (

L_{cs}

): When the image is covered by a large area of clouds, the network cannot predict the type of ground objects covered by clouds due to the lack of structural information. The farther the cloud pixel is from the clear pixel, the harder it is for the network to learn the correct texture surrounding this cloud pixel. This results in a poor correlation between the reconstructed pixel and the correct pixel, eventually leading to incompatibility between the generated image texture and the texture of the surrounding non-cloudyregion. To maintain the consistency of the low-frequency content and high-frequency structure of the cloud-removed image (

y_{CR}

), the feature boundaries are extracted to improve the reliability of the network restoration effect.

L_{cs} = {‖ \sqrt{{[G x ({\hat{y}}_{CF})]}^{2} + {[G y ({\hat{y}}_{CF})]}^{2}} - \sqrt{{[G x (y_{CR})]}^{2} + {[G y (y_{CR})]}^{2}} ‖}_{1}

(8)

(5) The local adaptive reconstruction loss function (

L_{cloudy}

,

L_{non - cloudy}

): The intention of removing clouds from the RS image is to reconstruct cloudy regions while leaving non-cloudy regions unchanged. This is what we have been emphasizing throughout this work thus far. Therefore, the non-cloudy region of the reconstructed cloud-removed image (

y_{CR}

) based on the deep learning-based method should be highly similar to the non-cloudy region of the input image (

x_{RGB}

), while the cloudy region should be similar to the cloud-free image (

{\hat{y}}_{CF}

). We address this issue by using a cloud mask (

x_{CM}

) to intervene in the loss function, improving the portability of non-cloudy regions and focusing on the recovery of cloudy regions. The two loss functions covering the recovery of non-cloudy and cloudy regions are defined as follows.

L_{cloudy} = {‖ x_{CM} ⊙ ({\hat{y}}_{CF} - y_{CR}) ‖}_{1}

(9)

L_{non - cloudy} = {‖ (1 - x_{CM}) ⊙ (x_{RGB} - y_{CR}) ‖}_{1}

(10)

Ultimately, the loss functions used for the DC-GAN-CL are a weighted combination of all the above loss functions. The optimal weights

α

are determined empirically by observing the training process.

L_{total} = α_{1} L_{cGAN} + α_{2} L_{map} + α_{3} L_{G} + α_{4} L_{cs} + α_{5} L_{cloudy} + α_{6} L_{non - cloudy}

(11)

3. Experimental Results and Analysis

3.1. Description of Dataset

An optical remote sensing image cloud removal dataset named RICE [49] is employed to validate the effectiveness of the proposed method. RICE contains two subsets: the thin cloud image dataset RICE1 and the thick cloud image dataset RICE2. RICE1 is constructed from Google Earth data and contains 500 pairs of RS images with a size of 512 pixels × 512 pixels. The RICE2 dataset is provided by Landsat 8 OLI/TIRS and contains 736 groups of cloudy images, cloud-free images, and cloud masks. The cloudy images and cloud-free images are LandsatLook natural color images with a resolution of 30 m and made from three bands (Landsat 8 OLI/TIRS, Bands 2, 3, 4), while the cloud-free images are acquired manually with a time interval of fewer than 15 days.

The division of the experimental RICE dataset for training and testing is described below. In total, 80% of the images in the RICE1 dataset are used for training the network and the remaining 20% are used as the test dataset. For the RICE2 dataset, 470 sets of images are chosen as the training dataset and 118 sets of images are reserved for the test dataset.

3.2. Training Details and Evaluation Metrics

We implement DC-GAN-CL with the PyTorch framework of Windows 10 and train it on an NVIDIA 1070 Ti GPU. The Adam optimizer algorithm is used to optimize the network and the learning rate is set to 10⁻⁴. The training process of the proposed method and the deep learning-based comparison method mentioned in this work are iterated over 200 epochs. In addition, the weights in Equation (11) are

α_{1}

=

α_{2}

=

α_{4}

= 1,

α_{3}

=

α_{6}

= 100, and

α_{5}

= 20.

Two representative indicators are selected to evaluate the cloud removal performance of various methods and the quality of the images: peak signal to noise ratio (PSNR) and structural similarity index measurement (SSIM). PSNR measures the difference between images at the pixel level based on the error sensitivity, while SSIM evaluates the similarity between the reconstructed and the reference image. Higher values of both indicate better reconstruction accuracy. PSNR and SSIM are defined as follows:

MSE (z, Z) = \frac{1}{H W C} \sum_{i = 1}^{H} \sum_{j = 1}^{W} \sum_{k = 1}^{C} {(z (i, j, k) - Z (i, j, k))}^{2}

(12)

PSNR = 10 l g (\frac{255^{2}}{MSE})

(13)

SSIM (z, Z) = \frac{(2 μ_{z} μ_{Z} + C_{1}) (2 σ_{z Z} + C_{2})}{(μ_{z}^{2} + μ_{Z}^{2} + C_{1}) (σ_{z}^{2} + σ_{Z}^{2} + C_{2})}

(14)

where

z

and

Z

represent the two images to be compared.

μ_{z}

(

μ_{Z}

) and

σ_{z}

(

σ_{Z}

) represent the average grey level and variance of

z

(

Z

), respectively, and

σ_{z Z}

is the covariance between

z

and

Z

.

C_{1}

and

C_{2}

are decimals of negligible size and are used to maintain stability.

3.3. Comparison with Other Cloud Removal Methods

3.3.1. Results of Testing on the RICE1 Dataset

We present our experiments and comparisons with the conventional DCP method and the mainstream methods based on deep learning: cGAN [50], SpAGAN [37], McGAN [32], and AMGAN-CR [39]. It is worth mentioning that the thin clouds in the RICE1 dataset affect the whole image, so we eliminate the module and loss function related to cloud masks in the experiment and still achieve the desired experimental results. Table 2 shows the average PSNR and SSIM values of the cloud-removed images as calculated by the various methods for the RICE1 test set. Our method achieves markedly higher PSNR and SSIM values of 31.677 dB and 97.2%, respectively.

The reconstruction results from the various approaches are shown in Figure 6. To facilitate observation, each selected area is marked with a green box and placed below the corresponding image. In Figure 6b, the images reconstructed by DCP appear color-distorted due to prior reliance on the dark channel. In contrast, the results generated by the deep learning-based methods are more acceptable. The results of the cGAN, SpAGAN, McGAN, and AMGAN-CR remove clouds insufficiently, and the texture of the cloud-removed images is blurred. Notably, the results of the DC-GAN-CL are visually competitive and the cloud-removed images appear similar to the cloud-free images in terms of color, texture, and structure.

3.3.2. Results of Testing on the RICE2 Dataset

Since it is difficult to separate the cloud components from the ground information in a single thick cloud image, conventional methods cannot reconstruct the cloud-contaminated regions without subjective bias. Therefore, five GAN-based networks, Cloud-GAN, cGAN, SpAGAN, McGAN, and AMGAN-CR, are selected for the comparison experiment on the RICE2 dataset. Table 3 shows the average PSNR and SSIM values of the cloud-removed images obtained by the various methods for the RICE2 test set. DC-GAN-CL outperforms the other methods, achieving PSNR and SSIM values of 31.322 dB and 93.4%, respectively.

The experimental results are displayed in Figure 7. For the RICE2 test set, all methods achieve acceptable experimental results with good visual effects overall, but there are differences in some local details. Cloud-GAN, cGAN, and SpAGAN generated images with apparent artifacts because learning with a single spectral mapping is insufficient to guarantee network stability. SpAGAN is insensitive to feature boundaries, resulting in a blurring of river information when generating images, causing two sides of a river to join together. In addition, there are still clouds remaining on the third row of Figure 7d. For McGAN, its generator is based on U-Net, which directly cascades the feature maps of the front layer full of cloud information to the symmetric layer. This inevitably increases the burden of network cloud removal and the resultant reconstruction contains cloudy regions incorrectly filled with learned grass texture. Although the two indicators for AMGAN-CR are sub-optimal, it performs poorly on the desert image in Figure 7f. As shown in Figure 7g, our method is more sensitive to the surface feature boundaries, which enhances the reconstruction of coherent image semantics and generates reasonable textures in the cloudy region. The final cloud-removed image is more visually acceptable, with a natural articulation of boundaries and fewer artifacts.

3.3.3. Accuracy and Training Analysis

McGAN and AMGAN-CR, which perform better in the cloud removal experiments than other methods, are chosen to analyze the trends in image reconstruction accuracy during network iteration with the proposed method (DC-GAN-CL). These data are visualized in the broken line chart as shown in Figure 8 and Figure 9.

From Figure 8, it can be observed that the PSNR and SSIM values of the three methods improve significantly with increasing network iterations. AMGAN-CR is slightly inferior to the other two methods, while McGAN and DC-GAN-CL are evenly matched. Before the 125th epoch, McGAN has the leading SSIM value, but after that, it tends to decrease. The proposed method DC-GAN-CL possesses strong training stability and achieves the best final accuracy. As shown in Figure 9, the proposed method has excellent performance on the RICE2 test set, and the supremacy of DC-GAN-CL becomes more obvious throughout the 200 epochs, which further demonstrates its applicability and effectiveness for reconstructing complex cloud-contaminated images.

3.4. Ablation Study

As expressed in Section 2, the proposed method of DC-GAN-CL features several innovative blocks and loss functions that are not developed in previous deep learning-based cloud removal approaches. In this section, we present an ablation study that analyses the RICE2 dataset to elucidate and quantify the contribution of each introduced component to the superiority of the proposed method.

3.4.1. Ablation Study of the Network Architecture

A network with eighteen layers of residual blocks is first evaluated as the baseline. Based on this, we gradually introduce the distortion coding module (DCM) and feature refinement module (FRM) to compare the performances of different module combinations. The results of the quantitative analysis are listed in Table 4. The proposed method achieves improvements of 4.993 dB (PSNR) and 2.8% (SSIM) over the baseline. As depicted in Table 4, the values of PSNR for the network without the DCM and the network without the FRM are considerably lower compared to the proposed method.

Apart from quantitative assessments, the cloud-removed images of the methods with different combinations of modules are also acquired and are shown in Figure 10. To facilitate observation, each selected area is marked with a purple box and placed below the corresponding image. As demonstrated in Figure 10, the proposed method has remarkable performance in texture restoration with barely any artifacts, in contrast to the other three network architectures which have a poor ability to reconstruct the spectral information. This confirms that the proposed method can feasibly encode the cloud distortion region into trainable parameters for the network using the reaction force of cloud influence, as well as achieve deep extraction of effective features for fusion using the feature refinement module to enhance performance.

3.4.2. Ablation Study of the Loss Functions

Although the adversarial and regression loss functions have been commonly used for GAN-based image inpainting in computer vision, they are not yet well-suited for reconstructing RS images contaminated by clouds. To validate the indispensability of each component in the loss functions of this work, we conduct loss function ablation experiments using the adversarial loss function (

L_{cGAN}

), the regression loss function (

L_{G}

), and the distortion coding loss function (

L_{map}

) as the baseline, and gradually combine other loss functions, including the coherent semantic loss function (

L_{cs}

) and the local adaptive reconstruction loss function (

L_{cloudy}

,

L_{non - cloudy}

). The average PSNR and SSIM values for cloud-removed images obtained by the ablation methods and the proposed method are shown in Table 5. These results demonstrate that the proposed method effectively drives the network to learn more features of the image, enhances the recovery of image details, and outperforms the metrics of any other loss function combination when all loss functions are jointly involved in the network training.

As depicted in Figure 11, the three methods involving the ablation of loss functions failed to generate coherent river texture information when faced with surface textures that are completely obscured by clouds. In contrast, the texture information generated by the proposed method is visibly smoother, as highlighted by the purple boxes in Figure 11e.

3.5. Parameter Sensitivity Analysis

In this part, we perform tuning for the six hyper parameters given by the loss function (

L_{total}

) on the RICE2 dataset. Considering the possible coupling between the six loss terms, stepwise addition of losses is taken. Inspired by existing works [37,42,43], we first set

α_{1}

=

α_{2}

= 1 and

α_{3}

= 100. As mentioned earlier, adaptive reconstruction of local regions, especially non-cloudy regions, can be achieved by introducing a cloud mask. The local adaptive reconstruction loss function (

L_{cloudy}

,

L_{non - cloudy}

) is added, the weighting factor of

L_{non - cloudy}

is assigned the same value as

L_{G}

, expecting their impact is equivalent. Then, for

L_{cloudy}

, its parameter

α_{5}

is set to 1, 20, 50, and 100, respectively, to analyze the impact of PSNR values on the resulting image, and the results are plotted in Figure 12 Left. One can observe that when

α_{5}

is changed from 1 to 20, the results of the network are improved, after which the PSNR value decreases by 0.6 dB. The PSNR reaches its maximum when

α_{5}

takes the value of 20. Eventually, these five parameters are fixed and we obtain the optimal PSNR value by adjusting

α_{4}

, as plotted in Figure 12 Right. As can be seen from Figure 12 Right, the best results will be obtained when the parameter

α_{4}

of

L_{cs}

is set to 1. Therefore, we empirically set the weighting factors (

α_{1} - α_{6}

) of

L_{total}

to 1, 1, 100, 1, 20, and 100, respectively.

4. Discussion

4.1. Cloud Mask Role Analysis

In this section, we reiterate and demonstrate the relevance of cloud masks for cloud removal in RS images. As discussed repeatedly in the previous section, the purpose of recasting a cloud-contaminated RS image should be to recover the optical information obscured by clouds while making minimal changes to non-cloudy regions. Thus, a cloud mask is implemented to explicitly intervene in the generator learning to constrain modifications to cloudy regions while keeping non-cloudy regions unchanged. The specific contributions are embodied in our network architecture and loss functions. Below, the quantification and analysis of this process is actually an extension of Section 3.3.2.

We select two cloud-removed images from the results of the RICE2 experiment and calculate the PSNR and SSIM values between the non-cloudy region of the cloudy image and the corresponding region of the cloud-removed image. The cloudy region is calculated as the correlation value between the cloud-removed image and the cloud-free image. As shown in Table 6, the proposed method still achieves better results than other methods despite not being the best performer on some metric values. Our method obtains higher PSNR and SSIM values in a single image compared to other methods while approaching the envisioned effect of constructing non-cloudy regions of the cloud-removed image that are more similar to the cloudy image.

Figure 13(a1,a2) presents two selected cloudy images, while Figure 13(b1,b2) displays the corresponding cloud mask of the cloudy image, again. Figure 13(c1–h1,c2–h2) depicts the reconstruction results of the Cloud-GAN, cGAN, SpAGAN, McGAN, AMGAN-CR, and DC-GAN-CL, respectively. All methods achieve visual results consistent with the performance of the values in Table 6.

Overall, the proposed method outperforms the other methods in terms of several visual sensor evaluations including the global inpainting effect and local adaptive reconstruction effect. Furthermore, this work aims to develop a framework for cloud removal without discussing the acquisition of cloud masks. Nowadays, methods regarding cloud detection are well established and can be classified into conventional [51,52] or deep learning-based [53,54] approaches. The conventional methods capture cloud details by virtue of the spectral reflectance of the band, while the deep learning-based methods learn the mapping relationship between cloudy images and cloud masks, and these obtain higher cloud detection accuracy. Therefore, we assume that cloud masks that can be used directly, or cloudy regions that can be extracted well by the algorithm, do not affect the cloud removal. Subsequently, we intend to integrate the cloud detection algorithm into the framework of cloud removal.

4.2. Additional Dataset

The RICE dataset contains mainly natural feature types, such as lakes, grasslands, mountains, and deserts. To verify the generalization ability of the proposed method for dealing with different types of features, another dataset consisting mainly of built-up images is introduced, called the Paris dataset, produced by [33]. Hasan et al. [33] selected satellite images of Paris from the WorldView-2 European Cities and cropped them into RGB images of size 256 × 256 pixels, yielding a total of 4043 images. However, the cloudy image is synthesized from a cloud mask (

x_{CM}

), real cloud image (

x_{Cloud}

) and cloud-free image (

{\hat{y}}_{CF}

). Figure 14 shows two examples of the Paris dataset, while the simulated cloud images (

x_{S_RGB}

) are obtained by Equation (15).

x_{S_RGB} = (1 - x_{CM}) ⊙ {\hat{y}}_{CF} + x_{CM} ⊙ x_{Cloud}

(15)

We evaluate Cloud-GAN, cGAN, SpAGAN, McGAN, AMGAN-CR and DC-GAN-CL on the Paris dataset and still select PSNR and SSIM as the evaluation metrics. In the Paris dataset, we randomly select 800 pairs of images and divide them into a training set and a test set in a 4:1 ratio. Other settings are the same as in Section 3.2. Table 7 shows the average PSNR and SSIM values of the cloud-removed images as calculated by the various methods for the Paris test set. From Table 7, the proposed method achieves maximum values of 23.163 dB (PSNR) and 91.2% (SSIM) for both metrics, respectively. All methods perform poorly on PSNR, which is caused by images saturated with complex building structures. As can be seen in Figure 15, the restoration of the boundaries of different buildings in the cloudy image is more accurate with DC-GAN-CL. In general, visual assessment of the cloud-removed images (Figure 15) and the metrics obtained (Table 7) confirm that the proposed method DC-GAN-CL significantly improves the cloud-removed image quality compared with other methods.

5. Conclusions

In this work, a novel cloud removal framework based on a generative adversarial network was proposed for the reconstruction of missing information in optical RS images contaminated by clouds. The effectiveness of the method was verified using the RICE and Paris datasets. Moreover, both quantitative indices and qualitative visual effects demonstrated that the proposed method was capable of detailed reconstruction of optical information contaminated by thin clouds or dense and thick clouds and acquired satisfactory reconstruction accuracy. Additionally, the non-cloudy regions of the cloudy images remained unchanged and does not affect the quality of the cloud-removed image by this method.

Moreover, the lengths of the satellite revisit cycles mean that a single sensor cannot acquire consecutive time series of RS data. Reconstruction of cloud-contaminated regions from just one cloudy image cannot overcome this limitation, i.e., the network cannot train with enough prediction experience to generate spectral information of ground objects obscured by thick clouds. Thus, we plan to deepen our investigation of the weaknesses mentioned above and employ multi-source data as supplementary information for RS data reconstruction in future studies.

Author Contributions

Conceptualization, J.Z.; methodology, J.Z. and W.R.; software, X.L.; validation, W.R. and H.X.; formal analysis, J.Z.; investigation, X.L.; data curation, H.X.; writing—original draft preparation, J.Z.; writing—review and editing, H.X.; visualization, W.R.; supervision, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of China under Grant 41871226; in part by the Major Industrial Technology Research and Development Projects of High-tech Industry in Chongqing under Grant D2018-82; in part by the Intergovernmental International Scientific and Technological Innovation Cooperation Project of the National key R & D Program under Grant 2021YFE0194700; in part by the Key Cooperation Project of Chongqing Municipal Education Commission under Grant HZ2021008.

Data Availability Statement

The RICE dataset is available online at: https://github.com/BUPTLdy/RICE_DATASET (accessed on 9 October 2021). The Paris dataset is available online at: https://data.mendeley.com/datasets/jk3wr7crj7/3 (accessed on 5 April 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Yu, C.; Chen, L.; Su, L.; Fan, M.; Li, S. Kriging interpolation method and its application in retrieval of MODIS aerosol optical depth. In Proceedings of the 19th International Conference on Geoinformatics (ICG), Shanghai, China, 24–26 June 2011; pp. 1–6. [Google Scholar]
Zhang, C.; Li, W.; Travis, D.J. Restoration of clouded pixels in multispectral remotely sensed imagery with cokriging. Int. J. Remote Sens. 2009, 30, 2173–2195. [Google Scholar] [CrossRef]
Siravenha, A.C.; Sousa, D.; Bispo, A.; Pelaes, E. Evaluating inpainting methods to the satellite images clouds and shadows removing. In Proceedings of the International Conference on Signal Processing, Image Processing and Pattern Recognition, Jeju Island, Korea, 8–10 December 2011; pp. 56–65. [Google Scholar]
Zhu, X.; Gao, F.; Liu, D.; Chen, J. A modified neighborhood similar pixel interpolator approach for removing thick clouds in Landsat images. IEEE Geosci. Remote Sens. Lett. 2011, 9, 521–525. [Google Scholar] [CrossRef]
Siravenha, A.C.; Sousa, D.; Bispo, A.; Pelaes, E. The use of high-pass filters and the inpainting method to clouds removal and their impact on satellite images classification. In Proceedings of the International Conference on Image Analysis and Processing, Ravenna, Italy, 14–16 September 2011; pp. 333–342. [Google Scholar]
Cai, W.; Liu, Y.; Li, M.; Cheng, L.; Zhang, C. A self-adaptive homomorphic filter method for removing thin cloud. In Proceedings of the 2011 19th International Conference on Geoinformatics, Shanghai, China, 24–26 June 2011; pp. 1–4. [Google Scholar]
Shen, H.; Li, H.; Qian, Y.; Zhang, L.; Yuan, Q. An effective thin cloud removal procedure for visible remote sensing images. ISPRS J. Photogramm. Remote Sens. 2014, 96, 224–235. [Google Scholar] [CrossRef]
Zhang, Y.; Guindon, B.; Cihlar, J. An image transform to characterize and compensate for spatial variations in thin cloud contamination of Landsat images. Remote Sens. Environ. 2002, 82, 173–187. [Google Scholar] [CrossRef]
Xu, M.; Jia, X.; Pickering, M. Automatic cloud removal for Landsat 8 OLI images using cirrus band. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Quebec, QC, Canada, 13–18 July 2014; pp. 2511–2514. [Google Scholar]
Xu, M.; Pickering, M.; Plaza, A.J.; Jia, X. Thin cloud removal based on signal transmission principles and spectral mixture analysis. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1659–1669. [Google Scholar] [CrossRef]
Lv, H.; Wang, Y.; Shen, Y. An empirical and radiative transfer model based algorithm to remove thin clouds in visible bands. Remote Sens. Environ. 2016, 179, 183–195. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar]
Shi, S.; Zhang, Y.; Zhou, X.; Cheng, J. A novel thin cloud removal method based on multiscale dark channel prior (MDCP). IEEE Geosci. Remote Sens. Lett. 2021, 19, 1001905. [Google Scholar] [CrossRef]
Lin, C.; Tsai, P.H.; Lai, K.; Chen, J. Cloud removal from multitemporal satellite images using information cloning. IEEE Trans. Geosci. Remote Sens. 2012, 51, 232–241. [Google Scholar] [CrossRef]
Cheng, Q.; Shen, H.; Zhang, L.; Yuan, Q.; Zeng, C. Cloud removal for remotely sensed images by similar pixel replacement guided with a spatio-temporal MRF model. ISPRS J. Photogramm. Remote Sens. 2014, 92, 54–68. [Google Scholar] [CrossRef]
Li, Z.; Shen, H.; Cheng, Q.; Li, W.; Zhang, L. Thick cloud removal in high-resolution satellite images using stepwise radiometric adjustment and residual correction. Remote Sens. 2019, 11, 1925. [Google Scholar] [CrossRef] [Green Version]
Chen, B.; Huang, B.; Chen, L.; Xu, B. Spatially and temporally weighted regression: A novel method to produce continuous cloud-free landsat imagery. IEEE Trans. Geosci. Remote Sens. 2017, 55, 27–37. [Google Scholar] [CrossRef]
Li, X.; Shen, H.; Zhang, L.; Zhang, H.; Yuan, Q.; Yang, G. Recovering quantitative remote sensing products contaminated by thick clouds and shadows using multitemporal dictionary learning. IEEE Trans. Geosci. Remote Sens. 2014, 52, 7086–7098. [Google Scholar]
Wen, F.; Zhang, Y.; Gao, Z.; Ling, X. Two-pass robust component analysis for cloud removal in satellite image sequence. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1090–1094. [Google Scholar] [CrossRef]
Ji, T.; Chu, D.; Zhao, X.; Hong, D. A unified framework of cloud detection and removal based on low-rank and group sparse regularizations for multitemporal multispectral images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5303015. [Google Scholar] [CrossRef]
Lin, J.; Huang, T.; Zhao, X.; Chen, Y.; Zhang, Q.; Yuan, Q. Robust thick cloud removal for multitemporal remote sensing images using coupled tensor factorization. IEEE Trans. Geosci. Remote Sens. 2022, 60, 3140800. [Google Scholar] [CrossRef]
Eckardt, R.; Berger, C.; Thiel, C.; Schmullius, C. Removal of optically thick clouds from multi-spectral satellite images using multi-frequency SAR data. Remote Sens. 2013, 5, 2973–3006. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Li, W.; Shen, C. Removal of optically thick clouds from high-resolution satellite imagery using dictionary group learning and interdictionary nonlocal joint sparse coding. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2017, 10, 1870–1882. [Google Scholar] [CrossRef]
Zhu, C.; Zhao, Z.; Zhu, X.; Nie, Z.; Liu, Q.H. Cloud removal for optical images using SAR structure data. In Proceedings of the 2016 IEEE 13th International Conference on Signal Processing (ICSP), Chengdu, China, 6–10 November 2016; pp. 1872–1875. [Google Scholar]
Shen, H.; Wu, J.; Cheng, Q.; Aihemaiti, M.; Zhang, C.; Li, Z. A spatiotemporal fusion based cloud removal method for remote sensing images with land cover changes. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 862–874. [Google Scholar] [CrossRef]
Zhang, Q.; Yuan, Q.; Shen, H.; Zhang, L. A unified spatial-temporal-spectral learning framework for reconstructing missing data in remote sensing images. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 4981–4984. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Zi, Y.; Xie, F.; Zhang, N.; Jiang, Z.; Zhu, W.; Zhang, H. Thin cloud removal for multispectral remote sensing images using convolutional neural networks combined with an imaging model. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 3811–3823. [Google Scholar] [CrossRef]
Li, W.; Li, Y.; Chen, D.; Chan, J.C.W. Thin cloud removal with residual symmetrical concatenation network. ISPRS J. Photogramm. Remote Sens. 2019, 153, 137–150. [Google Scholar] [CrossRef]
Chen, Y.; Tang, L.; Yang, X.; Fan, R.; Bilal, M.; Li, Q. Thick clouds removal from multitemporal ZY-3 satellite images using deep learning. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 13, 143–153. [Google Scholar] [CrossRef]
Dai, P.; Ji, S.; Zhang, Y. Gated convolutional networks for cloud removal from bi-temporal remote sensing images. Remote Sens. 2020, 12, 3427. [Google Scholar] [CrossRef]
Enomoto, K.; Sakurada, K.; Wang, W.; Fukui, H.; Matsuoka, M.; Nakamura, R.; Kawaguchi, N. Filmy cloud removal on satellite imagery with multispectral conditional generative adversarial nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 48–56. [Google Scholar]
Hasan, C.; Horne, R.; Mauw, S.; Mizera, A. Cloud removal from satellite imagery using multispectral edge-filtered conditional generative adversarial networks. Int. J. Remote Sens. 2022, 43, 1881–1893. [Google Scholar] [CrossRef]
Singh, P.; Komodakis, N. Cloud-GAN: Cloud removal for sentinel-2 imagery using a cyclic consistent generative adversarial networks. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 1772–1775. [Google Scholar]
Zhu, J.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Sarukkai, V.; Jain, A.; Uzkent, B.; Ermon, S. Cloud removal in satellite images using spatiotemporal generative networks. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, 1–5 March 2020; pp. 1785–1794. [Google Scholar]
Pan, H. Cloud removal for remote sensing imagery via spatial attention generative adversarial network. arXiv 2020, arXiv:2009.13015. [Google Scholar]
Yu, W.; Zhang, X.; Pun, M.O. Cloud removal in optical remote sensing imagery using multiscale distortion-aware networks. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5512605. [Google Scholar] [CrossRef]
Xu, M.; Deng, F.; Jia, S.; Jia, X.; Plaza, A.J. Attention mechanism-based generative adversarial networks for cloud removal in Landsat images. Remote Sens. Environ. 2022, 271, 112902. [Google Scholar] [CrossRef]
Meraner, A.; Ebel, P.; Zhu, X.X.; Schmitt, M. Cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion. ISPRS J. Photogramm. Remote Sens. 2020, 166, 333–346. [Google Scholar] [CrossRef]
Ebel, P.; Meraner, A.; Schmitt, M.; Zhu, X.X. Multisensor data fusion for cloud removal in global and all-season sentinel-2 im-agery. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5866–5878. [Google Scholar] [CrossRef]
Gao, J.; Yuan, Q.; Li, J.; Zhang, H.; Su, X. Cloud removal with fusion of high resolution optical and SAR images using generative adversarial networks. Remote Sens. 2020, 12, 191. [Google Scholar] [CrossRef] [Green Version]
Darbaghshahi, F.N.; Mohammadi, M.R.; Soryani, M. Cloud removal in remote sensing images using generative adversarial networks and SAR-to-optical image translation. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4105309. [Google Scholar] [CrossRef]
Wen, X.; Pan, Z.; Hu, Y.; Liu, J. Generative adversarial learning in YUV color space for thin cloud removal on satellite imagery. Remote Sens. 2021, 13, 1079. [Google Scholar] [CrossRef]
Zhang, X.; Qiu, Z.; Peng, C.; Ye, P. Removing cloud cover interference from Sentinel-2 imagery in Google Earth Engine by fusing Sentinel-1 SAR data with a CNN model. Int. J. Remote Sens. 2022, 43, 132–147. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2874–2883. [Google Scholar]
Zupanc, A. Improving Cloud Detection with Machine Learning. 2017. Available online: https://medium.com/sentinel-hub/improving-cloud-detection-with-machine-learning-c09dc5d7cf13 (accessed on 9 September 2019).
Lin, D.; Xu, G.; Wang, X.; Wang, Y.; Sun, X.; Fu, K. A remote sensing image dataset for cloud removal. arXiv 2019, arXiv:1901.00600. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Zhong, B.; Chen, W.; Wu, S.; Hu, L.; Luo, X.; Liu, Q. A cloud detection method based on relationship between objects of cloud and cloud-shadow for Chinese moderate to high resolution satellite imagery. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2017, 10, 4898–4908. [Google Scholar] [CrossRef]
Bo, P.; Fenzhen, S.; Yunshan, M. A cloud and cloud shadow detection method based on fuzzy c-means algorithm. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2020, 13, 1714–1727. [Google Scholar] [CrossRef]
Yang, J.; Guo, J.; Yue, H.; Liu, Z.; Hu, H.; Li, K. CDnet: CNN-based cloud detection for remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6195–6211. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, S.; Liu, S.; Xiao, B.; Cao, X. Ground-based cloud detection using multiscale attention convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8019605. [Google Scholar] [CrossRef]

Figure 1. The proposed DC-GAN-CL framework.

Figure 2. Architecture of the generator.

Figure 3. Architecture of the distortion coding module.

Figure 4. Examples of cloudy images (

x_{RGB}

), and the corresponding distortion coding maps (

m_{DC}

). (a) RICE1/

x_{RGB} 1

; (b) RICE1/

m_{DC} 1

; (c) RICE2/

x_{RGB} 2

; (d) RICE2/

m_{DC} 2

.

Figure 4. Examples of cloudy images (

x_{RGB}

), and the corresponding distortion coding maps (

m_{DC}

). (a) RICE1/

x_{RGB} 1

; (b) RICE1/

m_{DC} 1

; (c) RICE2/

x_{RGB} 2

; (d) RICE2/

m_{DC} 2

.

Figure 5. Architecture of the feature refinement module.

Figure 6. Results of each method for RICE1. (a) Cloudy images; (b) DCP; (c) cGAN; (d) SpAGAN; (e) McGAN; (f) AMGAN-CR; (g) DC-GAN-CL; (h) cloud-free images.

Figure 7. Results of each method for RICE2. (a) Cloudy images; (b) Cloud-GAN; (c) cGAN; (d) SpAGAN; (e) McGAN; (f) AMGAN-CR; (g) DC-GAN-CL; (h) cloud-free images.

Figure 8. Trends in the PSNR and SSIM values of McGAN, AMGAN-CR, and DC-GAN-CL on the RICE1 test set throughout training.

Figure 9. Trends in the PSNR and SSIM values of McGAN, AMGAN-CR, and DC-GAN-CL on the RICE2 test set throughout training.

Figure 10. Results of an ablation study of different components of the network architecture on the RICE2 test set. (a) Cloudy images; (b) baseline; (c) baseline + DCM; (d) baseline + FRM; (e) DC-GAN-CL; (f) cloud-free images.

Figure 11. Results of an ablation study of different components of the network architecture on the RICE2 test set. (a) Cloudy images; (b) baseline; (c) baseline +

L_{cs}

; (d) baseline +

L_{cloudy}

+

L_{non - cloudy}

; (e) DC-GAN-CL; (f) cloud-free images.

Figure 11. Results of an ablation study of different components of the network architecture on the RICE2 test set. (a) Cloudy images; (b) baseline; (c) baseline +

L_{cs}

; (d) baseline +

L_{cloudy}

+

L_{non - cloudy}

; (e) DC-GAN-CL; (f) cloud-free images.

Figure 12. Analysis of the influence of different hyper parameter values in the proposed method on the performance.

Figure 13. Results of each method in analysing RICE2/(Image1,Image2). (a) Cloudy images; (b) cloud masks; (c) Cloud-GAN; (d) cGAN; (e) SpAGAN; (f) McGAN; (g) AMGAN-CR; (h) DC-GAN-CL; (i) cloud-free images.

Figure 14. Samples from the Paris dataset. (a) Cloud-free images; (b) real cloud images; (c) cloud masks; (d) simulated cloud images.

Figure 15. Results of each method for Paris dataset. (a) Cloudy images; (b) Cloud-GAN; (c) cGAN; (d) SpAGAN; (e)McGAN; (f) AMGAN-CR; (g) DC-GAN-CL; (h) cloud-free images.

Table 1. Architecture of the discriminator.

Discriminator
Input:	( $x_{RGB}$ , $y_{CR}$ or ${\hat{y}}_{CF}$ )
Layer	Architecture
L₁	Conv(32, 4, 2, 1); BNorm = False; LReLU
L₂	Conv(128, 4, 2, 1); BNorm; LReLU
L₃	Conv(256, 4, 2, 1); BNorm; LReLU
L₄	Conv(512, 4, 2, 1); BNorm; LReLU
L₅	Conv(1, 3, 1, 1);
Output:	‘True’ or ‘False’

Table 2. Quantitative evaluation of various methods on the RICE1 test set.

Methods	mPSNR/dB ↑	mSSIM/% ↑
DCP	23.844	81.9
cGAN	27.192	90.6
SpAGAN	29.976	95.8
McGAN	30.290	96.9
AMGAN-CR	29.823	96.2
DC-GAN-CL	31.677	97.2

↑: The higher score indicates a better effect. The best value of each indicator is highlighted in bold.

Table 3. Quantitative evaluation of various methods on the RICE2 test set.

Methods	mPSNR/dB ↑	mSSIM/% ↑
Cloud-GAN	26.314	86.2
cGAN	24.179	83.7
SpAGAN	27.333	90.4
McGAN	28.496	92.3
AMGAN-CR	29.408	91.9
DC-GAN-CL	31.322	93.4

↑: The higher score indicates the better effect. The best value of each indicator is highlighted in bold.

Table 4. Quantitative evaluation of DC-GAN-CL with different components of the network architecture using the RICE2 test set.

Architecture			RICE2
Baseline	DCM	FRM	mPSNR/dB ↑	mSSIM/% ↑
√	×	×	26.329	90.6
√	√	×	27.703	91.3
√	×	√	28.153	91.7
√	√	√	31.322	93.4

↑: The higher score indicates the better effect. The best value of each indicator is highlighted in bold.

Table 5. Quantitative evaluation of DC-GAN-CL with different components of the loss functions using the RICE2 test set.

Loss Functions			RICE2
Baseline	$L_{cs}$	$L_{cloudy}$ $, L_{non - cloudy}$	mPSNR/dB ↑	mSSIM/% ↑
√	×	×	29.546	92.3
√	√	×	30.545	92.7
√	×	√	29.770	92.5
√	√	√	31.322	93.4

↑: A higher score indicates a better effect. The best value of each indicator is highlighted in bold.

Table 6. Quantitative evaluation of various methods on RICE2/(Image1, Image2).

Methods	RICE2/(Image1, Image2)
	PSNR/dB ↑			SSIM/% ↑
	Cloudy	Non-Cloudy	Whole	Cloudy	Non-Cloudy	Whole
Cloud-GAN [Image1]	27.210	9.064	26.144	72.9	27.3	56.4
cGAN [Image1]	25.179	9.830	23.991	68.2	28.5	48.4
SpAGAN [Image1]	27.083	8.820	26.053	73.8	28.4	57.3
McGAN [Image1]	25.656	8.815	25.008	71.5	28.4	53.2
AMGAN-CR [Image1]	26.834	8.972	26.007	73.3	32.7	57.1
DC-GAN-CL [Image1]	27.604	9.795	27.107	74.1	30.3	60.1
Cloud-GAN [Image2]	27.372	8.944	24.275	82.4	46.6	64.7
cGAN [Image2]	27.957	7.755	24.974	82.6	39.3	64.4
SpAGAN [Image2]	28.254	7.982	25.985	84.4	41.1	66.5
McGAN [Image2]	28.117	7.681	26.177	85.4	40.8	68.8
AMGAN-CR [Image2]	28.067	7.562	26.908	84.8	39.4	71.2
DC-GAN-CL [Image2]	29.023	8.091	28.419	86.5	43.0	74.5

↑: The higher score indicates the better effect. The best value of each indicator is highlighted in bold.

Table 7. Quantitative evaluation of various methods on the Paris test set.

Methods	mPSNR/dB ↑	mSSIM/% ↑
Cloud-GAN	21.048	87.2
cGAN	19.930	85.6
SpAGAN	22.634	89.7
McGAN	22.666	90.2
AMGAN-CR	22.838	90.5
DC-GAN-CL	23.163	91.2

↑: The higher score indicates the better effect. The best value of each indicator is highlighted in bold.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, J.; Luo, X.; Rong, W.; Xu, H. Cloud Removal for Optical Remote Sensing Imagery Using Distortion Coding Network Combined with Compound Loss Functions. Remote Sens. 2022, 14, 3452. https://doi.org/10.3390/rs14143452

AMA Style

Zhou J, Luo X, Rong W, Xu H. Cloud Removal for Optical Remote Sensing Imagery Using Distortion Coding Network Combined with Compound Loss Functions. Remote Sensing. 2022; 14(14):3452. https://doi.org/10.3390/rs14143452

Chicago/Turabian Style

Zhou, Jianjun, Xiaobo Luo, Wentao Rong, and Hao Xu. 2022. "Cloud Removal for Optical Remote Sensing Imagery Using Distortion Coding Network Combined with Compound Loss Functions" Remote Sensing 14, no. 14: 3452. https://doi.org/10.3390/rs14143452

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cloud Removal for Optical Remote Sensing Imagery Using Distortion Coding Network Combined with Compound Loss Functions

Abstract

1. Introduction

1.1. Conventional Cloud Removal Approaches

1.2. Deep Learning-Based Cloud Removal Approaches

2. Methodology

2.1. Overview of the Proposed Framework

2.2. Architecture of the Distortion Coding Network

2.3. Distortion Coding Module

2.4. Feature Refinement Module

2.5. Loss Functions

3. Experimental Results and Analysis

3.1. Description of Dataset

3.2. Training Details and Evaluation Metrics

3.3. Comparison with Other Cloud Removal Methods

3.3.1. Results of Testing on the RICE1 Dataset

3.3.2. Results of Testing on the RICE2 Dataset

3.3.3. Accuracy and Training Analysis

3.4. Ablation Study

3.4.1. Ablation Study of the Network Architecture

3.4.2. Ablation Study of the Loss Functions

3.5. Parameter Sensitivity Analysis

4. Discussion

4.1. Cloud Mask Role Analysis

4.2. Additional Dataset

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI