1. Introduction
Due to limitations in storage space and transmission bandwidth, digital images are usually compressed to remove redundancy [
1]. Generally, compression methods trade image quality for higher data compression rates. However, scientific considerations make image quality degradation unacceptable for high-performance applications. Thus, near-lossless compression methods are usually employed in remote-sensing applications for a trade-off between image quality and compression rates. JPEG-LS is a new generation of image lossless (LS) compression standard developed by the Joint Photographic Expert Group (JPEG) that can perform lossless and near-lossless compression [
2], where NEAR indicates the control of information loss. In order to express this concept more clearly, JPEG-LS in the following description only represents its near-lossless implementation. Due to its low complexity, JPEG-LS has been widely used in remote sensing applications [
3]. Although one image compressed by JPEG-LS may present slighter quality degradation than compression by general lossy compression schemes (e.g., JPEG and BMP), it still causes noticeable banding artifacts in some flat image areas. As shown in
Figure 1, these banding artifacts not only lead to information loss but also result in unpleasant visual feelings, which may severely affect high-performance remote sensing applications. Hence, there is an urgent need for JPEG-LS compressed remote-sensing image restoration.
The research community has proposed a lot of compressed image restoration methods, including model-based methods [
4,
5,
6,
7,
8] and learning-based methods [
9,
10,
11,
12,
13,
14,
15]. Specifically, a fully convolutional neural network (FCN) [
16] has achieved state-of-the-art results in recent years. However, the existing methods all focus on lossy compression schemes and social media images. As far as we know, there is no specialized research work on JPEG-LS compressed remote sensing image restoration.
To this end, we develop this initial line of work for JPEG-LS compressed remote sensing image restoration. Because of the following two core problems, our task is much more difficult than general lossy compressed image restoration: (1) Due to the requirement of low information loss, JPEG-LS compressed images exhibit only slight degradation, manifested by small differences in pixel values from their corresponding references. However, the pixel value of high-bit remote sensing images varies widely. Thus, our task is not only a problem of high-precision restoration but also of bridging small pixel-value gaps from wide numerical ranges. (2) Most remote sensing images cover many flat areas that contain little context information. However, as shown in
Figure 1, JPEG-LS compressed banding artifacts generated from run-length coding [
17] usually occur in such flat areas. Thus, our task may lack context information when removing JPEG-LS banding artifacts.
To deal with the above problems, we propose a novel CNN network called CARNet. Its core idea is a context-aware residual learning mechanism. It has the following three key components: First, we design a scale-invariant baseline to realize high-accuracy pixel-value recovery. Since the pixel value of high-bit remote sensing images may vary widely, directly learning the latent clean image may amplify slight degradation caused by near-lossless compression. Hence, we consider residual mapping learning may be more efficient for our task. Inspired by [
18], we propose a scale-invariant baseline to mine residual features. Scale-invariant means learning in fixed scale space, which can reduce pixel value reconstruction errors caused by image scale changes. Hence, our baseline can provide spatial accuracy that is good at residual learning to enable high-precision recovery for minor degradation. Second, we propose a context-aware subnet to supplement context information. Our scale-invariant baseline may perform well in learning residual mapping, but due to limited receptive fields, it may lack the ability to extract rich context information. Hence, we design a context-aware subnet that focuses on mining context information. It can provide large receptive fields for exploring context information, thus showing promising results in JPEG-LS banding artifact removal. Third, we propose a prior-guided feature fusion mechanism to ease the information flow among the above two stages. Our scale-invariant baseline and context-aware subnet focus on solving the two core problems of JPEG-LS compressed remote sensing image restoration, respectively. It may not work if we directly fuse them and further use the fused features to reconstruct the latent clean image. Hence, we progressively integrate context features into our scale-invariant baseline. This scheme forms a prior-guided reconstruction that provides better features for restoration. Further, we notice that the gradient angle values of banding artifacts are almost
or
. This is a noteworthy local feature of JPEG-LS compressed remote sensing images. By utilizing this JPEG-LS compression prior, we design special loss functions to strengthen the overall supervision in gradient-value space, thus further improving restoration performance.
Alternatively, researchers usually employ two commonly used Reference (R) Image Quality Assessment (IQA) algorithms (i.e., PSNR and SSIM [
19]) for quantitatively evaluating restoration performance. However, these metrics are no longer suitable for our study due to the particularity of JPEG-LS-degraded remote sensing images. Specifically, the above problem is manifested in the following two aspects: (1) Because of the near-lossless character of JPEG-LS compression, SSIM scores of JPEG-LS-degraded images under different compression rates are very close (e.g., numerical differences do not appear until the third decimal place). It may be hard to distinguish the perceptual quality of different JPEG-LS-degraded images through such a tiny numerical difference. (2) R IQA models may become unreliable when the original reference image is degraded [
20]. Due to the particular perspective of the remote sensing image, the evaluation results of PSNR and SSIM in our task may be inconsistent with human judgment. To this end, we propose novel R IQA algorithms, LS-PSNR and LS-SSIM, to provide better quantitative assessment for our research. Specifically, we first design novel R IQA models
in gradient-value space then combine
with pixel-value-space R IQA models
(i.e., PSNR and SSIM). Our LS-PSNR and LS-SSIM may be viewed as a process of conditioning
using
, where the predicted
score serves as prior knowledge. Adopting this conditioning process, our R IQA models greatly expand the difference among predicted scores. Additionally, through utilizing
priors, our R IQA models show promise in yielding predictions of human quality judgments.
Furthermore, we prepare a new dataset of JPEG-LS compressed remote sensing images to supplement existing benchmark data. Experiments show that our method sets the state-of-the-art for JPEG-LS near-lossless compressed remote sensing image restoration. The contributions of this paper are highlighted as follows:
We develop the initial line of work on JPEG-LS near-lossless compressed remote sensing image restoration.
We propose a novel CNN network, called CARNet, to deal with new challenges in this initial line of work. Its core idea is a context-aware residual learning mechanism. Further, we design special loss functions to further improve restoration performance by utilizing JPEG-LS compression priors.
We propose novel R IQA algorithms, called LS-PSNR and LS-SSIM, to provide better assessment results for our research by utilizing special characteristics of JPEG-LS banding artifacts.
We prepare a new dataset of JPEG-LS compressed remote sensing images to supplement existing benchmark data. Experiments show that our method sets the state-of-the-art for JPEG-LS near-lossless compressed remote sensing image restoration.
2. Related Work
Some early works [
21,
22,
23] treat compression artifact removal as a denoising problem by modeling compression artifacts as additive noise. These works only consider the smoothness or regularity in pixel intensities. In their restored images, edges and textures may be smoothed. Other works [
24,
25,
26] treat compression artifact removal as an image inverse problem. These methods further consider the nonstationarity of image content, but they ignore the content-correlated characteristic of the compression noise. Further, due to ill-posedness, prior knowledge is required to regularize the solutions of their methods.
Recently, CNNs have been widely used for low-level image processing problems and have achieved excellent results. CNN-based restoration for compressed images was first introduced by Dong et al. [
9]. However, their small-scale network limits the network receptive field, and the training process of their network converges too slowly. Then, DnCNN [
11] boosts performance on general blind image restoration tasks. Later, a wavelet transform-based network, MWCNN [
27], brings further improvements. In other explorations, a deep convolutional sparse coding network [
28] combines model-based methods with deep CNN. Additionally, a Dual-domain Multi-scale CNN (DMCNN) [
14] is proposed for JPEG compressed image restoration by enlarging the receptive fields on both the pixel and DCT domains. Their model shows promising restoration results, but their network architecture is too redundant. Alternatively, some works [
13,
29] propose a feed-forward fully convolutional residual network. Their model is trained with a generative adversarial framework. However, restoration results produced by such networks are often not vivid, and the training process of a generative adversarial network is usually arduous.
Since the assessment of restoration results consists of quantitative and qualitative evaluation, the most-recent works move towards two genres. On the one hand, some works focus on improving quantitative accuracy. For example, inspired by spatial-wise convolution for shift-invariance, Fan et al. [
30] proposes a “scale-wise convolution” to convolve across multiple scales for scale-invariance. Their network shows that modeling scale-invariance into neural networks in a proper way may bring significant benefits to image restoration performance. On the other hand, some works focus on improving qualitative visual feelings. Ehrlich et al. [
31] proposes QGAC that adopts a quantization table to make a single model able to correct JPEG artifacts at any compression rate. Additionally, Jiang et al. [
32] presents FBCNN that can achieve flexible JPEG image restoration by manual control of compression quality factor. Further, Zamir et al. [
15] proposes a multi-stage architecture that progressively learns restoration functions for the degraded inputs, thereby breaking down the overall recovery process into more manageable steps.
Based on the above research, we develop this initial line of work for JPEG-LS compressed remote sensing image restoration. We refer to the proposed network as CARNet; it can achieve accurate restoration while performing well in banding artifact removal by adopting a context-aware residual learning mechanism.
3. Method
In this section, the proposed method is introduced. The proposed CARNet is introduced first, then followed by the loss function, and last but not least, the novel R IQA algorithm is illustrated.
3.1. CARNet Framework
The framework of the proposed CARNet is shown in
Figure 2. The entire network is an end-to-end system that takes a JPEG-LS near-lossless compressed image
C as input and directly generates the output image
O. The network is fairly straightforward, with each component designed to achieve a specific task. As illustrated, our model contains three components: scale-invariant baseline, context-aware subnet, and prior-guided reconstruction. In order to express the network learning process conveniently, we use
to represent
convolution, and
to indicate PReLu [
33] activation.
3.1.1. Scale-Invariant Baseline
Since a near-lossless compressed image degrades slightly, it only shows small pixel-value differences from its reference, and its restoration requires high accuracy. Generally speaking, the learning process of small residual values may be easier to converge than direct regression of large pixel values, especially for remote sensing images, which are usually 10-bit to 12-bit and present wide numerical ranges. Hence, we propose a scale-invariant baseline to achieve residual learning. As shown in
Figure 2, it takes compressed image
C as input, obtains low-level feature
F, then extracts the basic residual features using five res-blocks, where:
and each res-block’s output can be represented as
:
where
;
. Without any downsampling operation, our baseline extracts residual features from full-res inputs. This scale-invariant network ensures accurate residual mapping and thus can achieve high-accuracy restoration. Further, our network learns small pixel-value differences from a deep network, thus making it easy to meet the problem of vanishing gradients. Based on [
11,
34], the residual mapping also simplifies the convergence process.
3.1.2. Context-Aware Subnet
Remote sensing images usually cover large flat areas that may not present obvious context information. However, most JPEG-LS banding artifacts occur in these flat areas. Thus, due to learning in fixed-scale space, our scale-invariant baseline may lack receptive fields to obtain enough context information for banding artifact removal in such flat areas. It needs additional context information. To this end, we propose the context-aware subnet to mine context information by adopting various techniques to enlarge receptive fields. As shown in
Figure 3, our context-aware subnet is a U-Net-like structure that consists of downsampling convolution, dilated convolution, and pixel-shuffle upsampling convolution. These architectures are all effective designs to enlarge receptive fields, which greatly expands the whole network’s receptive fields. Further, we noticed that image gradient features contain rich contextual information. Thus, rather than only taking the compressed image as input, our context-aware subnet adopts gradient maps as additional input priors. Based on the above schemes, our context-aware subnet has enough receptive fields to effectively mine context information for JPEG-LS banding artifact removal in flat areas.
3.1.3. Prior-Guided Reconstruction
Since context information is a supplement to residual features, we do not directly adopt the fused features to reconstruct the latent clean image but use a prior-guided feature fusion mechanism to propagate context information from our context-aware subnet to later stages. As shown in
Figure 2, once we obtain the context features, we adopt them to guide the residual feature learning in our baseline. In order to give full play to the guiding role of the context prior, we first integrate context features into the basic residual features by a concatenate operation; we then extract context-aware residual features
using another three res-blocks. Finally, the learned context-aware residual features
are fed into three convolutional layers to generate the output image
O. The process can be simply expressed as:
This scheme forms a prior-guided reconstruction, which eases the information flow among stages. Hence, our context-aware subnet can provide context information that is lacking in residual mapping, which helps the whole network to achieve great performance in both accurate restoration and banding artifact removal.
3.2. Loss Function
We design a loss function L that consists of three components, and we minimize it during the network training. It is expressed as:
We use mean squared error (MSE) as the major loss function to supervise the whole network, which can be written as:
where
I is the ground-truth image and
n is the number of pixel points.
Considering the difficulty of learning small pixel-value differences, we add the L1 norm to strengthen the overall supervision in pixel-value space:
Further, we notice that the banding artifact has a strong local gradient feature: the gradient angle values are almost
or
. Thus, based on this prior, we propose the gradient angle loss
to further strengthen the overall supervision in gradient-value space. It is expressed as:
where
and
represent the gradient angle of the ground-truth and output, respectively. Additionally, we adopt a Sobel operator to calculate image gradients.
3.3. R IQA Algorithm
JPEG-LS compression artifacts appear as horizontal bands with distinct image gradient characteristics. (1) As banding artifacts become severe, the mean value of the deviation between horizontal and vertical image gradients increases continuously. (2) Most of the gradient angles shown in banding artifact areas are or . We design two novel R IQA models in gradient-value space by utilizing the above characteristics. Based on the first characteristic, similar to PSNR, we propose G-PSNR, which computes the deviation difference of horizontal and vertical image gradients between C and I. Based on the second characteristic, similar to SSIM, we propose G-SSIM, which computes the structural similarity of image gradient angles between C and I. Our G-PSNR and G-SSIM can assess the degradation severity caused by JPEG-LS banding artifacts but may be limited in evaluating the pixel-value similarity between C and I. Thus, one step further, we combine (G-PSNR and G-SSIM) with (PSNR and SSIM) and propose LS-PSNR and LS-SSIM, which fuses image-similarity evaluation and artifact-severity assessment. The specific calculation process is as follows.
As illustrated in
Figure 4, given an input image
I and its compressed version
C, PSNR and SSIM scores are generated to account for the perceptual quality difference
between
I and
C in pixel-value space. Then, a gradient component predicts the horizontal gradient
and vertical gradient
of
I and
C, respectively, followed by the calculation of gradient angle
A. Later, G-PSNR and G-SSIM scores are generated to account for the perceptual quality difference
between
I and
C in gradient-value space. The above calculation processes are expressed as follows:
where
and
represent gradient angles of
I and
C, respectively,
indicates the SSIM function that is used to predict SSIM scores,
represents the difference in the mean deviation of the horizontal and vertical gradient between
I and
C, and
R indicates the range of mean deviations. The mean deviation is computed through the window mean method. The above calculation processes can be illustrated as:
where
,
,
, and
indicate the window means of
,
,
, and
, respectively. The above window size is set as
. Last but not least, LS-PSNR and LS-SSIM calculation combines
with
, which can be expressed as:
where
indicates the PSNR function that is used to predict PSNR scores. In our study, we set
,
. The formal computation program of LS-PSNR is shown in Algorithm 1.
Algorithm 1 LS-PSNR |
Input: original image I, compressed image C, parameters: Output: LS-PSNR - 1:
compute PSNR between I and C - 2:
compute , , , and using Sobel operator - 3:
compute , , , and using Average Filter, window size set to - 4:
compute R using Equation ( 12) - 5:
if then - 6:
raise Value Error: “R must ≥ 0.” - 7:
end if - 8:
compute using Equations ( 10) and ( 11) - 9:
compute G-PSNR using Equation ( 9) - 10:
compute LS-PSNR using Equation ( 13) - 11:
return LS-PSNR
|
The proposed R-IQA models have several merits. They may be viewed as a process of conditioning on , where the predicted score serves as “prior” knowledge of JPEG-LS banding artifacts. Hence, the predicted scores of our R IQA models show promise in estimating the severity of JPEG-LS banding artifacts, which could provide a better evaluation of our restoration performance.
3.4. Dataset
We have collected a large dataset with 10-bit and 12-bit panchromatic remote sensing images, which have a resolution from 5353 × 17,144 to 16,296 × 16,968. In each data type, we randomly select some images as test data and adopt the remaining as training data. To train our model to adapt to different compression ratios, we prepare the corresponding degraded images using the JPEG-LS compression method at different NEAR value settings (i.e., 8, 12, and 16). Further, due to the limitation of computing resources, we crop high-res remote sensing images to a uniform size of . After the above processing, we collect a large remote sensing image dataset consisting of 51,966 10-bit image patches and 14,715 12-bit image patches, each containing three types of JPEG-LS compressed patches at different NEAR values.
Alternatively, in order to evaluate the performance of our R IQA models for indicating the similarity between one remote sensing image and its JPEG-LS compressed one, we prepared a manually labeled dataset consisting of 200 image pairs. Each image pair contains a 12-bit remote-sensing image and its corresponding LPEG-LS-degraded one (NEAR = 16). Additionally, each data pair is marked by a Mean Opinion Score (MOS) that indicates the images’ similarity. Our MOS result is an arithmetic average of three experts’ scores.
Figure 5 presents the distribution of our MOS-labeled dataset. Additionally,
Figure 6 shows visual examples of different MOS scores.
5. Conclusions
In this paper, we propose a novel CNN model, CARNet, to explore the restoration of JPEG-LS compressed remote sensing images. It shows promise in solving the challenging problems in our study through a context-aware residual learning mechanism. Specifically, it achieves high-accuracy restoration by adopting a scale-invariant baseline to learn residual mapping. It performs well in JPEG-LS banding artifact removal by using a context-aware subnet to enlarge receptive fields. Additionally, it eases the information flow among stages by utilizing a prior-guided feature fusion mechanism. Alternatively, we propose novel R IQA models, LS-PSNR and LS-SSIM, to provide better evaluation results for our study. By adopting the characteristics of JPEG-LS banding artifacts as priors, our R IQA models can excellently yield predictions of human quality judgments and effectively distinguish tiny similarity differences among JPEG-LS-degraded images. Further, we prepare a new dataset of JPEG-LS compressed remote sensing images to supplement existing benchmark data. The evaluation results indicate that our work is the current state-of-the-art among all CNN-based methods. However, our method requires training a new model for each compression ratio, which is very time-consuming and computationally intensive. Hence, our next work will focus on designing a framework that can accommodate a wide range of JPEG-LS compression ratios.