Dual Image Deblurring Using Deep Image Prior

Shin, Chang Jong; Lee, Tae Bok; Heo, Yong Seok

doi:10.3390/electronics10172045

Open AccessArticle

Dual Image Deblurring Using Deep Image Prior

by

Chang Jong Shin

¹,

Tae Bok Lee

¹

and

Yong Seok Heo

^1,2,*

¹

Department of Artificial Intelligence, Ajou University, Suwon 16499, Korea

²

Department of Electrical and Computer Engineering, Ajou University, Suwon 16499, Korea

^*

Author to whom correspondence should be addressed.

Electronics 2021, 10(17), 2045; https://doi.org/10.3390/electronics10172045

Submission received: 16 July 2021 / Revised: 16 August 2021 / Accepted: 18 August 2021 / Published: 24 August 2021

(This article belongs to the Special Issue Research on Load Distribution Techniques at the Software Level in Mobile Embedded Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Blind image deblurring, one of the main problems in image restoration, is a challenging, ill-posed problem. Hence, it is important to design a prior to solve it. Recently, deep image prior (DIP) has shown that convolutional neural networks (CNNs) can be a powerful prior for a single natural image. Previous DIP-based deblurring methods exploited CNNs as a prior when solving the blind deburring problem and performed remarkably well. However, these methods do not completely utilize the given multiple blurry images, and have limitations of performance for severely blurred images. This is because their architectures are strictly designed to utilize a single image. In this paper, we propose a method called DualDeblur, which uses dual blurry images to generate a single sharp image. DualDeblur jointly utilizes the complementary information of multiple blurry images to capture image statistics for a single sharp image. Additionally, we propose an adaptive

L_{2}_

SSIM loss that enhances both pixel accuracy and structural properties. Extensive experiments show the superior performance of our method to previous methods in both qualitative and quantitative evaluations.

Keywords:

deep learning; deep image prior; deblurring; blur kernel estimation

1. Introduction

Motion blur is a common artifact caused by the relative motion between the camera and the scene during exposure. In practice, when we obtain images from cameras equipped in the mobile embedded systems, the images are often blurred because they are usually captured with hand-held cameras. The unwanted blur artifacts not only degrade the image quality but also result in the loss of important information in the image. Consequently, blurry images deteriorate the performance of various computer vision tasks, such as image classification [1,2,3], object detection [4,5,6], and segmentation [7,8,9]. Accordingly, numerous image deblurring studies have been actively proposed to remove blur artifacts and restore sharp images.

Given a blurry image y, the blur process is typically modeled as a convolution operation of a latent sharp image x and a blur kernel k as follows:

y = k \otimes x + n,

(1)

where ⊗ denotes the convolution operator and n is the noise. The goal of blind image deblurring is to estimate the sharp image and the blur kernel simultaneously when the blur kernel is unknown. This is a classical ill-posed problem because x and k can have multiple solutions. Owing to the ill-posed nature of the problem, conventional deblurring studies constrain the solution space by leveraging various priors and regularizers.

Recently, extensive studies [10,11,12,13,14] based on deep learning (DL) have been performed on image deblurring. Most of them employ deep convolutional neural networks (CNNs) and trained them on a large-scale dataset of blurry/sharp image pairs [15]. CNNs implicitly learn more general priors by capturing the natural image statistics from a large number of blurry/sharp image pairs. DL-based methods have provided superior results. However, collecting such a large dataset is difficult and expensive [16]. In contrast to DL-based data-driven approaches, Ulyanov et al. [17] proposed a deep image prior (DIP), which is based on self-supervised learning, and showed that a CNN can capture the low-level statistics of a single natural image. Their method performed remarkably well in low-level vision tasks, such as denoising, super-resolution, and inpainting. Inspired by this, Ren et al. [18] suggested the SelfDeblur framework to solve the single image blind deblurring problem. Given a single blurry image, the SelfDeblur estimates the latent sharp image and the blur kernel simultaneously by jointly optimizing the image generator network and kernel estimator network. However, the SelfDeblur cannot perform deblurring in the case of multiple blurry images. This is because the architecture of the SelfDeblur is strictly designed to leverage only the internal statistics of a single blurry image. Although using multiple observations for image deblurring is beneficial [19,20], most of self-supervised learning approaches do not completely leverage the internal information of given multiple images.

We propose a method called DualDeblur that aims to restore a single sharp image from two given blurry observations. In many practical scenarios, we can capture multiple images of the same physical scene. At this time, we obtain multiple blurry images under various conditions through multiple captures. For example, let us consider two blurry images shown in Figure 1b,c. They share the same latent sharp image as shown in Figure 1a. Thus, the sharp images restored from Figure 1b,c should be the same. Hence, we can further constrain the solution space. Specifically, our DualDeblur comprises a single image generator and two blur kernel estimators. The image generator aims to estimate a sharp image, which is latent in two blurry images. Each blur kernel estimator estimates the blur kernel for each blurry image. Thereafter, we jointly optimize the image generator and blur kernel estimators by comparing the reblurred images and given blurry images. Here, the reblurred images are generated by the blur process of the predicted image and the estimated blur kernels. Through this joint optimization process, our image generator learns a strong prior for a single sharp image by using the complementary information of multiple images.

In addition, we propose an adaptive

L_{2}_

SSIM loss to enhance both pixel-wise accuracy and structure details. Most DIP-based methods use the

L_{2}

loss to minimize the difference in pixel values between the target image and restored image. In our task, simply using the

L_{2}

loss may deteriorate the restoration performance because the target image is blurry. Thus, the

L_{2}

loss is insufficient to restore the detailed textures. Hence, many restoration methods involve replacing the

L_{2}

loss with structural properties loss, such as the SSIM loss [9], MS-SSIM loss [22], and FSIM loss [23]. However, using only the SSIM loss has several limitations. SSIM does not consider pixel-wise accuracy. Therefore, comparing corrupted structures may lead to unexpected resulting images. To tackle this, our adaptive

L_{2}_

SSIM loss adjusts the weight for each training step through a weighted sum that considers the characteristics of

L_{2}

and SSIM. At the beginning of training, most of the weight is focused on

L_{2}

, which is decreased exponentially, according to the iterations. Hence, pixel-wise accuracy is ensured by focusing on

L_{2}

in the early stages of training. Increasing the pixel-wise accuracy at an early stage of training can prevent unexpected structures in the resulting images. In the remaining stages of training, we exponentially increase the weight of the SSIM loss to preserve the structural properties. Through this process, our reconstruction loss ensures both pixel-wise accuracy and structural properties.

Figure 1 shows the effectiveness of our method. Generally, large blurs often occur when the images are taken from cameras with fast movement in the night environments (see Figure 1b,c). In this case, previous classical methods often fail to restore the sharp images, as shown in Figure 1d,e. This is because the priors utilized in the methods are subjective and cannot accurately capture the intrinsic distribution of natural images and blur kernels [24]. As shown in Figure 1f,g, SelfDeblur [18] also fails to estimate the kernel for severely blurred images and does not appropriately deblur images. However, the proposed DualDeblur successfully estimates two blur kernels using two severely blurred images and generates a superior resulting image with many textures. Our experiments show that DualDeblur performs better than other comparative methods, both quantitatively and qualitatively.

The following are the main contributions of this study:

We propose a DIP-based deblurring method called DualDeblur using two blurry images of the same scene. Multiple images are used to jointly optimize complementary information.
We propose an adaptive $L_{2}_$ SSIM loss that adjusts the weights of both $L_{2}$ and SSIM for each optimization step. From this, we ensure both pixel-wise accuracy and structural properties in the deblurred image.
The experimental results show that our method is quantitatively and qualitatively superior to previous methods.

2. Related Works

In this section, we briefly introduce the existing image deblurring methods based on optimization and DL [25].

2.1. Optimization-Based Image Deblurring

Image deblurring, one of the classical inverse problems, aims to restore a sharp latent image from a given blurry image. Owing to the ill-posed nature of the deblurring problem, most traditional methods have been proposed to constrain the solution space by using various priors or regularizers, such as TV regularizations [26,27], gradient priors [21], sparsity priors [28], gradient sparsity priors [29], Gaussian scale mixture priors [30], hyper-Laplacian priors [31],

ℓ_{1}

/

ℓ_{2}

-norms [32], variational Bayes approximations [33,34],

ℓ_{0}

-norms [35,36], patch-based statistical priors [37,38], adaptive sparse priors [19], and dark channel priors [39]. By taking advantage of those priors, the traditional methods jointly estimated the sharp image and blur kernel from the blurry image. However, most of these methods heavily rely on the accurate selection of regularizers or priors. Furthermore, when the blur kernel is large and complex, their methods often fail to restore the sharp image.

2.2. DL-Based Image Deblurring

Recently, DL [25]-based methods were widely developed to solve the image deblurring problem. Early DL-based deblurring methods [40,41] focused only on estimating blur kernels using DL. Sun et al. [40] proposed to predict the probabilistic distribution of motion blur at the patch level, using a CNN. Chakrabarti et al. [41] presented a CNN to predict the complex Fourier coefficients of a deconvolution filter to be applied to the input patch for restoration. Unlike traditional approaches of using CNNs as a kernel estimation process, Nah et al. [10] proposed to directly predict the deblurred output without an additional kernel estimation process by using multi-scale CNNs. Motivated by the multi-scale approach, Tao et al. [12] proposed to reduce the memory size using a long short-term memory (LSTM)-based scale-recurrent network. Zhang et al. [14] proposed a multi-level CNN that uses a multi-patch hierarchy as input to exploit a multi-patch localized-to-coarse approach. Ulyanov et al. [17] suggested DIP, showing that CNNs can work satisfactorily as priors for a single image. However, there is a limitation to capturing the characteristics of the blur kernel, because the DIP network consists of CNNs that contain only image statistics [18]. To tackle this, Ren et al. [18] suggested the SelfDeblur to solve the blind deblurring problem. SelfDeblur [18] adopted a CNN to capture image statistics. To overcome the aforementioned drawback of DIP, they employed a fully connected network (FCN) to model the prior of the blur kernel. Although SelfDeblur [18] effectively solves the blind deblurring problem, its structure can only handle a single image and cannot appropriately utilize multiple images. In contrast to SelfDelbur, our DualDeblur is designed with a structure that can utilize multiple images that share a single sharp image.

3. Proposed Method

In this section, we describe the blur process for two blurry images and the proposed DualDeblur framework, using two blurry images. Additionally, we introduce an adaptive

L_{2}_

SSIM loss that considers both pixel-wise accuracy and perceptual properties. Subsequently, we summarize the optimization process of the proposed method.

3.1. DualDeblur

Given two blurry observations

y_{1}

and

y_{2}

, the blur process can be formulated as follows:

y_{1} = k_{1} \otimes x + n_{1}, y_{2} = k_{2} \otimes x + n_{2},

(2)

where x denotes a latent sharp image, and

k_{1}

and

k_{2}

represent two blur kernels corresponding to each blurry observation, respectively. Our DualDeblur predicts a single sharp image x using two blurry images,

y_{1}

and

y_{2}

. As depicted in Figure 2, DualDeblur consists of an image generator

f_{θ_{x}} (\cdot)

and blur kernel estimators

f_{θ_{k 1}} (\cdot)

and

f_{θ_{k 2}} (\cdot)

. Table 1 presents the detailed architecture of our image generator

f_{θ_{x}} (\cdot)

. The image generator

f_{θ_{x}} (\cdot)

is learned as a network

\hat{x} = f_{θ_{x}} (z_{x})

mapping the uniform distribution

z_{x}

to an image

\hat{x}

. Table 2 shows our kernel estimators

f_{θ_{k 1}} (\cdot)

and

f_{θ_{k 2}} (\cdot)

. The blur kernel estimator

f_{θ_{k 1}} (\cdot)

is learned as a network

{\hat{k}}_{1} = f_{θ_{k 1}} (z_{k 1})

mapping the uniform distribution 1-D vector

z_{k 1}

to a 2-D reshaped blur kernel

{\hat{k}}_{1}

. Similarly, the blur kernel estimator

f_{θ_{k 2}} (\cdot)

is learned as a network

{\hat{k}}_{2} = f_{θ_{k 2}} (z_{k 2})

mapping the uniform distribution 1-D vector

z_{k 2}

to a 2-D reshaped blur kernel

{\hat{k}}_{2}

. Networks

f_{θ_{k 1}} (\cdot)

and

f_{θ_{k 2}} (\cdot)

are dual architectures designed for two blurry images.

{\hat{k}}_{1}

and

{\hat{k}}_{2}

are the estimated blur kernels corresponding to

y_{1}

and

y_{2}

, respectively. DualDeblur jointly optimizes

f_{θ_{x}} (\cdot)

,

f_{θ_{k 1}} (\cdot)

, and

f_{θ_{k 2}} (\cdot)

by comparing

y_{1}

and

{\hat{x}}_{1} \otimes {\hat{k}}_{1}

, as well as

y_{2}

and

{\hat{x}}_{2} \otimes {\hat{k}}_{2}

through the proposed loss function, as explained in the following.

3.2. Adaptive $L_{2}_$ SSIM Loss

In this sub-section, we propose an adaptive

L_{2}_

SSIM loss to enhance both pixel-wise accuracy and perceptual properties. We adjust the weights of each training step with a weighted sum that considers the properties of

L_{2}

and SSIM. First, we introduce the

L_{2}

and SSIM losses.

When solving the restoration problem, the

L_{2}

loss is usually used and is formulated as follows:

L_{2} = \sum_{i = 1}^{2} {∥{\hat{k}}_{i} \otimes \hat{x} - y_{i}∥}^{2},

(3)

where i denotes the i-th observation, and

L_{2}

increases the pixel-wise accuracy by minimizing the pixel values between the target image and the restored image. However, in the case of

L_{2}

, the output image tends to be blurry and lacks high-frequency textures [42,43]. In our case, using only

L_{2}

is even worse because both y and

k \otimes x

are blurry images. To overcome the limitation, the SSIM loss, which preserves perceptual features is also used. SSIM captures the luminance, contrast, and structure of an image [9]. Here,

L_{S S I M}

is formulated as follows:

L_{S S I M} = \sum_{i = 1}^{2} (1 - S S I M ({\hat{k}}_{i} \otimes \hat{x}, y_{i})),

(4)

However, because the SSIM loss does not consider pixel-wise accuracy, collapsed structures in the blurry observations may lead to an unexpected structure in the resulting image. Therefore, we propose an adaptive

L_{2}_

SSIM loss to preserve the strengths of each loss and compensate for the weaknesses of each loss. The proposed adaptive

L_{2}_

SSIM loss (

L_{L_{2}_S S I M}

) is formulated as follows:

\begin{matrix} L_{L_{2}_S S I M} (t) = ω (t) α L_{2} + (1 - ω (t)) L_{S S I M}, \\ ω (t) = exp (- \frac{t}{γ}), \end{matrix}

(5)

where

ω (t)

denotes a weighting function that adjusts the weights of the

L_{2}

and SSIM losses according to each tth step, and

α

represents a parameter that adjusts the scale of the

L_{2}

loss.

γ

denotes a parameter that adjusts the range of the steps affected by the

L_{2}

loss. At the beginning of the step, the weights of the

L_{2}

loss account for most of the total weights to focus on pixel-wise accuracy so that it does not result in unexpected structures. Hence, we reduce the weights of the

L_{2}

loss and increase those of the

L_{S S I M}

loss to preserve the structure content of the image. As a result, our reconstruction loss not only increases the pixel-wise accuracy, but also preserves the structural details of the image. The effectiveness of the proposed reconstruction loss was demonstrated in an ablation study in Section 4.5.

The final optimization process of DualDeblur is summarized in Algorithm 1. Here, T denotes the total training iteration, and

θ_{k 1}

,

θ_{k 2}

and

θ_{x}

represent network parameters corresponding to

f_{θ_{k 1}} (\cdot)

,

f_{θ_{k 2}} (\cdot)

and

f_{θ_{x}} (\cdot)

, respectively. DualDeblur estimates a restored image and two blur kernels. Thereafter, it generates two reblurred images using a convolution operation and compares them with

y_{1}

and

y_{2}

, respectively, through the

L_{L_{2}_S S I M}

loss in Equation (5). By optimizing all the networks simultaneously, the image generator

f_{θ_{x}} (\cdot)

jointly utilizes the complementary information of the two blurry images. Finally, we obtain the restored image and blur kernels from T iterations.

Algorithm 1 DualDeblur optimization process

Input: blurry images

y_{1}, y_{2}

and T iterations

Output: restored image

\hat{x}

, estimated blur kernels

{\hat{k}}_{1}

and

{\hat{k}}_{2}

1: Sample $z_{x}$ , $z_{k 1}$ , and $z_{k 2}$ from uniform distribution
2: for $t = 1$ to T do
3: perturb $z_{x}$
4: $\hat{x} = f_{θ_{x}}^{t - 1} (z_{x})$
5: ${\hat{k}}_{1} = f_{θ_{k 1}}^{t - 1} (z_{k 1})$
6: ${\hat{k}}_{2} = f_{θ_{k 2}}^{t - 1} (z_{k 2})$
7: Compute the gradients of $θ_{x}^{t - 1}, θ_{k 1}^{t - 1}$ and $θ_{k 2}^{t - 1}$ w.r.t. $L_{L_{2}_S S I M} (t)$
8: Update $θ_{x}^{t}, θ_{k 1}^{t}$ and $θ_{k 2}^{t}$ using the ADAM [44]
9: end for
10: $\hat{x} = f_{θ_{x}}^{T} (z_{x}),$ ${\hat{k}}_{1} = f_{θ_{k 1}}^{T} (z_{k 1})$ and ${\hat{k}}_{2} = f_{θ_{k 2}}^{T} (z_{k 2})$

4. Experimental Results

4.1. Dataset

To evaluate the performance of our method, we used two image deblurring benchmark datasets: the Levin test set [33] and the Lai test set [45]. The proposed method solves the deblurring problem by using two observations. In this case, there are two possible scenarios. First, two observations are degraded by a similar degree of blur artifacts (soft pairs). Second, the degrees of blur artifacts are very different from each other (hard pairs). To simulate these cases, we divided each test set into soft and hard pairs and used them for evaluation. The two test sets are discussed in the following.

1: Levin test set [33]: In their seminal work, Levin et al. [33] provided 8 blur kernels with size of $k \times k$ , where $k = 13, 15, 17, 19, 21, 23, 27$ and 4 sharp images, resulting in 32 blurry gray-scale images with size of $255 \times 255$ . To evaluate our method, we divided the soft and hard pairs on the basis of difference in blur kernel size. If the difference was less than 5 pixels, we classified such an image pair as a soft pair, and vice versa as a hard pair. Following this pipeline, we randomly selected 7 soft pairs and 7 hard pairs, totaling to 14 blurry pairs per image. In short, we prepared a total of 56 pairs of blurry images for evaluation. The composition of the Levin test set [33] is described in detail in Table 3. Specifically, the soft pairs comprised $[13, 15], [15, 17], [17, 19], [19, 21], [21, 23 a], [21, 23 a]$ , and $[23 a, 23 b]$ . Here, each number represents the blur kernel size of k. For example, [11, 13] means that the blur kernel sizes 13 × 13 and 15 × 15 are paired. Because the Levin test set contains two blur kernels with a size of 23 × 23, we denote each as $23 a$ and $23 b$ . The hard pairs contained $[13, 27], [15, 27], [17, 27], [19, 27], [21, 27], [23 a, 27]$ , and $[23 b, 27]$ .
2: Lai test set [45]: We further compared our method using the Lai test set [45], which contains RGB images of various sizes. The Lai test set comprises 4 blur kernels and 25 sharp images, resulting in 100 blurry images. It is divided into five categories: $M a n m a d e, N a t u r a l, P e o p l e, S a t u r a t e d,$ and $T e x t$ , with 20 images for each category. The sizes of the 4 blur kernels are $31 \times 31, 51 \times 51, 55 \times 55,$ and $75 \times 75$ . Thus, we prepared a soft pair (i.e., $[51, 55]$ ), and 4 hard pairs (i.e., $[31, 51], [31, 75], [51, 75]$ , and $[55, 75]$ ). As described in Table 3, there are 25 sharp images and 5 blur kernel pairs; a total of 125 pairs of blur images are used for evaluation.

4.2. Implementation Details

We implemented our DualDeblur using Pytorch [46]. The networks were optimized using Adam [44] with a learning rate of

1 \times 10^{- 2}

,

β_{1}

= 0.9, and

β_{2}

= 0.999. In our experiments, the total number of iterations was 5000, and the learning rate was decayed by multiplying by 0.5 for every 2000, 3000, and 4000 iterations. We empirically set values of

α

and

γ

in Equation (5) as

α = 10

and

γ = 100

. Following [17,18], we sampled the initial

z_{x}

,

z_{k 1}

and

z_{k 2}

from the uniform distribution with a fixed random seed 0. Notably, all the experiments of our model were conducted using a single NVIDIA TITAN-RTX GPU.

4.3. Comparison on the Levin Test Set

For the Levin test set [33], we compared our DualDeblur with the existing blind deconvolution methods (i.e., Krishnan et al. [32], Levin et al. [33], Cho&Lee [30], Xu&Jia [21], Sun et al. [37], Zuo et al. [29], and Pan-DCP [39]), and a DIP-based deblurring method (i.e., SelfDeblur [18]). Ref. [34] was used as the deconvolution to generate the final results of the previous methods. For quantitative comparison, we calculated the PSNR and SSIM [9] metrics using the codes provided by [18]. Moreover, we reported FSIM [23] and LPIPS [43] distance to evaluate the perceptual similarity. We also compared the error ratio [34], which was formulated by the sum of squared differences between deconvolution with the estimated kernels and deconvolution with the ground truth kernels.

We computed the average PSNR, SSIM, error ratio, FSIM and LPIPS on the Levin test set for various methods (see Table 4). For a fair comparison, we reported the results for the soft and hard pairs that contained each kernel.

With the advantage of using multiple images, the results of our method were significantly superior to those of the previous methods in terms of all the metrics. Specifically, our results showed that the PSNR was 8.00 higher than the second-highest SelfDeblur [18], that the SSIM was 0.0542 higher than the second-highest Zuo et al. [29], and that the FSIM was 0.0378 higher than the second-highest Sun et al. [37]. Our method also showed superior performance at the LPIPS distance compared to the other methods. Note that our method performed remarkably well regardless of the difference in blur kernel size between the two given images. Our experimental results show that average results of the hard pairs are slightly better than those of the soft pairs. We believe that this is because the complementary information between the two images is important for deblurring, and the hard pairs often include more complementary information than the soft pairs. In Figure 3, we compare the previous methods with the soft and hard pairs of our method. The results of the previous methods are the results for input 1 in Figure 3. In Figure 3, ours {1,2} is the soft pair result of input 1 and input 2, and ours {1,3} is the hard pair result of input 1 and input 3. Our method outperforms other methods in restoring sharp edges and fine details in both soft and hard pairs. The blur kernel estimated using the DualDeblur method is considerably closer to the ground truth.

As shown in Table 5, we measured the inference time and the number of model parameters of our method and SelfDeblur [18]. We measured the average inference time for a single image using the Levin test set [33]. The inference time of our model and the SelfDeblur [18] were measured on a PC with an NVIDIA TITAN-RTX GPU, while other methods were measured a PC with 3.30 GHz Intel(R) Xeon(R) CPU as reported in [18]. Our model has a longer inference time and more parameters than SelfDeblur [18]. This is because our model optimizes three networks, whereas SelfDeblur [18] optimizes two networks.

4.4. Comparison on Lai Test Set

For the Lai test set [45], our method was compared with those of Cho and Lee [30], Xu and Jia [21], Xu et al. [35], Michael et al. [38], Perrone et al. [27], Pan-DCP [39], and SelfDeblur [18]. In previous methods, after blur kernel estimation, ref. [47] was applied to the

S a t u r a t e d

category as deconvolution, and ref. [31] to the other categories. In Table 6, our DualDeblur results achieved better quantitative metrics, compared with the previous methods. Our average results for the Lai test set [45] were 7.72 higher for PSNR and 0.2136 higher for SSIM compared with the 2nd highest SelfDelbur [18]. The results of LPIPS showed that our method can restore more perceptually high-quality images, compared to other methods. Additionally, our method performed superior for all blur kernels. This shows that the proposed DualDeblur method performed excellently for large and diverse images. Both our soft and hard pairs outperformed the results of the previous methods.

In Figure 4 and Figure 5, through a qualitative comparison, it can be seen that our DualDeblur is visually superior to the previous methods. The kernel estimated by our DualDeblur is highly accurate compared with the other methods. Although other methods suffer from blur or ringing artifacts, our results are perceivably superior with rich texture (see Figure 4 details). Additionally, Figure 5 shows the high-quality details of our result; clearly, only the result of our method accurately reconstructs the stripes of the tie.

In Figure 6, our method shows superior results when using two blurry images that cannot be deblurred by the previous methods. Conversely, our method performs deblurring by jointly using two blurred images that are severely damaged and contain little information. In the 3rd line of Figure 6, SelfDeblur [18] fails to estimate the blur kernels in both input 1 and input 2, whereas our method is superior in estimating the blur kernels and the final image.

4.5. Ablation Study

To investigate the effectiveness of the proposed dual architecture and adaptive

L_{2}_

SSIM loss, we conducted ablation studies. After equalizing the loss, we compared the dual architecture (called DualDeblur-A) with [18] to investigate the effect of the dual architecture. Furthermore, we demonstrated the effectiveness of our adaptive

L_{2}_

SSIM loss by comparing models optimized using

L_{L_{2}_S S I M}

and only

L_{2}

or

L_{S S I M}

. Models DualDeblur-B and DualDeblur-C have the same architecture as DualDeblur-A; however, DualDeblur-B uses only

L_{2}

in Equation (3) and DualDeblur-C uses only

L_{S S I M}

in Equation (4) for optimization. Finally, we define DualDeblur, using the proposed

L_{L_{2}_S S I M}

in Equation (5). The quantitative and qualitative comparisons are shown in Table 7 and Figure 7, respectively.

4.5.1. Effects of Dual Architecture

Unlike SelfDeblur [18], which performs deblurring with a single observation, our method leverages multiple observations via a dual architecture. In our experiments, DualDeblur-A using a dual architecture significantly improved the deblurring performance, compared to SelfDeblur (see (a) and (b) in Table 7). The PSNR and SSIM results of DualDeblur-A increased by 2.68 and 0.0098, respectively, compared to those of SelfDeblur. For FSIM and LPIPS, the results of DualDeblur-A are also better than those of SelfDeblur by 0.738 and 0.0334, respectively. This indicates that using multiple images is more helpful for deblurring than using a single image. This also shows that the proposed method is effective in handling multiple images during the deblurring procedure. The results of DualDeblur-A and DualDeblur-B (see Table 7) show that the performance of DualDeblur-B without TV regularization is similar to that of DualDeblur-A. These results show that the dual architecture works well without an additional regularizer.

4.5.2. Effects of Adaptive $L_{2}$ _SSIM Loss

The proposed adaptive

L_{2}_

SSIM loss, formulated as the weighted sum of

L_{2}

and

L_{S S I M}

, focuses on restoring the intensity values per pixel first and then gradually restoring the structure later. By using the proposed adaptive

L_{2}_

SSIM loss, we aim to exploit the advantages of

L_{2}

and

L_{S S I M}

loss functions and complement their limitations. To demonstrate the effectiveness of the adaptive

L_{2}_

SSIM loss, we compare the performances of DualDeblur optimized with various loss functions (1) DualDeblur-B using the

L_{2}

loss, (2) DualDeblur-C using the

L_{S S I M}

loss, and (3) DualDeblur using the

L_{L_{2}_S S I M}

loss.

When optimizing our model using only the

L_{2}

loss, the quantitative results are the worst in PSNR and SSIM (see Table 7). As shown in Figure 7, the results of our method using only the

L_{2}

loss are overly smooth and fail to restore the details. To overcome this, we employed the structural loss (

L_{S S I M}

) in our method to enhance the perceptual quality and structural details in local regions [48]. Figure 7 also shows that using

L_{S S I M}

helps restore details of the image rather than using only the

L_{2}

loss. However,

L_{S S I M}

does not restore the accurate pixel intensity. Additionally, corrupted structures in blurry observations may lead to unexpected structures in the resulting images.

However, in Figure 7 the results of our adaptive

L_{2}_

SSIM loss

L_{L_{2}_S S I M}

demonstrate not only effectiveness in restoring accurate pixel values, but also in restoring the details and sharp edges of the image. As shown in Table 7, DualDeblur achieves the best in most metrics including PSNR, SSIM, and LPIPS except FSIM. Specifically, the results of DualDeblur show that the average PSNR increases by 5.26 and 1.78, compared with those of DualDeblur-B and DualDeblur-C, respectively. In addition, the results of DualDeblur show that the average SSIM is 0.0212 higher than the second-highest DualDeblur-C, that the average FSIM is 0.0197 lower than the highest DaulDeblur-A, and that the average LPIPS is 0.0287 better than the second-best DualDeblur-A. Figure 8a demonstrates the effectiveness of our adaptive

L_{2}_

SSIM loss. The proposed adaptive

L_{2}_

SSIM loss outperforms all other losses in every iteration. Figure 8b shows the change of

ω (t)

in Equation (5), which is the weight of the adaptive

L_{2}_

SSIM loss following the training iterations. As mentioned earlier, the

L_{2}

is more weighted than

L_{S S I M}

in the initial iteration step, and the weight of

L_{S S I M}

increases exponentially.

As shown in Table 8, we conduct various experiments on the

α

and

γ

of Equation (5). The results show that the model with

α = 10

and

γ = 100

gives the best results for both PSNR and SSIM, whereas the model with

α = 50

and

γ = 200

is the best for FSIM and LPIPS. We select the model with

α = 10

and

γ = 100

because PSNR and SSIM are the most commonly used metrics.

5. Conclusions

In this paper, we proposed a DualDeblur framework to restore a single sharp image using multiple blurry images. Our framework adopted a dual architecture to utilize the complementary information of two blurry images for obtaining a single sharp image. We proposed an adaptive

L_{2}_

SSIM loss to ensure both pixel accuracy and structural details. For practical and accurate performance evaluation of our results, we divided the blur pairs into soft and hard pairs. Extensive comparisons demonstrated the superior results of our DualDeblur, compared to those of previous methods in both quantitative and qualitative evaluations.

Author Contributions

Conceptualization, C.J.S., T.B.L. and Y.S.H.; software, C.J.S.; validation, C.J.S.; investigation, C.J.S. and T.B.L.; writing—original draft preparation, C.J.S.; writing—review and editing, C.J.S., T.B.L. and Y.S.H.; supervision, Y.S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1C1C1007446), and in part by the BK21 FOUR program of the National Research Foundation of Korea funded by the Ministry of Education (NRF5199991014091).

Conflicts of Interest

The authors declare no conflict of interest.

References

Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–16 December 2015; pp. 1440–1448. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nah, S.; Hyun Kim, T.; Mu Lee, K. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3883–3891. [Google Scholar]
Su, S.; Delbracio, M.; Wang, J.; Sapiro, G.; Heidrich, W.; Wang, O. Deep video deblurring for hand-held cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1279–1288. [Google Scholar]
Tao, X.; Gao, H.; Shen, X.; Wang, J.; Jia, J. Scale-recurrent network for deep image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8174–8182. [Google Scholar]
Zhang, J.; Pan, J.; Ren, J.; Song, Y.; Bao, L.; Lau, R.W.; Yang, M.H. Dynamic scene deblurring using spatially variant recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2521–2529. [Google Scholar]
Zhang, H.; Dai, Y.; Li, H.; Koniusz, P. Deep stacked hierarchical multi-patch network for image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5978–5986. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Multi-stage progressive image restoration. arXiv 2021, arXiv:2102.02808. [Google Scholar]
Quan, Y.; Chen, M.; Pang, T.; Ji, H. Self2self with dropout: Learning self-supervised denoising from single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1890–1898. [Google Scholar]
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9446–9454. [Google Scholar]
Ren, D.; Zhang, K.; Wang, Q.; Hu, Q.; Zuo, W. Neural blind deconvolution using deep priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3341–3350. [Google Scholar]
Zhang, H.; Wipf, D.; Zhang, Y. Multi-image blind deblurring using a coupled adaptive sparse prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–27 June 2013; pp. 1051–1058. [Google Scholar]
Rav-Acha, A.; Peleg, S. Two motion-blurred images are better than one. Pattern Recognit. Lett. 2005, 26, 311–317. [Google Scholar] [CrossRef]
Xu, L.; Jia, J. Two-phase kernel estimation for robust motion deblurring. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2010; pp. 157–170. [Google Scholar]
Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; 2003; Volume 2, pp. 1398–1402. [Google Scholar]
Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, H.; Yue, Z.; Zhao, Q.; Meng, D. A Deep Variational Bayesian Framework for Blind Image Deblurring. arXiv 2021, arXiv:2106.02884. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Chan, T.F.; Wong, C.K. Total variation blind deconvolution. IEEE Trans. Image Process. 1998, 7, 370–375. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Perrone, D.; Favaro, P. Total variation blind deconvolution: The devil is in the details. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2909–2916. [Google Scholar]
Fergus, R.; Singh, B.; Hertzmann, A.; Roweis, S.T.; Freeman, W.T. Removing camera shake from a single photograph. In ACM SIGGRAPH 2006 Papers; Association for Computing Machinery: New York, NY, USA, 2006; pp. 787–794. [Google Scholar]
Zuo, W.; Ren, D.; Zhang, D.; Gu, S.; Zhang, L. Learning iteration-wise generalized shrinkage–thresholding operators for blind deconvolution. IEEE Trans. Image Process. 2016, 25, 1751–1764. [Google Scholar] [CrossRef]
Cho, S.; Lee, S. Fast motion deblurring. In ACM SIGGRAPH Asia 2009 Papers; Association for Computing Machinery: New York, NY, USA, 2009; pp. 1–8. [Google Scholar]
Krishnan, D.; Fergus, R. Fast image deconvolution using hyper-Laplacian priors. Adv. Neural Inf. Process. Syst. 2009, 22, 1033–1041. [Google Scholar]
Krishnan, D.; Tay, T.; Fergus, R. Blind deconvolution using a normalized sparsity measure. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 233–240. [Google Scholar]
Levin, A.; Weiss, Y.; Durand, F.; Freeman, W.T. Understanding and evaluating blind deconvolution algorithms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1964–1971. [Google Scholar]
Levin, A.; Weiss, Y.; Durand, F.; Freeman, W.T. Efficient marginal likelihood optimization in blind deconvolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 2657–2664. [Google Scholar]
Xu, L.; Zheng, S.; Jia, J. Unnatural l0 sparse representation for natural image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1107–1114. [Google Scholar]
Pan, J.; Hu, Z.; Su, Z.; Yang, M.H. l_0-regularized intensity and gradient prior for deblurring text images and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 342–355. [Google Scholar] [CrossRef] [PubMed]
Sun, L.; Cho, S.; Wang, J.; Hays, J. Edge-based blur kernel estimation using patch priors. In Proceedings of the IEEE International Conference on Computational Photography, Cambridge, MA, USA, 19–21 April 2013; pp. 1–8. [Google Scholar]
Michaeli, T.; Irani, M. Blind deblurring using internal patch recurrence. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 783–798. [Google Scholar]
Pan, J.; Sun, D.; Pfister, H.; Yang, M.H. Deblurring images via dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2315–2328. [Google Scholar] [CrossRef] [PubMed]
Sun, J.; Cao, W.; Xu, Z.; Ponce, J. Learning a convolutional neural network for non-uniform motion blur removal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 7–9 May 2015; pp. 769–777. [Google Scholar]
Chakrabarti, A. A neural approach to blind motion deblurring. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 221–235. [Google Scholar]
Sajjadi, M.S.; Scholkopf, B.; Hirsch, M. Enhancenet: Single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 4491–4500. [Google Scholar]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Lai, W.S.; Huang, J.B.; Hu, Z.; Ahuja, N.; Yang, M.H. A comparative study for single image blind deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1701–1709. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
Whyte, O.; Sivic, J.; Zisserman, A. Deblurring shaken and partially saturated images. Int. J. Comput. Vis. 2014, 110, 185–201. [Google Scholar] [CrossRef]
Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 2016, 3, 47–57. [Google Scholar] [CrossRef]

Figure 1. Visual quality comparison. The input image for each method is denoted as

{}

(i.e., ours {1,2} indicates our resulting image when the input images are blurry image 1 and blurry image 2). (a) Ground-truth image. (b) Blurry image with kernel size 55 × 55. (c) Blurry image with kernel size 75 × 75. (d,e) Results of [21] corresponding to (b,c), respectively. In (d), PSNR is 15.33 and in (e), PSNR is 14.45. (f,g) Results of [18] corresponding to (b,c), respectively. In (f), PSNR is 21.03 and in (g), PSNR is 20.15. (h) Our result. In (h), PSNR is 26.82.

Figure 1. Visual quality comparison. The input image for each method is denoted as

{}

(i.e., ours {1,2} indicates our resulting image when the input images are blurry image 1 and blurry image 2). (a) Ground-truth image. (b) Blurry image with kernel size 55 × 55. (c) Blurry image with kernel size 75 × 75. (d,e) Results of [21] corresponding to (b,c), respectively. In (d), PSNR is 15.33 and in (e), PSNR is 14.45. (f,g) Results of [18] corresponding to (b,c), respectively. In (f), PSNR is 21.03 and in (g), PSNR is 20.15. (h) Our result. In (h), PSNR is 26.82.

Figure 2. Architecture of the proposed DualDeblur.

Figure 3. Qualitative comparisons on the Levin test set [33]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as

{}