Next Article in Journal
Aircraft Type Recognition in Remote Sensing Images: Bilinear Discriminative Extreme Learning Machine Framework
Next Article in Special Issue
Performance Improvement of Image-Reconstruction-Based Defense against Adversarial Attack
Previous Article in Journal
Design and Optimization of Compact Printed Log-Periodic Dipole Array Antennas with Extended Low-Frequency Response
Previous Article in Special Issue
Deep Image Prior for Super Resolution of Noisy Image
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dual Image Deblurring Using Deep Image Prior

1
Department of Artificial Intelligence, Ajou University, Suwon 16499, Korea
2
Department of Electrical and Computer Engineering, Ajou University, Suwon 16499, Korea
*
Author to whom correspondence should be addressed.
Electronics 2021, 10(17), 2045; https://doi.org/10.3390/electronics10172045
Submission received: 16 July 2021 / Revised: 16 August 2021 / Accepted: 18 August 2021 / Published: 24 August 2021

Abstract

:
Blind image deblurring, one of the main problems in image restoration, is a challenging, ill-posed problem. Hence, it is important to design a prior to solve it. Recently, deep image prior (DIP) has shown that convolutional neural networks (CNNs) can be a powerful prior for a single natural image. Previous DIP-based deblurring methods exploited CNNs as a prior when solving the blind deburring problem and performed remarkably well. However, these methods do not completely utilize the given multiple blurry images, and have limitations of performance for severely blurred images. This is because their architectures are strictly designed to utilize a single image. In this paper, we propose a method called DualDeblur, which uses dual blurry images to generate a single sharp image. DualDeblur jointly utilizes the complementary information of multiple blurry images to capture image statistics for a single sharp image. Additionally, we propose an adaptive L 2 _ SSIM loss that enhances both pixel accuracy and structural properties. Extensive experiments show the superior performance of our method to previous methods in both qualitative and quantitative evaluations.

1. Introduction

Motion blur is a common artifact caused by the relative motion between the camera and the scene during exposure. In practice, when we obtain images from cameras equipped in the mobile embedded systems, the images are often blurred because they are usually captured with hand-held cameras. The unwanted blur artifacts not only degrade the image quality but also result in the loss of important information in the image. Consequently, blurry images deteriorate the performance of various computer vision tasks, such as image classification [1,2,3], object detection [4,5,6], and segmentation [7,8,9]. Accordingly, numerous image deblurring studies have been actively proposed to remove blur artifacts and restore sharp images.
Given a blurry image y, the blur process is typically modeled as a convolution operation of a latent sharp image x and a blur kernel k as follows:
y = k x + n ,
where ⊗ denotes the convolution operator and n is the noise. The goal of blind image deblurring is to estimate the sharp image and the blur kernel simultaneously when the blur kernel is unknown. This is a classical ill-posed problem because x and k can have multiple solutions. Owing to the ill-posed nature of the problem, conventional deblurring studies constrain the solution space by leveraging various priors and regularizers.
Recently, extensive studies [10,11,12,13,14] based on deep learning (DL) have been performed on image deblurring. Most of them employ deep convolutional neural networks (CNNs) and trained them on a large-scale dataset of blurry/sharp image pairs [15]. CNNs implicitly learn more general priors by capturing the natural image statistics from a large number of blurry/sharp image pairs. DL-based methods have provided superior results. However, collecting such a large dataset is difficult and expensive [16]. In contrast to DL-based data-driven approaches, Ulyanov et al. [17] proposed a deep image prior (DIP), which is based on self-supervised learning, and showed that a CNN can capture the low-level statistics of a single natural image. Their method performed remarkably well in low-level vision tasks, such as denoising, super-resolution, and inpainting. Inspired by this, Ren et al. [18] suggested the SelfDeblur framework to solve the single image blind deblurring problem. Given a single blurry image, the SelfDeblur estimates the latent sharp image and the blur kernel simultaneously by jointly optimizing the image generator network and kernel estimator network. However, the SelfDeblur cannot perform deblurring in the case of multiple blurry images. This is because the architecture of the SelfDeblur is strictly designed to leverage only the internal statistics of a single blurry image. Although using multiple observations for image deblurring is beneficial [19,20], most of self-supervised learning approaches do not completely leverage the internal information of given multiple images.
We propose a method called DualDeblur that aims to restore a single sharp image from two given blurry observations. In many practical scenarios, we can capture multiple images of the same physical scene. At this time, we obtain multiple blurry images under various conditions through multiple captures. For example, let us consider two blurry images shown in Figure 1b,c. They share the same latent sharp image as shown in Figure 1a. Thus, the sharp images restored from Figure 1b,c should be the same. Hence, we can further constrain the solution space. Specifically, our DualDeblur comprises a single image generator and two blur kernel estimators. The image generator aims to estimate a sharp image, which is latent in two blurry images. Each blur kernel estimator estimates the blur kernel for each blurry image. Thereafter, we jointly optimize the image generator and blur kernel estimators by comparing the reblurred images and given blurry images. Here, the reblurred images are generated by the blur process of the predicted image and the estimated blur kernels. Through this joint optimization process, our image generator learns a strong prior for a single sharp image by using the complementary information of multiple images.
In addition, we propose an adaptive L 2 _ SSIM loss to enhance both pixel-wise accuracy and structure details. Most DIP-based methods use the L 2 loss to minimize the difference in pixel values between the target image and restored image. In our task, simply using the L 2 loss may deteriorate the restoration performance because the target image is blurry. Thus, the L 2 loss is insufficient to restore the detailed textures. Hence, many restoration methods involve replacing the L 2 loss with structural properties loss, such as the SSIM loss [9], MS-SSIM loss [22], and FSIM loss [23]. However, using only the SSIM loss has several limitations. SSIM does not consider pixel-wise accuracy. Therefore, comparing corrupted structures may lead to unexpected resulting images. To tackle this, our adaptive L 2 _ SSIM loss adjusts the weight for each training step through a weighted sum that considers the characteristics of L 2 and SSIM. At the beginning of training, most of the weight is focused on L 2 , which is decreased exponentially, according to the iterations. Hence, pixel-wise accuracy is ensured by focusing on L 2 in the early stages of training. Increasing the pixel-wise accuracy at an early stage of training can prevent unexpected structures in the resulting images. In the remaining stages of training, we exponentially increase the weight of the SSIM loss to preserve the structural properties. Through this process, our reconstruction loss ensures both pixel-wise accuracy and structural properties.
Figure 1 shows the effectiveness of our method. Generally, large blurs often occur when the images are taken from cameras with fast movement in the night environments (see Figure 1b,c). In this case, previous classical methods often fail to restore the sharp images, as shown in Figure 1d,e. This is because the priors utilized in the methods are subjective and cannot accurately capture the intrinsic distribution of natural images and blur kernels [24]. As shown in Figure 1f,g, SelfDeblur [18] also fails to estimate the kernel for severely blurred images and does not appropriately deblur images. However, the proposed DualDeblur successfully estimates two blur kernels using two severely blurred images and generates a superior resulting image with many textures. Our experiments show that DualDeblur performs better than other comparative methods, both quantitatively and qualitatively.
The following are the main contributions of this study:
  • We propose a DIP-based deblurring method called DualDeblur using two blurry images of the same scene. Multiple images are used to jointly optimize complementary information.
  • We propose an adaptive L 2 _ SSIM loss that adjusts the weights of both L 2 and SSIM for each optimization step. From this, we ensure both pixel-wise accuracy and structural properties in the deblurred image.
  • The experimental results show that our method is quantitatively and qualitatively superior to previous methods.

2. Related Works

In this section, we briefly introduce the existing image deblurring methods based on optimization and DL [25].

2.1. Optimization-Based Image Deblurring

Image deblurring, one of the classical inverse problems, aims to restore a sharp latent image from a given blurry image. Owing to the ill-posed nature of the deblurring problem, most traditional methods have been proposed to constrain the solution space by using various priors or regularizers, such as TV regularizations [26,27], gradient priors [21], sparsity priors [28], gradient sparsity priors [29], Gaussian scale mixture priors [30], hyper-Laplacian priors [31], 1 / 2 -norms [32], variational Bayes approximations [33,34], 0 -norms [35,36], patch-based statistical priors [37,38], adaptive sparse priors [19], and dark channel priors [39]. By taking advantage of those priors, the traditional methods jointly estimated the sharp image and blur kernel from the blurry image. However, most of these methods heavily rely on the accurate selection of regularizers or priors. Furthermore, when the blur kernel is large and complex, their methods often fail to restore the sharp image.

2.2. DL-Based Image Deblurring

Recently, DL [25]-based methods were widely developed to solve the image deblurring problem. Early DL-based deblurring methods [40,41] focused only on estimating blur kernels using DL. Sun et al. [40] proposed to predict the probabilistic distribution of motion blur at the patch level, using a CNN. Chakrabarti et al. [41] presented a CNN to predict the complex Fourier coefficients of a deconvolution filter to be applied to the input patch for restoration. Unlike traditional approaches of using CNNs as a kernel estimation process, Nah et al. [10] proposed to directly predict the deblurred output without an additional kernel estimation process by using multi-scale CNNs. Motivated by the multi-scale approach, Tao et al. [12] proposed to reduce the memory size using a long short-term memory (LSTM)-based scale-recurrent network. Zhang et al. [14] proposed a multi-level CNN that uses a multi-patch hierarchy as input to exploit a multi-patch localized-to-coarse approach. Ulyanov et al. [17] suggested DIP, showing that CNNs can work satisfactorily as priors for a single image. However, there is a limitation to capturing the characteristics of the blur kernel, because the DIP network consists of CNNs that contain only image statistics [18]. To tackle this, Ren et al. [18] suggested the SelfDeblur to solve the blind deblurring problem. SelfDeblur [18] adopted a CNN to capture image statistics. To overcome the aforementioned drawback of DIP, they employed a fully connected network (FCN) to model the prior of the blur kernel. Although SelfDeblur [18] effectively solves the blind deblurring problem, its structure can only handle a single image and cannot appropriately utilize multiple images. In contrast to SelfDelbur, our DualDeblur is designed with a structure that can utilize multiple images that share a single sharp image.

3. Proposed Method

In this section, we describe the blur process for two blurry images and the proposed DualDeblur framework, using two blurry images. Additionally, we introduce an adaptive L 2 _ SSIM loss that considers both pixel-wise accuracy and perceptual properties. Subsequently, we summarize the optimization process of the proposed method.

3.1. DualDeblur

Given two blurry observations y 1 and y 2 , the blur process can be formulated as follows:
y 1 = k 1 x + n 1 , y 2 = k 2 x + n 2 ,
where x denotes a latent sharp image, and k 1 and k 2 represent two blur kernels corresponding to each blurry observation, respectively. Our DualDeblur predicts a single sharp image x using two blurry images, y 1 and y 2 . As depicted in Figure 2, DualDeblur consists of an image generator f θ x ( · ) and blur kernel estimators f θ k 1 ( · ) and f θ k 2 ( · ) . Table 1 presents the detailed architecture of our image generator f θ x ( · ) . The image generator f θ x ( · ) is learned as a network x ^ = f θ x ( z x ) mapping the uniform distribution z x to an image x ^ . Table 2 shows our kernel estimators f θ k 1 ( · ) and f θ k 2 ( · ) . The blur kernel estimator f θ k 1 ( · ) is learned as a network k ^ 1 = f θ k 1 ( z k 1 ) mapping the uniform distribution 1-D vector z k 1 to a 2-D reshaped blur kernel k ^ 1 . Similarly, the blur kernel estimator f θ k 2 ( · ) is learned as a network k ^ 2 = f θ k 2 ( z k 2 ) mapping the uniform distribution 1-D vector z k 2 to a 2-D reshaped blur kernel k ^ 2 . Networks f θ k 1 ( · ) and f θ k 2 ( · ) are dual architectures designed for two blurry images. k ^ 1 and k ^ 2 are the estimated blur kernels corresponding to y 1 and y 2 , respectively. DualDeblur jointly optimizes f θ x ( · ) , f θ k 1 ( · ) , and f θ k 2 ( · ) by comparing y 1 and x ^ 1 k ^ 1 , as well as y 2 and x ^ 2 k ^ 2 through the proposed loss function, as explained in the following.

3.2. Adaptive L 2 _ SSIM Loss

In this sub-section, we propose an adaptive L 2 _ SSIM loss to enhance both pixel-wise accuracy and perceptual properties. We adjust the weights of each training step with a weighted sum that considers the properties of L 2 and SSIM. First, we introduce the L 2 and SSIM losses.
When solving the restoration problem, the L 2 loss is usually used and is formulated as follows:
L 2 = i = 1 2 k ^ i x ^ y i 2 ,
where i denotes the i-th observation, and L 2 increases the pixel-wise accuracy by minimizing the pixel values between the target image and the restored image. However, in the case of L 2 , the output image tends to be blurry and lacks high-frequency textures [42,43]. In our case, using only L 2 is even worse because both y and k x are blurry images. To overcome the limitation, the SSIM loss, which preserves perceptual features is also used. SSIM captures the luminance, contrast, and structure of an image [9]. Here, L S S I M is formulated as follows:
L S S I M = i = 1 2 ( 1 S S I M ( k ^ i x ^ , y i ) ) ,
However, because the SSIM loss does not consider pixel-wise accuracy, collapsed structures in the blurry observations may lead to an unexpected structure in the resulting image. Therefore, we propose an adaptive L 2 _ SSIM loss to preserve the strengths of each loss and compensate for the weaknesses of each loss. The proposed adaptive L 2 _ SSIM loss ( L L 2 _ S S I M ) is formulated as follows:
L L 2 _ S S I M ( t ) = ω ( t ) α L 2 + ( 1 ω ( t ) ) L S S I M , ω ( t ) = exp ( t γ ) ,
where ω ( t ) denotes a weighting function that adjusts the weights of the L 2 and SSIM losses according to each tth step, and α represents a parameter that adjusts the scale of the L 2 loss. γ denotes a parameter that adjusts the range of the steps affected by the L 2 loss. At the beginning of the step, the weights of the L 2 loss account for most of the total weights to focus on pixel-wise accuracy so that it does not result in unexpected structures. Hence, we reduce the weights of the L 2 loss and increase those of the L S S I M loss to preserve the structure content of the image. As a result, our reconstruction loss not only increases the pixel-wise accuracy, but also preserves the structural details of the image. The effectiveness of the proposed reconstruction loss was demonstrated in an ablation study in Section 4.5.
The final optimization process of DualDeblur is summarized in Algorithm 1. Here, T denotes the total training iteration, and θ k 1 , θ k 2 and θ x represent network parameters corresponding to f θ k 1 ( · ) , f θ k 2 ( · ) and f θ x ( · ) , respectively. DualDeblur estimates a restored image and two blur kernels. Thereafter, it generates two reblurred images using a convolution operation and compares them with y 1 and y 2 , respectively, through the L L 2 _ S S I M loss in Equation (5). By optimizing all the networks simultaneously, the image generator f θ x ( · ) jointly utilizes the complementary information of the two blurry images. Finally, we obtain the restored image and blur kernels from T iterations.
Algorithm 1 DualDeblur optimization process
Input: blurry images y 1 , y 2 and T iterations
Output: restored image x ^ , estimated blur kernels k ^ 1 and k ^ 2
  •  1: Sample z x , z k 1 , and z k 2 from uniform distribution
  •  2: for t = 1 to T do
  •  3:     perturb z x
  •  4:      x ^ = f θ x t 1 ( z x )
  •  5:     k ^ 1 = f θ k 1 t 1 ( z k 1 )
  •  6:      k ^ 2 = f θ k 2 t 1 ( z k 2 )
  •  7:     Compute the gradients of θ x t 1 , θ k 1 t 1 and θ k 2 t 1 w.r.t. L L 2 _ S S I M ( t )
  •  8:     Update θ x t , θ k 1 t and θ k 2 t using the ADAM [44]
  •  9: end for
  •  10: x ^ = f θ x T ( z x ) , k ^ 1 = f θ k 1 T ( z k 1 ) and k ^ 2 = f θ k 2 T ( z k 2 )

4. Experimental Results

4.1. Dataset

To evaluate the performance of our method, we used two image deblurring benchmark datasets: the Levin test set [33] and the Lai test set [45]. The proposed method solves the deblurring problem by using two observations. In this case, there are two possible scenarios. First, two observations are degraded by a similar degree of blur artifacts (soft pairs). Second, the degrees of blur artifacts are very different from each other (hard pairs). To simulate these cases, we divided each test set into soft and hard pairs and used them for evaluation. The two test sets are discussed in the following.
1
Levin test set [33]: In their seminal work, Levin et al. [33] provided 8 blur kernels with size of k × k , where k = 13 , 15 , 17 , 19 , 21 , 23 , 27 and 4 sharp images, resulting in 32 blurry gray-scale images with size of 255 × 255 . To evaluate our method, we divided the soft and hard pairs on the basis of difference in blur kernel size. If the difference was less than 5 pixels, we classified such an image pair as a soft pair, and vice versa as a hard pair. Following this pipeline, we randomly selected 7 soft pairs and 7 hard pairs, totaling to 14 blurry pairs per image. In short, we prepared a total of 56 pairs of blurry images for evaluation. The composition of the Levin test set [33] is described in detail in Table 3. Specifically, the soft pairs comprised [ 13 , 15 ] ,   [ 15 , 17 ] ,   [ 17 , 19 ] ,   [ 19 , 21 ] ,   [ 21 , 23 a ] ,   [ 21 , 23 a ] , and [ 23 a , 23 b ] . Here, each number represents the blur kernel size of k. For example, [11, 13] means that the blur kernel sizes 13 × 13 and 15 × 15 are paired. Because the Levin test set contains two blur kernels with a size of 23 × 23, we denote each as 23 a and 23 b . The hard pairs contained [ 13 , 27 ] ,   [ 15 , 27 ] ,   [ 17 , 27 ] ,   [ 19 , 27 ] ,   [ 21 , 27 ] ,   [ 23 a , 27 ] , and [ 23 b , 27 ] .
2
Lai test set [45]: We further compared our method using the Lai test set [45], which contains RGB images of various sizes. The Lai test set comprises 4 blur kernels and 25 sharp images, resulting in 100 blurry images. It is divided into five categories: M a n m a d e , N a t u r a l , P e o p l e , S a t u r a t e d , and T e x t , with 20 images for each category. The sizes of the 4 blur kernels are 31 × 31 ,   51 × 51 ,   55 × 55 , and 75 × 75 . Thus, we prepared a soft pair (i.e., [ 51 , 55 ] ), and 4 hard pairs (i.e., [ 31 , 51 ] ,   [ 31 , 75 ] ,   [ 51 , 75 ] , and [ 55 , 75 ] ). As described in Table 3, there are 25 sharp images and 5 blur kernel pairs; a total of 125 pairs of blur images are used for evaluation.

4.2. Implementation Details

We implemented our DualDeblur using Pytorch [46]. The networks were optimized using Adam [44] with a learning rate of 1 × 10 2 , β 1 = 0.9, and β 2 = 0.999. In our experiments, the total number of iterations was 5000, and the learning rate was decayed by multiplying by 0.5 for every 2000, 3000, and 4000 iterations. We empirically set values of α and γ in Equation (5) as α = 10 and γ = 100 . Following [17,18], we sampled the initial z x , z k 1 and z k 2 from the uniform distribution with a fixed random seed 0. Notably, all the experiments of our model were conducted using a single NVIDIA TITAN-RTX GPU.

4.3. Comparison on the Levin Test Set

For the Levin test set [33], we compared our DualDeblur with the existing blind deconvolution methods (i.e., Krishnan et al. [32], Levin et al. [33], Cho&Lee [30], Xu&Jia [21], Sun et al. [37], Zuo et al. [29], and Pan-DCP [39]), and a DIP-based deblurring method (i.e., SelfDeblur [18]). Ref. [34] was used as the deconvolution to generate the final results of the previous methods. For quantitative comparison, we calculated the PSNR and SSIM [9] metrics using the codes provided by [18]. Moreover, we reported FSIM [23] and LPIPS [43] distance to evaluate the perceptual similarity. We also compared the error ratio [34], which was formulated by the sum of squared differences between deconvolution with the estimated kernels and deconvolution with the ground truth kernels.
We computed the average PSNR, SSIM, error ratio, FSIM and LPIPS on the Levin test set for various methods (see Table 4). For a fair comparison, we reported the results for the soft and hard pairs that contained each kernel.
With the advantage of using multiple images, the results of our method were significantly superior to those of the previous methods in terms of all the metrics. Specifically, our results showed that the PSNR was 8.00 higher than the second-highest SelfDeblur [18], that the SSIM was 0.0542 higher than the second-highest Zuo et al. [29], and that the FSIM was 0.0378 higher than the second-highest Sun et al. [37]. Our method also showed superior performance at the LPIPS distance compared to the other methods. Note that our method performed remarkably well regardless of the difference in blur kernel size between the two given images. Our experimental results show that average results of the hard pairs are slightly better than those of the soft pairs. We believe that this is because the complementary information between the two images is important for deblurring, and the hard pairs often include more complementary information than the soft pairs. In Figure 3, we compare the previous methods with the soft and hard pairs of our method. The results of the previous methods are the results for input 1 in Figure 3. In Figure 3, ours {1,2} is the soft pair result of input 1 and input 2, and ours {1,3} is the hard pair result of input 1 and input 3. Our method outperforms other methods in restoring sharp edges and fine details in both soft and hard pairs. The blur kernel estimated using the DualDeblur method is considerably closer to the ground truth.
As shown in Table 5, we measured the inference time and the number of model parameters of our method and SelfDeblur [18]. We measured the average inference time for a single image using the Levin test set [33]. The inference time of our model and the SelfDeblur [18] were measured on a PC with an NVIDIA TITAN-RTX GPU, while other methods were measured a PC with 3.30 GHz Intel(R) Xeon(R) CPU as reported in [18]. Our model has a longer inference time and more parameters than SelfDeblur [18]. This is because our model optimizes three networks, whereas SelfDeblur [18] optimizes two networks.

4.4. Comparison on Lai Test Set

For the Lai test set [45], our method was compared with those of Cho and Lee [30], Xu and Jia [21], Xu et al. [35], Michael et al. [38], Perrone et al. [27], Pan-DCP [39], and SelfDeblur [18]. In previous methods, after blur kernel estimation, ref. [47] was applied to the S a t u r a t e d category as deconvolution, and ref. [31] to the other categories. In Table 6, our DualDeblur results achieved better quantitative metrics, compared with the previous methods. Our average results for the Lai test set [45] were 7.72 higher for PSNR and 0.2136 higher for SSIM compared with the 2nd highest SelfDelbur [18]. The results of LPIPS showed that our method can restore more perceptually high-quality images, compared to other methods. Additionally, our method performed superior for all blur kernels. This shows that the proposed DualDeblur method performed excellently for large and diverse images. Both our soft and hard pairs outperformed the results of the previous methods.
In Figure 4 and Figure 5, through a qualitative comparison, it can be seen that our DualDeblur is visually superior to the previous methods. The kernel estimated by our DualDeblur is highly accurate compared with the other methods. Although other methods suffer from blur or ringing artifacts, our results are perceivably superior with rich texture (see Figure 4 details). Additionally, Figure 5 shows the high-quality details of our result; clearly, only the result of our method accurately reconstructs the stripes of the tie.
In Figure 6, our method shows superior results when using two blurry images that cannot be deblurred by the previous methods. Conversely, our method performs deblurring by jointly using two blurred images that are severely damaged and contain little information. In the 3rd line of Figure 6, SelfDeblur [18] fails to estimate the blur kernels in both input 1 and input 2, whereas our method is superior in estimating the blur kernels and the final image.

4.5. Ablation Study

To investigate the effectiveness of the proposed dual architecture and adaptive L 2 _ SSIM loss, we conducted ablation studies. After equalizing the loss, we compared the dual architecture (called DualDeblur-A) with [18] to investigate the effect of the dual architecture. Furthermore, we demonstrated the effectiveness of our adaptive L 2 _ SSIM loss by comparing models optimized using L L 2 _ S S I M and only L 2 or L S S I M . Models DualDeblur-B and DualDeblur-C have the same architecture as DualDeblur-A; however, DualDeblur-B uses only L 2 in Equation (3) and DualDeblur-C uses only L S S I M in Equation (4) for optimization. Finally, we define DualDeblur, using the proposed L L 2 _ S S I M in Equation (5). The quantitative and qualitative comparisons are shown in Table 7 and Figure 7, respectively.

4.5.1. Effects of Dual Architecture

Unlike SelfDeblur [18], which performs deblurring with a single observation, our method leverages multiple observations via a dual architecture. In our experiments, DualDeblur-A using a dual architecture significantly improved the deblurring performance, compared to SelfDeblur (see (a) and (b) in Table 7). The PSNR and SSIM results of DualDeblur-A increased by 2.68 and 0.0098, respectively, compared to those of SelfDeblur. For FSIM and LPIPS, the results of DualDeblur-A are also better than those of SelfDeblur by 0.738 and 0.0334, respectively. This indicates that using multiple images is more helpful for deblurring than using a single image. This also shows that the proposed method is effective in handling multiple images during the deblurring procedure. The results of DualDeblur-A and DualDeblur-B (see Table 7) show that the performance of DualDeblur-B without TV regularization is similar to that of DualDeblur-A. These results show that the dual architecture works well without an additional regularizer.

4.5.2. Effects of Adaptive L 2 _SSIM Loss

The proposed adaptive L 2 _ SSIM loss, formulated as the weighted sum of L 2 and L S S I M , focuses on restoring the intensity values per pixel first and then gradually restoring the structure later. By using the proposed adaptive L 2 _ SSIM loss, we aim to exploit the advantages of L 2 and L S S I M loss functions and complement their limitations. To demonstrate the effectiveness of the adaptive L 2 _ SSIM loss, we compare the performances of DualDeblur optimized with various loss functions (1) DualDeblur-B using the L 2 loss, (2) DualDeblur-C using the L S S I M loss, and (3) DualDeblur using the L L 2 _ S S I M loss.
When optimizing our model using only the L 2 loss, the quantitative results are the worst in PSNR and SSIM (see Table 7). As shown in Figure 7, the results of our method using only the L 2 loss are overly smooth and fail to restore the details. To overcome this, we employed the structural loss ( L S S I M ) in our method to enhance the perceptual quality and structural details in local regions [48]. Figure 7 also shows that using L S S I M helps restore details of the image rather than using only the L 2 loss. However, L S S I M does not restore the accurate pixel intensity. Additionally, corrupted structures in blurry observations may lead to unexpected structures in the resulting images.
However, in Figure 7 the results of our adaptive L 2 _ SSIM loss L L 2 _ S S I M demonstrate not only effectiveness in restoring accurate pixel values, but also in restoring the details and sharp edges of the image. As shown in Table 7, DualDeblur achieves the best in most metrics including PSNR, SSIM, and LPIPS except FSIM. Specifically, the results of DualDeblur show that the average PSNR increases by 5.26 and 1.78, compared with those of DualDeblur-B and DualDeblur-C, respectively. In addition, the results of DualDeblur show that the average SSIM is 0.0212 higher than the second-highest DualDeblur-C, that the average FSIM is 0.0197 lower than the highest DaulDeblur-A, and that the average LPIPS is 0.0287 better than the second-best DualDeblur-A. Figure 8a demonstrates the effectiveness of our adaptive L 2 _ SSIM loss. The proposed adaptive L 2 _ SSIM loss outperforms all other losses in every iteration. Figure 8b shows the change of ω ( t ) in Equation (5), which is the weight of the adaptive L 2 _ SSIM loss following the training iterations. As mentioned earlier, the L 2 is more weighted than L S S I M in the initial iteration step, and the weight of L S S I M increases exponentially.
As shown in Table 8, we conduct various experiments on the α and γ of Equation (5). The results show that the model with α = 10 and γ = 100 gives the best results for both PSNR and SSIM, whereas the model with α = 50 and γ = 200 is the best for FSIM and LPIPS. We select the model with α = 10 and γ = 100 because PSNR and SSIM are the most commonly used metrics.

5. Conclusions

In this paper, we proposed a DualDeblur framework to restore a single sharp image using multiple blurry images. Our framework adopted a dual architecture to utilize the complementary information of two blurry images for obtaining a single sharp image. We proposed an adaptive L 2 _ SSIM loss to ensure both pixel accuracy and structural details. For practical and accurate performance evaluation of our results, we divided the blur pairs into soft and hard pairs. Extensive comparisons demonstrated the superior results of our DualDeblur, compared to those of previous methods in both quantitative and qualitative evaluations.

Author Contributions

Conceptualization, C.J.S., T.B.L. and Y.S.H.; software, C.J.S.; validation, C.J.S.; investigation, C.J.S. and T.B.L.; writing—original draft preparation, C.J.S.; writing—review and editing, C.J.S., T.B.L. and Y.S.H.; supervision, Y.S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1C1C1007446), and in part by the BK21 FOUR program of the National Research Foundation of Korea funded by the Ministry of Education (NRF5199991014091).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  2. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  3. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
  4. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–16 December 2015; pp. 1440–1448. [Google Scholar]
  5. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  6. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  7. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
  8. Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
  9. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Nah, S.; Hyun Kim, T.; Mu Lee, K. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3883–3891. [Google Scholar]
  11. Su, S.; Delbracio, M.; Wang, J.; Sapiro, G.; Heidrich, W.; Wang, O. Deep video deblurring for hand-held cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1279–1288. [Google Scholar]
  12. Tao, X.; Gao, H.; Shen, X.; Wang, J.; Jia, J. Scale-recurrent network for deep image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8174–8182. [Google Scholar]
  13. Zhang, J.; Pan, J.; Ren, J.; Song, Y.; Bao, L.; Lau, R.W.; Yang, M.H. Dynamic scene deblurring using spatially variant recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2521–2529. [Google Scholar]
  14. Zhang, H.; Dai, Y.; Li, H.; Koniusz, P. Deep stacked hierarchical multi-patch network for image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5978–5986. [Google Scholar]
  15. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Multi-stage progressive image restoration. arXiv 2021, arXiv:2102.02808. [Google Scholar]
  16. Quan, Y.; Chen, M.; Pang, T.; Ji, H. Self2self with dropout: Learning self-supervised denoising from single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1890–1898. [Google Scholar]
  17. Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9446–9454. [Google Scholar]
  18. Ren, D.; Zhang, K.; Wang, Q.; Hu, Q.; Zuo, W. Neural blind deconvolution using deep priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3341–3350. [Google Scholar]
  19. Zhang, H.; Wipf, D.; Zhang, Y. Multi-image blind deblurring using a coupled adaptive sparse prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–27 June 2013; pp. 1051–1058. [Google Scholar]
  20. Rav-Acha, A.; Peleg, S. Two motion-blurred images are better than one. Pattern Recognit. Lett. 2005, 26, 311–317. [Google Scholar] [CrossRef]
  21. Xu, L.; Jia, J. Two-phase kernel estimation for robust motion deblurring. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2010; pp. 157–170. [Google Scholar]
  22. Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; 2003; Volume 2, pp. 1398–1402. [Google Scholar]
  23. Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Wang, H.; Yue, Z.; Zhao, Q.; Meng, D. A Deep Variational Bayesian Framework for Blind Image Deblurring. arXiv 2021, arXiv:2106.02884. [Google Scholar]
  25. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
  26. Chan, T.F.; Wong, C.K. Total variation blind deconvolution. IEEE Trans. Image Process. 1998, 7, 370–375. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Perrone, D.; Favaro, P. Total variation blind deconvolution: The devil is in the details. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2909–2916. [Google Scholar]
  28. Fergus, R.; Singh, B.; Hertzmann, A.; Roweis, S.T.; Freeman, W.T. Removing camera shake from a single photograph. In ACM SIGGRAPH 2006 Papers; Association for Computing Machinery: New York, NY, USA, 2006; pp. 787–794. [Google Scholar]
  29. Zuo, W.; Ren, D.; Zhang, D.; Gu, S.; Zhang, L. Learning iteration-wise generalized shrinkage–thresholding operators for blind deconvolution. IEEE Trans. Image Process. 2016, 25, 1751–1764. [Google Scholar] [CrossRef]
  30. Cho, S.; Lee, S. Fast motion deblurring. In ACM SIGGRAPH Asia 2009 Papers; Association for Computing Machinery: New York, NY, USA, 2009; pp. 1–8. [Google Scholar]
  31. Krishnan, D.; Fergus, R. Fast image deconvolution using hyper-Laplacian priors. Adv. Neural Inf. Process. Syst. 2009, 22, 1033–1041. [Google Scholar]
  32. Krishnan, D.; Tay, T.; Fergus, R. Blind deconvolution using a normalized sparsity measure. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 233–240. [Google Scholar]
  33. Levin, A.; Weiss, Y.; Durand, F.; Freeman, W.T. Understanding and evaluating blind deconvolution algorithms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1964–1971. [Google Scholar]
  34. Levin, A.; Weiss, Y.; Durand, F.; Freeman, W.T. Efficient marginal likelihood optimization in blind deconvolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 2657–2664. [Google Scholar]
  35. Xu, L.; Zheng, S.; Jia, J. Unnatural l0 sparse representation for natural image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1107–1114. [Google Scholar]
  36. Pan, J.; Hu, Z.; Su, Z.; Yang, M.H. l_0-regularized intensity and gradient prior for deblurring text images and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 342–355. [Google Scholar] [CrossRef] [PubMed]
  37. Sun, L.; Cho, S.; Wang, J.; Hays, J. Edge-based blur kernel estimation using patch priors. In Proceedings of the IEEE International Conference on Computational Photography, Cambridge, MA, USA, 19–21 April 2013; pp. 1–8. [Google Scholar]
  38. Michaeli, T.; Irani, M. Blind deblurring using internal patch recurrence. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 783–798. [Google Scholar]
  39. Pan, J.; Sun, D.; Pfister, H.; Yang, M.H. Deblurring images via dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2315–2328. [Google Scholar] [CrossRef] [PubMed]
  40. Sun, J.; Cao, W.; Xu, Z.; Ponce, J. Learning a convolutional neural network for non-uniform motion blur removal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 7–9 May 2015; pp. 769–777. [Google Scholar]
  41. Chakrabarti, A. A neural approach to blind motion deblurring. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 221–235. [Google Scholar]
  42. Sajjadi, M.S.; Scholkopf, B.; Hirsch, M. Enhancenet: Single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 4491–4500. [Google Scholar]
  43. Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
  44. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  45. Lai, W.S.; Huang, J.B.; Hu, Z.; Ahuja, N.; Yang, M.H. A comparative study for single image blind deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1701–1709. [Google Scholar]
  46. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
  47. Whyte, O.; Sivic, J.; Zisserman, A. Deblurring shaken and partially saturated images. Int. J. Comput. Vis. 2014, 110, 185–201. [Google Scholar] [CrossRef]
  48. Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 2016, 3, 47–57. [Google Scholar] [CrossRef]
Figure 1. Visual quality comparison. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are blurry image 1 and blurry image 2). (a) Ground-truth image. (b) Blurry image with kernel size 55 × 55. (c) Blurry image with kernel size 75 × 75. (d,e) Results of [21] corresponding to (b,c), respectively. In (d), PSNR is 15.33 and in (e), PSNR is 14.45. (f,g) Results of [18] corresponding to (b,c), respectively. In (f), PSNR is 21.03 and in (g), PSNR is 20.15. (h) Our result. In (h), PSNR is 26.82.
Figure 1. Visual quality comparison. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are blurry image 1 and blurry image 2). (a) Ground-truth image. (b) Blurry image with kernel size 55 × 55. (c) Blurry image with kernel size 75 × 75. (d,e) Results of [21] corresponding to (b,c), respectively. In (d), PSNR is 15.33 and in (e), PSNR is 14.45. (f,g) Results of [18] corresponding to (b,c), respectively. In (f), PSNR is 21.03 and in (g), PSNR is 20.15. (h) Our result. In (h), PSNR is 26.82.
Electronics 10 02045 g001
Figure 2. Architecture of the proposed DualDeblur.
Figure 2. Architecture of the proposed DualDeblur.
Electronics 10 02045 g002
Figure 3. Qualitative comparisons on the Levin test set [33]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Figure 3. Qualitative comparisons on the Levin test set [33]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Electronics 10 02045 g003
Figure 4. Qualitative comparisons on the Lai test set [45]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Figure 4. Qualitative comparisons on the Lai test set [45]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Electronics 10 02045 g004
Figure 5. Qualitative comparisons on the Lai test set [45]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Figure 5. Qualitative comparisons on the Lai test set [45]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Electronics 10 02045 g005
Figure 6. Qualitative comparisons on the Lai test set [45]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Figure 6. Qualitative comparisons on the Lai test set [45]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Electronics 10 02045 g006
Figure 7. Ablation study. Qualitative comparisons on the Levin test set [33]. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Figure 7. Ablation study. Qualitative comparisons on the Levin test set [33]. The input image for each method is denoted as { } (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).
Electronics 10 02045 g007
Figure 8. (a) PSNR versus number of training iterations for ablation study. (b) Value of ω ( t ) versus number of training iterations.
Figure 8. (a) PSNR versus number of training iterations for ablation study. (b) Value of ω ( t ) versus number of training iterations.
Electronics 10 02045 g008
Table 1. Architecture f θ x . We adopt Unet [7] with a skip connection as the architecture of f θ x . Conv2d represents a 2D convolution operation, “lReLU” denotes a leaky ReLU, and ⊕ denotes the channel-wise concatenation. Kernel ( m , n × n , p ) represents the number of filters m, filter sizes n × n , and padding p. We implement downsampling with stride 2 and upsampling with bilinear interpolation. C represents image channels, and W x × H x the image size.
Table 1. Architecture f θ x . We adopt Unet [7] with a skip connection as the architecture of f θ x . Conv2d represents a 2D convolution operation, “lReLU” denotes a leaky ReLU, and ⊕ denotes the channel-wise concatenation. Kernel ( m , n × n , p ) represents the number of filters m, filter sizes n × n , and padding p. We implement downsampling with stride 2 and upsampling with bilinear interpolation. C represents image channels, and W x × H x the image size.
Input: z x ( 8 × W x × H x ) of a uniform distribution
Output: latent image x ^ ( C × W x × H x )
EncoderOperationKernelInOutDecoderOperationKernelInOut
Encoder 1Conv2d, lReLU128, 3 × 3, 1 z x e 1 Decoder 1Conv2d, lReLU128, 3 × 3, 1 e 5 s 5 d 1
Conv2d, lReLU16, 3 × 3, 1 e 1 s 1
Encoder 2Conv2d, lReLU128, 3 × 3, 1 e 1 e 2 Decoder 2Conv2d, lReLU128, 3 × 3, 1 d 1 s 4 d 2
Skip 2Conv2d, lReLU16, 3 × 3, 1 e 2 s 2
Encoder 3Conv2d, lReLU128, 3 × 3, 1 e 2 e 3 Decoder 3Conv2d, lReLU128, 3 × 3, 1 d 2 s 3 d 3
Skip 3Conv2d, lReLU16, 3 × 3, 1 e 3 s 3
Encoder 4Conv2d, lReLU128, 3 × 3, 1 e 3 e 4 Decoder 4Conv2d, lReLU128, 3 × 3, 1 d 3 s 2 d 4
Skip 4Conv2d, lReLU16, 3 × 3, 1 e 4 s 4
Encoder 5Conv2d, lReLU128, 3 × 3, 1 e 4 e 5 Decoder 5Conv2d, lReLU128, 3 × 3, 1 d 4 s 1 d 5
Skip 5Conv2d, lReLU16, 3 × 3, 1 e 5 s 5
Output layerConv2d, S i g m o i d C, 1 × 1, 0 d 5 x ^
Table 2. The architecture f θ k i ( · ) . We adopt a FCN as each blur kernel estimator network f θ k i ( · ) . W k i × H k i represents blur kernel sizes. f θ k i ( · ) takes a 200-dimensional input and has 1000 nodes in the hidden layer and W k i × H k i nodes in the last layer. The 1D output is reshaped to a 2D blur kernel size.
Table 2. The architecture f θ k i ( · ) . We adopt a FCN as each blur kernel estimator network f θ k i ( · ) . W k i × H k i represents blur kernel sizes. f θ k i ( · ) takes a 200-dimensional input and has 1000 nodes in the hidden layer and W k i × H k i nodes in the last layer. The 1D output is reshaped to a 2D blur kernel size.
Input: z k i (200) of uniform distributions, blur kernel size of W k i × H k i
Output: blur kernel k i ( W k i × H k i )
FCNOperation
Layer 1Linear (200, 1000), R e L U
Layer 2Linear (1000, W k i × H k i ), S o f t M a x
Table 3. Configurations of Levin test set [33] and Lai test set [45].
Table 3. Configurations of Levin test set [33] and Lai test set [45].
Test Set# GT Images# Blur Kernel# Blur Images# Soft Pair# Hard Pair# Total Pair
Levin test set [33]4832282856
Lai test set [45]25410025100125
Table 4. Quantitative comparisons on the Levin test set [33]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The best results are highlighted. Results of the blur kernel “Avg.” means the averagePSNR, SSIM, error ratio, FSIM and LPIPS results for all blur kernels.
Table 4. Quantitative comparisons on the Levin test set [33]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The best results are highlighted. Results of the blur kernel “Avg.” means the averagePSNR, SSIM, error ratio, FSIM and LPIPS results for all blur kernels.
MethodBlur
Kernel
PSNR ↑SSIM ↑Error
Ratio ↓
FSIM↑LPIPS ↓MethodBlur
Kernel
PSNR ↑SSIM ↑Error
Ratio↓
FSIM ↑LPIPS ↓
known k *1336.530.96591.00000.88680.0530known k *1535.330.95251.00000.81670.0919
Krishnan et al. * [32]1334.880.95751.17150.91160.0604Krishnan et al. * [32]1534.870.94811.05630.78620.1201
Cho & Lee * [30]1333.930.95321.25360.85780.0925Cho & Lee * [30]1533.880.94291.31910.78910.1226
Levin et al. * [34]1334.290.95331.34540.82130.0922Levin et al. * [34]1530.940.89502.56130.80030.1199
Xu & Jia * [21]1334.100.95321.28460.86120.0939Xu & Jia * [21]1533.040.93551.42720.77630.1417
Sun et al. * [37]1336.240.96590.99330.86390.0685Sun et al. * [37]1534.960.94971.12770.78870.1073
Zuo et al. * [29]1335.280.95981.06860.84490.0892Zuo et al. * [29]1534.310.94421.16600.77170.1281
Pan-DCP * [39]1335.470.95911.06900.83590.0887Pan-DCP * [39]1534.190.94151.12440.74950.1259
SelfDeblur [18]1333.030.93881.50780.87310.0938SelfDeblur [18]1533.800.94091.35330.80000.1030
Ours (soft)13, 1539.930.98630.59420.94240.0283Ours (soft)15, 1740.410.98570.45620.87700.0448
Ours (hard)13, 2741.170.98790.34750.90180.0307Ours (hard)15, 2740.900.98620.37570.81770.0578
MethodBlur
Kernel
PSNRSSIMError
Ratio
FSIMLPIPSMethodBlur
Kernel
PSNRSSIMError
Ratio
FSIMLPIPS
known k *1733.170.93861.00000.74910.1176known k *1934.040.94241.00000.86070.0719
Krishnan et al. * [32]1731.690.91601.23280.76050.1317Krishnan et al. * [32]1932.870.93251.17490.82570.0939
Cho & Lee * [30]1731.710.92031.19580.77600.1334Cho & Lee * [30]1932.200.92311.25960.85520.1027
Levin et al. * [34]1729.610.88921.60490.71220.1613Levin et al. * [34]1931.030.91061.60470.81010.1146
Xu & Jia * [21]1730.540.90281.46370.74430.1528Xu & Jia * [21]1932.580.92941.13220.87320.0999
Sun et al. * [37]1732.670.93181.14920.75840.1229Sun et al. * [37]1932.970.93121.20070.88100.0747
Zuo et al. * [29]1732.310.92781.14950.74710.1406Zuo et al. * [29]1933.280.93550.98730.87500.9515
Pan-DCP * [39]1731.820.92151.20840.74050.1397Pan-DCP * [39]1932.500.92501.15360.86130.1031
SelfDeblur [18]1733.120.92750.94030.77210.1251SelfDeblur [18]1933.110.92321.11420.82920.1182
Ours (soft)17, 1940.990.98760.36300.81570.0565Ours (soft)19, 2141.820.98930.47260.72330.0955
Ours (hard)17, 2740.530.98640.29840.85060.0454Ours (hard)19, 2740.730.98740.33510.79370.0703
MethodBlur
Kernel
PSNRSSIMError
Ratio
FSIMLPIPSMethodBlur
Kernel
PSNRSSIMError
Ratio
FSIMLPIPS
known k *2136.410.96721.00000.77250.1441known k *23a35.210.95731.00000.82220.1169
Krishnan et al. * [32]2130.590.92492.93690.77250.1021Krishnan et al. * [32]23a23.750.77004.65990.86570.1497
Cho & Lee * [30]2130.460.91432.51310.79260.1106Cho & Lee * [30]23a28.670.88562.31860.84030.1276
Levin et al. * [34]2132.260.93762.03280.72390.1287Levin et al. * [34]23a30.050.91262.07960.75160.1419
Xu & Jia * [21]2133.820.95091.43990.80840.1029Xu & Jia * [21]23a29.480.86512.43570.84940.1428
Sun et al. * [37]2133.290.94021.74880.82790.0774Sun et al. * [37]23a32.480.93791.39880.86900.0858
Zuo et al. * [29]2133.650.95151.54160.80670.0942Zuo et al. * [29]23a31.990.93441.53030.89440.0972
Pan-DCP * [39]2134.490.95181.31030.80080.0997Pan-DCP * [39]23a32.690.93611.29690.87050.0949
SelfDeblur [18]2132.520.94021.99130.80580.0946SelfDeblur [18]23a34.290.94780.95190.85240.0757
Ours (soft)21, 23a40.390.98790.52440.87510.0374Ours (soft)21, 23b40.730.98800.43850.88430.0365
Ours (hard)21, 2741.940.98950.34820.87020.0456Ours (hard)23b, 2740.800.98670.22850.91670.0267
MethodBlur
Kernel
PSNRSSIMError
Ratio
FSIMLPIPSMethodBlur
Kernel
PSNRSSIMError
Ratio
FSIMLPIPS
known k *23b33.580.94931.00000.74830.1153known k *Avg.34.530.94921.00000.77540.1058
Krishnan et al. * [32]23b26.670.79242.56810.81950.1429Krishnan et al. * [32]Avg.29.880.86662.45230.80460.1282
Cho & Lee * [30]23b27.840.85101.69250.78020.1529Cho & Lee * [30]Avg.30.570.89661.71130.80510.1280
Levin et al. * [34]23b29.580.90121.45430.77850.1379Levin et al. * [34]Avg.30.800.90921.77240.77080.1301
Xu & Jia * [21]23b30.350.90961.21750.87440.1142Xu & Jia * [21]Avg.31.670.91631.48980.82530.1232
Sun et al. * [37]23b31.980.93311.10050.86530.0882Sun et al. * [37]Avg.32.990.93301.28470.83490.0935
Zuo et al. * [29]23b31.350.93061.13560.88450.1009Zuo et al. * [29]Avg.32.660.93321.25000.83610.1084
Pan-DCP * [39]23b31.430.92671.26140.86050.0935Pan-DCP * [39]Avg.32.690.92841.25550.81610.1114
SelfDeblur [18]23b33.050.93040.96510.79860.1091SelfDeblur [18]Avg.33.070.93131.19680.80860.1082
Ours (soft)23a, 23b40.740.98510.26460.90920.0339Ours (soft)Avg.40.720.98710.44480.86100.0476
Ours (hard)23a, 2741.400.98770.27000.89960.0357Ours (hard)Avg.41.070.98740.31480.86430.0446
Table 5. Comparison of average inference time on Levin test set [33] and the number of model parameters. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result.
Table 5. Comparison of average inference time on Levin test set [33] and the number of model parameters. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result.
MethodTime (s)Parameters (M)
Krishnan et al. * [32]8.9400-
Cho & Lee * [30]1.3951-
Levin et al. * [34]78.263-
Xu & Jia * [21]1.1840-
Sun et al. * [37]191.03-
Zuo et al. * [29]10.998-
Pan-DCP * [39]295.23-
SelfDeblur [18]368.5729.1
Ours423.4935.9
Table 6. Quantitative comparisons on the Lai test set [45]. The methods marked with * adopt [31,47] as non-blind deconvolution for the final result after kernel estimation. Ref. [47] is adopted as a non-blind deconvolution method in the S a t u r a t e d category, and ref. [31] for the other categories. The best results are highlighted. Results of the blur kernel “Avg.” means the averagePSNR, SSIM, FSIM and LPIPS results for all blur kernels.
Table 6. Quantitative comparisons on the Lai test set [45]. The methods marked with * adopt [31,47] as non-blind deconvolution for the final result after kernel estimation. Ref. [47] is adopted as a non-blind deconvolution method in the S a t u r a t e d category, and ref. [31] for the other categories. The best results are highlighted. Results of the blur kernel “Avg.” means the averagePSNR, SSIM, FSIM and LPIPS results for all blur kernels.
MethodBlur KernelPSNR ↑SSIM ↑FSIM ↑LPIPS ↓MethodBlur KernelPSNR ↑SSIM ↑FSIM ↑LPIPS ↓
Cho & Lee * [30]3119.600.66640.71820.3855Cho & Lee * [30]5116.740.43420.63940.4996
Xu & Jia * [21]3123.700.85340.80690.3099Xu & Jia * [21]5119.690.68210.67730.3982
Xu et al. * [35]3122.900.80770.79280.3151Xu et al. * [35]5119.180.66030.67030.4073
Michaeli et al. * [38]3122.020.7499076680.3492Michaeli et al. * [38]5118.070.49950.65620.4791
Perrone et al. * [27]3122.120.82790.75620.3501Perrone et al. * [27]5116.210.44710.63580.5002
Pan-L0 * [36]3122.580.84050.78860.3267Pan-L0 * [36]5118.080.62330.66370.4271
Pan-DCP * [39]3123.380.84780.80290.3580Pan-DCP * [39]5119.690.69610.67360.4475
SelfDeblur [18]3122.400.83450.80050.4205SelfDeblur [18]5121.270.77480.79280.4708
Ours (hard)31, 5128.570.97110.80560.1959Ours (soft)51, 5528.320.95980.80340.2131
Ours (hard)31, 7529.090.97510.82760.1691Ours (hard)51, 7528.780.96130.82520.1781
MethodBlur KernelPSNRSSIMFSIMLPIPSMethodBlur KernelPSNRSSIMFSIMLPIPS
Cho & Lee * [30]5516.990.48570.65810.4863Cho & Lee * [30]Avg.17.060.48010.65710.4997
Xu & Jia * [21]5518.980.64540.67940.4179Xu & Jia * [21]Avg.20.180.70800.71230.4121
Xu et al. * [35]5518.120.58590.67070.4386Xu et al. * [35]Avg.19.230.65930.69710.4278
Michaeli et al. * [38]5517.660.49450.65540.4942Michaeli et al. * [38]Avg.18.370.51810.67290.4904
Perrone et al. * [27]5517.330.56070.66570.4545Perrone et al. * [27]Avg.18.480.61300.68870.4568
Pan-L0 * [36]5517.190.53670.65420.4602Pan-L0 * [36]Avg.18.540.62480.68880.4454
Pan-DCP * [39]5518.710.61360.66370.4520Pan-DCP * [39]Avg.19.890.66560.69870.4625
SelfDeblur [18]5520.840.75900.70170.5112SelfDeblur [18]Avg.20.970.75240.74880.5076
Ours (hard)55, 7528.720.96240.83370.1813Ours (average)Avg.28.690.96600.81910.1875
Table 7. Ablation study on the Levin test set [33]. The best results are highlighted.
Table 7. Ablation study on the Levin test set [33]. The best results are highlighted.
ApproachLoss Fn.PSNR ↑SSIM ↑Error Ratio ↓FSIM ↑LPIPS ↓
(a) SelfDeblur [18] L 2 + TV33.070.94381.25090.80860.1082
(b) DualDeblur-A L 2 + TV35.750.95360.69210.88240.0748
(c) DualDeblur-B L 2 35.630.95280.70870.88160.0758
(d) DualDeblur-C L S S I M 39.110.96610.62260.78900.0819
(e) DualDeblur L L 2 _ S S I M 40.890.98730.37980.86270.0461
Table 8. Influence of α and γ in Equation (5) on the Levin test set [33]. The best results are highlighted.
Table 8. Influence of α and γ in Equation (5) on the Levin test set [33]. The best results are highlighted.
α γ PSNR ↑SSIM ↑FSIM ↑LPIPS
11038.850.96490.77700.0870
110039.690.97660.79040.0780
120040.650.98580.81260.0660
101039.770.97990.80730.0684
1010040.890.98730.86270.0461
1020040.700.98720.85920.0487
501039.330.98260.86100.0514
5010039.270.98180.87560.0465
5020038.960.98050.87840.0459
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shin, C.J.; Lee, T.B.; Heo, Y.S. Dual Image Deblurring Using Deep Image Prior. Electronics 2021, 10, 2045. https://doi.org/10.3390/electronics10172045

AMA Style

Shin CJ, Lee TB, Heo YS. Dual Image Deblurring Using Deep Image Prior. Electronics. 2021; 10(17):2045. https://doi.org/10.3390/electronics10172045

Chicago/Turabian Style

Shin, Chang Jong, Tae Bok Lee, and Yong Seok Heo. 2021. "Dual Image Deblurring Using Deep Image Prior" Electronics 10, no. 17: 2045. https://doi.org/10.3390/electronics10172045

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop