Multi-Scale Cyclic Image Deblurring Based on PVC-Resnet

Zhang, Kai; Chen, Minhui; Zhu, Dequan; Liu, Kaixuan; Zhao, Haonan; Liao, Juan

doi:10.3390/photonics10080862

Open AccessArticle

Multi-Scale Cyclic Image Deblurring Based on PVC-Resnet

by

Kai Zhang

,

Minhui Chen

,

Dequan Zhu

,

Kaixuan Liu

,

Haonan Zhao

and

Juan Liao

^*

College of Engineering, Anhui Agricultural University, Hefei 230036, China

^*

Author to whom correspondence should be addressed.

Photonics 2023, 10(8), 862; https://doi.org/10.3390/photonics10080862

Submission received: 16 June 2023 / Revised: 14 July 2023 / Accepted: 15 July 2023 / Published: 25 July 2023

(This article belongs to the Special Issue Recent Advances in Optical Metrology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Aiming at the non-uniform blurring of image caused by optical system defects or external interference factors, such as camera shake, out-of-focus, and fast movement of object, a multi-scale cyclic image deblurring model based on a parallel void convolution-Resnet (PVC-Resnet) is proposed in this paper, in which a multi-scale recurrent network architecture and a coarse-to-fine strategy are used to restore blurred images. The backbone network is built based on Unet codec architecture, where a PVC-Resnet module designed by combinations of parallel dilated convolution and residual network is constructed in the encoder of the backbone network. The convolution receptive field is expanded with parallel dilated convolution to extract richer global features. Besides, a multi-scale feature extraction module is designed to extract the shallow features of different scale targets in blurred images, and then the extracted features are sent to the backbone network for feature refinement. The SSIM loss function and the L₁ loss function are combined to construct the SSIM-L₁ joint loss function for the optimization of the overall network to ensure that the image restoration at different stages can be optimized. The experimental results show that the average peak signal-to-noise ratio (PSNR) of the proposed model on different data sets is as high as 32.84 dB, and the structural similarity (SSIM) reaches 0.9235. and statistical structural similarity (Stat-SSIM) of 0.9249 on different datasets. Compared with other methods, the deblurred images generated by this method are superior to the methods proposed by Nah et al., Kupyn et al. and Cho S J et al., especially on the calibration board data set. The model proposed in this paper applies parallel dilated convolution and SSIM-L₁ joint loss function to improve the performance of the network so that the edge and texture details of the restored image are clearer.

Keywords:

image deblurring; multi-scale cycle; PVC-Resnet; parallel dilated convolution; joint loss function

1. Introduction

In recent years, along with the progress of computer computing speed and the development of vision algorithms, images have become an important data source, and useful information and features can be extracted by processing and analyzing images extracted for materials [1], biology [2], medicine [3], and other application fields. However, optical imaging systems [4] are limited by their own physical characteristics, such as the shape and material of the lens and the location of the shot, resulting in the scattering and refraction phenomenon when the light passes through the lens or reflector, and vision sensors may blur images acquired in real time, causing changes in image resolution and contrast, which affects the acquisition of important information in the images, thus causing a degradation in the performance of vision algorithms [5,6]. Therefore, image deblurring is of great significance in the field of optical imaging and plays an important role in improving image quality.

Traditional image deblurring methods usually recover the sharp version from a blurred image via estimating the blurring kernel [7], which models different types of uniform, non-uniform and depth-aware blurring kernel [8] and impose various constraints to solve the blurring kernel using a priori information about the image, and finally recover the corresponding sharp image from the given blurred image. Xu et al. [9] made an assumption that the blur function is a generalized mathematical sparse expression and then performed the deconvolution operation to obtain the sharp image estimate. Zhou et al. [10] proposed an alternative method of global modeling for image deblurring research, which decouples image degradation and content components to a certain extent by Fourier transform as a priori for image degradation, and employs Fourier spatial interaction modeling and Fourier channel evolution customized core design. Hayashi et al. [11] proposed a method for the estimation of motion blur parameters of point spread function (PSF), using a filter convolution PSF consisting of PSF (motion blur) and PSF (out-of-focus blur) convolutions to inverse the blurred image and sharpening the image, which is effective. Most of the traditional works recovered blurred images by utilizing a hypothetical blurring kernel and natural image priors during the deblurring process, but the blurring function is commonly unknown and obtaining the optimal blurring kernel is a difficult task, thus directly affecting the overall deblurring performance.

The development of emerging technologies such as optical computing [12] and deep learning provides new solutions for image deblurring. Optical computing techniques can accelerate image processing and computation by exploiting the nonlinear properties of optical devices, which can be operated directly in the optical domain [13]. Unlike traditional methods, optical computing methods can directly process blurred images without estimating blur kernels. For example, by using devices such as optical phase modulators [14] or spatial light modulators [15], operations including nonlinear filtering, inverse filtering, regularization and least squares can be implemented in the optical domain, thereby directly recovering the sharp version of the blurred image. Adabi et al. [16] proposed a scalable and learnable deblurring framework to organize digital filters in an intelligent way, which can find the most suitable speckle reduction algorithm for a given image. Convolutional neural network (CNN) based image deblurring methods have also achieved remarkable success. Early CNN-based image deblurring methods used CNN as a blurring kernel estimator and constructed a two-stage CNN-based kernel estimation and deconvolution-based image deblurring framework [17], but the down-sampling operation of the image reduced the resolution of the small-scale target information in the blurred image, resulting in the loss of some details of the small targets in the deblurred image. In order to realize the restoration of small targets in blurred images, Nah et al. [18] proposed a coarse-to-fine deblurring method that used multi-scale CNN to directly deblur the image. Tao et al. [19] proposed a scale-recurrent network (SRN) with shared parameters, which can reduce network model parameters and effectively improve the efficiency of model training. On the basis of SRN, Gao et al. [20] proposed a parameter selective sharing strategy and a nested skip connection structure to realize the blind restoration of blurred images in dynamic scenes. Compared with traditional CNNs, multi-scale models can improve the deblurring performance of the network, but their multi-scale coarse-to-fine strategies are mainly to increase network depth or share network weights, leading to an increase in computational complexity.

Recently, the deblurring model based on GAN (generative adversarial network) was studied, which realized image deblurring in a single-scale way by constructing a generator and discriminator and defining a game between the discriminator and the generator. Kupyn et al. proposed DeblurGAN [21], a kernel-free blind motion deblurring based on a conditional adversarial network [22] that is optimized using a multi-component loss function, which is helpful for the detection of blurred images. Zhao et al. [23] proposed a lightweight and real-time unsupervised BID baseline, called Frequency Domain Contrast Loss Constrained Lightweight CycleGAN (FCL-GAN), where two new cooperative units, Lightweight Domain Conversion Unit (LDCU) and Parameterless Frequency Domain Contrast Unit (PFCU) were designed, and the performance of the model was proved on multiple datasets. The GANs-based image deblurring methods have proven to be effective in image deblurring but suffer from training instability. With the increase in the number of network layers, the network has problems such as information loss and overfitting, resulting in a serious degradation in model performance. To solve the above problems, some studies have found that the introduction of residual blocks in CNNs can build wider networks, and residual learning can make the network learn more features from each convolutional layer [24,25,26]. Based on the advantage of residual learning, Sharif et al. [27] proposed an end-to-end scale recurrent deep network to achieve multi-modal image deblurring, in which a new residual dense block with spatial asymmetric attention is designed and the experimental results show that it has obvious advantages in qualitative and quantitative evaluation. It has been demonstrated that different levels of blur in images can be better handled from multi-scale images [28]. Lin et al. [29] used a transformer-based deep learning model for image deblurring and used the imaging model to generate 16 different blurring kernels to adapt the model to different degrees of blurred images, and the results show that the proposed method can remove different degrees of blurring and can handle images with both clear and blurred targets. Based on this, various CNN-based deblurring methods have also adopted this idea where blurred images of different scales are used as inputs to each sub-network [30], and a coarse-to-fine design principle has proven to be effective for image deblurring.

In order to achieve the restoration of targets of different scales in blurred images and to obtain the global features of blurred images, a multi-scale circular image deblurring model based on PVC-Resnet is proposed in this paper. Resnet is introduced into the network to avoid gradient disappearance and information loss in the deep neural network. According to the characteristics of blurred images, parallel dilated convolution is applied to expand the convolution receptive field and improve the deblurring ability of the network model. A multi-scale feature extraction module is proposed to extract the shallow information of different size targets in blurred images. In addition, to compensate for the loss of necessary resolution information of small targets caused by the downsampling process, this paper uses a multiscale cyclic method to reduce the blurred images and perform feature fusion between different scales by upsampling. Aiming at the deviation of brightness and color caused by SSIM loss function, combined with L₁ norm loss, the joint loss function SSIM-L₁ loss is constructed to restore the deblurred image closer to the true value image in edge detail and brightness color.

2. Materials and Methods

2.1. Data Description

Two datasets have been used in this paper, GoPro dataset and the calibration board dataset. The GoPro dataset was created by Nah et al. [19]. It simulates complex camera shakes and object motions. The motion-blurred scenes include pedestrian motion and vehicle motion, which are widely used in the field of image deblurring. The dataset consists of one-to-one correspondence of real blurred images and groundtruth images taken by high-speed cameras, including 3214 blurred images with a size of 1280 × 720, of which 2103 are training images and 1111 are test images. Some examples of the GoPro dataset are shown in Figure 1.

The calibration plate dataset is generated by MATLAB software and is commonly used in the fields of camera calibration and image distortion correction, etc. The use of the calibration plate dataset allows us to obtain the accurate internal and external parameters of the camera, which can better recover the image details and textures and improve the effectiveness of the deblurring. Besides, with the calibration plate dataset, we can use known geometrical structures to evaluate the reconstruction accuracy and fidelity of different deblurring methods, which allows for a better comparison of the performance of different deblurring methods. The generated images are enhanced by geometric transformations such as rotation, scaling, and tilting. After the screening, 2400 clear calibration plate images are obtained. In order to make the blurred image closer to the actual shooting situation, this paper uses the fuzzy algorithm to process the generated calibration plate images, and finally, 800 defocus blurred images and 1600 motion blurred images are obtained. Some examples of the calibration plate dataset are shown in Figure 2.

2.2. Multi-Scale Cyclic Deblurring Model Based on PVC-Resnet

The structure of the multi-scale cyclic deblurring model based on PVC-Resnet proposed in this paper is shown in Figure 3. In the proposed model, the Unet codec architecture is used as the backbone network. To improve the deblurring ability of the model, a multi-scale feature extraction module and PVC-Resnet module are designed in the construction of the network encoder. The multi-scale feature extraction module realizes shallow feature extraction of different-sized targets in blurred images by concatenating multiple convolutional layers in parallel. The PVC-Resnet module combines parallel dilated convolution with a residual network. The residual structure directly transmits the input information to the output by introducing a cross-layer connection, so as to avoid gradient disappearance and information loss in the deep neural network, thereby improving the performance and training efficiency of the model. Besides, the dilated convolution effectively expands the convolution receptive field without increasing the amount of calculation and realizes the extraction of global and local information from blurred images. Moreover, in order to make up for the loss of the necessary resolution information caused by the downsampling process, this paper uses a multi-scale cycle to restore the blurred image. As shown in Figure 3, in the coding stage, the model input is a blurred image of different scales. The small-scale blurred images are first extracted by using the multi-scale feature extraction module and then fed into the backbone network stage 3 for feature refinement. After upsampling, C₃ is fused with the blurred image, and then input to the backbone network stage 2 through the multi-scale feature extraction module for feature refinement. After upsampling, C₂ is fused with the largest-scale blurred image B1. Finally, the deblurred image C₁ is output through a complete backbone network. At the same time, the output results of each loop are supervised by clear images of corresponding scales, and the SSIM-L₁ joint loss function is constructed to calculate the loss between the two, so as to ensure that the image restoration at different stages can be optimized.

2.2.1. Backbone Network Design

The backbone network of this paper is based on the Unet architecture based on the coding network block-decoding network block. The Unet network performs dimensional splicing of the feature map through skip connections, which can retain more location and feature information, and is superior to other network structures in image deblurring tasks. The coding network block of this paper is composed of a convolution layer, PVC-Resnet module and maximum pooling layer. Stage 0–Stage 5 in Table 1 shows the structure of the coding network. The convolution layer is responsible for the feature extraction of the input feature map, and the maximum pooling layer is responsible for two times the downsampling of the input feature map to reduce the spatial resolution of the image. The PVC-Resnet module combines parallel dilated convolution with residual network and uses the Resnet network to improve the deblurring performance of the model. According to the characteristics of blurred images, parallel dilated convolution is added to expand the convolution receptive field and obtain the global features of the image. The decoding network block is composed of an upsampling layer and a convolution layer. The structure is shown in Stage 6–Stage 9 in Table 1. The encoder and decoder are connected across layers through a long skip connection structure, which effectively aggregates the context information of the shallow network and the deep network and makes up for the information loss caused by the downsampling process in the coding stage.

2.2.2. Multi-Scale Feature Extraction Module

Due to the camera jitter and target motion during the imaging process, the collected images will be blurred to varying degrees at different positions. Therefore, the network model is required to have strong feature extraction capabilities and be able to adapt to the feature extraction of different-sized targets in blurred images. Generally, the shallow network convolution receptive field is small, which can retain the required resolution information for small targets in blurred images, while the deep network convolution receptive field is large, which is suitable for processing large targets. Therefore, this paper proposes a multi-scale feature extraction module consisting of a different number of convolutional layers of different sizes in parallel to achieve multi-scale feature extraction of different scale targets in blurred images. The structure of the multi-scale feature extraction module is shown in Figure 4. The feature information of different-sized targets in the blurred image is extracted by four parallel branches. Each branch contains a different number of CBR modules, each of which contains a 3 × 3 convolutional layer, a batch normalization layer and a linear rectification function. As shown in Figure 4, from left to right, each branch consists of two CBR modules, four CRB modules and one CRB module, respectively, and each branch is followed by a 1 × 1 convolution to adjust the number of channels for dimension connection between feature maps. In order to reduce the amount of calculation, the parameters of the first branch and the first convolution layer with the second branch are shared to extract the blur of the larger size target in the blurred image. The third branch network is shallow, which is used to extract the blur of small-size targets in blurred images. The fourth branch contains only a 1 × 1 convolution, which is used to capture the global and local features of each pixel in the blurred image. Finally, the feature information extracted from the four branches is fused to enhance the weights of useful features in the network and suppress useless features to complete the feature extraction of multi-scale targets in blurred images.

2.2.3. PVC-Resnet Module

In the process of inputting blurred images into the neural network, in order to increase the receptive field and reduce the amount of calculation, the network will perform downsampling operations such as convolution and pooling on the image. These operations increase the receptive field but reduce the spatial resolution, resulting in the loss of internal data structure and spatial hierarchical information of the image. Therefore, based on the Resnet network, this paper proposes a parallel void convolution-Resnet (PVC-Resnet) module in the encoder of the backbone network. The built PVC-Resnet module introduces dilated convolution into the Resnet network, which can expand the receptive field, extract richer global features and enhance the deblurring ability of the network model. For the dilated convolution, ‘ dilation rate ‘ is introduced into the standard convolution layer, and the spacing of each value is defined when the convolution kernel processes data, which can increase the receptive field without increasing the amount of calculation and make full use of the information contained in the feature map. The relationship between the actual receptive field size and the expansion rate of the feature map is defined as follows:

N = k + (k - 1) \times (d - 1)

(1)

where d represents the expansion rate, that is, the number of intervals between the convolution kernel. (d − 1) represents the number of spaces filled in, k represents the size of the convolution kernel, and N represents the actual receptive field size of the feature map, respectively. The receptive field changes after increasing the dilated convolution are shown in Table 2.

For the deblurring task, the receptive field should be large enough to ensure the capture of severe large-scale blurring, but too large an expansion rate will lose the continuity of image information. With regard to these, this paper constructs a parallel void convolution module based on the ResNet structure, as shown in Figure 5. Through the combination of the void convolution and the standard convolution, the missing information of the void convolution at the spatial level is effectively supplemented to give better continuity to the whole network.

As shown in Figure 5, S2, S3, and S4 are composed of multiple end-to-end residual units BTNK1 and BTNK2. The structures BTNK1 and BTNK2 are given in Figure 5. BTNK2 makes the input x obtain the mapping function F(x) through three convolutional layers and related BN layers and Relu activation functions and then adds it to x through a jump connection structure to finally obtain the mapping function. Compared with BTNK2, BTNK1 has a 1 × 1 branch convolution layer, which is used to adjust the number of channels between input x and output F(x), match the difference between input and output dimensions, and finally obtain the mapping function F(x) + G(x). The structural advantages of BTNK1 and BTNK2 enable shallow features to be directly mapped to the deep layer, which effectively solves the problem of loss of target texture detail information caused by traditional convolutional layers and fully connected layers in the process of image deblurring.

2.2.4. Loss Function

The design of loss function can impact the performance of the network. In this paper, the joint loss function is built to complete the network parameter optimization. The loss function is defined as the joint loss of structural similarity (SSIM) loss and L₁ norm loss, which is calculated as:

L_{l o s s} = α \cdot L_{SSIM} + (1 - α) \cdot G_{α} \cdot L_{l 1}

(2)

In the equation,

L_{l o s s}

represents the loss value of the joint loss function,

L_{SSIM}

represents the loss value of SSIM,

L_{l 1}

represents the loss value of L₁,

α

represents a constant, usually 0.84,

G_{α}

represents the Gaussian distribution parameter. In practical applications, the Gaussian function is generally used to calculate the mean, variance and covariance of the image, instead of traversing the pixels to obtain higher efficiency.

The L₁ norm loss function is also called the minimum absolute deviation (LAD). In general, it minimizes the sum S of the absolute difference between the target value

Y_{i}

and the estimated value

f (x_{i})

. The calculation is as follows:

S = \sum_{i = 1}^{n} | Y_{i} - f (x_{i}) |

(3)

The full name of the SSIM loss function is structural similarity index, which is structural similarity. It considers brightness, contrast and structural indicators, and considers human perception. In general, the results obtained by SSIM will have better detail.

The research shows that SSIM loss can easily lead to brightness change and color deviation, but it can retain high-frequency information and better restore the edges and details of the image, while L₁ loss function can better maintain brightness and color unchanged. Therefore, this paper adopts a joint loss function combining SSIM loss and L₁ loss.

3. Experiment and Analysis

3.1. Experimental Environment and Parameter Settings

The Intel(R)-Xeon(R) CPU E5-2699 v3, 2.30 GHz, and NVIDIA Quadro P2000 GPU with 5 GB of video memory were used for the experiments. The software environment is Windows 64-bit operating system, using Python as the programming language and Pytorch as the deep learning framework.

In the network training, a batch training method is used to divide the training and validation sets into multiple batches, and all the images of the training set are computed in the network model as one epoch. The network model is initialized by loading pre-trained weights to initialize the network. The initial learning rate is 1 × 10⁻³, and the Adam algorithm is used to optimize and calculate the adaptive learning rate for each weight parameter.

3.2. Evaluation Indicators

The evaluation criteria used in this paper include Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM), Statistical Structural Similarity (Stat-SSIM), and Deblurring Time of a single image. PSNR is the most common and widely used objective evaluation index of an image, which evaluates the quality of an image according to the error between the corresponding pixel points, and is expressed in dB, with a larger value indicating that the smaller the distortion is and the closer the two images are SSIM is a measure of the similarity of two images, which takes into account the brightness, contrast and structural indicators, and human perception. Generally speaking, the results obtained by SSIM will have better details. Stat-SSIM is an improved algorithm based on SSIM, which takes into account the statistical variations of the image and uses the histogram information of the image to adjust the normalization coefficients of the luminance and contrast factors to be able to more accurately assess the quality of the image, and its calculation is shown in Equation (4):

Stat - SSIM (x, y) = \frac{1}{n_{p} n_{c}} \sum_{p c} (\frac{2 μ_{x} μ_{y} + ε_{μ}}{μ_{x}^{2} + μ_{y}^{2} + ε_{μ}})_{p c} \times (\frac{2 E [(x - μ_{x}) (y - μ_{y})] + ε_{σ}}{E [{(x - μ_{x})}^{2}] + E [{(y - μ_{y})}^{2}] + ε_{σ}})_{p c}

(4)

where

μ_{i}

is a shorthand for the image i expected value

E [i]

, which is computed through convolution with an 11 × 11 Gaussian kernel of standard deviation 3. All the other expectation values in the equation are calculated by convoluting the argument quantity with the same Gaussian filter

ε_{μ}

and

ε_{σ}

are two regulators that limit the maximum resolution of the fractions in the computational equation and impose cutoffs on the mean and variance expectation values, respectively. In particular, when the values of both the numerator and denominator of the fraction are much smaller than the corresponding

ε

, the output is close to 1. The results are finally averaged over the whole image containing

n_{p}

pixels and

n_{c}

channel dimensions.

The image expectation value

μ

is calculated as shown in Equation (5):

μ_{x} = \frac{1}{H \times W} \sum_{m = 1}^{H} \sum_{n = 1}^{M} x (m, n)

(5)

σ

represents the standard deviation of the image and is calculated as shown in Equation (6):

σ_{x} = (\frac{1}{H + W - 1} \sum_{m = 1}^{H} \sum_{n = 1}^{M} (x ((m, n) - μ_{x})^{2})^{\frac{1}{2}}

(6)

where (m, n) denotes the image position coordinates, and H, W represent the height and width of the image, respectively.

3.3. Analysis and Comparison of Experimental Results

3.3.1. Ablation Experiment

In order to verify the effectiveness of the PVC-Resnet module and the joint loss function SSIM-L₁ proposed in this paper, ablation experiments were designed to test the proposed deblurring model, and the performance of the SSIM-L₁ + Resnet model, L₁ + PVC-Resnet model, SSIM + PVC-Resnet model and SSIM-L₁ + PVC-Resnet model were tested based on two sample data sets, and the results are shown in Figure 6. As can be seen from the detailed comparison graphs, due to the lack of the advantage of parallel void convolution to expand the convolutional field, the image deblurring effect of the SSIM-L₁ + Resnet model is the worst, and the recovered images have one or more different levels of defects in the edge details, respectively. Compared with the SSIM-L₁ + Resnet model, the image deblurring effect of the L₁ + PVC-Resnet model is slightly improved, but some texture details in the recovered images are still not clear enough. The SSIM + PVC-Resnet model has a better recovery effect in the edges and details of blurred images, but its recovered images slightly deviate from the true value images in terms of brightness and color. In contrast, the SSIM-L₁ + PVC-Resnet model proposed in this paper uses the SSIM-L₁ joint loss function to calculate the error loss value between the prediction result and the label data after expanding the receptive field through parallel dilated convolution. The restored image is clearer with more sharp edges and details. It is also closer to the true value image in brightness and color, and has a better deblurring effect.

Table 3 is the test results of image deblurring under different structural models. From Table 3, it can be seen that there is not much difference in the running time of single image deblurring between the structural models on the Gopro dataset and the calibration plate dataset, but the SSIM-L₁ + PVC-Resnet model proposed in this paper outperforms the SSIM-L₁ + Resnet model, L₁ + PVC-Resnet model and SSIM + PVC-Resnet models in terms of PSNR, SSIM and Stat-SSIM metrics. On the PSNR index, the SSIM-L₁ + PVC-Resnet model is 2.95 dB and 4.68 dB higher than the SSIM-L₁ + Resnet model on the two datasets, 3.10 dB and 1.47 dB higher than the L₁ + PVC-Resnet model, and 0.53 dB and 0.74 dB higher than the SSIM + PVC-Resnet model, respectively. On the SSIM index, the SSIM-L₁ + PVC-Resnet model is 0.0174 and 0.0253 higher than the SSIM-L₁ + Resnet model on the two data sets, 0.0067 and 0.0031 higher than the L₁ + PVC-Resnet model, and 0.0024 and 0.0078 higher than the SSIM + PVC-Resnet model. The Stat-SSIM of the SSIM-L₁ + PVC-Resnet model on Gopro and calibration board datasets are 0.9226 and 0.9271, which are higher than the SSIM-L₁ + Resnet model by 0.0385 and 0.0353, respectively. The data analysis shows that compared with the SSIM-L₁ + Resnet model, the SSIM-L₁ + PVC-Resnet model has a significant improvement in PSNR, SSIM and Stat-SSIM indicators. The parallel dilated convolution expands the convolution receptive field, making the feature information output by the convolution layer larger than the ordinary convolution range, and enhancing the deblurring ability of the network model. By analyzing the SSIM-L₁ + PVC-Resnet model, L₁ + PVC-Resnet model and SSIM + PVC-Resnet model, it can be seen that the SSIM-L₁ joint loss function proposed in this paper has higher PSNR, SSIM and Stat-SSIM indicators than the single L₁ loss function and SSIM loss function, indicating that the SSIM-L₁ joint loss function can restore blurred images more effectively.

3.3.2. Comparative Analysis of Performance of Different Deblurring Models

In order to verify the effectiveness of the network proposed in this paper, this experiment is based on GoPro dataset and calibration plate dataset to test the network model performance of the methods proposed by Nah et al. [18], Kupyn et al. [21], Cho S J et al. [26] and evaluate the algorithm strengths and weaknesses by PSNR and SSIM indicators. The test results are shown in Figure 7, from left to right are blurred image, true value image, Nah et al.’s method, Kupyn et al.’s method, Cho S J et al.’s method and our method. From the detailed comparison figure, we can see that the deblurred images generated by Nah et al.’s, and Kupyn et al.’s methods have more obvious ringing artifacts which are pseudo-edges similar to information fluctuations and oscillations when high-frequency information is added to enhance the edges and details of the image during image sampling and reconstruction. Compared with Nah et al.’s, and Kupyn et al.’s methods, the method in this paper has more accurate and clear feature contours on the recovered images, fewer artifacts, and higher digital recognition. Compared with the method of Cho S J et al., this paper has a slight improvement in light and brightness, which is closer to the real image, and the overall texture of the image is smoother and better perceived.

Table 4 shows the test results of image deblurring under different methods. From Table 4, it can be seen that on the Gopro dataset and the calibration board dataset, the model proposed in this study outperforms the methods of Nah et al., Kupyn et al. and Cho S J et al. in terms of PSNR, SSIM and Stat-SSIM metrics, although it is slightly faster than the method proposed by Nah et al. in terms of the runtime for single image deblurring. The PSNR value of this method is 3.75 dB and 1.62 dB higher than that of Nah et al., 1.88 dB and 1.35 dB higher than that of Kupyn et al. and 0.23 dB and 0.85 dB higher than that of Cho S J et al. On the SSIM index, the proposed method is 0.0227 and 0.0053 higher than that of Nah et al., 0.0064 and 0.0131 higher than that of Kupyn et al., 0.0049 and 0.0045 higher than that of Cho S J et al. On the Stat-SSIM metrics, the method proposed herein is 0.036 and 0.0176 higher than the method of Nah et al., 0.0213 and 0.0104 higher than the method of Kupyn et al. and 0.0047 and 0.0016 higher than the method of Cho S J et al., respectively. Through data analysis, it can be seen that the multi-scale cyclic image deblurring model based on PVC-Resnet proposed in this paper can effectively compensate for the missing feature information due to the lack of receptive field in the image deblurring process and improve the image deblurring effect.

4. Conclusions

In this paper, we propose a PVC-Resnet-based multi-scale cyclic image deblurring model to achieve high-quality single-image deblurring. In this paper, we make full use of the codec structure to build a backbone network based on the PVC-Resnet residual module and use parallel cavity convolution to expand the convolutional field of perception to improve the deblurring performance at each scale. A multi-scale feature extraction module is proposed to extract feature information of targets at different scales in blurred images, which is later fed to the backbone network for feature refinement. The SSIM-L₁ joint loss function is constructed by combining the SSIM loss function and the L₁ loss function, which is used to optimize the learning of the overall network parameters and further improve the deblurring performance of the overall network. Finally, the multi-scale cyclic architecture is used to gradually fuse the extracted image features of different scales to make up for the loss of the necessary resolution information caused by the downsampling process and enhance the model deblurring effect. Compared with the methods proposed by Nah et al., Kupyn et al. and Cho S J et al., although there is a slight decrease in the running speed of single image deblurring compared to the methods of Kupyn et al., Cho S J et al., the deblurring images generated by the model in this paper achieve the best results in PSNR, SSIM, and Stat-SSIM metrics, especially on the calibration board dataset. It can provide algorithm support for the restoration of blurred images caused by factors such as camera jitter, defocus, and rapid movement of objects. In the subsequent research, the network model will be lightweight and designed to improve the efficiency of single-image deblurring on the basis of ensuring image deblurring performance.

Author Contributions

Conceptualization, K.Z. and J.L.; methodology, K.Z. and M.C.; software, K.Z. and K.L.; validation, J.L. and M.C.; formal analysis, H.Z.; investigation, K.Z.; resources, K.L.; data curation, H.Z.; writing—original draft preparation, K.Z.; writing—review and editing, J.L. and D.Z.; visualization, M.C.; supervision, D.Z.; project administration, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (No. 32201665), the Key R&D Program of China (2022YFD2001801-3), the Natural Science Foundation of Anhui (No. 2108085MC96) and the Key R&D Program of Anhui (No. 202004a06020016).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Vel, R.; Bhatt, A.; Priyanka, A.; Gauthaman, A.; Anilkumar, V.; Safeena, A.S.; Ranjith, S. DEAE-Cellulose-based composite hydrogel for 3D printing application: Physicochemical, mechanical, and biological optimization. Mater. Today Commun. 2022, 33, 104335. [Google Scholar] [CrossRef]
Gu, S.H.; Yu, C.H.; Song, Y.; Kim, N.Y.; Sim, E.; Choi, J.Y.; Song, D.H.; Hur, G.H.; Shin, Y.K.; Jeong, S.T. A small interfering RNA lead targeting RNA-dependent RNA-polymerase effectively inhibit the SARS-CoV-2 infection in golden syrian hamster and rhesus macaque. bioRxiv 2020. [Google Scholar] [CrossRef]
Zang, K.; Hui, L.; Wang, M.; Huang, Y.; Zhu, X.; Yao, B. TIM-3 as a prognostic marker and a potential immunotherapy target in human malignant tumors: A meta-analysis and bioinformatics validation. Front. Oncol. 2021, 11, 579351. [Google Scholar] [CrossRef] [PubMed]
Park, J.; Brady, D.J.; Zheng, G.; Tian, L.; Gao, L. Review of bio-optical imaging systems with a high space-bandwidth product. Adv. Photonics 2021, 3, 044001. [Google Scholar] [CrossRef] [PubMed]
Harmeling, S.; Hirsch, M.; Schölkopf, B. Space-variant single-image blind deconvolution for removing camera shake. In Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; pp. 829–837. [Google Scholar]
Hirsch, M.; Schuler, C.J.; Harmeling, S.; Schölkopf, B. Fast removal of non-uniform camera shake. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV 2011), Barcelona, Spain, 6–13 November 2011; pp. 463–470. [Google Scholar]
Gupta, A.; Joshi, N.; Zitnick, C.L.; Cohen, M.; Curless, B. Single image deblurring using motion density functions. In Proceedings of the the 11th European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; pp. 171–184. [Google Scholar]
Torres, G.F.; Kämäräinen, J. Depth-Aware Image Compositing Model for Parallax Camera Motion Blur. In Proceedings of the Scandinavian Conference on Image Analysis, Levi, Finland, 18–21 April 2023; pp. 279–296. [Google Scholar]
Xu, L.; Zheng, S.C.; Jia, J.Y. Unnatural L0 sparse representation for natural image deblurring. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1107–1114. [Google Scholar]
Zhou, M.; Huang, J.; Guo, C.L.; Li, C. Fourmer: An Efficient Global Modeling Paradigm for Image Restoration. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 June 2023; pp. 42589–42601. [Google Scholar]
Hayashi, T.; Tsubouchi, T. Estimation and sharpening of blur in degraded images captured by a camera on a moving object. Sensors 2022, 22, 1635. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Ouyang, S.; Shen, Y.; Chen, X. Ternary Optical Computer: An Overview and Recent Developments. In Proceedings of the 2021 12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), Xi’an, China, 10–12 December 2021; pp. 82–87. [Google Scholar]
Berger, K.; Machwitz, M.; Kycko, M.; Kefauver, S.C.; Van Wittenberghe, S.; Gerhards, M.; Verrelst, J.; Atzberger, C.; van der Tol, C.; Damm, A.; et al. Multi-sensor spectral synergies for crop stress detection and monitoring in the optical domain: A review. Remote Sens. Environ. 2022, 280, 113198. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Xu, H.; Zhu, H.; Chen, X.; Wang, Y. Pixel-wise Phase Unwrapping with Adaptive Reference Phase Estimation for 3-D Shape Measurement. IEEE Trans. Instrum. Meas. 2023, 72, 5006309. [Google Scholar] [CrossRef]
Benea-Chelmus, I.C.; Meretska, M.L.; Elder, D.L.; Tamagnone, M.; Dalton, L.R.; Capasso, F. Electro-optic spatial light modulator from an engineered organic layer. Nat. Commun. 2021, 12, 5928. [Google Scholar] [CrossRef] [PubMed]
Adabi, S.; Rashedi, E.; Clayton, A.; Mohebbi-Kalkhoran, H.; Chen, X.W.; Conforto, S.; Nasiriavanaki, M. Learnable despeckling framework for optical coherence tomography images. J. Biomed. Opt. 2018, 23, 016013. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cheng, J.; Zhu, W.; Li, J.; Xu, G.; Chen, X.; Yao, C. Restoration of atmospheric turbulence-degraded short-exposure image based on convolution neural network. Photonics 2023, 10, 666. [Google Scholar] [CrossRef]
Nah, S.; Kim, T.H.; Lee, K.M. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 257–265. [Google Scholar]
Tao, X.; Gao, H.Y.; Shen, X.Y.; Wang, J.; Jia, J.Y. Scale-Recurrent Network for Deep Image Deblurring. In Proceedings of the 2018 IEEE/CVF Conf. On Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8174–8182. [Google Scholar]
Gao, H.Y.; Tao, X.; Shen, X.Y.; Jia, J.Y. Dynamic scene deblurring with parameter selective sharing and nested skip connections. In Proceedings of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3843–3851. [Google Scholar]
Kupyn, O.; Budzan, V.; Mykhailych, M.; Mishkin, D.; Matas, J. DeblurGAN: Blind motion deblurring using conditional adversarial networks. In Proceedings of the 2018 IEEE/CVF Conf. On Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8183–8192. [Google Scholar]
Huang, X.; Li, Q.; Tai, Y.; Chen, Z.; Liu, J.; Shi, J.; Liu, W. Time series forecasting for hourly photovoltaic power using conditional generative adversarial network and Bi-LSTM. Energy 2022, 246, 123403. [Google Scholar] [CrossRef]
Zhao, S.; Zhang, Z.; Hong, R.; Xu, M.; Yang, Y.; Wang, M. FCL-GAN: A lightweight and real-time baseline for unsupervised blind image deblurring. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 6220–6229. [Google Scholar]
Wightman, R.; Touvron, H.; Jégou, H. Resnet strikes back: An improved training procedure in timm. arXiv 2021, arXiv:2110.00476. [Google Scholar]
Miao, M.; Zheng, L.; Xu, B.; Yang, Z.; Hu, W. A multiple frequency bands parallel spatial–temporal 3D deep residual learning framework for EEG-based emotion recognition. Biomed. Signal Process. Control. 2023, 79, 104141. [Google Scholar] [CrossRef]
Cho, S.J.; Ji, S.W.; Hong, J.P.; Jung, S.W.; Ko, S.J. Rethinking coarse-to-fine approach in single image deblurring. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4641–4650. [Google Scholar]
Sharif, S.M.A.; Naqvi, R.A.; Mehmood, Z.; Hussain, J.; Ali, A.; Lee, S.W. MedDeblur: Medical Image Deblurring with Residual Dense Spatial-Asymmetric Attention. Mathematics 2022, 11, 115. [Google Scholar] [CrossRef]
Liu, S.; Wang, H.; Wang, J.; Pan, C. Blur-kernel bound estimation from pyramid statistics. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 1012–1016. [Google Scholar] [CrossRef]
Lin, H.; Ma, L.; Hu, Q.; Zhang, X.; Xiong, Z.; Han, H. Single image deblurring for pulsed laser range-gated imaging system with multi-slice Integration. Photonics 2022, 9, 642. [Google Scholar] [CrossRef]
Suin, M.; Purohit, K.; Rajagopalan, A.N. Spatially-attentive patch-hierarchical network for adaptive motion deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3606–3615. [Google Scholar]

Figure 1. Samples of GoPro Dataset (a) blurry image; (b) clear image.

Figure 2. Samples of Calibration board dataset (a) defocus blur; (b) motion blur.

Figure 3. Multi-scale circular image deblurring model based on PVC-Resnet.

Figure 4. Multi-scale feature extraction module.

Figure 5. PVC-ResNet module.

Figure 6. Subjective comparison of different structural models for deblurring.

Figure 7. Comparison of subjective effects of deblurring results of each model image.

Table 1. Detailed information of the model.

	Module Type	Conval Kernel Size	Step Size	Input Size
Stage 0	Conv2D + BN + Relu	3 × 3	1	1280 × 720 × 3
Stage 1	PVC-Resnet	3 × 3, 1 × 1	1	1280 × 720 × 32
Stage 1	maxpooling	3 × 3	2	1280 × 720 × 32
Stage 2	PVC-Resnet	3 × 3, 1 × 1	1	640 × 360 × 64
Stage 2	maxpooling	3 × 3	2	640 × 360 × 64
Stage 3	PVC-Resnet	3 × 3, 1 × 1	1	320 × 180 × 128
Stage 3	maxpooling	3 × 3	2	320 × 180 × 128
Stage 4	PVC-Resnet	3 × 3, 1 × 1	1	160 × 80 × 256
Stage 4	maxpooling	3 × 3	2	160 × 80 × 256
Stage 5	Conv2D + BN + Relu	3 × 3	1	80 × 40 × 512
	Conv2D + BN + Relu	3 × 3	1	80 × 40 × 512
	Conv2D + BN + Relu	3 × 3	1	80 × 40 × 512
	Conv2D + BN + Relu	3 × 3	1	80 × 40 × 512
Stage 6	UpSampling2D	2 × 2	—	80 × 40 × 512
	Conv2D + BN + Relu	3 × 3	1	160 × 80 × 256
	Concat	——	—	160 × 80 × 256 160 × 80 × 256
	Conv2D + BN + Relu	3 × 3	1	160 × 80 × 256
Stage 7	UpSampling2D	2 × 2	—	160 × 80 × 256
	Conv2D + BN + Relu	3 × 3	1	320 × 180 × 128
	Concat	——	—	320 × 180 × 128 320 × 180 × 128
	Conv2D + BN + Relu	3 × 3	1	320 × 180 × 128
Stage 8	UpSampling2D	2 × 2	—	320 × 180 × 128
	Conv2D + BN + Relu	3 × 3	1	640 × 360 × 64
	Concat	——	—	640 × 360 × 64
	Conv2D + BN + Relu	3 × 3	1	640 × 360 × 64
Stage 9	UpSampling2D	2 × 2	—	640 × 360 × 64
	Conv2D + BN + Relu	3 × 3	1	1280 × 720 × 32
	Concat	——	—	1280 × 720 × 32 1280 × 720 × 32
	Conv2D + BN + Relu	3 × 3, 1 × 1	1	1280 × 720 × 3

Table 2. The receptive field size of convolution kernel under different D values.

Conv Layer	D	Size of Conv Kernel	Feeling Field
Conv-1	1	3	3
Conv-2	2	3	5
Conv-3	4	3	9

Table 3. Evaluation index scores of different structural models.

	Gopro Dataset				Calibration Board Dataset
Model	SSIM-L₁ + Resnet	L₁ + PVC-Resnet	SSIM + PVC-Resnet	SSIM-L₁ + PVC-Resnet	SSIM-L₁ + Resnet	L₁ + PVC-Resnet	SSIM + PVC-Resnet	SSIM-L₁ + PVC-Resnet
PSNR/dB	28.34	28.19	30.76	31.29	29.70	32.91	33.64	34.38
SSIM	0.8968	0.9075	0.9118	0.9142	0.9074	0.9296	0.9249	0.9327
Stat-SSIM	0.8841	0.9106	0.9120	0.9226	0.8918	0.9115	0.9077	0.9271
Time/s	0.2896	0.3029	0.2961	0.2957	0.2764	0.2986	0.2850	0.2814

Table 4. Image deblurring test results of each model.

	Gopro Dataset				Calibration Board Dataset
Model	Nah	Kupyn	Cho S J	Ours	Nah	Kupyn	Cho S J	Ours
PSNR/dB	27.84	29.41	31.06	31.29	32.76	33.03	33.53	34.38
SSIM	0.8915	0.9078	0.9093	0.9142	0.9274	0.9196	0.9282	0.9327
Stat-SSIM	0.8866	0.9013	0.9179	0.9226	0.9095	0.9167	0.9255	0.9271
Time/s	0.3376	0.1903	0.2382	0.2957	0.3162	0.1739	0.2053	0.2814

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, K.; Chen, M.; Zhu, D.; Liu, K.; Zhao, H.; Liao, J. Multi-Scale Cyclic Image Deblurring Based on PVC-Resnet. Photonics 2023, 10, 862. https://doi.org/10.3390/photonics10080862

AMA Style

Zhang K, Chen M, Zhu D, Liu K, Zhao H, Liao J. Multi-Scale Cyclic Image Deblurring Based on PVC-Resnet. Photonics. 2023; 10(8):862. https://doi.org/10.3390/photonics10080862

Chicago/Turabian Style

Zhang, Kai, Minhui Chen, Dequan Zhu, Kaixuan Liu, Haonan Zhao, and Juan Liao. 2023. "Multi-Scale Cyclic Image Deblurring Based on PVC-Resnet" Photonics 10, no. 8: 862. https://doi.org/10.3390/photonics10080862

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Scale Cyclic Image Deblurring Based on PVC-Resnet

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Description

2.2. Multi-Scale Cyclic Deblurring Model Based on PVC-Resnet

2.2.1. Backbone Network Design

2.2.2. Multi-Scale Feature Extraction Module

2.2.3. PVC-Resnet Module

2.2.4. Loss Function

3. Experiment and Analysis

3.1. Experimental Environment and Parameter Settings

3.2. Evaluation Indicators

3.3. Analysis and Comparison of Experimental Results

3.3.1. Ablation Experiment

3.3.2. Comparative Analysis of Performance of Different Deblurring Models

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI