Combining Deep Image Prior and Second-Order Generalized Total Variance for Image Inpainting

You, Shaopei; Xu, Jianlou; Fan, Yajing; Guo, Yuying; Wang, Xiaodong

doi:10.3390/math11143201

Open AccessArticle

Combining Deep Image Prior and Second-Order Generalized Total Variance for Image Inpainting

by

Shaopei You

,

Jianlou Xu

^*,

Yajing Fan

,

Yuying Guo

and

Xiaodong Wang

School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang 471000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(14), 3201; https://doi.org/10.3390/math11143201

Submission received: 13 June 2023 / Revised: 16 July 2023 / Accepted: 19 July 2023 / Published: 21 July 2023

Download

Browse Figures

Versions Notes

Abstract

:

Image inpainting is a crucial task in computer vision that aims to restore missing and occluded parts of damaged images. Deep-learning-based image inpainting methods have gained popularity in recent research. One such method is the deep image prior, which is unsupervised and does not require a large number of training samples. However, the deep image prior method often encounters overfitting problems, resulting in blurred image edges. In contrast, the second-order total generalized variation can effectively protect the image edge information. In this paper, we propose a novel image restoration model that combines the strengths of both the deep image prior and the second-order total generalized variation. Our model aims to better preserve the edges of the image structure. To effectively solve the optimization problem, we employ the augmented Lagrangian method and the alternating direction method of the multiplier. Numerical experiments show that the proposed method can repair images more effectively, retain more image details, and achieve higher performance than some recent methods in terms of peak signal-to-noise ratio and structural similarity.

Keywords:

second-order total generalized variational; alternating direction method; depth image prior; image inpainting

MSC:

68T07; 94A08

1. Introduction

Image inpainting is a processing technology to improve image quality, which is a hot spot in the field of image processing research and is widely used in scientific research and engineering. When recording or storing images, due to various reasons, differences and distortions inevitably occur between the observed image and the real image, resulting in image quality degradation and loss of important information. Image inpainting addresses issues such as missing semantic information, object occlusion, and content corruption. It can effectively repair missing and blurred parts of damaged images by learning effective pixel and feature information around missing regions to generate filling information similar to the original image content.

In the development of image inpainting technology, the total variation (TV) method was first proposed by Rudin et al. [1] in 1992; it was originally used to reduce image noise. In 2002, Chan et al. [2] extended the TV model to image inpainting and proposed an image inpainting method based on the TV model. Later, many algorithms were developed specifically for image inpainting, mainly removing smudges, scratches, and overlapping text in images ([3,4,5,6,7]). These image inpainting algorithms fill the missing information in the image by spreading the linear structure to the target area, but the spreading process causes some blurring artifacts, which become more evident when filling large areas. Therefore, Criminisi et al. [8] combined the advantages of the “texture synthesis” algorithm and “inpainting” technology used to fill the gap between small images and proposed a new algorithm that can effectively remove large objects in images, which can better fill large objects in images.

Since 2014, there has been significant progress in the development of convolutional neural network-based encoder structures and generative adversarial networks for image inpainting. Pathak et al. [9] introduced the context encoder (CE) algorithm, which is an unsupervised feature learning algorithm based on the idea of the generative adversarial network (GAN). The context encoder is capable of understanding the semantic information of the image to some extent and generating content based on the information surrounding the hole. However, the algorithm has limitations in sensing and extracting remote information, leading to a loss of structural information and texture blur when repairing large-area missing images. To address this, Iizuka et al. [10] employed dilated convolution with four-layer stacked emissions to expand the network’s perceptual field and capture global information in an image. Zeng et al. [11] proposed an enhancement model for high-resolution image inpainting called aggregate context transform GAN. This model addresses the issues of structure distortion and texture blur that arise when using GAN for high-resolution image restoration. Previous deep-learning-based image inpainting methods utilized a standard convolution network on a corrupted image, which often resulted in color differences or blurring artifacts. In response, Liu et al. [12] suggested using partial convolution to inpaint images only with effective pixels. While these methods are capable of filling incomplete regions of an image, they struggle to repair the image structure when important parts are missing. To tackle this challenge, Yu et al. [13] introduced a novel depth-based generative model that not only synthesizes new image structures but also utilizes pixel features around the image as a reference for improved prediction. Nazeri et al. [14] proposed a two-stage model to address the image inpainting problem by dividing it into structure prediction and image completion stages. This approach aims to effectively reconstruct and repair the lost structure. However, Zeng [15] introduced the pyramid-context encoder network (PENnet) scheme, which deviates from the traditional two-stage network architecture. Instead, it applies the attention mechanism on a single-stage GAN generation network. This allows for improved image quality without relying on stacking multiple GAN networks but rather by leveraging multi-scale features more effectively. To tackle the issue of excessively smooth textures and lost structural information, Li et al. [16] presented a progressive visual structure reconstruction (PRVS) network. This network gradually reconstructs the structure and related pixel features, resulting in better image restoration. For random masks, Yu et al. [17] proposed a generative image inpainting system that utilizes free-form masks for image inpainting. Liu et al. [18] introduced a probabilistically diverse GAN (PD-GAN) image inpainting method that generates multiple restoration results with diverse content and visual fidelity for input images with arbitrary masks. Zeng et al. [19] proposed a generative image inpainting model with auxiliary contextual reconstruction branches to enhance the repair of missing regions in images. In deep-learning-based image inpainting tasks, the optimal size of the surrounding area needed to inpaint different types of missing regions varies, as very large receptive domains may not always be suitable for preserving local structures and textures. Therefore, Quan et al. [20] devised a three-stage painting framework that incorporates local and global refinement. This framework connects two subnetworks with small and large acceptance fields, enabling the effective handling of different types of missing regions. In addition, other applications based on deep learning can be found in references [21,22].

Since the deep-learning method trains a deep neural network using a set of training samples, it is crucial to have a large number of high-quality samples in order to achieve good performance. However, in reality, it is challenging to collect a sufficient number of samples. In contrast to traditional image deep learning, Ulyanov et al. [23] proposed an unsupervised deep-learning method called deep image prior (DIP). This method utilizes the deep network itself as a regularization technique for the inverse problem. It does not require a large number of real images but instead uses observation images for experiments. Subsequently, researchers ([24,25,26]) have focused on theoretical analysis and performance enhancement of DIP. In [27], the authors introduced an explicit anisotropic TV term to the minimization problem. Furthermore, the combination of DIP with a TV-based regular term has demonstrated promising performance in processing X-ray images [28] and computed tomography reconstructions [29]. However, the TV regularization term tends to produce a staircase effect, making small details of the image lost. Later, Mataev et al. [30] enhanced the DIP framework by introducing a technique called ‘regularized denoising’ (RED) as a priori. In another study, Ersin Arican et al. [31] proposed a neural structure search (NAS) strategy specifically designed for image processing within the DIP framework. This strategy requires less training compared to conventional NAS methods and effectively implements image-specific NAS, thereby reducing the search space and improving efficiency.

Up to now, the DIP method has been extensively studied for image processing [32,33]. However, the lack of a constraint term in DIP makes it challenging to find the optimal solution, so overfitting problems will occur in the experiment, resulting in fuzzy artifacts in the repaired image and loss of image edge information. In image processing tasks based on TV regular terms, the use of only first-order derivatives often leads to the staircase effect, causing a blurring of the image structure. To address this, researchers have explored higher-order derivatives. For example, Bredies et al. [34] discovered that high-order derivatives can better capture details such as the texture of image edges. They proposed a new total generalized variational (TGV) model to mitigate the staircase effect and preserve the edge structure of images. In particular, the second-order TGV strikes a balance between first-order and second-order derivatives. Therefore, in order to overcome the shortcomings of the TV regular term and combine the advantages of DIP and second-order TGV, this paper introduces the second-order TGV regular term in DIP to establish a new image restoration model, in which its regular term can better portray the edge and texture information of the image. For the new model, we adopt the augmented Lagrangian method and the alternating direction method of multipliers to solve the optimization problem effectively. The numerical experimental results show that the second-order TGV, as the regular term can better protect the image edge details and make the restored image clearer, which is better than the DIP-based model.

The remainder of this paper is organized as shown below. In Section 2, the related work is introduced. In Section 3, the proposed new model and algorithm are introduced and discussed, and it is shown how to solve the ADMM sub-step efficiently. In Section 4, numerical experimental results of the new method are given and compared with other models. At last, Section 5 summarizes this whole paper.

2. Related Work

In [8], Criminisi et al., proposed a method for determining the repair order based on the calculation of priority assigned to each area. The algorithm prioritizes areas that require repair the most, ensuring they are addressed first. The Criminisi algorithm is known for its simplicity, fast processing speed, and satisfactory results. However, during practical application, the algorithm’s estimation of image edges may be influenced by high-frequency information, such as image texture. This can result in an incorrect prioritization of repair blocks and subsequently impact the final repair outcomes.

With the development of deep-learning methods, they have become widely utilized in the field of image processing. However, these methods often necessitate a substantial amount of high-quality training samples, which can be challenging to obtain in reality. Additionally, real data are often scarce for most images, making supervised learning difficult to achieve. To address this issue, Ulyanov et al. [23] introduced DIP, a method that does not rely on a large number of samples for learning and can be employed to solve the challenging problem of discomfort inverse in imaging. In particular, for the image inpainting problem, given an image

f

, where missing pixels are represented by a binary mask

m

, the goal is to reconstruct the lost data. This can be formulated as the following minimization problem:

\min_{θ} \frac{1}{2} {‖(f - T_{θ} (z)) ⊙ m‖}_{2}^{2},

(1)

where

⊙

is a kind of matrix multiplication,

T_{θ} (z)

is a fixed convolutional neural network (CNN) generator with a weight of

θ

, and

z

is usually a random input vector sampled from a uniform distribution. Since DIP has no constraint term, it is difficult to find the optimal solution, and there are overfitting problems. Some texture details of the image will be lost when it is used for image inpainting. Later, researchers improved its performance by adding regular terms to DIP. Such as Cascarano et al. [35] proposed to improve DIP performance by adding explicit priors to Equation (1), that is, combining DIP and TV regularization, expressed as the following problems:

\min_{θ} \frac{λ}{2} {‖T_{θ} (z) - f‖}_{2}^{2} + μ \sum_{i = 1}^{n} {‖{(\nabla T_{θ} (z))}_{i}‖}_{2},

(2)

where

\nabla

is the gradient operator. Since the TV regular term is first-order, it can only effectively approximate the piecewise constant function and is prone to produce staircase effects in smooth regions of the image. Therefore, higher-order variational models have been widely studied in image processing tasks [36]. The TGV model proposed by Bredies et al., can effectively approximate polynomial functions of arbitrary order, which not only removes the staircase effect but also effectively protects the edge and small structure of the image. TGV also has excellent properties such as rotational invariance, lower semicontinuity, and convexity. For a second-order TGV, which encompasses both first- and second-order derivatives, it aims to find an optimal solution in the

B D (Ω)

vector field by minimizing the objective function, and the optimal solution changes according to the smoothness of the image region. It can adaptively coordinate the first derivative and the second derivative and uses the first derivative to protect the details in the image edge and texture area.

{TGV}_{α}^{2}

is expressed as follows:

{TGV}_{α}^{2} (u) = \min_{v \in B D (Ω)} α_{1} \int |\nabla u - v| d x + α_{2} \int |ε (v)| d x,

(3)

here,

B D (Ω)

represents the bounded twisted vector field space, and

ε (v) = \frac{1}{2} (\nabla v + \nabla v^{T})

denotes the symmetric derivative, which is a matrix-valued Radon metric, and

α_{1}

and

α_{2}

are two positive parameters. Such a definition provides a way to balance the first- and second-order derivatives of a function controlled using the ratio of weights

α_{1}

and

α_{2}

. TGV has gained significant attention from scholars due to its remarkable properties. For instance, Knoll et al. [37] incorporated TGV as a penalty term in MRI problems, specifically for image inpainting and iterative image reconstruction of under-sampled radial data sets of phase-controlled front circles. In another study, Papafitsoros et al. [38] explored a variational problem in bounded Hessian function spaces. They proposed a higher-order extension of the ROF generalization by adding a nonsmooth second-order regularization term, which resulted in improved performances for image inpainting tasks.

3. The Proposed Model and Algorithm

3.1. The Proposed Model

In order to overcome the drawback of blurring the structural edge of the target region in the process of image inpainting of DIP and combine the above advantages of TGV, this paper considers introducing

{TGV}_{α}^{2}

regularization term into DIP.

{TGV}_{α}^{2}

regularization has a first derivative and second derivative term, which can be adjusted adaptively for the first- and second-order derivatives. Therefore, introducing it into DIP can better protect the edge information of image structure in the process of image inpainting. The model being assumed is as follows:

\min_{θ} {TGV}_{α}^{2} (T_{θ} (z)) + \frac{λ}{2} {‖(f - T_{θ} (z)) ⊙ m‖}_{2}^{2},

(4)

where

λ

is used to adjust the regularization term and data fidelity term, and the model calculates the approximate value of the target solution as

T_{θ^{*}} (z)

, where

θ *

is the stop solution obtained by applying the early stop process to the solution (4) of the involved iterative optimization scheme. Compared with DIP, the new model incorporates

{TGV}_{α}^{2}

regularization term, which is a priori term of the image and contains the first- and second-order derivative terms. The first-order derivative protects the edge of the image, while the second-order derivative can reduce the staircase effect. It enhances the protection of the edge by balancing the first- and second-order derivatives and accurately represents the texture details during the image inpainting process. Additionally, compared to other supervised deep-learning methods, the new model combines DIP without the need for extensive training samples to train the network. To efficiently solve the new model, we utilize the augmented Lagrangian algorithm to compute (4) and devise a flexible ADMM algorithm to address the emergent optimization problem. Furthermore, we apply the Legendre–Fenchel transformation in the subproblem to simplify its handling. Numerical experimental results demonstrate that the proposed algorithm effectively preserves edge and texture details while inpainting missing information, outperforming the DIP method and other existing techniques.

3.2. Algorithm of the Proposed Model

In this paper, the ADMM algorithm is used to solve the minimization problem. Due to the more flexible modular structural framework of ADMM, we can embed any prior (explicit or implicit) information by modifying the regularity-related substeps. In the numerical experiments, we compare the results of the new model with those of other models.

Let

u = T_{θ} (z)

, then the augmented Lagrange function of problem (4) is as follows:

\begin{array}{l} L (θ, u, v) = α_{1} \int_{Ω} |\nabla u - v| d x + α_{2} \int_{Ω} ε (v) d x \\ + \frac{λ}{2} {‖(f - T_{θ} (z)) ⊙ m‖}_{2}^{2} + \frac{β}{2} {‖u - T_{θ} (z) + b‖}_{2}^{2} \end{array},

(5)

where

λ, β

are positive parameters, and

b

is the function associated with Lagrange multipliers. After appropriately initializing the variables involved according to the ADMM framework, the

k + 1

th iteration of the algorithm is shown as follows:

θ^{k + 1} = \underset{θ}{\arg \min} \frac{λ}{2} {‖(f - T_{θ} (z)) ⊙ m‖}_{2}^{2} + \frac{β}{2} {‖u^{k} - T_{θ} (z) + b^{k}‖}_{2}^{2},

(6)

u^{k + 1} = \underset{u}{\arg \min} α_{1} \int_{Ω} |\nabla u - v^{k}| d x + \frac{β}{2} {‖u - T_{θ^{k + 1}} (z) + b^{k}‖}_{2}^{2},

(7)

v^{k + 1} = \underset{v}{\arg \min} α_{1} \int_{Ω} |\nabla u^{k + 1} - v| d x + α_{2} \int_{Ω} |ε (v)| d x,

(8)

b^{k + 1} = b^{k} + u^{k + 1} - T_{θ^{k + 1}} (z) .

(9)

Equation (6) uses the Adam iteration scheme to solve inaccurately [39].

Equation (7), by applying the Legendre–Fenchel transformation, can be reduced to the following:

\begin{array}{l} \underset{u}{\arg \min} α_{1} \int_{Ω} |\nabla u - v^{k}| d x + \frac{β}{2} {‖u - T_{θ^{k + 1}} (z) + b^{k}‖}_{2}^{2} \\ = \arg \min_{u} \max_{p \in P} 〈\nabla u - v^{k}, p〉 + \frac{β}{2} {‖u - T_{θ^{k + 1}} (z) + b^{k}‖}_{2}^{2} \end{array},

(10)

where

P = \{p = (p_{1}, p_{2}) |{‖p‖}_{\infty} \leq α_{1}\}

,

p

is the dual variable, and

{‖p‖}_{\infty} = \sup_{x \in Ω} \sqrt{p_{1}^{2} + p_{2}^{2}}

. The update of

u

is expressed as follows:

u^{k + 1} = \frac{u^{k} + τ div p^{k} + τ β (T_{θ^{k + 1}} (z) - b^{k})}{1 + τ β},

(11)

p^{k + 1} = {proj}_{P} (p^{k} + γ (\nabla u^{k + 1} - v^{k})),

(12)

where

{proj}_{P} (\tilde{p}) = \frac{\tilde{p}}{\max (1, \frac{|\tilde{p}|}{α_{1}})}

, and

τ, γ

are positive parameters.

Equation (8) can also be obtained by applying the Legendre–Fenchel transform:

\begin{array}{l} \underset{v}{\arg \min} α_{1} \int_{Ω} |\nabla u^{k + 1} - v| d x + α_{2} \int_{Ω} |ε (v)| d x \\ = \arg \min_{v} \max_{q \in Q} 〈\nabla u^{k + 1} - v, p^{k + 1}〉 + 〈ε (v), q〉 \end{array},

(13)

where

Q = \{q = (\begin{array}{l} q_{11,} q_{12} \\ q_{21,} q_{22} \end{array}) |{‖q‖}_{\infty} \leq α_{2}\}

,

q

is the pairwise variable, and

{‖q‖}_{\infty} = \sup_{x \in Ω} \sqrt{q_{11}^{2} + q_{12}^{2} + q_{21}^{2} + q_{22}^{2}}

.

Finally, the updated formula of

v

and

q

is obtained as follows:

v^{k + 1} = v^{k} + t (p^{k + 1} + div q^{k}),

(14)

q^{k + 1} = {proj}_{Q} (q^{k} + δ ε (v^{k + 1})),

(15)

where

{proj}_{Q} (\tilde{q}) = \frac{\tilde{q}}{\max (1, \frac{|\tilde{q}|}{α_{2}})}

,

t

is the positive parameter.

From the above discussion, Algorithm 1 summarizes all the processes of the proposed model.

Algorithm 1 ADMM algorithm of the proposed model (4)

Input: Select parameters

α_{1}, α_{2}, λ, β, τ, γ, t, δ

and the initial values

θ^{0}, u^{0}, v^{0}, b^{0} .

For

k = 0 \to K

do:

Compute

θ^{k + 1} by θ^{k + 1} = \underset{θ}{\arg \min} \frac{λ}{2} {‖(f - T_{θ} (z)) ⊙ m‖}_{2}^{2} + \frac{β}{2} {‖u^{k} - T_{θ} (z) + b^{k}‖}_{2}^{2};

Compute

u^{k + 1} by u^{k + 1} = \frac{u^{k} + τ div p^{k} + τ β (T_{θ^{k + 1}} (z) - b^{k})}{1 + τ β};

Compute

p^{k + 1} by p^{k + 1} = {proj}_{P} (p^{k} + γ (\nabla u^{k + 1} - v^{k}));

Compute

v^{k + 1} by v^{k + 1} = v^{k} + t (p^{k + 1} + div q^{k})

Compute

q^{k + 1} by q^{k + 1} = {proj}_{Q} (q^{k} + δ ε (v^{k + 1}));

Compute

b^{k + 1} by b^{k + 1} = b^{k} + u^{k + 1} - T_{θ^{k + 1}} (z)

end for

Output:

T_{θ^{k + 1}} (z)

4. Numerical Experiments

In this section, we give different test images with different masks for our experiments. In our algorithm, we performed 50 ADAM iterations to solve

θ^{k + 1}

subproblem (6) in the original variable and manually adjusted the parameters to achieve the best effect (the test images are shown in Figure 1). Due to space limitations, we present some numerical experimental results and use red block diagrams to mark the regions with large differences and enlarge them. Finally, image quality evaluation indexes PSNR and SSIM are calculated to evaluate the effectiveness of the proposed model. To illustrate the performance of the new model, the proposed model is compared with an advanced image inpainting method ([19,23]) and a classical method [8].

4.1. Parameters Selection

In our experiments, parameters are adjusted manually to achieve the best results. For

λ

and

β

, we find that when

0.1 < λ < 10

and

0 < β < 50

, images will obtain better repair results. In addition, through our experiments, we find that the smaller parameter

λ

is, the better the image edge detail recovery effect is. The opposite is true for parameter

β

. For

τ

and

t

, the results of the proposed method are relatively stable when they are in the range

[0, 0.1]

.

4.2. Experimental Results

In Figure 2, this paper tests Kate’s image, and its original clean image is processed via the Kate mask. The experimental results are shown and compared with DIP and [8]. As seen in Figure 2b, the important structure of the face has been repaired, but from the enlarged Figure 2e, it is clear that the outline of the lips is slightly blurred, and the edges of the teeth have not been repaired. In Figure 2c, the occluded text is obviously not removed, and the details of the lip are not patched up. Figure 2d is the result of the new model; due to the adaptive regularization in the model, the image edge details are protected as much as possible in the experiments, making the lip and tooth edges of the figure clearer. In summary, better recovery results can be achieved by adding a TGV-based regularization term, and we can observe from Table 1 that the values of PSNR and SSIM have increased.

Figure 3 shows the repair results of the Vase image. The proposed model is compared with the DIP and traditional model [8] to illustrate the effectiveness of the approach presented in this paper. We find that the details of the balustrade in the center of Figure 3b are not patched up, and the edge of the balustrade in the enlarged Figure 3e is incomplete. The method [8] is not repaired according to the structural texture around the occluding part in Figure 3c, and the texture of the table edge and the railing part are not repaired. In Figure 3d, the new model has repaired the image more completely, and Figure 3g shows the following restoration details: the edge of the table is well connected with the railing, and the transverse texture of the rear fence is completely repaired, the edge is better maintained, and the effect is better than other visual and detailed effects.

The results of several models of library images are compared in Figure 4. First of all, we can see that the floor patch in Figure 4b is not very good; it fills the texture that should belong to the bookshelf to the ground, the edges of the books on the bookshelf are not repaired, and the texture looks very messy. In Figure 4c, the repair results are slightly distorted, such as the structure of a bookshelf appearing in the left window section. Moreover, from the enlarged Figure 4f, it is clear that the ground and bookshelf are not repaired well. In Figure 4d, the textural restoration of the bookshelf is relatively complete. In Figure 4g, the new model maintains the edges of the books well, and the restored floor is more similar to the surrounding structure and visually better. Overall, our method restores the structural details better, protects the edges of the image without destroying the important information of the image, and is visually better than other models.

Figure 5 is the Boat image inpainting results. Figure 5b is the experimental result of the DIP method. In order to obtain the best repair image, we printed the reconstructed image in each DIP iteration and stopped the algorithm when a good image was found visually. We observe that the edges of the hull in Figure 5b are not clear, and the structural edges of the bow are blurred in the enlarged Figure 5e. Figure 5c is the experimental result of [8], and in its enlarged Figure 5f, the letters on the ship are missing, and the hull edges are fuzzy and discontinuous. Figure 5d is a restored image of the new model, which is well restored in structural edges and details. In Figure 5g, the structural edge of the ship is relatively clear, and the font repair is relatively complete. From several comparison results, we know that the proposed model provides better inpainting results. The edges are clearer, and the fine structures are reconstructed better. In addition, PSNR and SSIM are improved, so our model is superior to DIP and traditional models.

The Walk image is tested in Figure 6. First of all, we can see that the inpainting result in Figure 6b is a little fuzzy, which is more obvious in the enlarged Figure 6e. In Figure 6f, the structure of other parts is added to the missing part of the image, and some structural blocks appear on the ground, so the overall visual effect is not very good. In Figure 6g, the occlusion part of the image patched by the new model is more similar to the surrounding structure, and the edge of the surrounding tree shadow seems to be kept smoother, with better visual effects and clearer.

Figure 7, Figure 8 and Figure 9 test several damaged images, respectively, and their results are shown in magnification, respectively. As seen in the enlarged image of Figure 7, the patch of Figure 7e is not complete. It does not patch out the texture of the vegetation, which seems to have large artifacts. In Figure 7g, the patch is closer to the surrounding structure and looks clearer and better. The models in Figure 8 do not fill the black block in the middle very well, there are other color artifacts in Figure 8e,f, and the connection at the mountain edge is not complete. Relatively speaking, in Figure 8g, the new model better protects the edges of the mountain and has fewer artifacts, which is visually better than the other models overall. From the enlarged view of Figure 9, the patching in Figure 9e is not clear, and the texture details of vegetation are not patched out. In Figure 9f, the structure that does not belong to this part is patched up, and it seems that the structure of the street lamp is patched here. In Figure 9g, the new model has a better repair effect. It repairs according to the structure around the missing part, making the texture details of the vegetation in the image clearer and the overall visual effect better.

In Figure 10 and Figure 11, we zoom in to show the recovered area. It can be seen that the texture of the mountain in Figure 10e is fuzzy, and there are many clumps, and the texture is very messy in Figure 10f. The restoration results of Figure 10g are clearer, and the structure and texture information of the mountain is more similar to the surrounding mountain, so the new model has a better restoration effect than the other two models. In Figure 11e, the DIP method restores the pixel information of the window to the missing area. In Figure 11f, it can be observed that the vegetation above the repaired part of the window appears discontinuous and lacks realism. In Figure 11g, the image is repaired based on the vegetation information around the missing area. Although it may not appear very clear, the repaired image looks more natural and visually closer to reality. This indicates that the new model has a better repair effect compared to both DIP and [19].

To demonstrate the performance of the new model, this paper provides PSNR diagrams of boat images with Bernoulli mask missing pixel probabilities of 0.4, 0.7, and 0.9, as shown in Figure 12. The first row is the DIP operation result, and the second row is the result of the new model. The PSNR chart shows that the new model is relatively stable, the number of iterations of our method is much less than that of DIP, and the peak PSNR can be reached in the early iterations. Therefore, the effectiveness and stability of the new model in the image repair task can be known from the figure.

In order to evaluate the performance of the new model in this paper more intuitively, we calculated the values of the image quality evaluation index PSNR and SSIM, as shown in Table 1. It can be seen from Table 1 that the numerical experimental results of the proposed new method are all higher than those of other methods. Specifically, the average value of PSNR of the new model is 28.919, which is 0.737 higher than the DIP method and 1.451 higher than [8], and similarly, the average value of SSIM is 0.954, which is 0.01 higher than the DIP method and 0.046 higher than [8]. These numerical results indicate the effectiveness of the proposed method in image inpainting.

5. Conclusions

In this paper, we propose a new model which uses TGV as a regularization term to extend the classical DIP framework. The new model provides a way to balance the first and second-order derivatives of the function to better protect image edges and provide more reliable recovery. We use the augmented Lagrangian algorithm to solve the proposed model and design a flexible ADMM algorithm to solve the optimization problem so that the Legendre–Fenchel transformation is applied to the subproblem to make the problem easier to solve. Numerical experiments show that the model is able to provide better restoration of object-obscured images and is able to protect the edges of the image structure while restoring the image, resulting in clearer edges and textures of the image structure. Compared with other methods, the method provides better restorations of damaged images and improves visual quality.

Author Contributions

Conceptualization, S.Y. and J.X.; methodology, S.Y. and J.X.; software, S.Y. and Y.G.; validation, S.Y. and J.X.; writing—original draft preparation, S.Y.; writing—review and editing, J.X., Y.F., Y.G. and X.W.; visualization, S.Y., Y.F. and X.W.; supervision, J.X.; project administration, J.X.; funding acquisition, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Science and Technology Research Project of Henan Province of China (Nos. 222102210053, 232102210111), Key Scientific Research Project of Colleges and Universities in Henan Province (No. 22A120006), The project was supported by the Open Research Fund of National Earth Observation Data Center (No. NODAOP2022004), and the Graduate Student Innovation Fund Project of Henan University of Science and Technology (No. CXJJ-2021-KJ12).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
Chan, T.F.; Shen, J. Nontexture Inpainting by Curvature-Driven Diffusions. J. Vis. Commun. Image Represent. 2001, 12, 436–449. [Google Scholar] [CrossRef]
Ballester, C.; Caselles, V.; Verdera, J.; Bertalmio, M.; Sapiro, G. A variational model for filling-in gray level and color images. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; IEEE: Toulouse, France, 2001; Volume 1, pp. 10–16. [Google Scholar]
Bertalmio, M.; Bertozzi, A.L.; Sapiro, G. Navier-stokes, fluid dynamics, and image and video inpainting. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001; IEEE: Toulouse, France, 2001; Volume 1, p. I. [Google Scholar]
Bertalmio, M.; Sapiro, G.; Caselles, V.; Ballester, C. Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New York, NY, USA, 1 July 2000; pp. 417–424. [Google Scholar]
Shen, J.; Chan, T.F. Mathematical models for local nontexture inpaintings. SIAM J. Appl. Math. 2002, 62, 1019–1043. [Google Scholar] [CrossRef] [Green Version]
Masnou, S.; Morel, J.M. Level lines based disocclusion. In Proceedings of the International Conference on Image Processing, Chicago, IL, USA, 7 October 1998; IEEE: Toulouse, France, 1998; pp. 259–263. [Google Scholar]
Criminisi, A.; Perez, P.; Toyama, K. Region Filling and Object Removal by Exemplar-Based Image Inpainting. IEEE Trans. Image Process. 2004, 13, 1200–1212. [Google Scholar] [CrossRef]
Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context Encoders: Feature Learning by Inpainting. arXiv 2016, arXiv:1604.07379. [Google Scholar]
Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Trans. Graph. (TOG) 2017, 36, 107.1–107.14. [Google Scholar] [CrossRef]
Zeng, Y.; Fu, J.; Chao, H.; Guo, B. Aggregated Contextual Transformations for High-Resolution Image Inpainting. arXiv 2021, arXiv:2104.01431. [Google Scholar] [CrossRef] [PubMed]
Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image Inpainting for Irregular Holes Using Partial Convolutions; Springer: Cham, Switzerland, 2018. [Google Scholar]
Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Generative Image Inpainting with Contextual Attention; IEEE: Toulouse, France, 2018. [Google Scholar]
Nazeri, K.; Ng, E.; Joseph, T.; Qureshi, F.; Ebrahimi, M. Edgeconnect: Structure guided image inpainting using edge prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 3265–3274. [Google Scholar]
Zeng, Y.; Fu, J.; Chao, H.; Guo, B. Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Li, J.; He, F.; Zhang, L.; Du, B.; Tao, D. Progressive Reconstruction of Visual Structure for Image Inpainting. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T. Freeform image inpainting with gated con volution. arXiv 2018, arXiv:1806.03589. [Google Scholar]
Liu, H.; Wan, Z.; Huang, W.; Song, Y.; Han, X.; Liao, J. PD-GAN: Probabilistic Diverse GAN for Image Inpainting. arXiv 2021, arXiv:2105.02201. [Google Scholar]
Zeng, Y.; Lin, Z.; Lu, H.; Patel, V.M. CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. In Proceedings of the International Conference on Computer Vision. arXiv 2021, arXiv:2011.12836. [Google Scholar]
Quan, W.; Zhang, R.; Zhang, Y.; Li, Z.; Wang, J.; Yan, D.M. Image Inpainting with Local and Global Refinement. IEEE Trans. Image Process. 2022, 31, 2405–2420. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.D.; Dong, Z.; Wang, S.H.; Yu, X.; Gorriz, J.M. Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation. Inf. Fusion 2020, 64, 149–187. [Google Scholar] [CrossRef]
Zhang, Y.; Deng, L.; Zhu, H.; Wang, W.; Ren, Z.; Zhou, Q.; Lu, S.; Sun, S.; Zhu, Z.; Gorriz, J.M.; et al. Deep Learning in Food Category Recognition. Inf. Fusion 2023, 98, 101859. [Google Scholar] [CrossRef]
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep Image Prior. Int. J. Comput. Vis. 2020, 128, 1867–1888. [Google Scholar] [CrossRef]
Arridge, S.; Maass, P.; Ktem, O.; Schnlieb, C.B. Solving inverse problems using data-driven models. Acta Numer. 2019, 28, 1–174. [Google Scholar] [CrossRef] [Green Version]
Dittmer, S.; Kluth, T.; Maass, P.; Baguer, D.O. Regularization by Architecture: A Deep Prior Approach for Inverse Problems; Springer: New York, NY, USA, 2020. [Google Scholar]
Cheng, Z.; Gadelha, M.; Maji, S.; Sheldon, D. A Bayesian Perspective on the Deep Image Prior. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Liu, J.; Sun, Y.; Xu, X.; Kamilov, U.S. Image Restoration Using Total Variation Regularized Deep Image Prior. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019. [Google Scholar]
Van Veen, D.; Jalal, A.; Soltanolkotabi, M.; Price, E.; Vishwanath, S.; Dimakis, A.G. Compressed Sensing with Deep Image Prior and Learned Regularization. arXiv 2018, arXiv:1806.06438. [Google Scholar]
Baguer, D.O.; Leuschner, J.; Schmidt, M. Computed tomography reconstruction using deep image prior and learned reconstruction methods. Inverse Probl. 2020, 36, 094004. [Google Scholar] [CrossRef]
Mataev, G.; Milanfar, P.; Elad, M. DeepRED: Deep image prior powered by RED. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Toronto, ON, Canada, 25 October 2019. [Google Scholar]
Ersin Arican, M.; Kara, O.; Bredell, G.; Konukoglu, E. ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior. arXiv 2021, arXiv:2111.15362. [Google Scholar]
Antorán, J.; Barbano, R.; Leuschner, J.; Hernández-Lobato, J.M.; Jin, B. A Probabilistic Deep Image Prior for Computational Tomography. arXiv 2022, arXiv:2203.00479. [Google Scholar]
Hoa, K.; Gilberta, A.; Jinb, H.; Collomossea, J. Neural Architecture Search for Deep Image Prior. Comput. Graph. 2021, 98, 188–196. [Google Scholar] [CrossRef]
Bredies, K.; Kunisch, K.; Pock, T. Total generalized variation. SIAM J. Imaging Sci. 2010, 3, 492–526. [Google Scholar] [CrossRef] [Green Version]
Cascarano, P.; Sebastiani, A.; Comes, M.C.; Franchini, G.; Porta, F. Combining weighted total variation and deep image prior for natural and medical image restoration via ADMM. In Proceedings of the 2021 21st International Conference on Computational Science and Its Applications (ICCSA), Cagliari, Italy, 13–16 September 2021; IEEE: Toulouse, France, 2021; pp. 39–46. [Google Scholar]
Ferstl, D.; Reinbacher, C.; Ranftl, R.; Rüther, M.; Bischof, H. Image guided depth upsampling using anisotropic total generalized variation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 993–1000. [Google Scholar]
Knoll, F.; Bredies, K.; Pock, T.; Stollberger, R. Second order total generalized variation (TGV) for MRI. Magn. Reson. Med. 2011, 65, 480–491. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Papafitsoros, K.; Schönlieb, C.B. A combined first and second order variational approach for image reconstruction. J. Math. Imaging Vis. 2014, 48, 308–338. [Google Scholar] [CrossRef] [Green Version]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. Test images: (a) Kate, (b) Vase, (c) Library, (d) Boat, (e) Air, (f) Farm, (g) Zoo, (h) Walk, (i) Cliff, (a1) Kate mask, (b1) Vase mask, (c1) Library mask, (d1) Boat mask, (e1) Air mask, (f1) Farm mask, (g1) Zoo mask, (h1) Walk mask, (i1) Cliff mask, (j) House, and (j1) House mask.

Figure 2. Kate’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [8], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 3. Vase’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [8], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 4. Library’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [8], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 5. Boat’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [8], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 6. Walk’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [8], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 7. Walk’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [8], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 8. Air’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [8], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 9. Zoo’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [8], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 10. Cliff’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [19], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 11. House’s image experiment results. (a) mask damage image, (b) inpainting results of DIP, (c) inpainting results of [19], (d) inpainting results of the new method, (e) local enlarged image of (b), (f) local enlarged image of (c), and (g) local enlarged image of (d).

Figure 12. PSNR diagram of boat diagram under Bernoulli mask with different probabilities.

Table 1. PSNR and SSIM values of repaired images under different models.

Model	DIP	[8]	Ours
Kate	(39.694, 0.990)	(38.362, 0.879)	(40.709, 0.993)
Vase	(29.469, 0.965)	(28.534, 0.894)	(30.328, 0.977)
Library	(19.425, 0.853)	(18.856, 0.812)	(20.180, 0.876)
Boat	(28.101, 0.890)	(27.663, 0.881)	(28.841, 0.907)
Walk	(31.872, 0.982)	(30.986, 0.967)	(32.195, 0.988)
Farm	(30.715, 0.968)	(30.107, 0.958)	(31.285, 0.976)
Air	(17.047, 0.921)	(16.659, 0.914)	(17.317, 0.929)
Zoo	(29.136, 0.984)	(28.579, 0.965)	(30.499, 0.991)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

You, S.; Xu, J.; Fan, Y.; Guo, Y.; Wang, X. Combining Deep Image Prior and Second-Order Generalized Total Variance for Image Inpainting. Mathematics 2023, 11, 3201. https://doi.org/10.3390/math11143201

AMA Style

You S, Xu J, Fan Y, Guo Y, Wang X. Combining Deep Image Prior and Second-Order Generalized Total Variance for Image Inpainting. Mathematics. 2023; 11(14):3201. https://doi.org/10.3390/math11143201

Chicago/Turabian Style

You, Shaopei, Jianlou Xu, Yajing Fan, Yuying Guo, and Xiaodong Wang. 2023. "Combining Deep Image Prior and Second-Order Generalized Total Variance for Image Inpainting" Mathematics 11, no. 14: 3201. https://doi.org/10.3390/math11143201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combining Deep Image Prior and Second-Order Generalized Total Variance for Image Inpainting

Abstract

1. Introduction

2. Related Work

3. The Proposed Model and Algorithm

3.1. The Proposed Model

3.2. Algorithm of the Proposed Model

4. Numerical Experiments

4.1. Parameters Selection

4.2. Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI