Discretization of Learned NETT Regularization for Solving Inverse Problems

Antholzer, Stephan; Haltmeier, Markus

doi:10.3390/jimaging7110239

Open AccessArticle

Discretization of Learned NETT Regularization for Solving Inverse Problems

by

Stephan Antholzer

and

Markus Haltmeier

^*

Department of Mathematics, University of Innsbruck, Technikerstrasse 13, 6020 Innsbruck, Austria

^*

Author to whom correspondence should be addressed.

J. Imaging 2021, 7(11), 239; https://doi.org/10.3390/jimaging7110239

Submission received: 11 October 2021 / Revised: 5 November 2021 / Accepted: 8 November 2021 / Published: 15 November 2021

(This article belongs to the Special Issue Inverse Problems and Imaging)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning based reconstruction methods deliver outstanding results for solving inverse problems and are therefore becoming increasingly important. A recently invented class of learning-based reconstruction methods is the so-called NETT (for Network Tikhonov Regularization), which contains a trained neural network as regularizer in generalized Tikhonov regularization. The existing analysis of NETT considers fixed operators and fixed regularizers and analyzes the convergence as the noise level in the data approaches zero. In this paper, we extend the frameworks and analysis considerably to reflect various practical aspects and take into account discretization of the data space, the solution space, the forward operator and the neural network defining the regularizer. We show the asymptotic convergence of the discretized NETT approach for decreasing noise levels and discretization errors. Additionally, we derive convergence rates and present numerical results for a limited data problem in photoacoustic tomography.

Keywords:

deep learning; inverse problems; discretization of NETT; regularization; convergence analysis; learned regularizer; limited data; photoacoustic tomography

1. Introduction

In this paper, we are interested in neural network based solutions to inverse problems of the form

Find x from data y^{δ} = A x + η .

(1)

Here

A

is a potentially non-linear operator between Banach spaces

X

and

Y

,

y^{δ}

are the given noisy data, x is the unknown to be recovered,

η

is the unknown noise perturbation and

δ \geq 0

indicates the noise level. Numerous image reconstruction problems, parameter identification tasks and geophysical applications can be stated as such inverse problems [1,2,3,4]. Special challenges in solving inverse problems are the non-uniqueness of the solutions and the instability of the solutions with respect to the given data. To overcome these issues, regularization methods are needed, which select specific solutions and at the same time stabilize the inversion process.

1.1. Reconstruction with Learned Regularizers

One of the most established class of methods for solving inverse problems is variational regularization where regularized solutions are defined as minimizers of the generelaized Tikhonov functional [2,5,6]

T_{y^{δ}, α} : X \to [0, \infty] : x \mapsto D (A x, y^{δ}) + α R (x) .

(2)

Here

D

is a distance like function measuring closeness of the data,

R

a regularization term enforcing regularity of the minimizer and

α

is the regularization parameter. Taking minimizers of this functional as regularized solution is also called (generalized) Tikhonov regularization. In the case that

D

and the regularizer are defined by the Hilbert space norms, (2) is classical Tikhonov regularization for which the theory is quite complete [1,7]. In particular, in this case, convergence rates, which name quantitative estimates for the distance between the true noise-free solution and regularized solutions from noisy data, are well known. Convergence rates for non-convex regularizers are derived in [8].

Typical regularization techniques are based on simple hand crafted regularization terms such as the total variation

∥ f ∥_{TV} = \int | \nabla f |

or quadratic Sobolev norms

{∥ \nabla f ∥}_{2}^{2} = \int {| \nabla f |}^{2}

on some function space. However, these regularizers are quite simplistic and might not well reflect the actual complexity of the underlying class of functions. Therefore, recently, it has been proposed and analyzed in [9] to use machine learning to construct regularizers in a data driven manner. In particular, the strategy in [9] is to construct a data-driven regularizer via the following consecutive steps:

(T1): Choose a family of desired reconstructions ${(x_{i})}_{i = 1}^{n}$ .
(T2): For some $B : Y \to X$ , construct undesired reconstructions ${(B A x_{i})}_{i = 1}^{n}$ .
(T3): Choose a class ${(Φ_{θ})}_{θ \in Θ}$ of functions (networks) $Φ_{θ} : X \to X$ .
(T4): Determine $θ^{⋆} \in Θ$ with $Φ_{θ^{⋆}} (x_{i}) ≃ x_{i} \land Φ_{θ^{⋆}} (B A x_{i}) ≃ x_{i}$ .
(T5): Define $R (x) = r (x, Φ (x))$ with $Φ = Φ_{θ^{⋆}}$ for some $r : Y \times Y \to [0, \infty]$ .

For imaging applications, the function class

{(Φ_{θ})}_{θ \in Θ}

can be chosen as convolutional neural networks which have demonstrated to give powerful classes of mappings between image spaces. The function r measures distance between a potential reconstruction x and the output of the network

Φ (x)

, and possibly contains additional regularization [10,11]. According to the training strategy in item (T4) the value of the regularizer will be small if the reconstruction is similar to elements in

{(x_{i})}_{i = 1}^{n}

and large for elements in

{(B A x_{i})}_{i = 1}^{n}

. A simple example that we will use for our numerical results is the learned regularizer

R (x) = ∥ {x - Φ (x) ∥}^{2} + {∥ x ∥}_{TV}

.

Convergence analysis and convergence rates for NETT (which stands for Network Tikhonov; referring to variants of (2), where the regularization term is given by a neural network) as well as training strategies have been established in [9,11,12]. A different training strategy for learning a regularizer has been proposed in [13,14]. Note that learning the regularizer first and then minimizing the Tikhonov functional is different from variational and iterative networks [15,16,17,18,19,20] where an iterative scheme is applied to enroll the functional

D_{θ} (A x, y^{δ}) + α R_{θ} (x)

which is then trained in an end to end fashion. Training the regularizer first has the advantage of being more modular, sharing some similarity with plug and play techniques [21], and the network training is independent of the forward operator

A

. Moreover, it enables to derive a convergence analysis as the noise level tends to zero and therefore comes with theoretical recovery guarantees.

1.2. Discrete NETT

The existing analysis of NETT considers minimizers of the Tikhonov functional (2) with regularizer of the form

R (x) = r (x, Φ (x))

before discretization, typically in an infinite dimensional setting. However, in practice, only finite dimensional approximations of the unknown, the operator and the neural network are given. To address these issues, in this paper, we study discrete NETT regularization which considers minimizers of

T_{y^{δ}, α, n} : X_{n} \to Y : x \mapsto D (A_{n} x, y^{δ}) + α R_{n} (x) .

(3)

Here

{(X_{n})}_{n \in N}

,

{(A_{n})}_{n \in N}

and

{(R_{n})}_{n \in N}

are families of subspaces of

X_{n} \subseteq X

, mappings

A_{n} : X \to Y

and regularizers

R_{n} : X \to [0, \infty]

, respectively, which reflect discretization of all involved operations. We present a full convergence analysis as the noise level

δ

converges to zero and

n, α

are chosen accordingly. Discretization of variational regularization has studied in [22] for the case that

D

is given by the norm distance and the regularizer

R

is taken convex and fixed. However, in the case of discrete NETT regularization it is natural to consider the case where the regularization depends on the discretization as regularization is learned in a discretized setting based on actual data. For that purpose our analysis includes non-convex regularizers that are allowed to depend on the discretization and the noise level.

1.3. Outline

The convergence analysis including convergence rates is presented in Section 2. In Section 3 we will present numerical results for a non-standard limited data problem in photoacoustic tomography that can be considered as simultaneous inpainting and artifact removal problem. We conclude the paper with a short summary and conclusion presented in Section 4.

2. Convergence Analysis

In this section we study the convergence of (3) and derive convergence rates.

2.1. Well-Posedness

First we state the assumptions that we will use for well-posedness (existence and stability) of minimizing NETT.

Assumption 1

(Conditions for well-posedness).

(W1)

X

,

Y

are Banach spaces,

X

reflexive,

D \subseteq X

weakly sequentially closed.

(W2)

The distance measure

D : Y \times Y \to [0, \infty]

satisfies

(a): $\exists τ \geq 1 : \forall y_{1}, y_{2}, y_{3} \in Y : D (y_{1}, y_{2}) \leq τ D (y_{1}, y_{3}) + τ D (y_{3}, y_{2})$ .
(b): $\forall y_{1}, y_{2} \in Y : D (y_{1}, y_{2}) = 0 \Leftrightarrow y_{1} = y_{2}$ .
(c): $\forall y, \tilde{y} \in Y : D (y, \tilde{y}) < \infty \land ∥ \tilde{y} - y_{k} ∥ \to 0 \Rightarrow D (y, y_{k}) \to D (y, \tilde{y})$ .
(d): $\forall y \in Y : ∥ y_{k} - y ∥ \to 0 \Rightarrow D (y_{k}, y) \to 0$ .
(e): $D$ is weakly sequentially lower semi-continuous (wslsc).

(W3)

R : X \to [0, \infty]

is proper and wslsc.

(W4)

A : D \subseteq X \to Y

is weakly sequentially continuous.

(W5)

\forall y, α, C : {x \in X ∣ T_{y, α} \leq C}

is nonempty and bounded.

(W6)

{(X_{n})}_{n \in N}

is a sequence of subspaces of

X

.

(W7)

{(A_{n})}_{n \in N}

is a family of weakly sequentially continuous

A_{n} : D \to Y

.

(W8)

{(R_{n})}_{n \in N}

is a family of proper wslsc regularizers

R_{n} : X \to [0, \infty]

.

(W9)

\forall y, α, C, n : {x \in X_{n} ∣ T_{y, α, n} \leq C}

is nonempty and bounded.

Conditions (W2)–(W5) are quite standard for Tikhonov regularization in Banach spaces to guarantee the existence and stability of minimizers of the Tikhonov functional and the given conditions are similar to [2,8,9,10,12,23,24]. In particular, (W2) describes the properties that the distance measure

D

should have. Clearly, the norm distance on

Y

fulfills these properties. Moreover, (W2a) holds for the norm with

τ = 1

since it then corresponds to the triangle inequality. Item (W2c) is the continuity of

D (y, \cdot)

while (W2d) considers the continuity of

D (\cdot, y)

at y. While (W2c) is not needed for existence and convergence of NETT it is required for the stability result as shown in [10] (Example 2.7). On the other hand (W2e) implies that the Tikhonov functional is wslsc which is needed for existence. Assumption (W5) is a coercivity condition; see [9] (Remark 2.4f.) on how to achieve this for a regularizer defined by neural networks. Item (W8) poses some restrictions on the regularizers. For NETT this is not an issue as neural networks used in practice are continuous. Note that for convergence and convergence rates we will require additional conditions that concern the discretization of the reconstruction space, the forward operator and regularizer.

The references [8,9,10,23] all consider general distance measures and allow non-convex regularizers. However, existence and stability of minimizing (2) are shown under assumptions slightly different from (W1)–(W5). Below we therefore give a short proof of the existence and stability results.

Theorem 1

(Existence and Stability). Let Assumption 1 hold. Then for all

y^{δ} \in Y

,

α > 0

,

n \in N

the following assertions hold true:

(a)

argmin T_{y, α, n} \neq \emptyset

.

(b)

Let

{(y_{k})}_{k \in N} \in Y^{N}

with

y_{k} \to y

and consider

x_{k} \in argmin T_{y_{k}, α, n}

.

${(x_{k})}_{k \in N}$ has at least one weak accumulation point.
Every weak accumulation point ${(x_{k})}_{k \in N}$ is a minimizer of $T_{y, α, n}$ .

(c)

The statements in (a),(b) also hold for

T_{y, α}

in place of

T_{y, α, n}

,

Proof.

Since (W1), (W6)–(W9) for

T_{y, α, n}

when

n \in N

are fixed give the same assumption as (W1), (W3)–(W5) for the non-discrete counterpart

T_{y, α}

, it is sufficient to verify (a), (b) for the latter. Existence of minimizers follows from (W1), (W2e), (W3)–(W5), because these items imply that the

T_{y, α}

is a wslsc coercive functional defined on a nonempty weakly sequentially closed subset of a reflexive Banach space. To show stability one notes that according to (W2a) for all

x \in X

we have

\begin{matrix} D (A x_{k}, y) + α R (x_{k}) \leq τ (D (A x_{k}, y_{k}) + α R (x_{k})) + τ D (y, y_{k}) \\ \leq τ (D (A x, y_{k}) + α R (x)) + τ D (y, y_{k}) . \end{matrix}

According to (W2c), (W2d), (W5) there exists

x \in X

such that the right hand side is bounded, which by (W5) shows that

{(x_{k})}_{k}

has a weak accumulation point. Following the standard proof [2] (Theorem 3.23) shows the weak accumulation points

{(x_{k})}_{k}

are minimizers of

T_{y, α}

. This uses the fact that the weak topology is indeed weaker than the norm topology, and that the involved functionals are wslsc. □

In the following we write

x_{α, n}^{δ}

for minimizers of

T_{y^{δ}, α, n}

. For

y \in Y

we call

x^{+} \in argmin {R (x) ∣ x \in X \land A x = y}

an

R

-minimizing solution of

A x = y

.

Lemma 1

(Existence of

R

-minimizing solutions). Let Assumption 1 hold. For any

y \in A (D)

an

R

-minimizing solution of

A x = y

exists. Likewise, if

n \in N

and

y \in A_{n} (D)

an

R_{n}

-minimizing solution of

A_{n} x = y

exists.

Proof.

Again is is sufficient the verify the claim for

R

-minimizing solution. Because

y \in A (D)

, the set

A^{- 1} ({y}) = {x \in X ∣ A x = y}

is non-empty. Hence we can choose a sequence

{(x_{k})}_{k \in N}

in

A^{- 1} ({y})

with

R (x_{k}) \to inf {R (x) ∣ x \in X \land A x = y}

. Due to (W2b),

{(x_{k})}_{k \in N}

is contained in

{x \in X ∣ D (A (x), y) + α R (x) \leq C}

for some

C > 0

which is bounded according to (W5). By (W1)

X

is reflexive and therefore

{(x_{k})}_{k \in N}

has a weak accumulation point

x^{+}

. From (W1), (W4), (W3) we conclude that

x^{+}

is an

R

-minimizing solution of

A x = y

. The case of

R_{n}

-minimizing solutions follows analogous. □

2.2. Convergence

Next we proof that discrete NETT converges as the noise level goes to zero and the discretization as well as the regularization parameter are chosen properly. We write

D_{n, M} : = {x \in D \cap X_{n} ∣ R_{n} (x) \leq M}

and formulate the following approximation conditions for obtaining convergence.

Assumption 2

(Conditions for convergence).

Element

x^{+} \in D

satisfies the following for all

M > 0

:

(C1): $\exists (z_{n}) \in \prod_{n \in N} (D \cap X_{n})$ with $λ_{n} : = | R_{n} (z_{n}) - R (x^{+}) | \to 0$ .
(C2): $ρ_{n} : = {sup}_{x \in D_{n, M}} | R_{n} (x) - R (x) | \to 0$ .
(C3): $γ_{n} : = D (A_{n} z_{n}, A x^{+}) \to 0$ .
(C4): $a_{n} : = {sup}_{x \in D_{n, M}} | D (A_{n} x, A x^{+}) - D (A x, A x^{+}) | \to 0$ .

Conditions (C1) and (C3) concerns the approximation of the true unknown x with elements in the discretization space, that is compatible with the discretization of the forward operator and regularizer. Conditions (C2) and (C4) are uniform approximation properties of the operator and the regularizer on

R_{n}

-bounded sets.

Theorem 2

(Convergence). Let (W1)–(W9) hold,

y \in A (D)

and let

x^{+}

be an

R

-minimizing solution of

A x = y

that satisfies (C1)–(C4). Moreover, suppose

{(δ_{k})}_{k \in N} \in {(0, \infty)}^{N}

converges to zero and

{(y_{k})}_{k \in N} \in Y^{N}

satisfies

D (y, y_{k}) \leq δ_{k}

. Choose

{(α_{k})}_{k \in N}

and

{(n_{k})}_{k \in N}

such that as

k \to \infty

we have

\begin{matrix} α_{k} \to 0 \end{matrix}

(4)

\begin{matrix} n_{k} \to \infty \end{matrix}

(5)

\begin{matrix} (δ_{k} + D (A_{n_{k}} z_{n_{k}}, y)) / α_{k} \to 0 . \end{matrix}

(6)

Then for

x_{k} \in argmin T_{y_{k}, δ_{k}, n_{k}}

the following hold:

(a): ${(x_{k})}_{k \in N}$ has a weakly convergent subsequence ${(x_{σ (k)})}_{k \in N}$
(b): The weak limit of ${(x_{σ (k)})}_{k \in N}$ is an $R$ -minimizing solution of $A x = y$ .
(c): $R_{σ (k)} (x_{σ (k)}) \to R (x^{⋆})$ , where $x^{⋆}$ is the weak limit of ${(x_{σ (k)})}_{k \in N}$ .
(d): If the $R$ -minimizing solution of $A x = y$ is unique, then ${(x_{k})}_{k \in N} ⇀ x^{+}$ .

Proof.

For convenience and some abuse of notation we use the abbreviations

R_{k} : = R_{n_{k}}

,

A_{k} : = A_{n_{k}}

,

a_{k} : = a_{n_{k}}

,

z_{k} : = z_{n_{k}}

and

ρ_{k} : = ρ_{n_{k}}

. Because

x_{k}

is a minimizer of the discrete NETT functional

T_{y_{k}, δ_{k}, n_{k}}

by (W2) we have

\begin{matrix} D (A_{k} x_{k}, y_{k}) & + α_{k} R_{k} (x_{k}) \leq D (A_{k} z_{k}, y_{k}) + α_{k} R_{k} (z_{k}) \\ \leq τ D (A_{k} z_{k}, y) + τ D (y, y_{k}) + α_{k} R_{k} (z_{k}) = τ D (A_{k} z_{k}, y) + τ δ_{k} + α_{k} R_{k} (z_{k}) \end{matrix}

According to (C1), (4), we get

\begin{matrix} D (A_{k} x_{k}, y_{k}) \leq τ (D (A_{k} z_{k}, y) + δ_{k}), \end{matrix}

(7)

\begin{matrix} R_{k} (x_{k}) \leq τ \cdot \frac{D (A_{k} z_{k}, y_{k}) + δ_{k}}{α_{k}} + R_{k} (z_{k}) . \end{matrix}

(8)

According to (C1), (C3), (5), (6) the right hand side in (7) converges to zero and the right hand side in (8) to

R (x_{+})

. Together with (C2) we obtain

R (x_{k}) \leq R_{k} (x_{k}) + ρ_{k} \to R (x_{+})

and

D (A x_{k}, y) \leq D (A_{k} x_{k}, y) + a_{k} \leq τ D (A_{k} x_{k}, y) + a_{k} + τ δ_{k} \to 0

. This shows that

{(D (A x_{k}, y) + R (x_{k}))}_{k \in N}

is bounded and by (W1), (W9) there exists a weakly convergent subsequence

{(x_{σ (k)})}_{k \in N}

. We denote the weak limit by

x^{⋆} \in X

. From (W2), (W4) we obtain

A x = y

. The weak lower semi-continuity of

R

assumed in (W3) shows

\begin{matrix} R (x^{⋆}) \leq \underset{k}{lim inf} R (x_{σ (k)}) \leq \underset{k}{lim sup} R (x_{σ (k)}) \\ \leq \underset{k}{lim sup} (R_{σ (k)} (x_{σ (k)}) + ρ_{k}) \leq R (x^{+}) . \end{matrix}

Consequently,

x^{⋆}

is an

R

-minimizing solution of

A x = y

and

R (x_{σ (k)}) \to R (x^{⋆})

. If the

R

-minimizing solution is unique then

x^{+}

is the only weak accumulation point of

{(x_{k})}_{k \in N}

which concludes the proof. □

2.3. Convergence Rates

Next we derive quantitative error estimates (convergence rates) in terms of the absolute Bregman distance. Recall that a function

R : X \to [0, \infty]

is Gâteaux differentiable at some

x^{⋆} \in X

if the directional derivative

R^{'} (x^{⋆}) (h) : = (R (x^{⋆} + t h) - R (x^{⋆})) / t

exist for every

h \in X

. We denote by

R^{'} (x^{⋆})

the Gâteaux derivative of

R

at x. In [9] we introduced the absolute Bregman distance

B_{R} (\cdot, x^{⋆}) : X \to [0, \infty]

of a Gâteaux differentiable functional

R : X \to [0, \infty]

at

x^{⋆} \in X

with respect to

R

defined by

\forall x \in X : B_{R} (x, x^{⋆}) : = | R (x) - R (x^{⋆}) - R^{'} (x^{⋆}) (x - x^{⋆}) | .

(9)

We write

{sup}_{y^{δ}} H (y^{δ}) : = sup {H (y^{δ}) ∣ y^{δ} \in X \land D (A x^{+}, y^{δ}) \leq δ}

. Convergence rates in terms of the Bregman distance are derived under a smoothness assumption on the true solution in the form of a certain variational inequality. More precisely we assume the following:

Assumption 3

(Conditions for convergence rates).

Element

x^{+} \in D

satisfies the following for all

M, δ > 0

:

(R1): Items (C1), (C2) hold.
(R2): $γ_{n, δ} : = {sup}_{y^{δ}} | D (A_{n} z_{n}, y^{δ}) - D (A x^{+}, y^{δ}) | \to 0$ .
(R3): $a_{n, δ} : = {sup}_{y^{δ}} {sup}_{x \in D_{n, M}} | D (A_{n} x, y^{δ}) - D (A x, y^{δ}) | \to 0$ .
(R4): $R$ is Gâteaux differentiable at $x^{+}$
(R5): There exist a concave, continuous, strictly increasing $φ : [0, \infty) \to [0, \infty)$ with $φ (0) = 0$ and $ϵ, β > 0$ such that for all $x \in X$

$| R (x) - R (x^{+}) | \leq ϵ \Rightarrow β B_{R} (x, x^{+}) \leq R (x) - R (x^{+}) + φ (D (A x, A x^{+})) .$

According to (R5) the inverse function

φ^{- 1} : [0, \infty) \to [0, \infty)

exists and is convex. We denote by

φ^{- *} (s) : = sup {s t - φ^{- 1} (t) ∣ t \geq 0}

its Fenchel conjugate.

Proposition 1

(Error estimates). Let

y \in A (D)

and

x^{+}

be an

R

-minimizing solution of

A x = y

such that (W1)–(W9) and (R1)–(R5) are satisfied. For

y^{δ} \in Y

with

D (y, y^{δ}) \leq δ

let

x_{α, n}^{δ} \in argmin T_{y^{δ}, α, n}

. Then for sufficient small

δ, α > 0

and sufficiently large

n \in N

, we have the error estimate

B_{R} (x_{α, n}^{δ}, x^{+}) \leq \frac{a_{n, δ} + γ_{n, δ} + δ}{α} + ρ_{n} + λ_{n} + φ (τ δ) + \frac{φ^{- *} (τ α)}{τ α} .

(10)

Proof.

According to Theorem 2 we can assume

| R (x_{α, n}^{δ}) - R (x^{+}) | < ϵ

and with (R5) we obtain

\begin{matrix} α β B_{R} (x_{α, n}^{δ}, x^{+}) \\ \leq α R (x_{α, n}^{δ}) - α R (x^{+}) + α φ (D (A x_{α, n}^{δ}, y)) \\ \leq α R_{n} (x_{α, n}^{δ}) - α R_{n} (z_{n}) + α ρ_{n} + α λ_{n} + α φ (D (A x_{α, n}^{δ}, y)) \\ \leq D (A_{n} z_{n}, y^{δ}) - D (A_{n} x_{α, n}^{δ}, y^{δ}) + α ρ_{n} + α λ_{n} + α φ (D (A x_{α, n}^{δ}, y)) \\ \leq δ - D (A x_{α, n}^{δ}, y^{δ}) + γ_{n, δ} + a_{n, δ} + α ρ_{n} + α λ_{n} + α φ (τ δ) + α φ (τ D (A x_{α, n}^{δ}, y^{δ})) \\ \leq δ + γ_{n, δ} + a_{n, δ} + α ρ_{n} + α λ_{n} + α φ (τ δ) + τ^{- 1} φ^{- *} (τ δ) . \end{matrix}

For the second inequality we used (C1) and (C2). We have

D (A_{n} x_{α, n}^{δ}, y^{δ}) + α R_{n} (x_{α, n}^{δ}) \leq D (A_{n} z_{n}, y^{δ}) + α R_{n} (z_{n})

and thus we get an estimate for

R_{n} (x_{α, n}^{δ}) - R_{n} (z_{n})

which we used for the third inequality. For the next inequality we used (R2) and (R3). Finally we used Young’s inequality

α φ (τ t) \leq t + τ^{- 1} φ^{- *} (τ α)

for the last step. □

Remark 1.

The error estimate (10) includes the approximation quality of the discrete or inexact forward operator

A_{n}

and the discrete or inexact regularizer

R_{n}

described by

a_{n, δ}

and

ρ_{n}

, respectively. What might be unexpected at first is the inclusion of two new parameters

λ_{n}

and

γ_{n, δ}

. These factors both arise from the approximation of

X

by the finite dimensional spaces

X_{n}

, where

γ_{n, δ}

reflects approximation accuracy in the image of the operator

A

and

λ_{n}

approximation accuracy with respect to the true regularization functional

R

. Note that in the case where the forward operator, the regularizer, and the solution space

X

are given precisely, we have

a_{n, δ} = γ_{n, δ} = λ_{n} = ρ_{n} = 0

. In this particular case we recover the estimate derived for the NETT in [9].

Theorem 3

(Convergence rates). Let the assumptions of Proposition 1 hold and consider the parameter choice rule

α (δ) ≍ δ / φ (δ)

and let the approximation errors satisfy

a_{n, δ} + γ_{n, δ} = O (δ)

,

ρ_{n} + λ_{n} = O (φ (τ δ))

. Then we have the convergence rate

B_{R} (x_{α (δ), n (δ)}^{δ}, x^{+}) = O (φ (τ δ)) .

(11)

Proof.

Noting that

φ^{- *} (τ δ / φ (τ δ)) / δ

remains bounded as

δ \to 0

, this directly follows from Proposition 1. □

Next we verify that a variational inequality of the form (R5) is satisfied with

φ (t) = c \sqrt{t}

under a typical source like condition.

Lemma 2

(Variational inequality under source condition). Let

R

,

A

be Gâteaux differentiable at

x^{+} \in X

, consider the distance measure

D (y_{1}, y_{2}) = {∥ y_{1} - y_{2} ∥}^{2}

and assume there exist

η \in X^{⋆}

and

c_{1}, c_{2}, ϵ > 0

with

c_{1} ∥ η ∥ < 1

such that for all

x \in X

with

| R (x) - R (x^{+}) | \leq ϵ

we have

\begin{matrix} R^{'} (x^{+}) = A^{'} {(x^{+})}^{*} η \\ ∥ A x - A x^{+} - A^{'} (x^{+}) (x - x^{+}) ∥ \leq c_{1} B_{R} (x, x^{+}) \\ R (x^{+}) - R (x) \leq c_{2} ∥ A x - A x^{+} ∥ . \end{matrix}

(12)

Then (R5) holds with

φ (t) = (∥ η ∥ + 2 c_{2}) \sqrt{t}

and

β = 1 - c_{1} ∥ η ∥

.

Proof.

Let

x \in X

with

| R (x) - R (x^{+}) | \leq ϵ

. Using the Cauchy-Schwarz inequality and Equation (12), we can estimate

\begin{matrix} | 〈 R^{'} (x^{+}), x - x^{+} 〉 | \leq & ∥ A^{'} (x^{+}) (x - x^{+}) ∥ ∥ η ∥ \\ \leq & ∥ A x - A x^{+} ∥ ∥ η ∥ + ∥ A x - A x^{+} - A^{'} (x^{+}) (x - x^{+}) ∥ ∥ η ∥ \\ \leq & ∥ A x - A x^{+} ∥ ∥ η ∥ + c_{1} ∥ η ∥ B_{R} (x, x^{+}) . \end{matrix}

Additionally, if

R (x) \geq R (x^{+})

, we have

| R (x) - R (x^{+}) | = R (x) - R (x^{+})

, and on the other hand if

R (x) < R (x^{+})

, we have

| R (x) - R (x^{+}) | \leq R (x) - R (x^{+}) + 2 (R (x^{+}) - R (x)) \leq R (x) - R (x^{+}) + 2 c_{2} ∥ A x - A x^{+} ∥

. Putting this together we get

\begin{matrix} B_{R} (x, x^{+}) \leq | R (x) - & R (x^{+}) | + | 〈 R^{'} (x^{+}), x - x^{+} 〉 | \\ \leq R (x) - R (x^{+}) + (∥ η ∥ + 2 c_{2}) ∥ A x - A x^{+} ∥ + c_{1} ∥ η ∥ B_{R} (x, x^{+}), \end{matrix}

and thus

(1 - c_{1} ∥ η ∥) B_{R} (x, x^{+}) \leq R (x) - R (x^{+}) + (∥ η ∥ + 2 c_{2}) ∥ A x - A x^{+} ∥

. □

Corollary 1

(Convergence rates under source condition). Let the conditions of Lemma 2 hold and suppose

\begin{matrix} α (δ) ≍ \sqrt{δ} \\ | R_{n (δ)} (z_{n (δ)}) - R (x^{+}) | = O (\sqrt{δ}) \\ sup {| R_{n (δ)} (x) - R (x) | ∣ x \in D_{n (δ), M}} = O (\sqrt{δ}) \\ ∥ A_{n (δ)} z_{n (δ)} - A x^{+} ∥ = O (\sqrt{δ}) \\ sup {∥ A_{n (δ)} x - A x ∥ ∣ x \in D_{n (δ), M}} = O (\sqrt{δ}) \\ sup {∥ A_{n (δ)} x ∥ ∣ x \in D_{n (δ), M}} < \infty . \end{matrix}

Then we have the convergence rates result

B_{R} (x_{α (δ), n (δ)}^{δ}, x^{+}) = O (\sqrt{δ}) .

(13)

Proof.

Follows from Theorem 3 and Lemma 2. Note that we use

∥ \cdot ∥

in the theorem, while

D (y_{1}, y_{1}) = {∥ y_{1} - y_{2} ∥}^{2}

uses the squared norm

{∥ \cdot ∥}^{2}

and thus the approximation rates for the terms concerning

A_{n (δ)}

are order

\sqrt{δ}

instead of

δ

as in Theorem 3. □

In Corollary 1, the approximation quality of the discrete operator

A_{n}

and the discrete and inexact regularization functional

R_{n}

need to be of the same order.

3. Application to a Limited Data Problem in PAT

Photoacoustic Tomography (PAT) is an emerging non-invasive coupled-physics biomedical imaging technique with high contrast and high spatial resolution [25,26]. It works by illuminating a semi-transparent sample with short optical pulses which causes heating of the sample followed by expansion and the subsequent emission of an acoustic wave. Sensors on the outside of the sample measure the acoustic wave and these measurements are then used to reconstruct the initial pressure

f : R^{d} \to R

, which provides information about the interior of the object. The cases

d = 2

and

d = 3

are relevant for applications in PAT. Here we only consider the case

d = 2

and assume a circular measurement geometry. The 2D case arises for example when using integrating line detectors in PAT [26].

3.1. Discrete Forward Operator

The pressure data

p : R^{2} \times [0, \infty) \to R

satisfies the wave equation

(\partial_{t}^{2} - Δ) p (r, t) = 0 for (r, t) \in R^{2} \times (0, \infty)

with initial data

p (\cdot, 0) =

and

\partial_{t} p (\cdot, 0) = 0

. In the case of circular measurement geometry one assumes that f vanishes outside the unit disc

D_{1} : = {r \in R^{2} ∣ ∥ r ∥ < 1}

and the measurement sensors are located on the boundary

\partial D_{1} = S^{1}

. We assume that the phantom will not generate any data for some region

I \subseteq D_{1}

, for example when the acoustic pressure generated inside I is too small to be recorded. This masked PAT problem consists in the recovery of the function f from sampled noisy measurements of

g = W (1_{I^{c}} f)

where

W

denotes the solution operator of the wave equation and

𝟙_{I^{c}}

the indicator function on

I^{c} : = R^{2} ∖ I

. Note that the resulting inverse problem can be seen of the combination of an inpainting problem and in inverse problems for the wave equation.

In order to implement the PAT forward operator we use a basis ansatz

f (r) = \sum_{i = 1}^{N \times N} x_{i} ψ (r - r_{i})

where

x_{i} \in R

are basis coefficients and

ψ : R^{2} \to R

a generalized Kaiser-Bessel (KB) and

r_{i} = (i - 1) / N

with

i = (i_{1}, i_{2}) \in {1, \dots, N}^{2}

. The generalized KB functions are popular in tomographic inverse problems [27,28,29,30] and denote radially symmetric functions with support in

D_{R}

defined by

ψ (r) : = {(1 - {∥ r ∥}^{2} / R^{2})}^{m / 2} \frac{I_{m} (γ \sqrt{1 - {∥ r ∥}^{2} / R^{2}})}{I_{m} (γ)} for ∥ r ∥ \leq R .

(14)

Here

I_{m}

is the modified Bessel function of the first kind of order

n \in N

and the parameters

γ > 0

and R denote the window taper and support radius, respectively. Since

W

is linear we have

W f = \sum_{i = 1}^{N \times N} x_{i} W (ψ (\cdot - r_{i}))

. For convenience we will use a pseudo-3D approach where use the 3D solution of

W ψ

for which there exists an analytical representation [29]. Denote by

s_{k}

uniformly spaced sensor locations on

S^{1}

and by

t_{j} > 0

uniformly sampled measurement times in

[0, 2]

. Define the

N_{t} N_{s} \times N^{2}

model matrix by

W_{N_{t} (k - 1) + j, N (i_{1} - 1) + i_{2}} = W (ψ (\cdot - r_{i})) (s_{k}, t_{j})

and an

N^{2} \times N^{2}

diagonal matrix by

{(M_{I})}_{N (i_{1} - 1) + i_{2}, N (i_{1} - 1) + i_{2}} = 1

if

r_{i} \in I^{c}

and zero otherwise. Let

W M_{I} = U Σ V^{⊺}

be the singular valued decomposition. We then consider the discrete forward matrix

A = U Σ_{⋆} V^{⊺}

where

Σ_{⋆}

is the diagonal matrix derived from

Σ

by setting singular values smaller than some

σ_{⋆}

to zero. This allows us to easily calculate

A^{+} = V Σ_{⋆}^{+} U^{⊺}

where

Σ_{⋆}^{+}

is calculated by inverting all diagonal elements of

Σ_{⋆}

that are greater than zero. In our experiments we use

N = N_{t} = 128

,

N_{s} = 150

and take I fixed as a diagonal stripe of width

0.34

.

3.2. Discrete NETT

We consider the discrete NETT with discrepancy term

D (A x, y^{δ}) = {∥ A x - y^{δ} ∥}_{2}^{2} / 2

and regularizer given by

R^{(m)} (x) = ∥ {x - Φ^{(m)} (x) ∥}_{2}^{2} + β {∥ \nabla x ∥}_{1, ϵ},

(15)

where

∥ {\nabla x ∥}_{1, ϵ} : = \sum_{i_{1}, i_{2} = 1}^{128} \sqrt{{| r_{i_{1} + 1, i_{2}} - r_{i} |}^{2} + {| r_{i_{1}, i_{2} + 1} - r_{i_{1}, i_{2}} |}^{2} + ϵ^{2}}

with

ϵ > 0

is a smooth version of the total variation [31] and

Φ^{(m)}

is a learnable network. We take

Φ^{(m)}

as the U-Net [32] with residual connection, which has first been applied to PAT image reconstruction in [33]. Here m stands for the number of down-/upsampling steps performed in the U-Net (the original one had

m = 4

i.e.,

m \in N

. This means that larger m yield a deeper network with more parameters. We generate training data that consist of square shaped rings with random profile and random location. See Figure 1 for an example of one such phantom (note that all plots in signal space use the same colorbar) and the corresponding data. We get a set of phantoms

x_{1}, \dots, x_{1000}

and corresponding basic reconstructions

h_{a} : = A^{+} (A x_{a} + η_{a})

, where

A^{+}

is the pseudo-inverse and

η_{a}

is Gaussian noise with standard deviation of

σ ∥ {A x_{a} ∥}_{\infty}

with

σ = 0.01

. The networks are trained by minimizing

\sum_{a = 1}^{1000} {∥ Φ^{(m)} (h_{a}) - x_{a} ∥}_{1} + γ {∥ Φ^{(m)} (x_{a}) - x_{a} ∥}_{1}

where we used the Adam optimizer with learning rate 0.01 and

γ = 0.1

. The considered loss is that we want the trained regularizer to give small values for

x_{a}

and large values for

h_{a}

. The strategy is similar to [9] but we use the final output of the network for the regularizer as proposed in [34]. To minimize (15) we use Algorithm 1 which implements a forward-backward scheme [35]. The most expensive step of this algorithm is the matrix inversion but since we use constant stepsize one also has to option to only calculate the inverse of the matrix once and reuse it. Thus one only has to perform two matrix-vector multiplications which are of the order

O (N^{2} N_{t} N_{s})

and

O (N^{4})

since

N_{2}

is the dimension of our phantoms. On the other hand calculating the gradient has similar complexity than applying the neural network which is in the order

O (F^{2} L N^{2})

with F the number of convolution channels and L the number of layers.

Algorithm 1: NETT optimization.

3.3. Numerical Results

For the numerical results we train two regularizers

R^{(1)}

and

R^{(3)}

as described in Section 3.2. The networks are implemented using PyTorch [36]. We also use PyTorch in order to calculate the gradient

\nabla_{x} R^{(m)}

. We take

N_{iter} = 15

,

s = 0.25

and

x_{0} = Φ^{(m)} A^{⊺} y

in Algorithm 1 and compute the inverse

{(A^{⊺} A - s Id)}^{- 1}

only once and then use it for all examples. We set

α = 0.015

for the noise-free case,

α = 0.016

for the low noise case and

α = 0.02

for the high noise cases, respectively, and selected a fixed

β = 15

. We expect that the NETT functional will yield better results due to data consistency, which is mainly helpful outside the masked center diagonal.

First we use the phantom from the testdata shown in Figure 1. The results using post processing and NETT are shown in Figure 2. One sees that all results with higher noise than used during training are not very good. This indicates that one should use similar noise as in the later applications even for the NETT. Figure 3 shows the average error using 10 test phantoms similar to the on in Figure 1. Careful numerical comparison of the numerical convergence rates and the theoretical results of Theorem 1 is an interesting aspect of further research. To investigate the stability of our method with respect to phantoms that are different from the training data we create a phantom with different structures as seen in Figure 4. As expected, the post processing network

Φ^{(3)}

is not really able to reconstruct the circles object, since it is quite different from the training data, but it also does not break down completely. On the other hand, the NETT approach yields good results due to data consistency.

4. Conclusions

We have analyzed the convergence a discretized NETT approach and derived the convergence rates under certain assumptions on the approximation quality of the involved operators. We performed numerical experiments using a limited data problem for PAT that is the combination of an inverse problem for the wave equation and an inpainting problem. To the best of our knowledge this is the first such problem studied with deep learning. The NETT approach yields better results that post processing for phantoms different from the training data. NETT still fails to recover some missing parts of the phantom in cases the data contains more noise than the training data. This highlights the relevance of using different regularizers for different noise levels. Finding ways to make the regularizers less dependent on the noise level used during training is a possible future research direction. Another interesting question is if this results can be combined with with approximation error estimates for neural networks e.g., [37,38]. It seems not obvious how these two approaches can be combined. Furthermore, studying how one can define neural network based regularizers that fulfill (12) might also be an interesting line of future research.

Author Contributions

M.H. prosed the conceptualization, framework and the long term vision of the work. M.H and S.A. developed the ideas, performed the formal analysis and have written and edited the paper. S.A. conducted the numerical experiments and has written the software. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported bay the Austrian Science Fund (FWF), project P 30747-N32.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data and code are freely available upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Engl, H.W.; Hanke, M.; Neubauer, A. Regularization of Inverse Problems. In Mathematics and Its Applications; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1996; Volume 375. [Google Scholar]
Scherzer, O.; Grasmair, M.; Grossauer, H.; Haltmeier, M.; Lenzen, F. Variational methods in imaging. In Applied Mathematical Sciences; Springer: New York, NY, USA, 2009; Volume 167. [Google Scholar]
Natterer, F.; Wübbeling, F. Mathematical Methods in Image Reconstruction. In Monographs on Mathematical Modeling and Computation; SIAM: Philadelphia, PA, USA, 2001; Volume 5. [Google Scholar]
Zhdanov, M.S. Geophysical Inverse Theory and Regularization Problems; Elsevier: Amsterdam, The Netherlands, 2002; Volume 36. [Google Scholar]
Morozov, V.A. Methods for Solving Incorrectly Posed Problems; Springer: New York, NY, USA, 1984. [Google Scholar]
Tikhonov, A.N.; Arsenin, V.Y. Solutions of Ill-Posed Problems; John Wiley & Sons: Washington, DC, USA, 1977. [Google Scholar]
Ivanov, V.K.; Vasin, V.V.; Tanana, V.P. Theory of Linear Ill-Posed Problems and Its Applications, 2nd ed.; Inverse and Ill-posed Problems Series; VSP: Utrecht, The Netherlands, 2002. [Google Scholar]
Grasmair, M. Generalized Bregman distances and convergence rates for non-convex regularization methods. Inverse Probl. 2010, 26, 115014. [Google Scholar] [CrossRef] [Green Version]
Li, H.; Schwab, J.; Antholzer, S.; Haltmeier, M. NETT: Solving inverse problems with deep neural networks. Inverse Probl. 2020, 36, 065005. [Google Scholar] [CrossRef] [Green Version]
Obmann, D.; Nguyen, L.; Schwab, J.; Haltmeier, M. Sparse ℓ^q-regularization of Inverse Problems Using Deep Learning. arXiv 2019, arXiv:1908.03006. [Google Scholar]
Obmann, D.; Nguyen, L.; Schwab, J.; Haltmeier, M. Augmented NETT regularization of inverse problems. J. Phys. Commun. 2021, 5, 105002. [Google Scholar] [CrossRef]
Haltmeier, M.; Nguyen, L.V. Regularization of Inverse Problems by Neural Networks. arXiv 2020, arXiv:2006.03972. [Google Scholar]
Lunz, S.; Öktem, O.; Schönlieb, C.B. Adversarial Regularizers in Inverse Problems; NIPS: Montreal, QC, Canada, 2018; pp. 8507–8516. [Google Scholar]
Mukherjee, S.; Dittmer, S.; Shumaylov, Z.; Lunz, S.; Öktem, O.; Schönlieb, C.B. Learned convex regularizers for inverse problems. arXiv 2020, arXiv:2008.02839. [Google Scholar]
Adler, J.; Öktem, O. Solving ill-posed inverse problems using iterative deep neural networks. Inverse Probl. 2017, 33, 124007. [Google Scholar] [CrossRef] [Green Version]
Aggarwal, H.K.; Mani, M.P.; Jacob, M. MoDL: Model-based deep learning architecture for inverse problems. IEEE Trans. Med. Imaging 2018, 38, 394–405. [Google Scholar] [CrossRef]
de Hoop, M.V.; Lassas, M.; Wong, C.A. Deep learning architectures for nonlinear operator functions and nonlinear inverse problems. arXiv 2019, arXiv:1912.11090. [Google Scholar]
Kobler, E.; Klatzer, T.; Hammernik, K.; Pock, T. Variational networks: Connecting variational methods and deep learning. In Proceedings of the German Conference on Pattern Recognition, Basel, Switzerland, 12–15 September 2017; Springer: Cham, Switzerland, 2017; pp. 281–293. [Google Scholar]
Yang, Y.; Sun, J.; Li, H.; Xu, Z. Deep ADMM-Net for Compressive Sensing MRI. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 10–18. [Google Scholar]
Shang, Y. Subspace confinement for switched linear systems. Forum Math. 2017, 29, 693–699. [Google Scholar] [CrossRef]
Romano, Y.; Elad, M.; Milanfar, P. The little engine that could: Regularization by denoising (RED). SIAM J. Imaging Sci. 2017, 10, 1804–1844. [Google Scholar] [CrossRef]
Pöschl, C.; Resmerita, E.; Scherzer, O. Discretization of variational regularization in Banach spaces. Inverse Probl. 2010, 26, 105017. [Google Scholar] [CrossRef]
Pöschl, C. Tikhonov Regularization with General Residual Term. Ph.D. Thesis, University of Innsbruck, Innsbruck, Austria, 2008. [Google Scholar]
Tikhonov, A.N.; Leonov, A.S.; Yagola, A.G. Nonlinear ill-posed problems. In Applied Mathematics and Mathematical Computation; Translated from the Russian; Chapman & Hall: London, UK, 1998; Volumes 1, 2 and 14. [Google Scholar]
Kruger, R.; Lui, P.; Fang, Y.; Appledorn, R. Photoacoustic ultrasound (PAUS)—Reconstruction tomography. Med. Phys. 1995, 22, 1605–1609. [Google Scholar] [CrossRef]
Paltauf, G.; Nuster, R.; Haltmeier, M.; Burgholzer, P. Photoacoustic tomography using a Mach-Zehnder interferometer as an acoustic line detector. Appl. Opt. 2007, 46, 3352–3358. [Google Scholar] [CrossRef]
Matej, S.; Lewitt, R.M. Practical considerations for 3-D image reconstruction using spherically symmetric volume elements. IEEE Trans. Med. Imaging 1996, 15, 68–78. [Google Scholar] [CrossRef] [PubMed]
Schwab, J.; Pereverzyev, S., Jr.; Haltmeier, M. A Galerkin least squares approach for photoacoustic tomography. SIAM J. Numer. Anal. 2018, 56, 160–184. [Google Scholar] [CrossRef] [Green Version]
Wang, K.; Schoonover, R.W.; Su, R.; Oraevsky, A.; Anastasio, M.A. Discrete Imaging Models for Three-Dimensional Optoacoustic Tomography Using Radially Symmetric Expansion Functions. IEEE Trans. Med. Imaging 2014, 33, 1180–1193. [Google Scholar] [CrossRef] [Green Version]
Wang, K.; Su, R.; Oraevsky, A.A.; Anastasio, M.A. Investigation of iterative image reconstruction in three-dimensional optoacoustic tomography. Phys. Med. Biol. 2012, 57, 5399. [Google Scholar] [CrossRef] [PubMed]
Acar, R.; Vogel, C.R. Analysis of bounded variation penalty methods for ill-posed problems. Inverse Probl. 1994, 10, 1217. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Antholzer, S.; Haltmeier, M.; Schwab, J. Deep learning for photoacoustic tomography from sparse data. Inverse Probl. Sci. Eng. 2019, 27, 987–1005. [Google Scholar] [CrossRef] [Green Version]
Antholzer, S.; Schwab, J.; Bauer-Marschallinger, J.; Burgholzer, P.; Haltmeier, M. NETT regularization for compressed sensing photoacoustic tomography. In Proceedings of the Photons Plus Ultrasound: Imaging and Sensing 2019, San Francisco, CA, USA, 3–6 February 2019; Volume 10878, p. 108783B. [Google Scholar]
Combettes, P.L.; Pesquet, J.C. Proximal splitting methods in signal processing. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering; Springer: Berlin/Heidelberg, Germany, 2011; pp. 185–212. [Google Scholar]
Paszke, A.; Gross, S. PyTorch: An Imperative Style, High-Performance Deep Learning Library; NIPS: Montreal, QC, Canada, 2018; pp. 8024–8035. [Google Scholar]
Hornik, K. Some new results on neural network approximation. Neural Netw. 1993, 6, 1069–1072. [Google Scholar] [CrossRef]
Barron, A.R. Approximation and estimation bounds for artificial neural networks. Mach. Learn. 1994, 14, 115–133. [Google Scholar] [CrossRef]

Figure 1. Top from left to right: phantom, masked phantom, and initial reconstruction

A^{+} A x

. The difference between the phantoms on the left and the middle one shows the mask region

I \subseteq D_{1}

where no data is generated. Bottom from left to right: data without noise, low noise

σ = 0.01

, and high noise

σ = 0.1

.

Figure 1. Top from left to right: phantom, masked phantom, and initial reconstruction

A^{+} A x

. The difference between the phantoms on the left and the middle one shows the mask region

I \subseteq D_{1}

where no data is generated. Bottom from left to right: data without noise, low noise

σ = 0.01

, and high noise

σ = 0.1

.

Figure 2. Top row: reconstructions using post-processing network

Φ^{(1)}

. Middle row: NETT reconstructions using

R^{(1)}

. Bottom row: NETT reconstructions using

R^{(3)}

. From Left to Right: Reconstructions from data without noise, low noise (

σ = 0.01

) and high noise (

σ = 0.1)

.

Figure 2. Top row: reconstructions using post-processing network

Φ^{(1)}

. Middle row: NETT reconstructions using

R^{(1)}

. Bottom row: NETT reconstructions using

R^{(3)}

. From Left to Right: Reconstructions from data without noise, low noise (

σ = 0.01

) and high noise (

σ = 0.1)

.

Figure 3. Semilogarithmic plot of the mean squared errors of the NETT using

R^{(1)}

and

R^{(3)}

depending on the noise level. The crosses are the values for the phantoms in Figure 2.

Figure 3. Semilogarithmic plot of the mean squared errors of the NETT using

R^{(1)}

and

R^{(3)}

depending on the noise level. The crosses are the values for the phantoms in Figure 2.

Figure 4. Left column: phantom with a structure not contained in the training data (top) and pseudo inverse reconstruction (bottom). Middle column: Post-processing reconstructions with

Φ^{(3)}

using exact (top) and noisy data (bottom). Right column: NETT reconstructions with

R^{(3)}

using exact (top) and noisy data (bottom).

Figure 4. Left column: phantom with a structure not contained in the training data (top) and pseudo inverse reconstruction (bottom). Middle column: Post-processing reconstructions with

Φ^{(3)}

using exact (top) and noisy data (bottom). Right column: NETT reconstructions with

R^{(3)}

using exact (top) and noisy data (bottom).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Antholzer, S.; Haltmeier, M. Discretization of Learned NETT Regularization for Solving Inverse Problems. J. Imaging 2021, 7, 239. https://doi.org/10.3390/jimaging7110239

AMA Style

Antholzer S, Haltmeier M. Discretization of Learned NETT Regularization for Solving Inverse Problems. Journal of Imaging. 2021; 7(11):239. https://doi.org/10.3390/jimaging7110239

Chicago/Turabian Style

Antholzer, Stephan, and Markus Haltmeier. 2021. "Discretization of Learned NETT Regularization for Solving Inverse Problems" Journal of Imaging 7, no. 11: 239. https://doi.org/10.3390/jimaging7110239

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Discretization of Learned NETT Regularization for Solving Inverse Problems

Abstract

1. Introduction

1.1. Reconstruction with Learned Regularizers

1.2. Discrete NETT

1.3. Outline

2. Convergence Analysis

2.1. Well-Posedness

2.2. Convergence

2.3. Convergence Rates

3. Application to a Limited Data Problem in PAT

3.1. Discrete Forward Operator

3.2. Discrete NETT

3.3. Numerical Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI