Deep Unfolded Gridless DOA Estimation Networks Based on Atomic Norm Minimization

Zhu, Hangui; Feng, Weike; Feng, Cunqian; Ma, Teng; Zou, Bo

doi:10.3390/rs15010013

Open AccessArticle

Deep Unfolded Gridless DOA Estimation Networks Based on Atomic Norm Minimization

by

Hangui Zhu

¹

,

Weike Feng

^1,*

,

Cunqian Feng

¹,

Teng Ma

² and

Bo Zou

¹

Air and Missile Defense College, Air Force Engineering University, Xi’an 710051, China

²

National Laboratory of Radar Signal Processing, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(1), 13; https://doi.org/10.3390/rs15010013

Submission received: 9 November 2022 / Revised: 16 December 2022 / Accepted: 19 December 2022 / Published: 21 December 2022

(This article belongs to the Topic Advances in Array Signal Processing with Errors: Models, Algorithms and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Deep unfolded networks have recently been regarded as an essential way to direction of arrival (DOA) estimation due to the fast convergence speed and high interpretability. However, few consider gridless DOA estimation. This paper proposes two deep unfolded gridless DOA estimation networks to resolve the above problem. We first consider the atomic norm-based 1D and decoupled atomic norm-based 2D gridless DOA models solved by the alternating iterative minimization of variables, respectively. Then, the corresponding deep networks are trained offline after constructing the corresponding complete training datasets. At last, the trained networks are applied to realize the 1D DOA and 2D estimation, respectively. Simulation results reveal that the proposed networks can secure higher 1D and 2D DOA estimation performances while maintaining a lower computational expenditure than typical methods.

Keywords:

direction of arrival (DOA); sparse recovery; atomic norm; alternating iterative minimization; deep unfolding

Graphical Abstract

1. Introduction

The Direction of Arrival (DOA) estimation technique is widely used in radar, sonar, and other fields, and is an important research direction in array signal processing [1,2]. Typical DOA estimation methods include MUSIC (Multiple Signal Classification) [3] and ESPRIT (Estimation of Signal Parameters via Rotational Invariance Technique) [4]. Thanks to the successful application of 1D DOA estimation problems, the subspace super-resolution algorithms are successfully extended to 2D DOA estimation problems, such as 2D unitary estimation of signal parameters via rotational invariance techniques (U-ESPRIT) [5] and the 2D multiple signal classification (2D MUSIC) algorithm [6]. These algorithms can obtain high angular resolution under the requirement of multiple snapshot reception data and known coherent signal sources, but any of the conditions not satisfied will cause its estimation performance to degrade or even fail.

In recent years, effective alternative algorithms based on compressive sensing techniques have been introduced into the field of DOA estimation, and fruitful research results have been achieved. The traditional compressed sensing algorithms, for example, orthogonal matching pursuit (OMP) [7], sparse Bayesian learning (SBL) [8], and other algorithms, divide the possible space of the signal source into a finite number of grid points by dimension. It shows excellent estimation performance when the target angle falls exactly on the established grid, which can be applied to complex scenarios such as single snapshot, signal coherence, and missing data. On the contrary, if the real signal source does not fall on the established grid, it will cause a grid mismatch problem, and the estimation performance will be degraded or even fail. In addition, traditional compressed sensing algorithms must satisfy the pairwise isometry property (PIP) [9] and high-density gridding.

A gridless continuous compressive sensing technique based on the atomic norm theory and the Vandermonde decomposition theorem is proposed to overcome these problems, called atomic norm minimization (ANM) [10,11,12,13,14]. ANM projects the Vandermonde structure of the steering matrix from the observed data to the corresponding semi-definite programming (SDP) model using the Toeplitz matrix and obtains the recovered signal by solving the SDP optimization model to achieve super-resolution. Compared with the conventional compressed sensing algorithm, it does not need to mesh the space domain, and effectively avoids the grid mismatch problem and the PIP limitation. By solving the SDP optimization model using the SDPT3-based CVX/alternating direction method of multipliers (ADMM), the ANM-based method can obtain the recovered signal to achieve super-resolution 1D DOA estimation. However, computational budgets of the CVX solver SDPT3-based method increase with the scale of the problem. The performance of the model-driven ADMM method also depends on the selected regularization parameters, and improper initial values of the parameters affect its convergence speed and accuracy.

According to Caratheodory’s theory, the Vandermonde decomposition of the Toeplitz matrix does not hold in higher dimensional spaces, so the 1D ANM cannot be directly extended to the 2D DOA estimation problem. Fortunately, Chi et al. successfully apply vectorized ANM (VANM) to 2D DOA estimation with a dual Toeplitz matrix with a 2-dimensional Vandermonde structure [15]. However, the vectorization operations increase computational complexity. They cannot be applied to real-world scenarios. The dual 2D ANM starts from the dual problem of the VANM, but does not reduce the high computational costs of the VANM [16]. Luckily, Tian et al. proposed a new decoupled atomic norm minimization algorithm (DANM) [17,18,19] to reduce the heavy computational burden. DANM first replaces the vector atomic set in VANM with the matrix atomic set and then derives the corresponding SDP model, which naturally decouples the dual Toeplitz matrix in VANM into two Toeplitz matrices containing one-dimensional Vandermonde structures. Thereby, the 2D DOA estimation problem is converted to two 1D DOA estimation problems. This algorithm significantly reduces the computational complexity and is several orders of magnitude lower than VANM, which can also be solved by SDPT3-based CVX and ADMM algorithms [20,21]. However, ADMM algorithms are suffering from difficulties of parameter setting in practical applications. Meanwhile, inappropriate parameter settings will decrease the convergence speed and accuracy of the ADMM algorithm, thus adding to the computing complexity and degrading the DOA estimation performance. Even if the proper parameters can be chosen by theoretical analysis and cross-validation method, the fixed parameter settings fail to guarantee optimal convergence.

Recently, the Deep Unfolding/Unrolling (DU) method has been proposed as a solution to the above problems of model-driven SR algorithms inspired by deep learning techniques. With the DU method, a specific SR algorithm is unrolled into a deep neural network by taking the number of iterations of the algorithm as the number of layers and the algorithm’s parameters as the network’s learning parameters. SR-Net is trained on the training data set to determine the optimal parameters, thus improving the convergence speed and accuracy of the SR algorithm [22,23,24,25,26,27,28,29,30,31]. Some deep unfolding networks have successfully solved different DOA estimation problems. For example, The paper implemented fast DOA estimation under array deficiencies using the learned ISTA (LISTA) algorithm based on the iterative soft thresholding algorithm (ISTA) [32]; The paper proposed (position-enabled complex Toeplitz learned iterative shrinkage thresholding algorithm) PACT-LISTA algorithm, which integrates the intrinsic Toeplitz structure of the signal into LISTA, ignores the amplitude and phase information of the signal to be recovered and solves the mutual coupling problem between the antenna array elements using data [33]; In the paper [34], the iterative fixed-point continuation (FPC) algorithm was expanded into a deep neural network DeepFPC, to achieve single-bit DOA estimation. The paper in [35] modeled the 2D-DOA estimation problem for unmanned vehicles as a block sparse recovery problem and then used sparse bayesian learning network (SBLNet) to learn nonlinear features from data received by a large-scale (Multi-Input Multi-Output) MIMO multiple-input system or (Reconfigurable Intelligent Surface) RIS to achieve 2D-DOA and polarization parameter estimation at lower computational complexity. Our group extended the smoothed (SL0) algorithm into SL0-Net to achieve super-resolution DOA estimation [36]. By combining model-driven and data-driven approaches effectively, these DU methods reduce computational complexity dramatically and provide high DOA performance compared with corresponding SR algorithms.

Based on the ideas above, this paper proposes the DU gridless DOA estimation method, and the main ideas and contributions are as follows:

➀: The ANM-ADMM and DANM-ADMM algorithms are implemented to solve the 1D ANM and 2D DANM gridless DOA model, respectively, based on sparse optimization theory to reduce the computational burden of the existing SDPT3-based CVX method to a certain extent.
➁: Aiming to reduce the high computational cost and tackle the problems of intractable parameter presettings of the algorithms, we unfold the algorithms to the multi-layer deep neural network named ANM-ADMM-Net and DANM-ADMM-Net, respectively, based on the DU method. They include several types of network layers and parameters. Depending on the ANM and DANM estimation models, the complete and reasonable dataset construction methods are proposed to train the network with suitable training methods to achieve fast and accurate 1D and 2D DOA estimation, respectively.
➂: Traditional DU methods, such as FISTA and LAMP, are challenging to solve the ANM and DANM models. The proposed ANM-ADMM-Net and DANM-ADMM-Net as the novel DU-DOA framework is the only DU method that can optimize the above models and secure better gridless 1D and 2D DOA estimation performances with lower operational complexity.

The rest of this paper is structured as follows: In Section 2, the array received signal model and gridless DOA model, including the ANM and DANM models, are established briefly. Section 3 introduces the 1D ANM-ADMM and 2D DANM-ADMM algorithms first, then, the network structure, datasets construction method, and training methods of ANM-ADMM-Net and DANM-ADMM-Net are provided in detail. Section 4 verifies the performance and advantages of DU-DOA through simulation experiments compared with the SDPT3-based CVX and ADMM methods. Conclusions are drawn in Section 5.

2. Signal Model and ANM-DOA Model

2.1. 1D Signal Model and Its ANM-DOA Model

Assume that

K_{t}

far-field narrow-band signals incident into a uniform linear array (ULA) composed of

M

array elements. Then, the array received signal can be characterized as

\begin{matrix} Y & = X + N \\ = \sum_{k_{t} = 1}^{K_{t}} a (f_{k_{t}}) s_{k_{t}} + N \\ = \sum_{k_{t} = 1}^{K_{t}} c_{k_{t}} a (f_{k_{t}}) b_{k_{t}} + N \end{matrix}

(1)

where

a (f_{k_{t}}) = {[1, \exp (j 2 π f_{k_{t}}), \dots, \exp (j 2 π (M - 1) f_{k_{t}})]}^{T}

is the steering vector,

f_{k_{t}} = d \sin θ_{k_{t}} / λ

= \sin θ_{k_{t}} / 2

,

λ

is the signal wavelength,

d = λ / 2

is the array spacing,

θ_{k_{t}}

is the angle between the

k_{t}

-th signal source and the array, and

{[\cdot]}^{T}

represents the transposition;

s_{k_{t}} = [s_{k_{t} 1}, s_{k_{t} 2}, \dots, s_{k_{t} L}] \in ℂ^{1 \times L}

is the complex amplitude matrix;

c_{k_{t}} = | | s_{k_{t}} | |_{2} > 0

and

b_{k_{t}} = c_{k_{t}}^{- 1} s_{k_{t}}

with

| | b_{k_{t}} | |_{2} = 1

;

N

is the zero-mean Gaussian matrix white noise.

According to ANM theory [10], the received signal

X

can be seen as a linear combination of some

K_{t}

atoms in the set

A_{M}

. The set

A_{M}

of atoms can be defined as

A_{M} ≜ {A (f_{k_{t}}, b_{k_{t}}) = a (f_{k_{t}}) b_{k_{t}}, f_{k_{t}} \in [0, 0.5]}

(2)

Then, the atomic norm

l_{0}

of

X

is defined as

| | X | |_{A_{M}, 0} = \inf_{K_{t}} {K_{t} | X = \sum_{k_{t} = 1}^{K_{t}} c_{k_{t}} A (f_{k_{t}}, b_{k_{t}}), f_{k_{t}} \in A_{M}, c_{k_{t}} \geq 0}

(3)

So, the frequency can be recovered by dealing with the following optimization problem

\underset{X}{argmin} | | X | |_{A_{M}, 0} s . t . | | Y - X | |_{2}^{2} \leq ε

(4)

Usually, the solution of (4) can be obtained by its dual problem [34,37]. By exploiting the semi-positive, Toeplitz, and low-rank properties of the array covariance matrix

R = E {Y Y^{H}} = T (u)

[10], also (4) can be formulated into the following SDP problem.

\min_{W, u, X} \frac{1}{2} [Tr (W) + Tr (T (u))] s . t . [\begin{matrix} T (u) & X \\ X^{H} & W \end{matrix}] \geq 0, | | Y - X | |_{F}^{2} \leq ε

(5)

where

E {\cdot}

and

Tr (\cdot)

denote the expectation operator and the trace operation, and

T (\cdot)

denotes a mapping operation from a vector

u \in ℂ^{M \times 1}

to a Hermitian Toeplitz matrix

T (u) \in ℂ^{M \times M}

, and

W \in ℂ^{L \times L}

denotes an auxiliary variable matrix.

For problem (5), CVX solver SDPT3 can be utilized to secure optimal

T (u)

efficiently [38]. Then, the results of DOA are estimated by Vandermonde composition of

T (u)

[34,37]. Although completely avoiding gridding, the dual ANM and ANM-CVX methods have extensive computational complexity, failing to meet real-time requirements.

2.2. 2D Signal Model and Its DANM-DOA Model

Consider

K_{t}

spatial far-field narrowband signals acting on a

N \times M

uniform rectangular array (URA) with array element spacing of half wavelength, as shown in Figure 1. The pitch angle

ϕ_{k_{t}}

and azimuth angle

θ_{k_{t}}

of the

k_{t}

-th incident signals are related to the angles

α_{k_{t}}

and

β_{k_{t}}

between the

x

and

y

axes as follows:

ϕ_{k_{t}} = \arcsin (\sqrt{\sin {}_{2}{(α_{k_{t}})} + \sin {}_{2}{(β_{k_{t}})}})

(6)

θ_{k_{t}} = \arctan (\frac{\sin (β_{k_{t}})}{\sin (α_{k_{t}})}

(7)

By finding the angles

α_{k_{t}}

and

β_{k_{t}}

, we can obtain

θ_{k_{t}}

and

ϕ_{k_{t}}

according to Equations (6) and (7). Therefore, this paper uses

α_{k_{t}}

and

β_{k_{t}}

for signal modeling analysis. Then, the array guiding vector, the guiding matrix of dimensions

x

and

y

are

{\begin{cases} a_{x} (w_{x, k_{t}}) = {[1, e^{- j w_{x, k_{t}}}, \dots, e^{- j (N - 1) w_{x, k_{t}}}]}^{T} \\ a_{y} (w_{y, k_{t}}) = {[1, e^{- j w_{y, k_{t}}}, \dots, e^{- j (M - 1) w_{y, k_{t}}}]}^{T} \end{cases}

(8)

{\begin{cases} A_{x} = [a_{x} (w_{x, 1}), a_{x} (w_{x, 2}), \dots, a_{x} (w_{x, K_{t}})] \\ A_{y} = [a_{y} (w_{y, 1}), a_{y} (w_{y, 2}), \dots, a_{y} (w_{y, K_{t}})] \end{cases}

(9)

The single snapshot data can be expressed as:

X = \sum_{k_{t} = 1}^{K_{t}} s_{k_{t}} a_{x} (w_{x, k_{t}}) a_{y}^{T} (w_{y, k_{t}}) = A_{x} S A_{y}^{T}

(10)

where

S = diag (s_{1}, s_{2}, \dots, s_{K_{t}})

is a diagonal amplitude matrix, and

s_{k_{t}}

is the amplitude of the

k_{t}

-th target signal corresponding to the true frequencies of interest

w_{x, k_{t}} = π \sin ϕ_{k_{t}} \cos θ_{k_{t}} = π \sin α_{k_{t}}

and

w_{y, k_{t}} = π \sin ϕ_{k_{t}} \sin θ_{k_{t}} = π \sin β_{k_{t}}

. Two dimensional frequencies sets are

w_{x} = (w_{x, 1}, w_{x, 2}, \dots, w_{x, K_{t}})

,

w_{y} = (w_{y, 1}, w_{y, 2}, \dots, w_{y, K_{t}})

, and the corresponding angles are

\hat{α} = (α_{1}, α_{2}, \dots, α_{K_{t}})

,

\hat{β} = (β_{1}, β_{2}, \dots, β_{K_{t}})

.

2D DOA estimation means that

w_{x, k_{t}}

and

w_{y, k_{t}}

are first recovered from the observations data

X

, then arriving at the angles

α_{k_{t}}

and

β_{k_{t}}

for acquiring pitch angle

ϕ_{k_{t}}

and azimuth angle

θ_{k_{t}}

.

According to [17], the set of atoms

A_{V}

in a matrix form of the received signal data

X

can be expressed as

\begin{matrix} A_{M} & = {a_{x} (w_{x, k_{t}}) a_{y}^{T} (w_{y, k_{t}}), w_{x, k_{t}}, w_{y, k_{t}} \in [- π, π]} \\ = {A_{γ}, γ \in [- π, π] \times [- π, π]} \end{matrix}

(11)

where each atom is a rank l matrix. The atomic norm

l_{1}

of

X

is defined as

| | X | |_{A_{M}} = \inf {\sum_{k_{t}}^{K_{t}} | s_{k_{t}, p} | | X = \sum_{k_{t}}^{K_{t}} A (γ) s_{k_{t}}, A (γ) \in A_{M}}

(12)

Define the one-level Toeplitz matrices

T (u_{x})

and

T (u_{y})

constructed by their first rows

u_{x} \in ℂ^{M \times 1}

, and

u_{y} \in ℂ^{N \times 1}

, respectively. The problem of minimizing Equation (12) can be transformed into addressing the following semidefinite programming (SDP) problem [19]

\begin{array}{l} \underset{u_{x}, u_{y}, X}{\arg \min} \frac{λ}{2 \sqrt{N M}} Tr (T (u_{x})) + Tr (T (u_{y})) \\ s . t . Θ = [\begin{matrix} T (u_{x}) & X_{}^{H} \\ X_{} & T (u_{y}) \end{matrix}] \geq 0, | | Y - X | |_{F}^{2} \leq ε \end{array}

(13)

where

Y

denotes the received data with noise.

After obtaining

T (u_{x})

and

T (u_{y})

, the

x

and

y

-dimensional DOA can be obtained by the following decomposition

T (u_{x}) = A_{x} D_{x} A_{x}^{H}, T (u_{y}) = A_{y} D_{y} A_{y}^{H}

(14)

where

D_{x}

and

D_{y}

are diagonal matrices. The final 2D DOA is obtained by the pairing procedure after obtaining the DOA in each dimension [39].

For the DANM optimization model (13), an off-the-shelf CVX solver SDPT3 can be utilized to secure

T (u_{x})

and

T (u_{y})

, where the dimensionality of the semi-definite constraint matrix is

(N + M) \times (N + M)

[21]. The above directly leads to high computational costs when the dimensions

N

and

M

are unacceptably large. Although the ADMM algorithm in [20,40] is an alternative to solving (13), computational load still needs reconsidered.

3. DU-DOA Estimation

In order to lighten the computational load and enhance the estimation performance efficiently, this paper intends to the DU-DOA method to solve (5) for 1D DOA and (13) for the 2D DOA estimations in this section. According to [22], the DU approach can be seen as an iterative step based on the SR algorithm for designing the structure and parameters of the deep neural network. DU methods such as LISTA, LAMP, and LePOM [22,23,24] can all theoretically achieve the 2D DOA estimation. However, these methods are unsuitable for solving the AMM-DOA and DAMM-DOA models, i.e., (5) and (13). This paper analyzes the ANM-ADMM and the DANM-ADMM algorithms to address the above problem in Section 3.1 and Section 3.3, respectively. The algorithms are expanded into the deep neural network ANM-ADMM-Net and DANM-ADMM-Net to achieve fast and accurate 1D and 2D DOA estimation in Section 3.2 and Section 3.4, respectively.

3.1. 1D ANM-ADMM DOA

To facilitate the solution, (5) is rewritten as (15).

\begin{array}{l} [X, u] = \underset{X, W, u, Θ}{\arg \min} \frac{τ}{2} [Tr (W) + Tr (T (u))] + \frac{1}{2} | | Y - X | |_{F}^{2} \\ s . t . Θ = [\begin{matrix} T (u) & X \\ X^{H} & W \end{matrix}] \geq 0 \end{array}

(15)

where

τ \in ℝ

is the regularization factor.

The augmented Lagrangian function of the problem (15) can be defined via [41].

\begin{array}{l} \underset{X, W, u, Θ \geq 0, Λ}{\arg \min} \frac{τ}{2} [Tr (W) + Tr (T (u))] + \frac{1}{2} | | Y - X | |_{F}^{2} \\ + 〈 Λ, Θ - [\begin{matrix} T (u) & X \\ X^{H} & W \end{matrix}] 〉 + \frac{ρ}{2} {‖ Θ - [\begin{matrix} T (u) & X \\ X^{H} & W \end{matrix}] ‖}_{F}^{2} \\ = \underset{X, W, u, Θ \geq 0, Λ}{\arg \min} \frac{τ}{2} [Tr (W) + Tr (T (u))] + \frac{1}{2} | | Y - X | |_{2}^{2} + Δ \end{array}

(16)

where

Λ \in ℂ^{(M + L) \times (M + L)}

is the Lagrangian multiplier, and

ρ > 0

is the penalty factor.

Note (16) is an unconstrained optimization problem, so that given the received signal

Y

, unknown signal components

X

, and vector

u

can be estimated by alternatively minimizing the cost function of

X

,

W

,

u

,

Θ

and

Λ

[14]. The specific iterative process can be found in Appendix A.

In conclusion, the ADMM algorithm for solving (5) is provided in Algorithm 1.

Algorithm 1: ANM-ADMM-DOA algorithm.

Input:

Y

, iterations

K

, penalty factor

ρ

, regularization factor

τ

.
Initialization:

Θ^{(0)} = 0_{M + L}

and

Λ^{(0)} = 0_{M + L}

.
For

k = 0 : K - 1

do
(1)

X^{(k + 1)} = 1 / (1 + 2 ρ) (Y + 2 Λ_{X}^{(k)} + 2 ρ Θ_{X}^{(k)})

;
(2)

W^{(k + 1)} = ρ^{- 1} Λ_{W}^{(k)} + Θ_{W}^{(k)} - τ / (2 ρ) I_{L}

;
(3)

u^{(k + 1)} = Γ (T^{*} (ρ^{- 1} Λ_{T (u)}^{(k)} + Θ_{T (u)}^{(k)}) - τ / (2 ρ) M e_{1})

;
(4)

{\begin{cases} {\hat{Θ}}^{(k + 1)} = [\begin{matrix} T (u^{(k + 1)}) & X^{(k + 1)} \\ {(X^{(k + 1)})}^{H} & W^{(k + 1)} \end{matrix}] - ρ^{- 1} Λ^{(k)} \\ {\hat{Θ}}^{(k + 1)} = G diag ({δ_{g}}) G^{- 1}, Θ^{(k + 1)} = G diag ({δ_{g}}_{+}) G^{- 1} \end{cases}

;
(5)

Λ^{(k + 1)} = Λ^{(k)} + ρ (Θ^{(k + 1)} - [\begin{matrix} T (u^{(k + 1)}) & X^{(k + 1)} \\ {(X^{(k + 1)})}^{H} & W^{(k + 1)} \end{matrix}])

;
End
Output: Recovered signal of interest

X^{(K)}

, the optimal estimate

u^{(K)}

.
Then: Estimate

f_{k_{t}}

and

s_{k_{t}}

by Vandermonde decomposition of

T (u^{(K)} \sqrt{M} / 2)

and compute DOA by

θ_{k_{t}} = asind (2 f_{k_{t}})

.

According to the above derivation, the parameters of the model-driven ADMM algorithm, including the penalty factor

ρ

and the regularization factor

τ

, need to be set in advance, which is a challenge for practical applications. Meanwhile, inappropriate parameter settings will decrease the convergence speed and accuracy of the ADMM algorithm, thus adding to the computing complexity of (5) and degrading the DOA estimation performance. Even if the proper parameters can be chosen by theoretical analysis and cross-validation method [41], the fixed parameter settings fail to guarantee the optimal convergence of the ANM-ADMM algorithm. Based on the idea of the DU method, this paper expands the algorithm into a deep neural network ANM-ADMM-Net and learns its optimal parameters from the constructed data obeying a particular distribution, thus solving the above problems.

It is necessary to note that the optimal estimates

T (u^{(K)})

of the algorithm are the critical points of the following estimation. Therefore, we can consider the output as a label and construct a proper loss function in the next subsection.

3.2. 1D ANM-ADMM-Net DOA

For the optimization problem shown in (5), when the system parameters are given and the complex amplitude and noise all obey a particular distribution, the received data

Y

will also have a particular distribution. At this point, assume that an optimal set of parameter sequences exists so that (5) can be solved quickly and accurately by the ADMM algorithm for all received signals, with the DOA obeying a specific distribution. Therefore, this subsection constructs the network ANM-ADMM-Net to tackle the problems of the ADMM algorithm based on its iterative steps. This network couples the interpretability of the model-driven algorithm and the nonlinear fitting capability of the data-driven deep learning method. Training networks based on a sufficient and complete training data set can obtain optimal iterative parameters, thereby reducing the number of iterations and further enabling higher speed and ameliorated performance of DOA estimation. In the following, the four parts of ANM-ADMM-Net: network structures, dataset construction, network initialization, and training are described thoroughly.

3.2.1. Network Structure

According to the steps in Algorithm 1, the ANM-ADMM algorithm can be mapped to a

K

layer network ANM-ADMM-Net shown in Figure 2, whose inputs are

Y

,

Θ^{(0)} = 0_{M + L}

, and

Λ^{(0)} = 0_{M + L}

, and the learnable parameters are

Ω = {Ω^{(k + 1)}}_{k = 0}^{K - 1} = {ρ_{k + 1}, τ_{k + 1}, η_{k + 1}}_{k = 0}^{K - 1}

, to arrive at the signal components

X^{(K)}

and the Toeplitz matrix

T (u^{(K)})

. The

(k + 1)

-th layer operation of ANM-ADMM-Net can be expressed as

[X^{(k + 1)}, u^{(k + 1)}] = F_{k + 1} {Y, X^{(k)}, W^{(k)}, u^{(k)}, θ^{(k)}, Λ^{(k)}, Ω^{(k + 1)}}

(17)

where

F_{k + 1} {\cdot}

contains five main structural sub-layers, including reconstruction sub-layer

A^{(k + 1)}

, the auxiliary variable update sub-layer

B^{(k + 1)}

, the Toeplitz transform sub-layer

C^{(k + 1)}

, the nonlinear sub-layer

D^{(k + 1)}

, and the multiplier update sub-layer

E^{(k + 1)}

corresponding to (17). The specific description is as follows.

(1): Reconstruction sub-layer $A^{(k + 1)}$ : Taking the output $Λ_{x}^{(k)}$ of the sub-layer $E^{(k)}$ and $Θ_{x}^{(k)}$ of the sub-layer $D^{(k)}$ of $k$ -th layer and received signal $Y$ as the input, the output $X^{(k + 1)}$ is updated as

X^{(k + 1)} = 1 / (1 + 2 ρ_{k + 1}) (Y + 2 Λ_{X}^{(k)} + 2 ρ_{k + 1} Θ_{x}^{(k)})

(18)

where

ρ_{k + 1}

is the learnable parameter. The output

X^{(k + 1)}

of

A^{(k + 1)}

can be used as the input of the sub-layer

D^{(k + 1)}

and

E^{(k + 1)}

in the

(k + 1)

-th layer.

(2): Auxiliary variable update sub-layer $B^{(k + 1)}$ : Taking the output $Θ_{W}^{(k)}$ of the sub-layer $D^{(k + 1)}$ and $Λ_{W}^{(k)}$ of the sub-layer $E^{(k)}$ in the $k$ -th layer are used as inputs, the output of $B^{(k + 1)}$ is given by

W^{(k + 1)} = ρ_{k + 1}^{- 1} Λ_{W}^{(k)} + Θ_{W}^{(k)} - τ_{k + 1} / (2 ρ_{k + 1}) I_{L}

(19)

where

τ_{k + 1}

is the learnable parameter. The output

W^{(k + 1)}

is the input of the sub-layer

D^{(k + 1)}

and

E^{(k + 1)}

in the

(k + 1)

-th layer.

(3): Toeplitz transform sub-layer $C^{(k + 1)}$ : The output $Θ_{T (u)}^{(k)}$ of the sub-layer $D^{(k + 1)}$ and $Λ_{T (u)}^{(k)}$ of the sub-layer $E^{(k)}$ in the $k$ -th layer are used as inputs, and the output is represented by

u^{(k + 1)} = Γ (T^{*} (ρ_{k + 1}^{- 1} Λ_{T (u)}^{(k)} + Θ_{T (u)}^{(k)}) - τ_{k + 1} / (2 ρ_{k + 1}) M e_{1})

(20)

where

u^{(k + 1)}

can be the input of the sub-layer

D^{(k + 1)}

and

E^{(k + 1)}

in the

(k + 1)

-th layer.

(4): Nonlinear sub-layer $D^{(k + 1)}$ : Taking the output $Λ_{T (u)}^{(k)}$ of the sub-layer $E^{(k)}$ of the $k$ -th layer, the output $X^{(k + 1)}$ of the sub-layer $A^{(k)}$ , and the output $u^{(k + 1)}$ of the sub-layer $B^{(k + 1)}$ as the input, the output of $D^{(k + 1)}$ can be given as

{\begin{cases} {\hat{Θ}}^{(k + 1)} = [\begin{matrix} T (u^{(k + 1)}) & X^{(k + 1)} \\ {(X^{(k + 1)})}^{H} & W^{(k + 1)} \end{matrix}] - ρ_{k + 1}^{- 1} Λ^{(k)} \\ {\hat{Θ}}^{(k + 1)} = G diag ({δ_{g}}) G^{- 1}, Θ^{(k + 1)} = G diag ({δ_{g}}_{+}) G^{- 1} \end{cases}

(21)

where the output

Θ^{(k + 1)}

of

D^{(k + 1)}

can be used as the input of the sub-layer

A^{(k + 2)}

,

B^{(k + 2)}

, and

C^{(k + 2)}

in the

(K + 2)

-th layer.

(5): Multiplier update sub-layer $E^{(k + 1)}$ : Taking the output $Λ^{(k)}$ of the sub-layer $D^{(k)}$ of the $k$ -th layer, the output $X^{(k + 1)}$ , $W^{(k + 1)}$ , $u^{(k + 1)}$ and $Θ^{(k + 1)}$ of the sub-layer $A^{(k + 1)}$ , $B^{(k + 1)}$ , $C^{(k + 1)}$ and $D^{(k + 1)}$ respectively as input, the output of $E^{(k + 1)}$ can be updated by

Λ^{(k + 1)} = Λ^{(k)} + η_{k + 1} (Θ^{(k + 1)} - [\begin{matrix} T (u^{(k + 1)}) & X^{(k + 1)} \\ {(X^{(k + 1)})}^{H} & W^{(k + 1)} \end{matrix}])

(22)

where the multiplier update rate

η_{k + 1}

is the learnable parameter, and the output

Λ^{(k + 1)}

of

E^{(k + 1)}

can be used as the input in the

(k + 2)

-th layer. It is essential to emphasize that the new parameters

η_{k + 1}

are added to enhance the learning capability further and the performance of ANM-ADMM-Net compared to updating multipliers by

ρ

of ANM-ADMM (as shown in Algorithm 1).

Considering that each sub-layer’s parameters are learned and tuned, there will be

3 K

parameters for

K

layer ANM-ADMM-Net generally, i.e.,

{ρ_{1}, \dots, ρ_{k}}

,

{τ_{1}, \dots, τ_{k}}

,

{η_{1}, \dots, η_{k}}

. Compared with the ANM-ADMM algorithm, where the parameters

ρ_{1} = \dots = ρ_{k}

and

τ_{1} = \dots = τ_{k}

are fixed, this parameter learning strategy of ANM-ADMM-Net has the advantages of superior flexibility and brilliant nonlinear fitting capability [36]. More importantly, the design of the network is guided by the model and is highly interpretable.

3.2.2. Data Construction

The proposed ANM-ADMM-Net is a sparse recovery approach jointly driven by model and data. The key to its effectiveness is constructing a reasonable dataset with generalization capability. By building an adequate and complete dataset, ANM-ADMM-Net is less prone to overfitting during the training process, performing better DOA estimation. Thus, this paper randomly generates the signal data

Y

obeying a particular distribution and forms the received data

X

. Specifically:

(1): Given the array element number $M$ , the frequency range $(f_{\min}, f_{\max}]$ , corresponding angle range $(θ_{\min}, θ_{\max}] =$ $(asind (2 f_{\min}), asind (2 f_{\max})]$ , number of snapshots $L$ .
(2): Given the maximum value of the number of sources $K_{t}$ and randomly generate the sources number $k_{t}$ .
(3): For each $k_{t} (1, 2, \dots, K_{t})$ , its frequency $f_{1 : K_{t}}$ has a uniform distribution U ( $f_{\min}, f_{\max}$ ), and the frequency interval between any two sources in the case of multiple sources needs to satisfy $\min_{i \neq j} | f_{i} - f_{j} | > 1 / M$ [38], then the angle of the source can be obtained as $θ_{1 : K_{t}}$ . The received signal $Y$ is generated according to (1), where $X = A S$ , $A = [a (f_{1}), a (f_{2}), \dots, a (f_{K_{t}})]$ , the complex amplitude matrix $S = [s_{1}, s_{2}, \dots, s_{K_{t}}] \in ℂ^{K_{t} \times L}$ obeys the standard normal distribution and $N$ is Gaussian noise with SNR.
(4): Repeat the above to obtain the set of $D$ received signals ${Y_{d}, X_{d}}_{d = 1}^{D}$ , the set of the frequencies and the angles ${{f_{1 : K_{t}}, θ_{1 : K_{t}}}_{d}}_{d = 1}^{D}$ . The ideal label set vector ${u_{d}}_{d = 1}^{D}$ can be obtained according to Vandermonde decomposition after the dual atomic norm minimization method [34,37].
(5): Divide randomly the above set into a training dataset ${Y_{q}^{train}, u_{q}^{train}}_{q = 1}^{Q = 0.8 D}$ and a testing dataset ${Y_{o}^{test}, u_{o}^{test}}_{o = 1}^{O = 0.2 D}$ .

3.2.3. Network Initialization and Training

Consider the difficulty of mapping

T (\cdot)

in (5),

T^{*} (\cdot)

and eigen-decomposition in Algorithm 1 when training ANM-ADMM-Net. Proper initialization of the parameters

Ω = {ρ_{k + 1}, λ_{k + 1}, γ_{k + 1}}_{k = 0}^{K - 1}

and an appropriate training method, including loss function, optimizer, and learning schedules, will make it easier to reach convergence and avoid falling into a locally optimal solution to a certain extent.

(1): Network initialization

The initial values of the parameters in each layer are set as

ρ_{1 : K} = ρ_{0}

,

τ_{1 : K} = τ_{0}

, and

η_{1 : K} = η_{0}

to enhance the proposed method’s flexibility based on the theoretical analysis [41]. Compared with the ANM-ADMM algorithm with fixed parameter settings, ANM-ADMM-Net will substantially increase the convergence rate (i.e., reduce the number of iterations) and shorten the time to solve (5), with guaranteed convergence performance.

(2): Network training

The Adam algorithm is adopted for learning and tuning the parameters with an initial learning rate of 1 × 10⁻³ to achieve the possible global optimum rapidly. Based on the training dataset

{Y_{q}^{train}, u_{q}^{train}}_{q = 1}^{Q = 0.8 D}

constructed in Section 3.2.2 and given the network layer

K

, the optimal parameters

Ω^{*} = {ρ_{k + 1}^{*}, τ_{k + 1}^{*}, η_{k + 1}^{*}}_{k = 0}^{K - 1}

can be obtained by minimizing the following normalized mean square error (NMSE) loss function using the principle of Back Propagation (BP) [42], i.e.,

Ω^{*} = \arg \min (\sum_{q = 1}^{Q} L_{u}^{q}) / Q

(23)

where

L_{u}^{q} = | | u^{(K)} (Ω, Θ^{(0)}, Λ^{(0)}, Y_{q}^{train}) \sqrt{M} / 2 - u_{q}^{train} | |_{F}^{2} / | | u_{q}^{train} | |_{F}^{2}

,

u^{(K)} (Ω, Θ^{(0)}, Λ^{(0)}, Y_{q}^{train}) \sqrt{M} / 2

denotes the estimated components of the

K

-th Toeplitz transform sub-layer output of the network with parameters

Ω

,

Θ^{(0)} = 0_{M + L}

,

Λ^{(0)} = 0_{M + L}

and

Y_{q}^{train}

as inputs, respectively.

Then, according to the given testing dataset

{Y_{o}^{test}, u_{o}^{test}}_{o = 1}^{O = 0.2 D}

, after obtaining the optimal parameters

Ω^{*}

, the recovered signal

X_{o}^{test}

and the Toeplitz matrix

T (u_{o}^{test})

can be estimated online by

{\begin{cases} X_{o}^{test} = X^{(K)} {Ω^{*}, Θ^{(0)}, Λ^{(0)}, Y_{o}^{test}} \\ u_{o}^{test} = u^{(K)} {Ω^{*}, Θ^{(0)}, Λ^{(0)}, Y_{o}^{test}} \end{cases}

(24)

At the end of the ANM-ADMM-Net DOA method,

T (u_{o}^{test} \sqrt{M} / 2)

can be the input of the Vandermonde decomposition to obtain the frequency

f_{k_{t}}

and amplitude values

s_{k_{t}}

, thereby achieving the estimated DOA

θ_{k_{t}} = asind (2 f_{k_{t}})

.

3.3. 2D DANM-ADMM DOA

To facilitate the solution, (13) is rewritten as (25):

\begin{matrix} [T (u_{x}), T (u_{y})] = & \arg \min_{u_{x}, u_{y}, X} \frac{λ}{2 \sqrt{N M}} Tr (T (u_{x})) + Tr (T (u_{y})) \\ + \frac{1}{2} | | Y - X | |_{2}^{2} s . t . Θ = [\begin{matrix} T (u_{x}) & X_{}^{H} \\ X & T (u_{y}) \end{matrix}] \geq 0 \end{matrix}

(25)

where

λ \in ℝ

is the regularization factor.

Then, the augmented Lagrangian function of the problem (25) can be defined via [41]

\begin{array}{l} \underset{X, u_{x}, u_{y}, Θ, Λ}{\arg \min} \prod (X, u_{x}, u_{y}, Θ, Λ) \\ = \underset{X, u_{x}, u_{y}, Θ, Λ}{\arg \min} \frac{λ}{2 \sqrt{N M}} Tr (T (u_{x})) + Tr (T (u_{y})) + \frac{1}{2} | | Y - X | |_{F}^{2} \\ + 〈 Λ, Θ - [\begin{matrix} T (u_{X}) & X_{}^{H} \\ X & T (u_{y}) \end{matrix}] 〉 + \frac{ρ}{2} {‖ Θ - [\begin{matrix} T (u_{x}) & X_{}^{H} \\ X & T (u_{y}) \end{matrix}] ‖}_{F}^{2} \\ = \underset{X, u_{x}, u_{y}, Θ, Λ}{\arg \min} \frac{λ}{2 \sqrt{N M}} Tr (T (u_{x})) + Tr (T (u_{y})) + \frac{1}{2} | | Y - X | |_{F}^{2} + Δ \end{array}

(26)

where

Λ \in ℂ^{(N + M) \times (N + M)}

is the Lagrangian multiplier, and

ρ > 0

is the penalty factor. Note (26) is an unconstrained optimization problem, so that, given the received signal

Y

, unknown signal components

X

and Toeplitz matrix

T (u_{x})

and

T (u_{y})

can be estimated by minimizing the objective function of

X, u_{x}, u_{y}, Θ, Λ

alternatively.

By the same token as in Algorithm 1, the ADMM algorithm for solving (13) is provided in Algorithm 2.

Algorithm 2: DANM-ADMM DOA algorithm.

Input:

Y

, number of iterations

K

, penalty factor

ρ

, regularization factor

λ

.
Initialization:

Θ_{}^{(0)} = 0_{M + N}

and

λ_{}^{(0)} = 0_{M + N}

.
For

k = 0 : K - 1

do
(1)

X_{}^{(k + 1)} = 1 / (1 + ρ) (Y + Λ_{X}^{(k)} + ρ Θ_{X}^{(k)})

;
(2)

{\begin{cases} {(T (u_{x}))}^{+} = ρ^{- 1} Λ_{T (u_{x})}^{(k)} + Θ_{T (u_{x})}^{(k)} - λ / (2 \sqrt{N M}) N I_{N} \\ u_{x}^{(k + 1)} = Γ (T^{*} (Λ_{T (u_{x})}^{(k)} + ρ Θ_{T (u_{x})}^{(k)}) - λ / (2 \sqrt{N M}) N e_{1}) / ρ \end{cases}

;
(3)

{\begin{cases} {(T (u_{y}))}^{+} = ρ^{- 1} Λ_{T (u_{y})}^{(k)} + Θ_{T (u_{y})}^{(k)} - λ / (2 \sqrt{N M}) M I_{M} \\ u_{y}^{(k + 1)} = Γ (T^{*} (Λ_{T (u_{y})}^{(k)} + ρ Θ_{T (u_{y})}^{(k)}) - λ / (2 \sqrt{N M}) M e_{1}) / ρ \end{cases}

;
(4)

{\begin{cases} {\hat{Θ}}^{(k + 1)} = [\begin{matrix} T (u_{x}^{(k + 1)}) & {(X^{(k + 1)})}^{H} \\ X^{(k + 1)} & T (u_{y}^{(k + 1)}) \end{matrix}] - ρ^{- 1} Λ^{(k)} \\ {\hat{Θ}}^{(k + 1)} = P diag ({δ_{g}}) P^{- 1}, Θ^{(k + 1)} = P diag ({δ_{g}}_{+}) P^{- 1} \end{cases}

;
(5)

Λ^{(k + 1)} = Λ^{(k)} + ρ (Θ^{(k + 1)} - [\begin{matrix} T (u_{x}^{(k + 1)}) & {(X_{}^{(k + 1)})}^{H} \\ X_{}^{(k + 1)} & T (u_{y}^{(k + 1)}) \end{matrix}])

;
End
Output: Recovered signal of interest

X_{}^{(K)}

, the optimal estimate

u_{x}^{(K)}

and

u_{y}^{(K)}

.
Then: Compute matrix

T (u_{x}^{(K)})

and

T (u_{y}^{(K)})

; Retrieve two dimensional frequencies

({\hat{w}}_{x}, {\hat{w}}_{y})

without pairing by root music algorithm or matrix pencil method [38]. Then, final 2D DOA estimation can be obtained by following pairing technique.

j_{i} = \underset{j}{\arg \max} | a_{x}^{T} ({\hat{w}}_{x, i}) X_{}^{(K)} a_{y}^{*} ({\hat{w}}_{y, j}) |

, where

j_{i}

denotes the frequency index of

{\hat{w}}_{y}

that matched to

{\hat{w}}_{x, i}

, i.e.,

{\hat{w}}_{y}^{*}

.
Finally: Obtain the pared

({\hat{w}}_{x}, {\hat{w}}_{y}^{*})

and compute the corresponding angles

({\hat{α}}_{x}, {\hat{β}}_{y}^{*})

, final pitch angle

ϕ_{k_{t}}

and azimuth angle

θ_{k_{t}}

can be calculated by (6) and (7).

where

Θ

are

Λ

Hermitian matrices, and

Θ_{X}

,

Λ_{X} \in ℂ^{N \times M}

,

Θ_{T (u_{x})} \in ℂ^{N \times N}

,

Λ_{T (u_{y})} \in ℂ^{M \times M}

,

Θ = [\begin{matrix} Θ_{T (u_{x})} & Θ_{X}^{H} \\ Θ_{X} & Θ_{T (u_{y})} \end{matrix}]

,

Λ = [\begin{matrix} Λ_{T (u_{x})} & Λ_{X}^{H} \\ Λ_{X} & Λ_{T (u_{y})} \end{matrix}] \in ℂ^{(N + M) \times (N + M)}

;

g = 1, 2 \dots, M + N

,

{δ_{g}}_{+}

denotes that all elements smaller than zero are set to zero; and

P

is an orthogonal matrix satisfying

P P^{T} = I_{M + N}

.

According to the above derivation, the parameters of the model-driven ADMM algorithm, including the penalty factor

ρ

and the regularization factor

λ

, need to be set in advance, which is a challenge for practical applications. Based on the idea of the DU method, this paper also expands the algorithm into a deep neural network DANM-ADMM-Net as Section 3.2, thus solving the above problems. The optimal estimates

T (u_{x}^{(K)})

and

T (u_{y}^{(K)})

can be considered as labels and construct a proper loss function in the next subsection.

3.4. 2D DANM-ADMM-Net DOA

In the following, the four parts of DANM-ADMM-Net: network structures, dataset construction, network initialization, and training, are described thoroughly.

3.4.1. Network Structure

According to the steps in Algorithm 2, the ADMM algorithm can be mapped to a

K

layer network DANM-ADMM-Net, shown in Figure 3, whose inputs are

Y

,

Θ^{(0)} = 0_{M + N}

and

Λ^{(0)} = 0_{M + N}

, and the learnable parameters are

Ω = {Ω^{(k + 1)}}_{k = 0}^{K - 1} = {ρ_{k + 1}, λ_{k + 1}, γ_{k + 1}}_{k = 0}^{K - 1}

, to arrive at the signal components

X_{}^{(K)}

and two Toeplitz matrixes

T (u_{x}^{(K)})

,

T (u_{y}^{(K)})

. The

(k + 1)

-th layer operation of DANM-ADMM-Net can be expressed as

[X_{}^{(k + 1)}, u_{x}^{(k + 1)}, u_{y}^{(k + 1)}] = F_{k + 1} {Y, Θ^{(k)}, Λ^{(k)}, Ω^{(k + 1)}}

(27)

where

F_{k + 1} {\cdot}

contains five main structure sub-layers, including reconstruction sub-layer

A^{(k + 1)}

, the Toeplitz transform sub-layer

B^{(k + 1)}

, the Toeplitz transform sub-layer

C^{(k + 1)}

, the nonlinear sub-layer

D^{(k + 1)}

, and the multiplier update sub-layer

E^{(k + 1)}

. The specific description is as follows.

(1): Reconstruction sub-layer $A^{(k + 1)}$ : Taking the output $Λ_{X}^{(k)}$ of the sub-layer $E^{(k)}$ and $Θ_{x}^{(k)}$ of the sub-layer $D^{(k)}$ of $k$ -th layer and received signal $Y$ as the inputs, the output $X_{}^{(k + 1)}$ is updated as

X_{}^{(k + 1)} = 1 / (1 + ρ_{k + 1}) (Y + Λ_{X}^{(k)} + ρ_{k + 1} Θ_{x}^{(k)})

(28)

where

ρ_{k + 1}

is the learnable parameter. The output

X_{}^{(k + 1)}

of

A^{(k + 1)}

can be used as the input of the sub-layer

D^{(k + 1)}

and

E^{(k + 1)}

in the

(k + 1)

-th layer.

(2): Toeplitz transform sub-layer $B^{(k + 1)}$ : Taking the output $Θ_{T (u_{x})}^{(k)}$ of the sub-layer $D^{(k + 1)}$ and $Λ_{T (u_{x})}^{(k)}$ of the sub-layer $E^{(k)}$ in the $k$ -th layer as inputs, the output of $B^{(k + 1)}$ is represented by

u_{x}^{(k + 1)} = Γ (T^{*} (Λ_{T (u_{x})}^{(k)} + ρ_{k + 1} Θ_{T (u_{x})}^{(k)}) - λ_{k + 1} / (2 \sqrt{N M}) N e_{1}) / ρ_{k + 1}

(29)

where

λ_{k + 1}

is the learnable parameter. The output

u_{x}^{(k + 1)}

is the input of the sub-layer

D^{(k + 1)}

and

E^{(k + 1)}

in the

(k + 1)

-th layer.

(3): Toeplitz transform sub-layer $C^{(k + 1)}$ : Taking the output $Θ_{T (u_{y})}^{(k)}$ of the sub-layer $D^{(k + 1)}$ and $Λ_{T (u_{y})}^{(k)}$ of the sub-layer $E^{(k)}$ in the $k$ -th layer as inputs, the output of $C^{(k + 1)}$ can be expressed as

u_{y}^{(k + 1)} = Γ (T^{*} (Λ_{T (u_{y})}^{(k)} + ρ_{k + 1} Θ_{T (u_{y})}^{(k)}) - λ_{k + 1} / (2 \sqrt{N M}) M e_{1}) / ρ_{k + 1}

(30)

where the output

u_{y}^{(k + 1)}

can be used as the input of the sub-layer

D^{(k + 1)}

and

E^{(k + 1)}

in the

(k + 1)

-th layer.

(4): Nonlinear sub-layer $D^{(k + 1)}$ : Taking the output $Λ_{T (u_{x})}^{(k)}$ of the sub-layer $E^{(k)}$ of the $k$ -th layer, the output $X^{(k + 1)}$ of the sub-layer $A^{(k)}$ , the output $u_{x}^{(k + 1)}$ of the sub-layer $B^{(k + 1)}$ and the output $u_{y}^{(k + 1)}$ of the sub-layer $C^{(k + 1)}$ as inputs, the output of $D^{(k + 1)}$ can be given as

{\begin{cases} {\hat{Θ}}^{(k + 1)} = [\begin{matrix} T (u_{x}^{(k + 1)}) & {(X_{}^{(k + 1)})}^{H} \\ X_{}^{(k + 1)} & T (u_{y}^{(k + 1)}) \end{matrix}] - ρ_{k + 1}^{- 1} Λ^{(k)} \\ {\hat{Θ}}^{(k + 1)} = P diag ({δ_{g}}) P^{- 1}, Θ^{(k + 1)} = P diag ({δ_{g}}_{+}) P^{- 1} \end{cases}

(31)

where the output

Θ^{(k + 1)}

of

D^{(k + 1)}

can be used as the input of the sub-layer

A^{(k + 2)}

,

B^{(k + 2)}

, and

C^{(k + 2)}

in the

(K + 2)

-th layer.

(5): Multiplier update sub-layer $E^{(k + 1)}$ : Taking the output $Λ^{(k)}$ of the sub-layer $E^{(k)}$ of the $k$ -th layer, the output $X_{}^{(k + 1)}$ , $u_{x}^{(k + 1)}$ , $u_{y}^{(k + 1)}$ , $Θ^{(k + 1)}$ of the sub-layer $A^{(k + 1)}$ , $B^{(k + 1)}$ , $C^{(k + 1)}$ , $D^{(k + 1)}$ respectively, as inputs, the output of $E^{(k + 1)}$ can be updated by

Λ^{(k + 1)} = Λ^{(k)} + γ_{k + 1} (Θ^{(k + 1)} - [\begin{matrix} T (u_{x}^{(k + 1)}) & {(X_{}^{(k + 1)})}^{H} \\ X_{}^{(k + 1)} & T (u_{y}^{(k + 1)}) \end{matrix}])

(32)

where multiplier update rate

γ_{k + 1}

is the learnable parameter, the output

Λ^{(k + 1)}

of

E^{(k + 1)}

can be used as the input in the

(k + 2)

-th layer. The new parameters

γ_{k + 1}

are also added to enhance the learning capability further and the performance of DANM-ADMM-Net compared to updating multipliers by

ρ

of DANM-ADMM (as shown in Algorithm 2).

Considering that each sub-layer’s parameters are learned and tuned, there will be

3 K

parameters for

K

layer DANM-ADMM-Net generally, i.e.,

{ρ_{1}, \dots, ρ_{k}}

,

{λ_{1}, \dots, λ_{k}}

,

{γ_{1}, \dots, γ_{k}}

. The parameters

ρ_{1} = \dots = ρ_{k}

and

λ_{1} = \dots = λ_{k}

of the DANM-ADMM algorithm are fixed.

3.4.2. Data Construction

This subsection randomly generates the received data

Y^{k_{t}}

and constructs the dataset. Specifically:

(1): Given the array element number $M$ , the pulse number $N$ , pitch angle $ϕ_{k} \in [ϕ_{\min}, ϕ_{\max}]$ , azimuth angle $θ_{k} \in [θ_{\min}, θ_{\max}]$ , and number of samples $D$ .
(2): Given the maximum value of the number of sources $K_{t}$ and randomly generate the sources number $k_{t}$ .
(3): For each $k_{t} (1, 2, \dots, K_{t})$ , compute $w_{x, k_{t}} = π \sin ϕ_{k_{t}} \cos θ_{k_{t}}$ and $w_{y, k_{t}} = π \sin ϕ_{k_{t}} \sin θ_{k_{t}}$ satisfying $\min_{i \neq j} | w_{x, i} - w_{x, j} | \geq 1 / ⌊ 4 (N - 1) ⌋$ and $\min_{i \neq j} | w_{y, i} - w_{y, j} | \geq 1 / ⌊ 4 (M - 1) ⌋$ [19]. Then received signal ${{Y_{p}^{k_{t}}, X_{p}^{k_{t}}}_{p = 1}^{P}}_{k_{t} = 1}^{K_{t}} = {Y^{d}, X^{d}}_{d = 1}^{D = K_{t} * P}$ is produced by $Y_{p}^{k_{t}} = X_{p}^{k_{t}} + N_{p}$ , where $X_{p}^{k_{t}} = \sum_{k_{t}}^{} s_{k_{t}} a_{x} (w_{x, k_{t}}) a_{y}^{T} (w_{y, k_{t}}) = A_{x} S A_{y}^{T}$ , and amplitude $s_{k_{t}}$ obeys complex standard normal distribution. $N_{p}$ is the complex Gaussian white noise with SNR.
(4): Randomly divide the received signal data ${Y^{d}, X^{d}}_{d = 1}^{D}$ into $Q$ training data ${Y_{q}^{train}, X_{q}^{train}}_{q = 1}^{Q = 0.8 D}$ and $O (D - Q)$ testing data ${Y_{o}^{test}, X_{o}^{test}}_{o = 1}^{O = 0.2 D}$ .
(5): Use the DANM-CVX method to address (13), thereby obtaining the training label set ${X_{q}^{train}, T (u_{x, q}^{train}), T (u_{y, q}^{train})}_{q = 1}^{Q}$ , the testing label set ${X_{o}^{test}, T (u_{x, o}^{test}), T (u_{x, o}^{test})}_{o = 1}^{O}$ according to the pairing technique in Algorithm 2.

3.4.3. Network Initialization and Training

The initial values of the parameters in each layer are set as

ρ_{1 : K} = ρ_{0}

,

λ_{1 : K} = λ_{0}

, and

γ_{1 : K} = γ_{0}

. The Adam algorithm is adopted for learning and tuning the parameters with an initial learning rate of 2 × 10⁻³ to achieve the possible global optimum rapidly. Based on the training dataset

{X_{q}^{train}, T (u_{x, q}^{train}), T (u_{y, q}^{train})}_{q = 1}^{Q}

constructed in Section 3.4.2 and given the number of network layers

K

, the optimal parameters

Ω^{*} = {ρ_{k + 1}^{*}, λ_{k + 1}^{*}, γ_{k + 1}^{*}}_{k = 0}^{K - 1}

can be obtained by minimizing the following NMSE loss function, i.e.,

Ω^{*} = \arg \min \frac{1}{Q} \sum_{q = 1}^{Q} 0.5 L_{T (u_{x})}^{q} + 0.5 L_{T (u_{y})}^{q}

(33)

where

{\begin{cases} L_{T (u_{x})}^{q} = | | T {(u_{x})}^{(K)} (Ω, Θ^{(0)}, Λ^{(0)}, X_{q}^{train}) - T {(u_{x})}_{q}^{train} | |_{2}^{2} / | | T {(u_{x})}_{q}^{train} | |_{F}^{2} \\ L_{T (u_{y})}^{q} = | | T {(u_{y})}^{(K)} (Ω, Θ^{(0)}, Λ^{(0)}, X_{q}^{train}) - T {(u_{y})}_{q}^{train} | |_{F}^{2} / | | T {(u_{y})}_{q}^{train} | |_{F}^{2} \end{cases}

,

T {(u_{x})}^{(K)} (Ω, Θ^{(0)}, Λ^{(0)}, X_{q}^{train})

, and

T {(u_{y})}^{(K)} (Ω, Θ^{(0)}, Λ^{(0)}, X_{q}^{train})

denotes the estimated components of the

K

-th Toeplitz transform sub-layer output of the network with parameters

Ω

,

Θ^{(0)} = 0_{M + N}

,

Λ^{(0)} = 0_{M + N}

and

X_{q}^{train}

as inputs, respectively.

Then, according to the given testing dataset

{X_{o}^{test}, T (u_{x, o}^{test}), T (u_{y, o}^{test})}_{o = 1}^{O}

, after obtaining the optimal parameters

Ω^{*}

, the recovered signal

X_{o}^{test}

and the Toeplitz matrix

T (u_{x, o}^{test})

,

T (u_{y, o}^{test})

can be estimated online by

{\begin{cases} X_{o}^{test} = X^{(K)} {Ω^{*}, X_{o}^{test}, Θ^{(0)}, Λ^{(0)}} \\ T {(u_{x, o}^{test})}^{(K)} = T {(u_{x})}^{(K)} {Ω^{*}, X_{o}^{test}, Θ^{(0)}, Λ^{(0)}} \\ T {(u_{y, o}^{test})}^{(K)} = T {(u_{y})}^{(K)} {Ω^{*}, X_{o}^{test}, Θ^{(0)}, Λ^{(0)}} \end{cases}

(34)

Follow the procedures of the DANM-ADMM algorithm in Algorithm 2 and thereby achieve 2D DOA estimation perfectly at the end of the DANM-ADMM-Net method.

4. Experiment Results

In this section, we evaluate the DU-DOA method based on the ANM-ADMM-Net and DANM-ADMM-Net through simulation experiments. Considering that the ADMM algorithm is the only iterative algorithm for solving the ANM and DANM models, we compare it to traditional 1D and 2D DOA estimation methods with fixed parameters. For the convenience of training and testing, all offline training procedures are implemented based on Python 3.8 with the configuration of Intel(R) Core i7-6246 3.30GHz CPU and NVIDIA Quadro GV100 GPU. Once the optimal parameters after training obtaining, all testing simulations will be implemented based on MATLAB 2020b online.

Since the noise level is generally unknown in the practical application, only the training data without noise are used in the training process of the ADMM network in this paper. Then noise is added to the data in the test data to verify the ADMM algorithm’s performance under different SNRs. Therefore, the training and testing datasets for DOA estimation can be constructed with parameters in Table 1 and Table 2 according to Section 3.2.2 and Section 3.4.2, respectively.

4.1. Network Convergence Analysis

This subsection investigates the convergence performance of the ANM-ADMM-Net and DANM-ADMM-Net under different layers and compares it with the traditional algorithms with fixed iterative parameters.

The iteration parameters of ANM-ADMM are set as

ρ_{0} = 0.5

,

τ_{0} = 0.01

,

η_{0} = 0.5

. Set different network layers

K = 10 ~ 40

, initialize and train network (450 epochs), and the results are shown in Figure 4. Among them, Figure 4a,b shows training NMSEs and testing NMSEs when

K = 40

, and Figure 4c shows the NMSEs of two methods. Based on Figure 4a,b, the training and testing NMSEs of the proposed method decrease with the increase in training time and effectively reach convergence. Figure 4c shows that, as the number of network layers (the number of iterations) increases, the NMSEs of both ANM-ADMM-Net and ANM-ADMM algorithms gradually decrease, and the former is much smaller than the latter. In addition, only when the number of iterations of the ANM-ADMM algorithm is at least fifty to sixty times higher than the ANM-ADMM-Net, their NMSE is equal, which implies the computing complexity required for convergence is reduced.

The iteration parameters of DANM-ADMM algorithm are set as

ρ_{0} = 0.1

,

λ_{0} = 0.1

, and

γ_{0} = 0.1

. Set different network layers

K = 5 ~ 25

, initialize and train DANM-ADMM-Net (600 epochs), and the results are shown in Figure 5. Among them, Figure 5a,b is the training NMSEs and testing NMSEs of DANM-ADMM-Net when

K = 5 / 10

, and Figure 5c shows the NMSEs of both when the number of network layers (iterations)

K = 5 ~ 25

. Figure 5d reveals the NMSEs of algorithm when iterations

K = 5 ~ 300

. Based on Figure 5a,b, the training and testing NMSEs of DANM-ADMM-Net decrease with the increase in training time and effectively reach convergence after 500 training epochs. In addition, Figure 5c shows that as the number of network layers (the number of iterations) increases, the NMSEs of both the DANM-ADMM-Net and DANM-ADMM algorithm decrease. However, the NMSE of the former is much smaller than that of the latter. From Figure 5c,d, the two have similar NMSE only when the number of iterations of the algorithm is fifty times higher than that of network when

K = 10

. Therefore, it can be concluded that the proposed can learn the optimal iteration parameters from the constructed dataset and obtain better convergence performance.

In practical applications, the networks can be trained offline based on different simulation conditions to determine the range of network layers that can obtain better DOA estimation performance and lower computational complexity. Then, the network layers can be selected according to the actual situation.

4.2. DOA Estimation Results Analysis

4.2.1. 1D DOA Estimation Results Analysis

This subsection investigates the performance of ANM-ADMM-Net for DOA estimation and its upsides over the SBL algorithm, dual ANM, ANM-CVX and ANM-ADMM methods with simulation experiments. The root mean square error (RMSE) is evaluated by

M_{c} = 100

Monte Carlo trials.

RMSE = \frac{1}{O} \sum_{o = 1}^{O} \sqrt{\frac{1}{K_{t} M_{c}} \sum_{j = 1}^{M_{c}} \sum_{k_{t} = 1}^{K_{t}} {({\hat{θ}}_{o, k_{t}} - θ_{o, k_{t}})}^{2}}

(35)

The results of different methods are shown in Figure 6, where grid numbers

N = 100

,

K = 200

and

λ = 1 e - 3

for the SBL algorithm [8];

K = 30

,

L = 5

for ANM-ADMM and ANM-ADMM-Net. The results indicate that unsuitable grid division inevitably causes the estimated DOA offset of the SBL algorithm. dual ANM and ANM-CVX methods are always the optimal estimation results. ANM-ADMM cannot recover the amplitude and degree of the signal perfectly due to limited iterations. However, the proposed ANM-ADMM-Net has a better estimation of amplitude and degree than the ANM-ADMM algorithm with fixed parameters, demonstrating the effectiveness of the combination of model-driven and data-driven DU methods.

Figure 7 shows the test RMSEs when the number of network layers

K = 10 ~ 40

. The RMSEs of ANM-ADMM and ANM-ADMM-Net gradually decrease as the number of network layers increases (number of iterations), and the latter is much smaller than the former with fixed parameters. The above results demonstrate that the more layers of the network, the more optimal iterative parameters can be learned from the constructed dataset due to its powerful nonlinear fitting capability and superior flexibility, resulting in better DOA estimation performance. In other words, the DU-gridless DOA method based on ANM-ADMM-Net is suitable for efficiently resolving the different sparsity problems (different target signal problems) at a lower computational cost.

The testing dataset when SNR = 0~60dB is constructed to verify the noise robustness according to Section 3.2.2 (i.e., generate testing data for every SNR and noise regularization factor of dual ANM is 0.1), and the test results when

K = 40

are shown in Figure 7b. It can be seen that when

SNR < 20 dB

, ANM-ADMM is unable to be implemented for estimation. However, the proposed method still performs better under limited network layers/iterations, demonstrating the higher noise robustness of the latter with the optimal parameters. In addition, when

SNR > 40 dB

, the test RMSE of ANM-ADMM-Net tends to be stable and close to the results obtained in the noise-free case. The above implies that even if noise-free data train the network, the proposed method still obtains better parameter results and can perform DOA estimation for the actual array received data containing noise.

4.2.2. 2D DOA Estimation Results Analysis

This subsection investigates the performance of DANM-ADMM-Net for 2D DOA estimation and its upsides over the conventional 2D DOA methods with simulation experiments. The root mean square error (RMSE) is also evaluated by

M_{c} = 100

Monte Carlo trials.

RMSE = \frac{1}{O} \sum_{o = 1}^{O} \sqrt{\frac{1}{K_{t} M_{c}} \sum_{j = 1}^{M_{c}} \sum_{k_{t} = 1}^{K_{t}} {({\hat{ϕ}}_{o, k_{t}} - ϕ_{o, k_{t}})}^{2} + {({\hat{θ}}_{o, k_{t}} - θ_{o, k_{t}})}^{2}}

(36)

The 2D DOA results estimated by different methods when

M = 20

,

K = 20

are shown in Figure 8, where the parameters of the 2D-MUSIC method are set to

SNR = 20 dB

, and the number of snapshots

L = 20

. The results in Figure 8b show that the DANM-ADMM algorithm fails to precisely converge to the ground truth precisely at a finite number of iterations due to improper parameter settings, resulting in some deviations in both azimuth and pitch angles estimated for individual targets. In contrast, the parameters of the proposed method are optimized for data and networks, thus converging to a better result completely at a finite number of layers.

Figure 9a gives the results of the test RMSEs when the number of array elements

M = 20

and the number of network layers

K = 5 ~ 25

. The RMSEs of DANM-ADMM and DANM-ADMM-Net gradually decrease as the number of network layers increases (number of iterations). When

K = 25

, the RMSE of the latter is reduced by 20dB compared to the former with fixed parameters, which indicates that the more layers of the network, the more optimal iterative parameters can be learned from the constructed dataset, resulting in better 2D DOA estimation performance.

The test dataset is constructed by Section 3.4.3 when SNR = 0~60dB (i.e., test data are generated for each SNR) to verify the noise robustness of DANM-ADMM-Net. The results when

K = 25

are shown in Figure 9b. The tested RMSE of both methods decreased with increased SNR under a limited number of network layers/iterations. If we train the network with data when noise exists, for example, SNR = 10 dB, DANM-ADMM-Net performs better than DANM-ADMM under any circumstances. Therefore, to ensure higher noise robustness, one can estimate the signal-to-noise ratio first, and then use the corresponding data containing the noise to train the network.

4.3. Computational Complexity and Running Time Comparsion

This subsection analyzes the computational complexity and running time of SDPT3 based ANM-CVX method and ANM-ADMM-Net at different array elements when network layers

K = 40

. The running time results when

M = 10 ~ 50

,

L = 5

and

M = 10

,

L = 1 ~ 15

are shown in Figure 10. The results of the comparison of DANM-CVX and DANM-ADMM-Net when network layers

K = 25

are also shown in Figure 11.

From the paper [43], it is known that the computational complexity of the solver SDPT3-based CVX method is

q_{1}^{2} q_{2}^{2}

, where

q_{1}

denotes the number of variables and

q_{2}

denotes the dimensionality of the SDP matrix. Therefore, the computational complexity of the CVX method for solving the ANM model is

O {Q {(M + L)}^{2} {(M L + M + L)}^{2}}

and

O {Q {(M + N)}^{2} {(M N + M + N)}^{2}}

for solving the DANM model. In contrast, while that of the ADMM method is

O {Q {(M + L)}^{3}}

and

O {Q {(M + N)}^{3}}

, respectively, where

Q

denotes the number of iterations, which indicates that when the matrix size is large, the computational complexity of the ANM-ADMM and DANM-ADMM methods to solve the ANM model and DANM model, respectively, is much smaller than that of the ANM-CVX and DANM-CVX methods, respectively.

It is important to emphasize that the computational complexity analysis of ANM/DANM-ADMM-Net in this paper does not include the computational costs required for network training, since it is possible to train offline and apply online. Moreover, after training to obtain the optimal parameters, the computational complexity of networks and its ADMM algorithms are identical and differ in the iterative parameters. Therefore, networks and ANM/DANM-ADMM algorithms will have the same computational complexity when applied with the same network layers (number of iterations). However, considering the reduced iterations requirements of networks, the computational complexity is further decreased compared to algorithms.

The results in Figure 10 show that the growth rate of the proposed method’s running time is much lower than that of the ANM-CVX and dual ANM methods, as the number of array elements or snapshots increases. In conclusion, ANM-ADMM-Net can be an alternative in the presence of larger-scale arrays or snapshots and high real-time requirements, and it is consistent with the above theoretical analysis. The same conclusion also can be drawn for DANM-CVX and DANM-ADMM-Net in Figure 11.

5. Conclusions

In this paper, a DU-based gridless DOA method is proposed. ANM-ADMM and DANM-ADMM algorithms are examined in the continuous domain, and the deep neural networks ANM-ADMM-Net and DANM-ADMM-Net are constructed for their problems. Their network structure, data set construction method, network initialization, and training method are introduced. Then, their performance is validated by simulation experiments. The results show that: compared with the existing methods, model-plus-data-driven ANM-ADMM-Net and DNM-ADMM-Net can learn the optimal iteration parameters from the data and quickly obtain more accurate 1D and 2D DOA estimation. The required computational complexity is reduced by 50~60 times and 20 times, respectively.

Author Contributions

Conceptualization, H.Z., W.F. and C.F.; methodology, H.Z., C.F. and W.F.; software, H.Z., B.Z. and T.M.; validation, H.Z., B.Z. and T.M.; formal analysis, W.F., C.F.; investigation, H.Z., W.F.; resources, W.F., H.Z.; data curation, H.Z., B.Z.; writing—original draft preparation, H.Z.; writing—review and editing, W.F., C.F.; visualization, H.Z., B.Z. and T.M.; supervision, T.M., H.Z.; project administration, W.F.; funding acquisition, W.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by: ➀ National Natural Science Foundation of China, No. 62001507; ➁ Young Talent fund of University Association for Science and Technology in Shaanxi, China, No. 20210106.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The following steps are elucidated to prove the iterative process of Algorithm 1. The five minimization subproblems of (16) can be given as

{\begin{cases} (1) : \underset{X}{\arg \min} \frac{1}{2} | | Y - X | |_{F}^{2} + Δ \\ (2) : \underset{W}{\arg \min} \frac{τ}{2} T (W) + Δ \\ (3) : \underset{u}{\arg \min} \frac{τ}{2} T (u) + Δ \\ (4) : \underset{Θ \geq 0}{\arg \min} Δ \\ (5) : \underset{Λ}{\arg \min} 〈 Λ, Θ - [\begin{matrix} T (u^{(k + 1)}) & X^{(k + 1)} \\ {(X^{(k + 1)})}^{H} & W^{(k + 1)} \end{matrix}] 〉 \end{cases}

(A1)

where

Θ

and

Λ

are Hermitian matrices, and

Θ_{W}

,

Λ_{W} \in ℂ^{L \times L}

,

Θ_{T (u)} \in ℂ^{M \times M}

,

Λ_{T (u)} \in ℂ^{M \times M}

, and

Θ = [\begin{matrix} Θ_{T (u)} & Θ_{X} \\ {(Θ_{X})}^{H} & Θ_{W} \end{matrix}]

,

Λ = [\begin{matrix} Λ_{T (u)} & Λ_{X} \\ {(Λ_{X})}^{H} & Λ_{W} \end{matrix}] \in ℂ^{(M + L) \times (M + L)}

.

Let the gradient of the objective function of the above five subproblems be equal to zero, then the

(k + 1)

-th iteration of the ADMM yields

{\begin{cases} A^{(k + 1)} : X^{(k + 1)} = 1 / (1 + 2 ρ) (Y + 2 Λ_{X}^{(k)} + 2 ρ Θ_{X}^{(k)}) \\ B^{(k + 1)} : W^{(k + 1)} = ρ^{- 1} Λ_{W}^{(k)} + Θ_{W}^{(k)} - τ / (2 ρ) I_{L} \\ C^{(k + 1)} : u^{(k + 1)} = Γ (T^{*} (ρ^{- 1} Λ_{T (u)}^{(k)} + Θ_{T (u)}^{(k)}) - τ / (2 ρ) M e_{1}) \\ D^{(k + 1)} : Θ^{(k + 1)} = G diag ({δ_{g}}_{+}) G^{- 1} \\ E^{(k + 1)} : Λ^{(k + 1)} = Λ^{(k)} + ρ (Θ^{(k + 1)} - [\begin{matrix} T (u^{(k + 1)}) & X^{(k + 1)} \\ {(X^{(k + 1)})}^{H} & W^{(k + 1)} \end{matrix}]) \end{cases}

(A2)

where

g = 1, 2 \dots, M + N

,

{δ_{g}}_{+}

denotes that all elements smaller than zero are set to zero, and

G

is an orthogonal matrix satisfying

G G^{T} = I_{M + L}

.

For subproblem

C^{(k + 1)}

, its objective function can be derived from

{(T (u))}^{+} = ρ^{- 1} Λ_{T (u)}^{(k)} + Θ_{T (u)}^{(k)} - τ / (2 ρ) I_{M}

(A3)

In truth, the resulting

{(T (u))}^{+}

is a non-Toeplitz matrix at the beginning of the algorithm. Therefore, according to the Toeplitz property of

T (u)

, first, transform

{(T (u))}^{+}

into a vector

u_{}^{(k + 1)}

, and then reassemble it to reach the accurate Toeplitz matrix

T (u_{}^{(k + 1)})

by the mapping

T (\cdot)

in (5) for solving the latter function, i.e.,

u^{(k + 1)} = Γ (T^{*} (ρ^{- 1} Λ_{T (u)}^{(k)} + Θ_{T (u)}^{(k)}) - τ / (2 ρ) M e_{1})

(A4)

where

a = T^{*} (A)

denotes a mapping from a matrix

A \in ℂ^{M \times M}

to a vector

a \in ℂ^{M \times 1}

and satisfying

a_{i} = sum (A_{p, q} | q - p + 1 = i)

,

Γ = diag (\frac{1}{M}, \frac{1}{M - 1}, \dots, \frac{1}{M - (M - 1)})

.

e_{1} \in ℂ^{M \times 1}

is the first column of the identity matrix.

For subproblem

D^{(k + 1)}

, its original objective function can be merged as

\underset{Θ \geq 0}{\arg \min} \frac{ρ}{2} {‖ Θ^{(k + 1)} - [\begin{matrix} T (u^{(k + 1)}) & X^{(k + 1)} \\ {(X^{(k + 1)})}^{H} & W^{(k + 1)} \end{matrix}] + ρ^{- 1} Λ^{(k)} ‖}_{F}^{2} + c o n s t

(A5)

where

c o n s t = \frac{ρ}{2} {‖ Λ^{(k)} / ρ ‖}_{2}^{2}

,

T (u^{(k + 1)})

is a Toeplitz matrix obtained from subproblem

C^{(k + 1)}

.

Then, the derivative of (41) is obtained by

{\tilde{Θ}}^{(k + 1)} = [\begin{matrix} T (u^{(k + 1)}) & X^{(k + 1)} \\ {(X^{(k + 1)})}^{H} & W^{(k + 1)} \end{matrix}] + ρ^{- 1} Λ^{(k)}

(A6)

Let

{\tilde{Θ}}^{(k + 1)} = G Σ G^{- 1}

, where the diagonal matrix

Σ = d i a g ({δ_{1}, δ_{2}, \dots, δ_{M + L}})

. Consider

Θ \geq 0

, thus there exists

Θ^{(k + 1)} = G diag ({δ_{g}}_{+}) G^{- 1}

(A7)

References

Mao, Z.; Liu, S.; Zhang, Y.D.; Han, L.; Huang, Y. Joint DoA-Range Estimation Using Space-Frequency Virtual Difference Coarray. IEEE Trans. Actions Signal Process. 2022, 70, 2576–2592. [Google Scholar] [CrossRef]
Liu, D.; Zhao, Y.; Zhang, T. Sparsity-Based Two-Dimensional DOA Estimation for Co-Prime Planar Array via Enhanced Matrix Completion. Remote Sens. 2022, 14, 4690. [Google Scholar] [CrossRef]
Pascal, V.; Mestre, X.; Loubaton, P. Performance analysis of an improved MUSIC DOA estimator. IEEE Trans. Signal Process. 2015, 63, 6407–6422. [Google Scholar]
Yu, J.; Li, J.; Sun, B.; Jiang, Y.; Xu, L. Multiple RFI sources location method combining two-dimensional ESPRIT DOA estimation and particle swarm optimization for spaceborne SAR. Remote Sens. 2021, 13, 1207. [Google Scholar] [CrossRef]
Haardt, M.; Zoltowski, M.D.; Mathews, C.P.; Nossek, J. 2D unitary ESPRIT for efficient 2D parameter estimation. In Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI, USA, 9–12 May 1995; Volume 3, pp. 2096–2099. [Google Scholar]
Xu, T.; Wang, X.; Huang, M.; Lan, X.; Sun, L. Tensor-based reduced-dimension music method for parameter estimation in monostatic fda-mimo radar. Remote Sens. 2021, 13, 3772. [Google Scholar] [CrossRef]
Tropp, J.A. Greed is Good: Algorithmic Results for Sparse Approximation. IEEE Trans. Inf. Theory 2004, 50, 2231–2242. [Google Scholar] [CrossRef] [Green Version]
Cao, Z.; Zhou, L.; Dai, J. Sparse Bayesian Approach for DOD and DOA Estimation with Bistatic MIMO Radar. EEE Access 2019, 7, 155335–155346. [Google Scholar] [CrossRef]
Ramasamy, D.; Venkateswaran, S.; Madhow, U. Compressive Parameter Estimation in AWGN. IEEE Trans. Signal Process. 2014, 62, 2012–2027. [Google Scholar] [CrossRef] [Green Version]
Bhaskar, B.N.; Tang, G.; Recht, B. Atomic norm denoising with applications to line spectral estimation. IEEE Trans. Signal Process. 2013, 61, 5987–5999. [Google Scholar] [CrossRef] [Green Version]
Chen, X.; Zhang, X.; Yang, J.; Tan, A. How to overcome basis mismatch: From atomic norm to gridless compressive sensing. Acta Autom. Sin. 2016, 42, 335–346. [Google Scholar]
Chandrasekaran, V.; Recht, B.; Parrilo, P.A.; Willsky, A.S. The convex geometry of linear inverse problems. Found. Comput.-Tional Math. 2012, 12, 805–849. [Google Scholar] [CrossRef] [Green Version]
Candes, E.J.; Romberg, J.K.; Tao, T. Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. A J. Issued Courant Inst. Math. Sci. 2006, 59, 1207–1223. [Google Scholar] [CrossRef]
Liu, S.; Mao, Z.; Zhang, Y.D.; Huang, Y. Rank minimization-based Toeplitz reconstruction for DoA estimation using coprime array. IEEE Commun. Lett. 2021, 25, 2265–2269. [Google Scholar] [CrossRef]
Chi, Y.; Chen, Y. Compressive two-dimensional harmonic retrieval via atomic norm minimization. IEEE Trans. Signal Process. 2014, 63, 1030–1042. [Google Scholar] [CrossRef]
Tang, W.G.; Jiang, H.; Pang, S.X. Grid-free DOD and DOA estimation for MIMO radar via duality-based 2D atomic norm minimization. IEEE Access 2019, 7, 60827–60836. [Google Scholar] [CrossRef]
Tian, Z.; Zhang, Z.; Wang, Y. Low-complexity optimization for two-dimensional direction-of-arrival estimation via decoupled atomic norm minimization. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 3071–3075. [Google Scholar]
Pan, J.; Tang, J.; Niu, Y. Fast two-dimensional atomic norm minimization in spectrum estimation and denoising. arXiv 2018, arXiv:1807.08606. [Google Scholar]
Zhang, Z.; Wang, Y.; Tian, Z. Efficient two-dimensional line spectrum estimation based on decoupled atomic norm minimization. Signal Process. 2019, 163, 95–106. [Google Scholar] [CrossRef] [Green Version]
Tang, W.G.; Jiang, H.; Zhang, Q. ADMM for gridless DOD and DOA estimation in bistatic MIMO radar based on decoupled atomic norm minimization with one snapshot. In Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Ottawa, ON, Canada, 11–14 November 2019; pp. 1–5. [Google Scholar]
Tang, W.G.; Jiang, H.; Pang, S.X. Gridless angle and range estimation for FDA-MIMO radar based on decoupled atomic norm minimization. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 4305–4309. [Google Scholar]
Gregor, K.; Lecun, Y. Learning fast approximations of sparse coding. In Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 399–406. [Google Scholar]
Liu, J.; Chen, X. ALISTA: Analytic weights are as good as learned weights in LISTA. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Borgerding, M.; Schniter, P.; Rangan, S. AMP-inspired deep networks for sparse linear inverse problems. IEEE Trans. Signal Process. 2017, 65, 4293–4308. [Google Scholar] [CrossRef]
Yang, C.; Gu, Y.; Chen, B.; Rangan, S. Learning proximal operator methods for nonconvex sparse recovery with theoretical guarantee. IEEE Trans. Signal Process. 2020, 68, 5244–5259. [Google Scholar] [CrossRef]
Zhang, J.; Ghanem, B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1828–1837. [Google Scholar]
Chen, X.; Liu, J.; Wang, Z.; Pang, S. Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar] [CrossRef]
Metzler, C.A.; Mousavi, A.; Baraniuk, R.G. Learned D-AMP: Principled neural network based compressive image recovery. arXiv 2017, arXiv:1704.06625. [Google Scholar]
Yang, Y.; Sun, J.; Li, H.; Rangan, S. Deep ADMM-Net for compressive sensing MRI. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 10–18. [Google Scholar]
Dong, W.; Wang, P.; Yin, W.; Shi, G.; Wu, F.; Lu, X. Denoising prior driven deep neural network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2305–2318. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhu, H.G.; Feng, W.K.; Feng, C.Q.; Liu, W. Deep unfolding based space-time adaptive processing method for airborne radar. J. Radars 2022, 11, 676–691. [Google Scholar] [CrossRef]
Wu, L.; Liu, Z.; Liao, J. DOA Estimation Using an Unfolded Deep Network in the Presence of Array Imperfections. In Proceedings of the 2022 7th International Conference on Signal and Image Processing (ICSIP), Suzhou, China, 20–22 July 2022; pp. 182–187. [Google Scholar]
Guo, Y.; Jin, J.; Wang, Q.; Chen, H.; Liu, W. Position-enabled complex Toeplitz LISTA for DOA estimation with unknow mutual coupling. Signal Process. 2022, 194, 108422. [Google Scholar] [CrossRef]
Xiao, P.; Liao, B.; Deligiannis, N. Deepfpc: A deep unfolded network for sparse signal recovery from 1-bit measurements with application to doa estimation. Signal Process. 2020, 176, 107699. [Google Scholar] [CrossRef]
Wan, L.; Sun, Y.; Sun, L.; Ning, Z.; Rodrigues, J.J.P.C. Deep learning based autonomous vehicle super resolution DOA estimation for safety driving. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4301–4315. [Google Scholar] [CrossRef]
Zhu, H.G.; Feng, C.Q.; Feng, W.K.; Chen, H. A deep learning approach for sparse snapshot doa estimation. J. Signal Process. 2022, 38, 2114–2123. [Google Scholar]
Li, Y.; Chi, Y. Off-the-Grid Line Spectrum Denoising and Estimation with Multiple Measurement Vectors. IEEE Trans. Signal Process. 2015, 64, 1257–1269. [Google Scholar] [CrossRef] [Green Version]
Toh, K.C.; Todd, M.J.; Tütüncü, R.H. On the implementation and usage of SDPT3–a Matlab software package for semidefinite-quadratic-linear programming, version 4.0. In Handbook on Semidefinite, Conic and Polynomial Optimization; Springer: Boston, MA, USA, 2012; pp. 715–754. [Google Scholar]
Yang, L.; Xie, L.; Stoica, P. Vandermonde Decomposition of Multilevel Toeplitz Matrices with Application to Multidimensional Super-Resolution. IEEE Trans. Inf. Theory 2016, 62, 3685–3701. [Google Scholar] [CrossRef] [Green Version]
Tang, W.G.; Jiang, H.; Zhang, Q. Gridless DOD and DOA estimation in bistatic MIMO radar using 2D-ANM and its low complexity algorithms. Digit. Signal Process. 2021, 108, 102900. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Krishnan, K.; Terlaky, T. Interior point and semidefinite approaches in combinatorial optimization. In Graph Theory and Combinatorial Optimization; Springer: Boston, MA, USA, 2005; pp. 101–157. [Google Scholar]

Figure 1. URA signal model.

Figure 2. The network structure of ANM-ADMM-Net.

Figure 3. The network structure of DANM-ADMM-Net.

Figure 4. Convergence performance of ANM-ADMM-Net and its comparison with ANM-ADMM algorithm. (a) Train NMSEs when

M = 10

(b) Test NMSEs when

M = 10

(c) Test NMSEs when

K = 10 - 40

(d) Test NMSEs when

K = 50 - 4050

.

Figure 4. Convergence performance of ANM-ADMM-Net and its comparison with ANM-ADMM algorithm. (a) Train NMSEs when

M = 10

(b) Test NMSEs when

M = 10

(c) Test NMSEs when

K = 10 - 40

(d) Test NMSEs when

K = 50 - 4050

.

Figure 5. Convergence performance of DANM-ADMM-Net and its comparison with DANM-ADMM algorithm. (a) Train NMSEs when

K = 5 / 10

. (b) Test NMSEs when

K = 5 / 10

. (c) Test NMSEs when

K = 5 ~ 25

. (d) Test NMSEs when

K = 5 ~ 300

.

Figure 5. Convergence performance of DANM-ADMM-Net and its comparison with DANM-ADMM algorithm. (a) Train NMSEs when

K = 5 / 10

. (b) Test NMSEs when

K = 5 / 10

. (c) Test NMSEs when

K = 5 ~ 25

. (d) Test NMSEs when

K = 5 ~ 300

.

Figure 6. DOA estimation results of different methods. (a) Two targets. (b) Three targets.

Figure 7. Test RMSEs versus

K

and SNR. (a) versus network layer

K

; (b) versus SNR.

Figure 7. Test RMSEs versus

K

and SNR. (a) versus network layer

K

; (b) versus SNR.

Figure 8. 2D DOA estimation results of different methods. (a) Two targets. (b) Three targets.

Figure 9. Test RMSEs versus

K

and

SNR

; (a) versus network layer

K

. (b) versus

SNR

.

Figure 9. Test RMSEs versus

K

and

SNR

; (a) versus network layer

K

. (b) versus

SNR

.

Figure 10. Running time versus array element number

M

and snapshot number

L

. (a) when

M = 10 ~ 50

; (b) when

L = 1 ~ 15

.

Figure 10. Running time versus array element number

M

and snapshot number

L

. (a) when

M = 10 ~ 50

; (b) when

L = 1 ~ 15

.

Figure 11. Running time of different methods versus array element number

M

.

Figure 11. Running time of different methods versus array element number

M

.

Table 1. Simulation parameters of 1D DOA estimation.

Parameter	Notation	Value
Antenna elements number	$M$	10
Frequency range	$[f_{\min}, f_{\max}]$	(0,0.5]
Maximum signal sources	$K_{t}$	3
Number of snapshots	$L$	1/3/5
Number of samples	$D$	600
Signal-to-noise Ratio	SNR	Inf
Number of training samples	$Q$	480
Number of validation samples	$O$	120

Table 2. Simulation parameters of 2D DOA estimation.

Parameter	Notation	Value
Antenna elements number	$N \times M$	$\times$ 20
Pitch angle range	$ϕ_{k_{t}}$	[0°,90°]
Azimuth angle range	$θ_{k_{t}}$	[0°,90°]
Maximum signal sources	$K_{t}$	3
Number of samples in each source	$P$	200
Number of samples	$D$	600
Signal-to-noise Ratio	SNR	Inf
Number of training samples	$Q$	480
Number of validation samples	$O$	120

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, H.; Feng, W.; Feng, C.; Ma, T.; Zou, B. Deep Unfolded Gridless DOA Estimation Networks Based on Atomic Norm Minimization. Remote Sens. 2023, 15, 13. https://doi.org/10.3390/rs15010013

AMA Style

Zhu H, Feng W, Feng C, Ma T, Zou B. Deep Unfolded Gridless DOA Estimation Networks Based on Atomic Norm Minimization. Remote Sensing. 2023; 15(1):13. https://doi.org/10.3390/rs15010013

Chicago/Turabian Style

Zhu, Hangui, Weike Feng, Cunqian Feng, Teng Ma, and Bo Zou. 2023. "Deep Unfolded Gridless DOA Estimation Networks Based on Atomic Norm Minimization" Remote Sensing 15, no. 1: 13. https://doi.org/10.3390/rs15010013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Unfolded Gridless DOA Estimation Networks Based on Atomic Norm Minimization

Abstract

1. Introduction

2. Signal Model and ANM-DOA Model

2.1. 1D Signal Model and Its ANM-DOA Model

2.2. 2D Signal Model and Its DANM-DOA Model

3. DU-DOA Estimation

3.1. 1D ANM-ADMM DOA

3.2. 1D ANM-ADMM-Net DOA

3.2.1. Network Structure

3.2.2. Data Construction

3.2.3. Network Initialization and Training

3.3. 2D DANM-ADMM DOA

3.4. 2D DANM-ADMM-Net DOA

3.4.1. Network Structure

3.4.2. Data Construction

3.4.3. Network Initialization and Training

4. Experiment Results

4.1. Network Convergence Analysis

4.2. DOA Estimation Results Analysis

4.2.1. 1D DOA Estimation Results Analysis

4.2.2. 2D DOA Estimation Results Analysis

4.3. Computational Complexity and Running Time Comparsion

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI