Off-Grid DoA Estimation via Two-Stage Cascaded Neural Network

Chung, Hyeonjin; Seo, Hyeongwook; Joo, Jeungmin; Lee, Dongkeun; Kim, Sunwoo

doi:10.3390/en14010228

Open AccessArticle

Off-Grid DoA Estimation via Two-Stage Cascaded Neural Network

by

Hyeonjin Chung

¹

,

Hyeongwook Seo

¹,

Jeungmin Joo

²,

Dongkeun Lee

² and

Sunwoo Kim

^1,*

¹

Department of Electronics and Computer Engineering, Hanyang University, Seoul 04763, Korea

²

Agency for Defense Development, Yuseong P.O. Box 35, Daejeon 34186, Korea

^*

Author to whom correspondence should be addressed.

Energies 2021, 14(1), 228; https://doi.org/10.3390/en14010228

Submission received: 8 December 2020 / Revised: 29 December 2020 / Accepted: 30 December 2020 / Published: 4 January 2021

(This article belongs to the Special Issue Designs and Algorithms of Localization in Vehicular Networks)

Download

Browse Figures

Versions Notes

Abstract

:

This paper introduces an off-grid DoA estimation via two-stage cascaded network which can resolve a mismatch between true direction-of-arrival (DoA) and discrete angular grid. In the first-stage network, the initial DoAs are estimated with a convolutional neural network (CNN), where initial DoAs are mapped on the discrete angular grid. To deal with the mismatch between initially estimated DoAs and true DoAs, the second-stage network estimates a tuning vector which represents the difference between true DoAs and nearest discrete angles. By using tuning vector, the final DoAs are estimated by moving initially estimated DoAs as much as the difference between true DoAs and nearest discrete angles. The limitation on estimation accuracy induced by the discrete angular grid can be resolved with the proposed two-stage network so that the estimation accuracy can be further enhanced. Simulation results show that adding the second-stage network after the first-stage network helps improve the estimation accuracy by resolving mismatch induced by the discretized grid. In the aspect of the implementation of machine learning, results also show that using CNN and using PReLU as the activation function is the best option for accurate estimation.

Keywords:

off-grid direction-of-arrival (DoA) estimation; machine learning; cascaded neural network; convolutional neural network (CNN); deep neural network (DNN); sparse representation

1. Introduction

Direction-of-arrival (DoA) estimation is one of the long-studied research topics in array signal processing. DoA estimation algorithms have been adopted in various applications, such as localization and radar [1]. The DoA estimation can also take essential role in cooperative localization for vehicular networks, where the cooperative localization between vehicles requires the relative distances and DoAs of neighboring vehicles [2,3]. Using the aforementioned information, the position of vehicles that are derived from a global positioning system (GPS) can be estimated using cooperative localization [2,3]. As the commercialization of an automotive multiple-input multiple-output (MIMO) radar progresses [4], it is possible for vehicles to harness DoAs of other vehicles using the DoA estimation algorithm.

Traditional DoA estimation algorithms such as MUSIC [5] have high estimation accuracy and resolution. However, they require a large number of snapshots and cannot properly estimate DoAs of coherent signal sources. To overcome these disadvantages, compressive sensing (CS)-based DoA estimation algorithms have been proposed in [6,7,8], where the CS-based DoA estimation exploits the discrete grid which consists of angular bases. Regardless of the type of CS, such as basis pursuit (BP) [9] and sparse Bayesian learning (SBL) [10], CS-based DoA estimation algorithms are commonly robust against low signal-to-noise ratio (SNR) and fewer snapshots [11]. CS-based DoA estimation exploits the discrete angular grid which consists of angular bases; however, there is a high possibility that the true DoAs do not correspond perfectly with the angular bases in practice. The mismatch between the DoAs and the angular bases is generally referred to as a grid-mismatch and induces an inevitable estimation error [12].

Aside from the aforementioned studies, an attempt to exploit the machine learning to the DoA estimation has first been proposed in [13,14], and continues to be studied as the deep learning theory and methods are developing fast [15,16]. In [13,14], a support vector regression (SVR) approximates the function that maps the received signals to DoAs. Results in [13,14] show that the DoA can be successfully estimated with SVR, where SVR is a machine learning technique that is mainly used for pattern recognition problems. However, the estimation accuracy of algorithms in [13,14] is inferior to the traditional DoA estimation algorithm such as MUSIC [5]. As computational resources that can support large datasets are not viable currently, small amounts of data are used for training, and this causes low estimation accuracy.

After the introduction of a neural network (NN) [16], DoA estimation algorithms based on various types of NN were proposed in [17,18,19,20,21], where these algorithms are trained and tested with the massive amounts of data. Although deep neural network (DNN)-based algorithms in [17,18] have higher estimation accuracy and resolution than traditional DoA estimation algorithms, they can only estimate the fixed number of DoAs as the number of DoAs that DNN can estimate is also fixed. In practice, the number of signal sources is unfixed and unknown so that the number of DoAs can vary. Thus, algorithms in [17,18] are not suitable for practical situations. Unlike in [17,18], the algorithms in [19,20] can cope with the varying number of DoAs by setting the output of network as an angular grid, where the idea of [19,20] is motivated by the CS-based DoA estimation. The angular grid represents the discretized angular domain and maps where DoAs belong within the grid. In this case, DoAs can be successfully estimated if every DoA matches with one of the discrete angles within the grid, regardless of the number of DoAs. However, the algorithms in [19,20] suffer from the identical problem of CS-based DoA estimation, as a true DoA may not perfectly match with the discrete angles in practice. Thus, the inevitable estimation error occurs when estimating DoAs with algorithms in [19,20].

To prevent the inevitable estimation error induced by mismatch between true DoAs and discrete angles within the grid, we propose a novel off-grid DoA estimation algorithm based on the two-stage cascaded NN. A design of the two-stage cascaded NN is motivated by a two-step approach in [22]. In the first stage, the DoAs are initially estimated with the convolutional neural network (CNN). As the initial DoAs estimated by the first-stage network are mapped on the discrete angular grid, the initially estimated DoAs contain the estimation error induced by the mismatch. In the second stage, the mismatch between the true DoA and angles within the grid is resolved by DNN. The second-stage network estimates a tuning vector which represents the difference between true DoAs and nearest discrete angles. The final estimated DoA can be obtained by moving initial DoA as much as the difference between the true DoA and the nearest discrete angle. By using this two-stage cascaded NN, the estimation accuracy is no longer limited by the discrete angular grid so that the estimation accuracy can be further enhanced. Simulation results prove that the two-stage cascaded NN can achieve higher estimation accuracy than using a single network by resolving the mismatch induced by the discrete angular grid.

Notations: We use lower case and upper case bold characters to, respectively, represent vectors and matrices throughout this paper.

{(\cdot)}^{T}

,

{(\cdot)}^{H}

, and

{(\cdot)}^{*}

, respectively, denote the transpose, conjugate transpose, and complex conjugation. ⊗ denotes Kronecker product.

Trace (\cdot)

denotes the trace of a matrix.

real (\cdot)

and

imag (\cdot)

, respectively, denote the real part and imaginary part.

a (i)

denotes the i-th element in a vector

a

, and

A (i, j)

denotes the

(i, j)

-th element in a matrix

A

.

∥ a ∥

denotes the L2 norm of

a

.

0_{N}

denotes a

N \times 1

zero vector, and

I_{N}

denotes a

N \times N

identity matrix.

diag (\cdot)

denotes a vector whose entries are diagonal elements of a given matrix.

C^{M \times N}

,

R^{M \times N}

, and

R_{+}^{M \times N}

, respectively, denote

M \times N

matrix whose elements are complex, real, and real positive numbers. If

N = 1

, they are

M \times 1

vectors.

2. Signal Model

We assume that P uncorrelated narrowband signals impinging on the uniform linear array (ULA) with M elements. Here, P narrowband signals have same carrier frequency. The spacing between adjacent antennas is set to half-wavelength

λ / 2

, where

λ

denotes the wavelength of the signals. DoAs of P signals are denoted as

Θ

, where

Θ = {[θ_{1}, \dots, θ_{P}]}^{T}

. When the narrowband signal impinges on the antenna array, an identical signal arrives at each antenna with a different time delay, where the time delay is dependent on the DoA of the signal. For narrowband signal, the time delay can be translated as a phase shift, and a vector that models the phase shift of each antenna according to DoA is generally known as array manifold vector or steering vector [23]. An array manifold vector whose DoA is

θ

,

a (θ)

can be given by follows as in [23].

a (θ) = {[1, e^{j π cos θ}, \dots, e^{j (M - 1) π cos θ}]}^{T} \in C^{M \times 1} .

(1)

An array manifold matrix for P signal sources,

A (Θ)

is

A (Θ) = [a (θ_{1}), \dots, a (θ_{P})] \in C^{M \times P} .

(2)

The received signal

X

can be written as

X = A (Θ) S + N \in C^{M \times D},

(3)

where D is the number of snapshots.

S = {[s_{1}, \dots, s_{P}]}^{T}

, where

s_{p} \in C^{D \times 1}

denotes the p-th signal vector.

N

is an additive white Gaussian noise matrix whose columns follow

CN (0_{M}, σ^{2} I_{M})

, where

0_{M}

is a

M \times 1

zero vector,

I_{M}

is a

M \times M

identity matrix, and

σ^{2}

denotes the power of the noise.

R_{X}

, the covariance matrix of

X

, can be defined as

\begin{matrix} R_{X} = E [X X^{H}] = A (Θ) R_{S} A {(Θ)}^{H} + σ^{2} I_{M} \approx \frac{X X^{H}}{D} \in C^{M \times M}, \end{matrix}

(4)

where

R_{S} = E [S S^{H}]

. As every signal source is uncorrelated with each other,

R_{S}

is a diagonal matrix where its p-th diagonal element equals to the power of the p-th signal source. We define

z \in R_{+}^{P \times 1}

such that

z = diag (R_{S})

. The

(i, j)

-th element of

R_{X}

,

R_{X} (i, j)

can be given by

R_{X} = \sum_{p = 1}^{P} z (p) a (θ_{p}) a {(θ_{p})}^{H} + σ^{2} I_{M},

(5)

where

z (p)

is the p-th element of

z

, which denotes the power of the p-th signal source and satisfies that

z (p) = E [s_{p}^{H} s_{p}]

. A vectorized covariance matrix

y

can be given by

y = \sum_{p = 1}^{P} z (p) a {(θ_{p})}^{*} \otimes a (θ_{p}) + σ^{2} i = A (Θ) z + σ^{2} i = vec (R_{X}) \in C^{M^{2} \times 1},

(6)

where

vec (\cdot)

denotes the vectorization of the matrix,

A (Θ) = [a {(θ_{1})}^{*} \otimes a (θ_{1}), \dots, a {(θ_{P})}^{*} \otimes a (θ_{P})] \in C^{M^{2} \times P}

, and

i = vec (I_{M})

.

Following CS-based DoA estimation and a machine learning-based DoA estimation framework [6,7,8,19,20],

y

can be represented as a product of a discretized angular grid and a sparse vector. Here, the discretized angular grid covers the potential DoAs that range between

[0^{\circ}, 180^{\circ}]

. The discretized angular grid

Φ

can be given by

\begin{matrix} Φ = [a {(ϑ_{1})}^{*} \otimes a (ϑ_{1}), \dots, a {(ϑ_{G})}^{*} \otimes a (ϑ_{G})] \in C^{M^{2} \times G}, \end{matrix}

(7)

where G denotes the size of the angular grid, and

ϑ_{g}

denotes the g-th discrete DoA such that

ϑ_{g} = {(180 g / G)}^{\circ}

. Assuming

θ_{1}, \dots, θ_{P}

corresponds with one of

ϑ_{1}, \dots, ϑ_{G}

,

y

can be rewritten as

y = Φ η + σ^{2} i \in C^{M^{2} \times 1} .

(8)

η \in R^{G \times 1}

denotes the sparse vector, where the majority of the elements of

η

are 0. If

θ_{p} = ϑ_{g}

,

η (g) = z (p)

.

3. Off-Grid DoA Estimation via Cascaded Convolutional Neural Network

In practice,

θ_{1}, \dots, θ_{P}

may not perfectly correspond to the potential DoAs in

Φ

. For example, when

θ_{p} = 75 . 3^{\circ}

and

G = 180

, none of the potential DoAs in

Φ

matches perfectly with

θ_{p}

. Thus, we propose the DoA estimation algorithm based on the two-stage cascaded NN, which can resolve the mismatch between the true DoAs and the discrete angular grid. In the first stage, the discrete angles that are nearest to the true DoAs are estimated by CNN. In the second stage, the difference from the true DoA and the nearest discrete angle is estimated by DNN. The overall structure of the two-stage cascaded NN is given in Figure 1. Throughout the paper, the term first-stage network denotes the network that initially estimates discrete angles, and the term second-stage network denotes the network that estimates the difference from the true DoA and the nearest discrete angles.

3.1. First Stage: DoA Estimation via Convolutional Neural Network

The CNN used in the first stage follows the network structure in [20], where 1D convolutional layers in the CNN extract the features from the input and reconstruct

η

. As the CNN in [20] does not use pooling, the number of rows of the input should be equal to the size of the output, where the size of the output is G. To form the input whose number of rows equals to G,

\tilde{η}

is formulated as follows.

\tilde{η} = Φ^{†} y = Φ^{H} (Φ Φ^{H}) y \in C^{G \times 1} .

(9)

As all elements of the input have to be real numbers, the input

ζ

is given by

ζ = [real (\tilde{η}), imag (\tilde{η})] \in R^{G \times 2} .

(10)

An output of the k-th 1D convolutional layer,

c_{k}

, can be represented as in [20]:

c^{(k)} = Z \{f_{Act} (W^{(k - 1)} * c^{(k - 1)} + b^{(k - 1)})\}, for k = 1, \dots, K,

(11)

where ∗ denotes the convolution,

c^{(0)} = ζ

, and K denotes the number of convolutional layers.

W^{(k)}

and

b^{(k)}

denote the convolution kernel and the bias of the k-th convolutional layers.

f_{Act}^{k} (\cdot)

denotes the activation function that the CNN is using. The padding operator

Z (\cdot)

adds zeros on both sides of the output of the activation function so that the size of

c_{k}

remains constant for

k = 1, \dots, K

.

The loss function of the CNN is given by mean squared error (MSE),

{∥ \hat{η} - η ∥}^{2}

, where

\hat{η}

denotes the estimation of

η

. The convolution kernel and the bias that minimize the loss function can be given by follows as in [20].

{\{{\hat{W}}^{(k)}, {\hat{b}}^{(k)}\}}_{k = 1}^{K} = \underset{{\{W^{(k)}, b^{(k)}\}}_{k = 1}^{K}}{argmin} {∥ \hat{η} - η ∥}^{2},

(12)

where

{\hat{W}}^{(k)}

and

{\hat{b}}^{(k)}

denotes the convolution kernel and the bias of the k-th convolutional layer that minimize the loss function.

The input data and the output data of the first-stage network can be summarized as follows.

Input data: $ζ \in R^{G \times 2}$
Output data: $\hat{η} \in R^{G \times 1}$ , trained by $η$

3.2. Second Stage: Resolving Mismatch via Deep Neural Network

The second-stage network followed by the first-stage network resolves the mismatch by estimating the tuning vector

ψ \in R^{G \times 1}

.

ψ

represents the difference between true DoAs and nearest discrete angles. After obtaining

\hat{η}

from the first-stage network, the second-stage network yields

\hat{ψ}

with

ζ

and

\hat{η}

, where

\hat{ψ}

denotes the estimation of

ψ

. Figure 2 explains how the DoAs are estimated when

G = 180

and

Θ = {[80 . 4^{\circ}, 89 . 8^{\circ}]}^{T}

. If the estimation is accurate, the 80-th and the 90-th elements of

\hat{η}

, respectively, are

z (1)

and

z (2)

, and the 80-th and the 90-th elements of

\hat{ψ}

are

+ 0 . 4^{\circ}

and

- 0 . 2^{\circ}

, respectively. To obtain the final estimated DoAs, the initial DoAs are estimated from the discretized angular grid by finding peaks in

\hat{η}

. Then, the initial DoAs are moved as much as the difference between true DoAs and nearest discrete angles by using

\hat{ψ}

.

As the input of the DNN must be a real vector, the input of the DNN

χ

is formulated as

χ = {[vec {(ζ)}^{T}, {\hat{η}}^{T}]}^{T} = {[real {(\tilde{η})}^{T}, imag {(\tilde{η})}^{T}, {\hat{η}}^{T}]}^{T} \in R^{3 G \times 1} .

(13)

Letting L,

d^{(l)}

, and

I^{(l)}

, respectively, denote the number of dense layers, the output of the l-th layer, and the size of

d^{(l)}

,

d^{(l)} (j)

can be given by

d^{(l)} (j) = \tanh (\sum_{i = 1}^{I^{(l - 1)}} U^{(l)} (j, i) d^{(l - 1)} (j) + v^{(l)} (j)), l = 1, \dots, L,

(14)

where

d^{(0)} = χ

.

U^{(l)}

and

v^{(l)}

denote the weights and the bias of the l-th layers, respectively. In the second stage, a hyperbolic tangent function

\tanh

is employed for the activation function. The reason for selecting

\tanh

as the activation function is that the output vector of the second-stage network must be capable of representing a negative value. For example, if

G = 180

so that the angular grid is discretized into units of

1^{\circ}

, elements of

\hat{ψ}

should be able to range between

- 0 . 5^{\circ}

and

+ 0 . 5^{\circ}

. Note that ReLU and sigmoid cannot be employed as the activation function of the second-stage network as ReLU and sigmoid can only yield the value equal to or greater than 0.

The loss function of the DNN is given by MSE,

{∥ \hat{ψ} - ψ ∥}^{2}

. The weights and the bias that minimize the loss function can be denoted as

{\{{\hat{U}}^{(l)}, {\hat{v}}^{(l)}\}}_{l = 1}^{L} = \underset{{\{U^{(l)}, v^{(l)}\}}_{l = 1}^{L}}{argmin} {∥ \hat{ψ} - ψ ∥}^{2},

(15)

where

{\hat{U}}^{(l)}

and

{\hat{v}}^{(l)}

, respectively, denote the weights and the bias of the l-th dense layer that minimize the loss function.

The input data and the output data of the second stage DNN can be summarized as follows.

Input data: $χ \in R^{3 G \times 1}$ ( $ζ$ and $\hat{η}$ )
Output data: $\hat{ψ} \in R^{G \times 1}$ , trained by $ψ$

4. Simulation Results and Discussions

4.1. Simulation Settings

The performance of the proposed algorithm and L1-SVD [6] are compared in this section. Even though the proposed algorithm is motivated by the work in [22], the comparison with the work in [22] is omitted as the algorithm in [22] exploits the co-prime array [24] and targets for the wideband signal.

M = 8

and

P = 2

. For all algorithms, a size of the grid G is set to 180 so that a discrete grid spacing is set to

1^{\circ}

. The signal-to-noise ratio (SNR) is defined as

SNR = \frac{Trace (R_{S})}{σ^{2} M} = \frac{1}{σ^{2} M} \sum_{p = 1}^{P} z (p) .

(16)

The root mean square error (RMSE) is defined as

RMSE = \sqrt{\frac{1}{P Q} \sum_{q = 1}^{Q} \{\sum_{p = 1}^{P} {({\hat{θ}}_{p}^{q} - θ_{p}^{q})}^{2}\}},

(17)

where Q is the number of Monte Carlo trials for RMSE calculation, and

θ_{p}^{q}

and

{\hat{θ}}_{p}^{q}

, respectively, denote the true DoA and the estimated DoA of the p-th signal source on the q-th trial.

For training, a total of 106,800 training data are used, and 20 percent of the training data is used for the test. When generating the received signal for the training, the SNR is randomly set between 0 and 10 dB while the number of snapshots is set to 256. DoAs are randomly set between

[30^{\circ}, 150^{\circ}]

, and

z = {[1, 1]}^{T}

. Once the DoAs and SNR are determined, the received signal

X

is generated as (3), and the covariance matrix

R_{X}

is formulated as (4). Using

R_{X}

, the input data of the first-stage network

ζ

can be formulated by following the procedure of (6)–(8) and (10). The output data of the first-stage network used for training

η

can be generated by

z

and DoAs. If

Θ = {[80 . 4^{\circ}, 89 . 8^{\circ}]}^{T}

, and

z = {[1, 1]}^{T}

,

η

becomes a sparse vector whose 80-th and 90-th elements are 1. Finally, the first-stage network is trained by 85,440 different sets of

ζ

and

η

. The input data of the second-stage network

χ

are generated as (13), where

χ

is a combination of input and output of the first-stage network. The output data of the second-stage network used for training

ψ

can be generated by DoAs. If

Θ = {[80 . 4^{\circ}, 89 . 8^{\circ}]}^{T}

,

ψ

becomes a sparse vector whose 80-th and 90-th elements are, respectively,

+ 0 . 4^{\circ}

and

- 0 . 2^{\circ}

. The second-stage network is trained by 85,440 different sets of

χ

and

ψ

.

In the first-stage network, the DNN is implemented as well as the CNN to compare the performance according to the type of the network. When the DNN is used for the first stage, the input

ζ \in G \times 2

is vectorized as the DNN can only support vector for both input and output. The output is identical to

\hat{η}

in both CNN- and DNN-based first-stage networks. Regardless of the type of network that the first stage is using, three types of activation functions—

PReLU

,

ReLU

, and

\tanh

—are used, and the number of convolutional layers and dense layers in the first-stage network is set to 4. If the CNN is used at the first stage, the kernel size of each layer is set to

25 \times 12

,

15 \times 6

,

5 \times 3

, and

3 \times 1

. If the DNN is used at the first stage, the number of nodes in every layer is set to 200. The number of dense layers in the second-stage network, L equals 4. Here, the number of nodes in every layer is set to 200. Number of epochs of the first-stage network and the second-stage network are, respectively, set to 300 and 500, and the batch size is uniformly set to 64. The overall training settings are organized in Table 1.

4.2. Performance Analysis According to Hyperparameters

The performance of the NN varies with hyperparameters, where hyperparameters include the number of epochs and the type of the activation function. The performance of the first-stage network is compared with respect to the number of epochs and the type of the activation function in Figure 3. Figure 3 shows that the test MSE is lower in the order of CNN with PReLU, DNN with PReLU, CNN with tanh, DNN with tanh, and DNN with ReLU. PReLU has the low test MSE while tanh has the high test MSE, where the test MSE means the loss function obtained by test data. Overall, the test MSE tends to converge around the 200-th epoch, except the DNN with tanh which converges around the 700-th epoch. Moreover, CNN has a lower test MSE than DNN if an identical activation function is used. From Figure 3, we can conclude that CNN whose activation function is PReLU is best suited for the DoA estimation at the first stage. Note that the test MSE of the second-stage network according to hyperparameters is omitted as the activation function of the second-stage network is fixed to tanh. In case of the second-stage network, the test MSE converges around the 500-th epoch.

Figure 4 shows outputs of the first-stage network and the second-stage network according to the type of NN and the activation function. Note that only results of CNN with PReLU, CNN with ReLU, and DNN with ReLU are presented as the others have higher test MSE, and figures in the second row show the true DoAs and the final estimated DoAs determined by two-stage cascaded NN. In Figure 4, CNN with PReLU yields distinct peaks around true DoAs at the first stage and achieves the highest estimation accuracy with the help of the second-stage network. However, DNN with PReLU yields obscure peaks at the first stage, and the output of the second stage shows poor performance when using DNN with PReLU. From Figure 3 and Figure 4, we can tell that CNN whose activation function is PReLU is the best option, and the output of the first-stage network affects the output of the second-stage network.

4.3. Analysis on Estimation Accuracy and Resolution

Before presenting simulation results, an explanation of the terms that are used in figures and analysis is given to avoid unclear understanding. The term single-stage denotes the DoA estimation without the second-stage network, and the term two-stage means that the second-stage network is added after the first-stage network. A type of the first-stage network is described after the number of stage that the algorithm is using. Here, the type of the first-stage network is either CNN or DNN. For example, two-stage CNN means that there is the second-stage network followed by the first-stage network and the first-stage network is CNN. Since Figure 3 shows that PReLU outperforms other types of activation functions, the activation function of the first-stage network is fixed to PReLU.

Figure 5 shows the root mean square error (RMSE) of DoA estimation algorithms when the SNR and the number of snapshots vary. Figure 5 shows that the proposed two-stage CNN presents the best estimation accuracy among other algorithms, except when fewer snapshots are used. When training the network, the number of snapshots is fixed to 256 so that the condition with the fewer snapshots is not considered. Thus, the RMSE of L1-SVD becomes lower than that of two-stage CNN and single-stage CNN when the number of snapshots is below 18. In Figure 5a,b, the RMSE of single-stage CNN, single-stage DNN, and L1-SVD does not fall below

0 . 2^{\circ}

. When the second-stage network is added after CNN, the RMSE falls below

0 . 2^{\circ}

by resolving the mismatch between true DoAs and discrete angular grid. However, when DoAs are incorrectly estimated at the first stage, the second-stage network does not improve the RMSE as the incorrect output of the first-stage network affects the performance of the second-stage network.

To analyze the resolution of DoA estimation algorithms, we present the RMSE when the difference between two adjacent DoAs varies. We define a distance between two adjacent DoAs

τ

as

τ = |(cos θ_{1} - cos θ_{2}) / 2|

, where

τ

is used as a criterion of resolution in [25,26,27]. Figure 6 shows the RMSE versus

τ

and presents that the second-stage network helps improve RMSE. The RMSE of the single-stage CNN and the two-stage CNN converges when

τ = 0.02

, and the RMSE of the single-stage DNN and the two-stage DNN converges when

τ = 0.03

. The RMSE of L1-SVD is higher than that of the CNN-based DoA estimation and the DNN-based DoA estimation at every

τ

. From this result, we can conclude that the CNN-based DoA estimation has the highest resolution, and the second-stage network helps to increase the estimation accuracy furthermore.

5. Conclusions

In this paper, we propose the off-grid DoA estimation algorithm which can resolve mismatch induced by discretized angular grid via two-stage cascaded NN. The first-stage network employs CNN, and a resized covariance matrix to fit the output dimension of CNN is used as an input. The first-stage network estimates initial DoAs that can be represented as discrete angles. Thus, there is a mismatch between true DoAs and initially estimated DoAs. The second-stage network employs DNN, and both input and output of the first-stage network are used as input. The output of the second-stage network is the tuning vector which represents the difference between true DoAs and nearest discrete angles. By using outputs of the first-stage network and the second-stage network, final estimated DoAs can be obtained. Simulation results show that using the cascaded NN improves the RMSE by resolving mismatch induced by the discretized grid. Furthermore, results show that using CNN and using PReLU as the activation function is the best option for DoA estimation considering both estimation accuracy and resolution. The proposed algorithm can be an attractive option for applications such as far-field localization and far-field radar. These applications require high estimation accuracy since a small DoA estimation error may cause large performance degradation. However, the performance of the proposed algorithm is analyzed only when the number of signal sources is fixed. In practice, the number of signal sources can vary so that the performance of the proposed algorithm may decrease. Thus, a method that addresses this issue needs to be studied for the practical implementation of the NN-based DoA estimation algorithm.

Author Contributions

Conceptualization, H.C.; Methodology, H.C. and H.S.; Software, H.C. and H.S.; Validation, H.C., H.S., and S.K.; Formal analysis, H.C.; Investigation, H.C. and H.S.; Resources, H.C., H.S., J.J., D.L., and S.K.; Data curation, H.C. and H.S.; Writing—original draft preparation, H.C. and S.K.; Visualization, H.C.; Supervision, S.K.; Project administration, J.J., D.L., and S.K.; Funding acquisition, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Agency for Defense Development.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, J.C.; Yao, K.; Hudson, R.E. Source localization and beamforming. IEEE Signal Process. Mag. 2002, 19, 30–39. [Google Scholar] [CrossRef] [Green Version]
Wymeersch, H.; Seco-Granados, G.; Destino, G.; Dardari, D.; Tufvesson, F. 5G mmWave positioning for vehicular networks. IEEE Wirel. Commun. 2017, 24, 80–86. [Google Scholar] [CrossRef] [Green Version]
Eom, J.; Kim, H.; Lee, S.H.; Kim, S. DNN-assisted cooperative localization in vehicular networks. Energies 2019, 12, 2758. [Google Scholar] [CrossRef] [Green Version]
Hasch, J.; Topak, E.; Schnabel, R.; Zwick, T.; Weigel, R.; Waldschmidt, C. Millimeter-wave technology for automotive radar sensors in the 77 GHz frequency band. IEEE Trans. Microw. Theory Tech. 2012, 60, 845–860. [Google Scholar] [CrossRef]
Schmidt, R. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 1986, 34, 276–280. [Google Scholar] [CrossRef] [Green Version]
Malioutov, D.; Cetin, M.; Willsky, A.S. A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Process. 2005, 53, 3010–3022. [Google Scholar] [CrossRef] [Green Version]
Gorodnitsky, I.F.; Rao, B.D. Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm. IEEE Trans. Signal Process. 1997, 45, 600–616. [Google Scholar] [CrossRef] [Green Version]
Yang, Z.; Xie, L.; Zhang, C. Off-grid direction of arrival estimation using sparse Bayesian inference. IEEE Trans. Signal Process. 2013, 61, 38–43. [Google Scholar] [CrossRef] [Green Version]
Chen, S.S.; Donoho, D.L.; Saunders, M.A. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 1999, 20, 33–61. [Google Scholar] [CrossRef]
Wipf, D.P.; Rao, B.D. Sparse Bayesian learning for basis selection. IEEE Trans. Signal Process. 2004, 52, 2153–2164. [Google Scholar] [CrossRef]
Shen, Q.; Liu, W.; Cui, W.; Wu, S. Underdetermined DOA estimation under the compressive sensing framework: A review. IEEE Access 2016, 4, 8865–8878. [Google Scholar] [CrossRef]
Chi, Y.; Scharf, L.L.; Pezeshki, A.; Calderbank, A.R. Sensitivity to basis mismatch in compressed sensing. IEEE Trans. Signal Process. 2011, 59, 2182–2195. [Google Scholar] [CrossRef]
Pastorino, M.; Randazzo, A. A smart antenna system for direction of arrival estimation based on a support vector regression. IEEE Trans. Antennas Propag. 2005, 53, 2161–2168. [Google Scholar] [CrossRef]
Randazzo, A.; Abou-Khousa, M.A.; Pastorino, M.; Zoughi, R. Direction of arrival estimation based on support vector regression: Experimental validation and comparison with MUSIC. IEEE Antennas Wirel. Propag. Lett. 2007, 6, 379–382. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Huang, H.; Yang, J.; Huang, H.; Song, Y.; Gui, G. Deep learning for super-resolution channel estimation and DOA estimation based massive MIMO system. IEEE Trans. Veh. Technol. 2018, 67, 8549–8560. [Google Scholar] [CrossRef]
Guo, Y.; Zhang, Z.; Huang, Y.; Zhang, P. DOA estimation method based on cascaded neural network for two closely spaced sources. IEEE Signal Process. Lett. 2020, 27, 570–574. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, C.; Yu, P.S. Direction-of-arrival estimation based on deep neural networks with robustness to array imperfections. IEEE Trans. Antennas Propag. 2018, 66, 7315–7327. [Google Scholar] [CrossRef]
Wu, L.; Liu, Z.; Huang, Z. Deep convolution network for direction of arrival estimation with sparse prior. IEEE Signal Process. Lett. 2019, 26, 1688–1692. [Google Scholar] [CrossRef]
Xiang, H.; Chen, B.; Yang, T.; Liu, D. Improved de-multipath neural network models with self-paced feature-to-feature learning for DOA estimation in multipath environment. IEEE Trans. Veh. Technol. 2020, 69, 5068–5078. [Google Scholar] [CrossRef]
Shen, Q.; Cui, W.; Liu, W.; Wu, S.; Zhang, Y.D.; Amin, M.G. Underdetermined wideband DOA estimation of off-grid sources employing the difference co-array concept. Signal Process. 2017, 130, 299–304. [Google Scholar] [CrossRef] [Green Version]
Krim, H.; Viberg, M. Two decades of array signal processing research: The parametric approach. IEEE Signal Process. Mag. 1996, 13, 67–94. [Google Scholar] [CrossRef]
Vaidyanathan, P.P.; Pal, P. Sparse sensing with co-prime samplers and arrays. IEEE Trans. Signal Process. 2011, 59, 573–586. [Google Scholar] [CrossRef]
Candès, E.; Fernandez-Granda, C. Towards a mathematical theory of super-resolution. Commun. Pure Appl. Math. 2014, 67. [Google Scholar] [CrossRef] [Green Version]
Tang, G.; Bhaskar, B.N.; Shah, P.; Recht, B. Compressed sensing off the grid. IEEE Trans. Inf. Theory 2013, 59, 7465–7490. [Google Scholar] [CrossRef] [Green Version]
Chi, Y.; Ferreira Da Costa, M. Harnessing sparsity over the continuum: Atomic norm minimization for superresolution. IEEE Signal Process. Mag. 2020, 37, 39–57. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The structure of the two-stage cascaded NN. 1D convolutional layers construct the first-stage network, and dense layers construct the second-stage network.

Figure 2. Determination of final estimated DoAs by using the output of the first-stage network

\hat{η}

and the output of the second-stage network

\hat{ψ}

.

Figure 2. Determination of final estimated DoAs by using the output of the first-stage network

\hat{η}

and the output of the second-stage network

\hat{ψ}

.

Figure 3. The test mean squared error (MSE) over epochs. The convolutional neural network (CNN)-based first-stage network and the deep neural network (DNN)-based first-stage network are compared, and three types of activation functions—PReLU, ReLU, and tanh—are used.

Figure 4. Outputs of the first-stage network and the second-stage network according to the type of NN and the activation function. The first row presents outputs of the first-stage network, and the second row presents outputs of the second-stage network.

Figure 5. (a) The RMSE for different SNR.

Q = 1000

and

D = 256

. (b) The RMSE for different number of snapshots.

Q = 1000

and the SNR is set to 10 dB.

Figure 5. (a) The RMSE for different SNR.

Q = 1000

and

D = 256

. (b) The RMSE for different number of snapshots.

Q = 1000

and the SNR is set to 10 dB.

Figure 6. The RMSE versus

τ

.

Q = 1000

and the SNR is set to 10 dB.

|θ_{1} - θ_{2}|

is about

1 . 7^{\circ}

around

90^{\circ}

when

τ = 0.015

and is about

5 . 2^{\circ}

around

90^{\circ}

when

τ = 0.045

.

Figure 6. The RMSE versus

τ

.

Q = 1000

and the SNR is set to 10 dB.

|θ_{1} - θ_{2}|

is about

1 . 7^{\circ}

around

90^{\circ}

when

τ = 0.015

and is about

5 . 2^{\circ}

around

90^{\circ}

when

τ = 0.045

.

Table 1. Simulation settings for training.

	First Stage		Second Stage
	CNN	DNN	DNN
Activation function	PReLU, ReLU, tanh		tanh
Number of training data	106,800 (85,440 for training and 21,360 for test)
Number of layers ( $K, L$ )	4
Kernel size	25 × 12, 15 × 6, 5 × 3, 3 × 1	-
Number of nodes	-	200
Batch size	64
The number of epochs	300		500

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chung, H.; Seo, H.; Joo, J.; Lee, D.; Kim, S. Off-Grid DoA Estimation via Two-Stage Cascaded Neural Network. Energies 2021, 14, 228. https://doi.org/10.3390/en14010228

AMA Style

Chung H, Seo H, Joo J, Lee D, Kim S. Off-Grid DoA Estimation via Two-Stage Cascaded Neural Network. Energies. 2021; 14(1):228. https://doi.org/10.3390/en14010228

Chicago/Turabian Style

Chung, Hyeonjin, Hyeongwook Seo, Jeungmin Joo, Dongkeun Lee, and Sunwoo Kim. 2021. "Off-Grid DoA Estimation via Two-Stage Cascaded Neural Network" Energies 14, no. 1: 228. https://doi.org/10.3390/en14010228

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Off-Grid DoA Estimation via Two-Stage Cascaded Neural Network

Abstract

1. Introduction

2. Signal Model

3. Off-Grid DoA Estimation via Cascaded Convolutional Neural Network

3.1. First Stage: DoA Estimation via Convolutional Neural Network

3.2. Second Stage: Resolving Mismatch via Deep Neural Network

4. Simulation Results and Discussions

4.1. Simulation Settings

4.2. Performance Analysis According to Hyperparameters

4.3. Analysis on Estimation Accuracy and Resolution

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI