A Full Loading-Based MVDR Beamforming Method by Backward Correction of the Steering Vector and Reconstruction of the Covariance Matrix

Zhou, Jing; Bao, Changchun

doi:10.3390/app13010285

Open AccessArticle

A Full Loading-Based MVDR Beamforming Method by Backward Correction of the Steering Vector and Reconstruction of the Covariance Matrix

by

Jing Zhou

and

Changchun Bao

^*

Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(1), 285; https://doi.org/10.3390/app13010285

Submission received: 26 October 2022 / Revised: 20 December 2022 / Accepted: 21 December 2022 / Published: 26 December 2022

(This article belongs to the Special Issue Advances in Speech and Language Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In order to improve the performance of the diagonal loading-based minimum variance distortionless response (MVDR) beamformer, a full loading-based MVDR beamforming method is proposed in this paper. Different from the conventional diagonal loading methods, the proposed method combines the backward correction of the steering vector of the target source and the reconstruction of the covariance matrix. Firstly, based on the linear combination, an appropriate full loading matrix was constructed to correct the steering vector of the target source backward. Secondly, based on the spatial sparsity of the sound sources, an appropriate loading matrix was constructed to further suppress interferences. Thirdly, the spatial response power was utilized to derive a more accurate direction of arrival (DOA) of the target source, which is helpful for obtaining a more accurate steering vector of the target source and a more effective covariance matrix iteratively. The simulation results show that the proposed method can effectively suppress interferences and noise.

Keywords:

speech enhancement; beamforming; minimum variance distortionless response (MVDR); backward calibration; interference suppression

1. Introduction

The minimum variance distortionless response (MVDR) beamformer has proved to be effective for suppressing interferences and noise by using an effective covariance matrix of the interference-plus-noise (CMIN) and an accurate steering vector of the target source [1,2,3,4,5]. Since the CMIN and the steering vector of the target source are unknown in the conventional MVDR beamformer (CMB), their applications are greatly limited. Thus, many improved MVDR beamforming methods [6,7,8,9] have been proposed successively. In the typical examples, the diagonal loading-based MVDR beamformers [9,10,11,12,13,14,15,16] have been paid more attention to recently. The purpose of the diagonal loading operation is to reduce the diffusion of the eigenvalues of the noise by loading the values on the diagonal elements of the covariance matrix of the observed signal (CMOS), which can reduce the error of the CMOS and the susceptibility to mismatch of the steering vector [9,10]. However, a larger loading value lets the CMB degrade to a general fixed beamformer; on the contrary, a smaller loading value can hardly improve the performance of the CMB [11,12].

In order to find the appropriate loading value, many diagonal loading methods have been proposed in recent decades. They can be divided into three categories. The first one is the single loading parameter-based diagonal loading method, whereby the loading matrix is the product of the loading value and unit matrix and loaded on the CMOS, such as the Hoerl–Kennard–Baldwin (HKB) method [9] and the load-to-white-noise ratio (LNR) method [10]. Although these two methods can reduce the diffusion of the eigenvalues of the noise with the appropriate loading value, they are susceptible to specific parameters. Hence, some specific parameter-free methods have been proposed, such as the spatially matched filter (SMF) method [11] and the bounded perturbation regularization (BPR) method [12]. Nevertheless, these methods usually depend on a priori assumptions or constraints, which can easily lead the loading value to be too large or too small. The second category of diagonal loading methods is the multiple loading parameters-based diagonal loading methods, which is usually a linear combination of the CMOS and the loading matrix, such as the general linear combination (GLC) method [13] and the noise reduction preprocessing into a truncated minimum mean square error (NRP-TMMSE) method [14]. Although this category of methods has a higher flexibility for designing the loading value, they are also sensitive to the number of microphones and snapshots. Moreover, the optimal objectives of these methods do not focus on making the loaded covariance matrix approach to the CMIN, which limits the improvement on the self-cancellation of the main lobe. The third category is the iterative diagonal loading methods, which tries to improve the performance of the CMB by performing diagonal loading iteratively, such as the parameter-free Landweber iteration (PFLI) method [15] and the iterative diagonally loaded sample matrix inverse (IDL-SMI) method [16]. This category of methods utilizes iteration to compensate for the limitation of a single diagonal loading operation but is sensitive to the loading value or the weighting value used for the iterative update.

Currently, the available diagonal loading-based MVDR beamforming methods are suitable for cases in which the steering vector of the target source does not mismatch or only has a small mismatch [17]. However, the studies in recent years have still mainly been focused on designing the loading value [18,19,20,21], which is intent on obtaining an effective compromise between the noise reduction and interference suppression, whereas the essential problem of the mismatch of the steering vector is ignored. In addition, the diagonal loading-based methods also ignore the influence of the off-diagonal elements.

Based on the aforementioned problems, a full loading-based MVDR beamforming method is proposed in this paper, which combines the advantages of multiple loading parameters and iteration. In this method, a full and an appropriate loading matrix was constructed to correct the steering vector of the target source backward and further suppress the interferences, respectively. In addition, spatial response power was used to derive a more accurate direction of arrival (DOA) of the target source.

The remaining parts of this paper are organized as follows: Section 2 describes the conventional diagonal loading methods; Section 3 gives the proposed full loading method; Section 4 shows the simulations and the results analysis; and Section 5 draws the conclusions.

2. Conventional Diagonal Loading Methods

Considering a free far-field case including one target source, G interference sources and environmental noise, a uniform linear array (ULA) with M (G < M) microphones was used to pick up the multi-channel speech signals. Assuming that the target source, interference sources and noise are mutually independent, by minimizing the output power of the interference plus noise with the constraint that the target source is distortionless, the MVDR beamformer at the kth frequency bin of the lth frame was built as follows [1,2,3,4,5]:

\min_{w} w^{H} R_{IN} w s . t . w^{H} a_{0} = 1

(1)

where w∈ℂ^M^×1 is the∈ weighting vector of the MVDR beamformer, R_IN∈ℂ^M^×M is the real CMIN, a₀∈ℂ^M^×1 is the real steering vector of the target source, the superscript “^H” indicates the conjugate transpose and the symbol “ℂ” indicates the complex space.

Equation (1) is commonly solved by the Lagrange multiplier method [3,5] as follows:

w = \frac{R_{I N}^{- 1} a_{0}}{a_{0}^{H} R_{I N}^{- 1} a_{0}}

(2)

Since R_IN is unknown in practice, it is usually replaced by the CMOS R_XX∈ℂ^M^×M, i.e.:

R_{I N} \to R_{X X} = \frac{1}{J} \sum_{j = l + 1 - J}^{l} x (j) x^{H} (j)

(3)

where J is the number of the snapshots and x(j)∈ℂ^M^×1 is the jth snapshot of the observed signal.

Substituting Equation (3) into Equation (2), the weighting vector of the CMB can be expressed as follows by using the estimated steering vector â₀∈ℂ^M^×1 of the target source:

w_{C M B} = \frac{R_{X X}^{- 1} {\hat{a}}_{0}}{{\hat{a}}_{0}^{H} R_{X X}^{- 1} {\hat{a}}_{0}}

(4)

The purpose of diagonal loading methods is to improve the performance of the CMB by loading values on the diagonal elements of the R_XX before the inverse operation [8,9,11], so that the general form of the weighting vector w_DL∈ℂ^M^×1 of the MVDR beamformer realized by the single loading parameter-based diagonal loading method can be expressed as follows:

w_{D L} = \frac{{(R_{X X} + ξ I)}^{- 1} {\hat{a}}_{0}}{{\hat{a}}_{0}^{H} {(R_{X X} + ξ I)}^{- 1} {\hat{a}}_{0}}

(5)

where ξ is the loading value and I∈ℂ^M^×M is a unit matrix.

Obviously, ξ is the key parameter for improving the CMB. When ξ = 0, Equation (5) degenerates to the CMB. When ξ is much larger than the powers of the sound sources, Equation (5) degenerates to the delay and sum (DAS) beamformer [9,10,11,12]. Table 1 gives six diagonal loading methods of the CMB including HKB, LNR, SMF, BPR, GLC and NRP-TMMSE, respectively, where the symbol “|| ||₂“ indicates the 2-norm operation.

From Table 1, we can see that the HKB uses the difference between the output b∈ℂ^M^×1 of the fixed filter and the output

A \hat{η}

of the blocking filter to solve the loading values [9], where A∈ℂ^M^×(M−1) and

\hat{η}

∈ℂ^(M−1)×1 are the outputs of the blocking matrix and weighting vector, respectively. The performance of the HKB is greatly affected by the number of microphones.

The LNR designs the loading value by the power

σ_{W}^{2}

of white noise and a specific parameter ζ_LNR (usually set to 10) [10], which aims to reduce the divergence of the eigenvalues that correspond to noise and make a compromise between the noise reduction and interference suppression by loading a value related to the power of white noise. However, the LNR is sensitive to the estimated power of white noise and the specific parameter ζ_LNR.

The SMF uses the output power (

{\bar{a}}_{0}^{H} R_{X X} {\bar{a}}_{0}

, where

{\bar{a}}_{0}

∈ℂ^M^×1 is the normalized steering vector of the target source) of the spatially matched filter as the loading value [11]. Since the output power of the SMF may contain the power of the target source, the SMF may cause the loading value to be too large. Therefore, the SMF-based MVDR beamformer is easy to degenerate into a fixed beamformer.

Based on the HKB, the BPR introduces a perturbation matrix and a constraint factor to avoid the influence of the number of microphones [12]. However, the approximate solution of the constraint factor δ is likely to be too small, which limits the improvement of performance on the CMB.

The GLC uses the non-negative shrinkage parameters α and β to combine the CMOS R_XX and the unit matrix I linearly, where these two parameters are solved by minimizing the error between the loaded covariance matrix and the covariance matrix R_EXP∈ℂ^M^×M that we expect [13]. Obviously, the GLC has higher flexibility for designing the loading value, but it is also sensitive to the number of microphones and snapshots.

Meanwhile, the NRP-TMMSE uses the denoised CMOS R_T∈ℂ^M^×M and convex optimization technique to solve the loaded covariance matrix and a more accurate steering vector of the target source [14]. It is an optimized method of the GLC. However, the optimization objectives of the GLC and NRP-TMMSE do not focus on making the loaded covariance matrix approach the CMIN, which limits the improvement of the problem of the self-cancellation of the main lobe.

Moreover, the diagonal loading iteration of the PFLI [15] is built as follows:

{\begin{cases} w_{L I}^{〈 0 〉} = 0 \\ w_{L I}^{〈 i_{i t e r a t i o n} 〉} = w_{L I}^{〈 i_{i t e r a t i o n} - 1 〉} + α_{L I} ({\hat{a}}_{0} - R_{X X} w_{L I}^{〈 i_{i t e r a t i o n} - 1 〉}) \end{cases}

(6)

where w_LI∈ℂ^M^×1 is the weighting vector of the beamformer, 0 is a zero vector, <i_iteration> is the index of the iteration and α_LI {0 < α_LI < (||R_XX||₂)⁻¹} is a relaxation factor. Obviously, the effectiveness of the PFLI depends on the â₀, the selection of α_LI and the number of iterations.

Furthermore, the IDL-SMI [16] is based on the assumption that the diagonal loading operation can improve the CMOS, so the backward correction of the steering vector of the target source is built as follows:

w_{L D} = \frac{R_{L D}^{- 1} {\hat{a}}_{0}}{{\hat{a}}_{0}^{H} R_{L D}^{- 1} {\hat{a}}_{0}} ≜ w_{C M B} = \frac{R_{X X}^{- 1} {\tilde{a}}_{0}}{{\tilde{a}}_{0}^{H} R_{X X}^{- 1} {\tilde{a}}_{0}}

(7)

The solution of the corrected steering vector of the target source ã₀ is given by:

{\tilde{a}}_{0} = \frac{{\tilde{a}}_{0}^{H} R_{X X}^{- 1} {\tilde{a}}_{0}}{{\hat{a}}_{0}^{H} R_{L D}^{- 1} {\hat{a}}_{0}} R_{X X} R_{L D}^{- 1} {\hat{a}}_{0} = α_{I D L} R_{X X} R_{L D}^{- 1} {\hat{a}}_{0}

(8)

Here, to avoid the influence of the constant α_IDL, ã₀ is normalized as ā₀ =

\sqrt{M}

ã₀/||ã₀||₂.

Since the improvement of the single operation of Equation (8) is quite limited, the iterative operation is used on the IDL-SMI. Its termination conditions are as follows [16]:

{\begin{matrix} {‖ {\bar{a}}_{0}^{〈 i_{i t e r a t i o n} 〉} - {\bar{a}}_{0}^{〈 i_{i t e r a t i o n} - 1 〉} ‖}_{2} < δ_{I D L} \\ \frac{| {({\bar{a}}_{0}^{〈 i_{i t e r a t i o n} 〉})}^{H} {\hat{a}}_{0} |}{{‖ {\bar{a}}_{0}^{〈 i_{i t e r a t i o n} 〉} ‖}_{2} {‖ {\hat{a}}_{0} ‖}_{2}} \geq \min (\frac{| {\hat{a}}_{0, - Δ}^{H} {\hat{a}}_{0} |}{{‖ {\hat{a}}_{0, - Δ} ‖}_{2} {‖ {\hat{a}}_{0} ‖}_{2}}, \frac{| {\hat{a}}_{0, + Δ}^{H} {\hat{a}}_{0} |}{{‖ {\hat{a}}_{0, + Δ} ‖}_{2} {‖ {\hat{a}}_{0} ‖}_{2}}) \end{matrix}

(9)

where δ_IDL is the parameter of the iterative increment. â_0,−∆∈ℂ^M^×1 and â_0,+∆∈ℂ^M^×1 are the steering vectors related to the angles of θ₀ − θ_∆ and θ₀ + θ_∆, respectively, θ₀ is the estimated DOA of the target source and θ_∆ is a small angular interval. Although the IDL-SMI seems to be very effective in reducing the mismatch of the steering vector, when the diagonal loading method does not play an optimization role, the estimated steering vector ā₀^<end>∈ℂ^M^×1 may be deviated from the real steering vector of the target source a₀∈ℂ^M^×1.

3. The Proposed Method

3.1. Framework of the Proposed Method

To improve the aforementioned problems, a full loading-based MVDR beamforming method is proposed, and its framework mainly includes three modules as shown in Figure 1. In Module 1, an improved GLC (IGLC) with a full loading matrix is used to correct the steering vector of the target source backward so that the distortion of the target source can be improved. In Module 2, an appropriate loading matrix based on the steered response power of the uncertain sets is used to reconstruct the covariance matrix so that the interferences can be further suppressed. In Module 3, based on the broadband spatial response power of the designed MVDR beamformer, the more accurate DOA of the target source is derived; thus, the initial steering vector of the target source used in Module 1 and the uncertain sets used in Module 2 can be more accurate. By iterating the above three modules, we can obtain the finally converged DOA of the target source; thus, a robust MVDR beamformer can be obtained.

3.2. The Improved GLC Method

3.2.1. Full Loading of the Covariance Matrix

Since the GLC can improve the CMOS by minimizing the error between the loaded covariance matrix R_GLC∈ℂ^M^×M and R_EXP, we chose it as the basic model. Firstly, the diagonal loading matrix was replaced by the full loading matrix, which can reduce the components of the target source and the error of the off-diagonal elements. The full loading-based covariance matrix is defined as follows:

\begin{array}{l} R_{I G L C} & = α R_{L M 1} + β R_{L M 2} \\ = [α_{1}, α_{2}, α_{3}] {[R_{X X}, R_{w h i t e}, I]}^{T} - [β_{1}, β_{2}, \dots, β_{P}] {[R_{S S, 1}, R_{S S, 2}, \dots, R_{S S, P}]}^{T} \\ = α_{1} {\tilde{R}}_{X X} + α_{2} R_{w h i t e} + α_{3} I - \sum_{m = 1}^{P} β_{m} R_{S S, m} \end{array}

(10)

where α = [α₁,α₂,α₃] and β = [β₁,β₂,…,β_P] are the vectors of the non-negative shrinkage parameters. R_LM1 = [R_XX, R_white, I]^T, R_LM2 = [R_SS,1,R_SS,2,…,R_SS,P]^T. R_white∈ℂ^M^×M is the basic covariance matrix of white noise and R_SS,m∈ℂ^M^×M is the covariance matrix corresponding to the mth eigenvalue of R_XX, which can be approximatively regarded as the covariance matrix of the mth source signal. The superscripted “^T” indicates the transpose operation. P is the number of the significantly large eigenvalues [22,23].

In Equation (10), R_white is used to reduce the influence of the off-diagonal elements, I is used to realize the diagonal loading and R_SS,m is used to reduce the components of the target source. Among them, R_white and R_SS,m need to be solved. Here, we use the noise subspace related to the smallest eigenvalue to estimate R_white. Thus, R_white and R_SS,m can be estimated as

R_{w h i t e} \approx λ_{M} v_{M} v_{M}^{H}

and

R_{S S, m} = λ_{m} v_{m} v_{m}^{H}

, respectively. Where λ₁ ≥ λ₂ ≥, …, ≥ λ_m ≥, …, ≥ λ_M are the eigenvalues of R_xx, v_M∈ℂ^M^×1 and v_m∈ℂ^M^×1 are the eigenvectors related to λ_M and λ_m, respectively. Hence, the optimization of the IGLC is built as follows:

\min_{α, β} E {{‖ R_{I G L C} - R_{I N} ‖}_{2}^{2}} = \min_{α, β} E {{‖ α_{1} R_{X X} + α_{2} R_{w h i t e} + α_{3} I - \sum_{m = 1}^{P} β_{m} R_{S S, m} - R_{I N} ‖}_{2}^{2}}

(11)

However, the covariance matrix R_IN of the interference-plus-noise is unknown. Here, we use the uncertain set-based method [24,25] to estimate R_IN. The uncertain set-based method uses spatial power (also called Capon power) to estimate the covariance matrix of each sound source. So, the R_IN can be estimated as follows:

{\tilde{R}}_{IN} = R_{XX} - \int_{ϕ_{T}} \frac{a (θ) a^{H} (θ)}{a^{H} (θ) R_{X X}^{- 1} a (θ)} d θ

(12)

where ϕ_T is the uncertain set of the target source and it can be established according to the estimation method of the initial DOA of the target source given in [24]. a(θ)∈ℂ^M^×1 is the steering vector corresponding to angle θ.

3.2.2. Solution of the Non-Negative Shrinkage Parameters

Although

{\tilde{R}}_{I N}

is not accurate enough, it can provide a good guide for the IGLC of Equation (11). Once

{\tilde{R}}_{I N}

is estimated, the vectors α and β can be determined. Substituting Equation (12) into Equation (11), the optimization is rewritten as follows:

\begin{array}{l} \min_{α, β} E {{‖ R_{I G L C} - R_{I N} ‖}_{2}^{2}} = \min_{α, β} E {{‖ α_{1} R_{X X} + α_{2} R_{w h i t e} + α_{3} I - \sum_{m = 1}^{P} β_{m} R_{S S, m} - R_{I N} ‖}_{2}^{2}} \\ = \min_{α, β} E {{‖ α_{2} R_{w h i t e} + α_{3} I - \sum_{m = 1}^{P} β_{m} R_{S S, m} - (1 - α_{1}) R_{I N} + α_{1} (R_{X X} - R_{I N}) ‖}_{2}^{2}} \\ \approx \min_{α, β} 〈 {‖ α_{2} R_{w h i t e} + α_{3} I - \sum_{m = 1}^{P} β_{m} R_{S S, m} - (1 - α_{1}) {\tilde{R}}_{I N} ‖}_{2}^{2} + α_{1}^{2} E {{‖ R_{X X} - {\tilde{R}}_{I N} ‖}_{2}^{2}} 〉 \end{array}

(13)

and:

E {{‖ R_{X X} - {\tilde{R}}_{I N} ‖}_{2}^{2}} \approx \frac{1}{J^{2}} \sum_{j = l + 1 - J}^{J} {‖ x (j) ‖}_{2}^{4} - \frac{1}{J} {‖ {\tilde{R}}_{I N} ‖}_{2}^{2}

(14)

where the elements of α and β are non-negative. Equation (13) is a multivariate quadratic optimization problem and can easily be solved [13,26].

3.3. Backward Correction of the Steering Vector of the Target Source

Once R_IGLC is obtained, the backward correction of the steering vector of the target source can be realized via the IDL-SIM method. Based on Equation (7) and the weighting vector w_IGLC of the IGLC-based MVDR beamformer, the corrected steering vector of the target source can be expressed as follows:

{\tilde{a}}_{0} = μ R_{X X} R_{I G L C}^{- 1} {\hat{a}}_{0}

(15)

where

μ = ({\tilde{a}}_{0}^{H} R_{X X}^{- 1} {\tilde{a}}_{0}) / ({\hat{a}}_{0}^{H} R_{I G L C}^{- 1} {\hat{a}}_{0})

.

To avoid the influence of μ,

{\tilde{a}}_{0}

is normalized as:

{\bar{a}}_{0} = \frac{\sqrt{M} {\tilde{a}}_{0}}{{‖ {\tilde{a}}_{0} ‖}_{2}}

(16)

Equation (16) can then be substituted into Equation (15), i.e., by replacing â₀ with

{\bar{a}}_{0}

, and this procedure can be repeated until the convergence conditions of Equation (9) are satisfied.

3.4. Reconstruction of the Covariance Matrix

Although Equation (16) can obtain a more accurate steering vector of the target source, the error between R_IGLC and R_IN cannot be completely eliminated; that is, the components of the target source may still exist in

{\tilde{R}}_{I N}

. Therefore, a new covariance matrix is reconstructed to reduce the sensitivity of the self-cancellation of the main lobe. Using the reconstructed covariance matrix, we hoped the components of the target source would be reduced or that the components of the interference-plus-noise would be highlighted. Based on the eigenvalue decomposition of R_XX, we define the weights of the target source, interference sources and noise as ρ_target, ρ_interference and ρ_noise, respectively, as shown in Equation (17). Moreover, based on the spatial sparsity of the sound sources, the uncertain sets of sound sources were used to reconstruct the covariance matrix. Hence, the weights of the entire angular space can be coherently expressed as (17):

{\begin{matrix} ρ_{t a r g e t} = λ_{M} \\ \begin{matrix} ρ_{i n t e r f e r e n c e} = λ_{1} \\ ρ_{n o i s e} = \frac{1}{M - P} \sum_{i = P + 1}^{M} λ_{i} \end{matrix} \end{matrix} \Rightarrow ρ (ϑ) = {\begin{matrix} ρ_{t a r g e t}, ϑ \in ϕ_{T} \\ ρ_{i n t e r f e r e n c e}, ϑ \in \cup_{g = 1}^{G} ϕ_{I} \\ ρ_{n o i s e}, o t h e r w i s e \end{matrix}

(17)

where ϑ is the angle varying from 0° to 180° and ϕ_I is the uncertain set of the interference sources.

Thus, the reconstructed covariance matrix R_rec∈ℂ^M^×M can be calculated by the steered spatial power as follows:

R_{r e c} = R_{I G L C} + \int_{0}^{180} ρ (ϑ) a (ϑ) a^{H} (ϑ) d ϑ

(18)

where a(ϑ)∈ℂ^M^×1 is the steering vector related to angle ϑ.

Similarly, in order to reduce the impact caused by the modulus of R_rec, Equation (18) is also normalized as follows:

{\bar{R}}_{r e c} = \frac{⌈ R_{I G L C} ⌉}{⌈ R_{r e c} ⌉} R_{r e c}

(19)

where the symbol “⌈ ⌉” indicates the determinant operation.

3.5. MVDR Beamforming

By solving Equations (16) and (18) simultaneously, the weighting vector of the proposed MVDR beamforming method is calculated as follows:

w_{p r o p o s e d} = \frac{R_{r e c}^{- 1} {\bar{a}}_{0}}{{\bar{a}}_{0}^{H} R_{r e c}^{- 1} {\bar{a}}_{0}}

(20)

Thus, the enhanced speech in the time-domain can be obtained by the beamforming and inverse short-time Fourier transform.

3.6. DOA Deduction through the Spatial Response Power and Iteration

Although the procedures from Section 3.2 to Section 3.4 can effectively improve the performance of the CMB, the effectiveness of Equations (15) and (18) is seriously affected by the initial DOA of the target source. Therefore, spatial response power is used to derive a more accurate DOA of the target source. The spatial response power at the kth frequency bin of the lth frame can be expressed as follows:

ψ (ϑ) = w_{p r o p o s e d}^{H} a (ϑ) a^{H} (ϑ) w_{p r o p o s e d}

(21)

By statistically quantifying the maximum of the spatial response power in each frame, the more accurate DOA of the target source can be derived. In addition, since Equations (15), (18) and (21) are not optimal, the iterative operation is used to optimize the performance of the proposed method. The iterative procedure is given as follows:

Step 1: Calculate the CMOS R_XX through Equation (3);
Step 2: Calculate the loaded covariance matrix R_IGLC through Equation (10);
Step 3: Correct the steering vector of the target source through Equation (15) and normalize it;
Step 4: Reconstruct the covariance matrix R_rec through Equation (16), and normalize it;
Step 5: Calculate the weighting vector w_proposed through Equation (20);
Step 6: Calculate the spatial response power ψ(ϑ) through Equation (21) and derive a new DOA of the target source;
Step 7: Update the DOA of the target source and return to Step 2 to repeat the procedures from Step 2 to Step 6 until the derived DOA of the target source is converged.

The mean γ of the DOA difference between the last three iterations and the current iteration is used for judging whether the proposed algorithm is converged, i.e.,

γ = \frac{1}{4} \sum_{κ = τ - 3}^{τ} Δ ϑ_{κ}

(22)

where Δϑκ is the difference in the derived DOA between the κth iteration and the (κ − 1)th iteration, and τ is the number of the iteration. If γ ≤ 0.1, the proposed algorithm converges. Otherwise, the proposed algorithm does not converge.

4. Simulations and Analysis

4.1. Simulation Setup

In the simulation, the number M of microphones was set to 10, the spacing element d was set to 0.02 m and the acoustic speed was 340 m/s. A TIMIT corpus [27] was used to generate the observed signal through a microphone array signal simulator given in [28]. A total of 200 utterances was randomly selected for the target speech source and 200 other utterances were randomly selected for the interference speech source. The signal-to-noise ratio (SNR) of white noise was set to 0 dB, 5 dB, 10 dB, 15 dB, 20 dB and 25 dB, respectively. The signal-to-interference ratio (SIR) was set to 0 dB. The DOAs of the target source and the interference source were randomly generated from 0° to 180° with a minimum interval of 25°. The sampling rate of the signal was 8 kHz, the frame length was set to 256 samples and a 256-point fast Fourier transform (FFT) was used. The Blackman–Harris window and Hamming window, both with a 50% overlap, were used for signal analysis and signal synthesis, respectively. The length of these two windows encompassed all 256 samples.

The signal-to-interference-plus-noise ratio (SINR) [5], perceptual evaluation of speech quality (PESQ) [29], short-time objective intelligibility (STOI) [30] and speech distortion index (SDI) [2] were used as the evaluation measures. Among them, the SINR indicates the capability of the interference suppression and noise reduction, the PESQ and STOI indicate the speech quality and intelligibility and the SDI indicates the target speech distortion. Hence, the larger the SINR, PESQ and STOI, the better the performance of the algorithm. The smaller the SDI, the better the performance of the algorithm. We compared the proposed method with the CMB of Equation (4), ideal MVDR beamformer (IMB) of Equation (2), HKB, LNR, SMF, BPR, GLC, NRP-TMMSE, PFLI and IDL-SMI.

4.2. Comparison of Spectrograms

Figure 2 shows a comparison of the spectrograms, where the SNR was 10 dB, the number of snapshots was 100, and the real DOAs of the target source and interference source were 70° and 20°, respectively. The initial DOA of the target source was 80°, which had an error of 10°. In Figure 2, the values of the color bar represent the logarithmic amplitude of the spectrum. Figure 2 shows that the IMB effectively recovered the target speech source and suppressed the interference source and noise. This confirmed that the MVDR beamformer can effectively enhance the target source with an accurate steering vector and an effective CMIN. Moreover, Figure 2 also shows that the target speech was seriously distorted by the CMB and BPR methods, a little distorted by the HKB and LNR methods and slightly distorted by the rest of the methods. Meanwhile, the residual components of the interference source severely remained in the SMF, GLC, NRP-TMMSE, PFLI and IDL-SMI methods. This indicates that the proposed method has a better performance than the reference methods except for the IMB.

4.3. Comparison of Beampatterns

Beampatterns were used to clarify the meaning of the spectrogram results in Section 4.2. Figure 3 shows the beampatterns and the values of the color bar represent the beampattern amplitude in dB. Figure 3 shows that the main lobes of the CMB, HKB, LNR, SMF, BPR, GLC and PFLI could not be directed at the real DOA of the target source of 70°, and the main lobe of the CMB and BPR were evidently cancelled by themselves. This explained why the target speech was seriously distorted in these two methods. Moreover, since the main lobes of the HKB, LNR, SMF, GLC and NRP-TMMSE were not directed at the real DOA of the target source, the target speech recovered by them was low-pass filtered. This explained why the target speech was slightly distorted in these five methods. Conversely, in the IMB, IDL-SIM and proposed method, the direction of the main lobe was close to the real DOA of the target source, so the target speech was well recovered.

However, the nulls of the SMF, GLC, NRP-TMMSE, PFLI and IDL-SMI were not effectively formed in the direction of the interference source, which explained why the residual components of the interference sources were maintained. Moreover, the proposed method not only made the main lobe closer to the real DOA of the target source, but also formed a null at the direction of the interference source. Hence, the proposed method has better performance for recovering the target source and suppressing the interference source.

4.4. Comparison of Evaluation Measures

In order to further evaluate the performance and robustness of the proposed method under different levels of noise, Figure 4 gives the evaluation results of the SINR, PESQ, STOI and SDI versus the input SNR (iSNR). In Figure 4, “Noisy” indicates the observed signal of the reference microphone.

In Figure 4a–c, as the iSNR increased, the outputs of the SINR, PESQ and STOI of the IMB, HKB, SMF, BPR, GLC, IDL-SIM, NRP-TMMSE, PFLI, IDL-SIM and proposed method increased as well. However, the SINR, PESQ and STOI of the CMB and LNR showed an increase and decrease procedure. The reason for this is that the problem of the self-cancellation of the main lobe was aggravated with the increase in the iSNR. Moreover, the proposed method obtained better results for the SINR, PESQ and STOI, except for the IMB. This indicates that the proposed method effectively improved the performance of the CMB under different iSNRs. In addition, Figure 4d shows the results of the SDIs. We can find that as the iSNR increased, the SDIs of the CMB, HKB, LNR and BPR showed decreases. The reason for this is that the problem of the self-cancellation of the main lobe is aggravated with the increase in the iSNR. Unlike the SDIs of the rest of the methods, which were more stable, the IMB used the real DOA of the target source, the SMF, GLC, PFLI and NRP-TMMSE were degenerated into the fixed beamformer, and the IDL-SIM and the proposed method corrected the steering vector of the target source backward. In addition, the proposed method obtained a lower SDI except for the IMB. This indicates that the proposed method has a stronger robustness under different levels of noise.

4.5. Verification of the DOA Deduction through the Spatial Response Power

In this section, the effectiveness of the DOA deduction by the spatial response power is verified. Figure 5 shows the histograms and the errors of the DOA of the target source varying with two kinds of errors of the initial DOA of the target source, i.e., 5° and 10°. The other simulation settings were the same as in Section 4.2.

Case 1: Figure 5a shows the statistical histogram of the maximum value distribution of the spatial response power after the first iteration. We can find that the maximum value is located at 71.1°, rather than at 75° of the initial DOA of the target source. This confirms the effectiveness of the procedures of the backward correction of the steering vector of the target source and the DOA deduction through the spatial response power. Furthermore, Figure 5b shows that as the number of iterations increases, the error of the derived DOA of the target source gradually decreases; this indicates that the iteration of the DOA deduction can effectively reduce the error of the estimated DOA of the target source.

Case 2: Figure 5c shows that an error of 10° can also be effectively reduced to 2.1° after the first iteration. Similarly, Figure 5d also shows that as the number of iterations increases, the error of the derived DOA of the target source gradually decreases. This also indicates that the proposed method is applicable to the case that the DOA of the target source is seriously mismatched.

4.6. Performance Analysis of the Optimization Modules

In order to analyze the performance of the three modules of the proposed method, we measured the performance of the IGLC-based MVDR beamforming method (labeled as IGLC), the Module 1-based MVDR beamforming method (labeled as Module 1), the Module 1 plus Module 2-based MVDR beamforming method (labeled as Modules 1 + 2), and the Module 1 plus Module 2 plus Module 3-based MVDR beamforming method (labeled as Modules 1 + 2 + 3). Figure 6 gives the evaluation results of the SINR, PESQ, STOI and SDI versus the iSNR.

The results of Figure 6 all show that the performance of the IGLC-based MVDR beamforming method is better than the IDL-SIM method, which confirms the effectiveness of the full loading-based IGLC method. Meanwhile, Figure 6 shows that the Module 1-based MVDR beamforming method is better than the IGLC-based MVDR beamforming method, which indicates that the backward correction of the steering vector of the target source based on the IGCL can effectively improve the performance of the CMB. Moreover, based on Module 1, when Module 2 was used for further optimization, the results of the SINR, PESQ, STOI and SDI improved, which confirms the effectiveness of the reconstruction of the covariance matrix based on steered response power. In addition, based on Module 1 and Module 2, when Module 3 was used, the results of the SINR, PESQ, STOI and SDI were further improved, which confirms the effectiveness of the derivation of the DOA of the target source based on the broadband spatial response power. Meanwhile, the gradual improvement results of Figure 6 also indicate the necessity of these three modules in the proposed method.

5. Conclusions

This paper presented a full loading-based MVDR beamforming method by the backward correction of the steering vector of the target source and the reconstruction of the covariance matrix. In this method, based on the principle of diagonal loading, a full loading matrix was constructed, which improved the loaded covariance matrix approach to the CMIN by correcting the off-diagonal elements and eliminating the components of the target source. To reduce the mismatch between the steering vector and the target source, the weighting vectors of the CMB and the full loading-based MVDR beamformer were used to correct the steering vector of the target source backward. Furthermore, based on the uncertain set and the eigenvalue decomposition, a steered covariance matrix was built to further suppress the interference source. Moreover, spatial response power was used to derive a more accurate DOA of the target source which is helpful for obtaining a more accurate steering vector of the target source and a more effective covariance matrix iteratively. We used a TIMIT corpus to verify the proposed method. The results showed that the proposed method effectively improved the performance of the CMB.

Author Contributions

Conceptualization, C.B. and J.Z.; methodology, C.B. and J.Z.; software, J.Z.; validation, C.B. and J.Z.; formal analysis, C.B. and J.Z.; investigation, J.Z.; resources, J.Z.; data curation, J.Z.; writing—original draft preparation, C.B. and J.Z.; writing—review and editing, C.B. and J.Z; visualization, J.Z.; supervision, C.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (Grant No. 61831019).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are grateful to the thorough reviewers.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zoltowski, M. On the performance analysis of the MVDR beamformer in the presence of correlated interference. IEEE Trans. Acoust. Speech Signal Process. 1988, 36, 945–947. [Google Scholar] [CrossRef] [Green Version]
Benesty, J.; Chen, J.; Huang, Y. Microphone Array Signal Processing; Springer-Verlag: Berlin, Germany, 2008; pp. 52–54. [Google Scholar]
Souden, M.; Benesty, J.; Affes, S. A Study of the LCMV and MVDR Noise Reduction Filters. IEEE Trans. Signal Process. 2010, 58, 4925–4935. [Google Scholar] [CrossRef]
Vorobyov, S.A. Principles of minimum variance robust adaptive beamforming design. Signal Process. 2013, 93, 3264–3277. [Google Scholar] [CrossRef]
Pan, C.; Chen, J.; Benesty, J. Performance Study of the MVDR Beamformer as a Function of the Source Incidence Angle. IEEE/ACM Trans. Audio Speech Lang. Process. 2013, 22, 67–79. [Google Scholar] [CrossRef]
Gu, Y.; Leshem, A. Robust Adaptive Beamforming Based on Interference Covariance Matrix Reconstruction and Steering Vector Estimation. IEEE Trans. Signal Process. 2012, 60, 3881–3885. [Google Scholar] [CrossRef]
Zhao, Y.; Jensen, J.; Jensen, T.; Chen, J.; Christensen, M. Experimental Study of Robust Acoustic Beamforming for Speech Acquisition in Reverberant and Noisy Environments. Appl. Acoust. 2020, 170, 107531. [Google Scholar] [CrossRef]
Li, J.; Stoica, P.; Wang, Z. On robust Capon beamforming and diagonal loading. IEEE Trans. Signal Process. 2003, 51, 1702–1715. [Google Scholar] [CrossRef] [Green Version]
Hoerl, A.E.; Kannard, R.W.; Baldwin, K.F. Ridge Regression: Some Simulations. Communications in Statistics-Theory and Methods; Marcel Dekker Inc.: New York, NY, USA, 1975; Volume 4, pp. 105–123. [Google Scholar]
Carlson, B. Covariance matrix estimation errors and diagonal loading in adaptive arrays. IEEE Trans. Aerosp. Electron. Syst. 1988, 24, 397–401. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, A.; Yang, Q. Robust Adaptive Beamforming Based on Conjugate Gradient Algorithms. IEEE Trans. Signal Process. 2016, 64, 6046–6057. [Google Scholar] [CrossRef]
Mahadi, M.; Ballal, T.; Moinuddin, M.; Al-Naffouri, T.; Al-Saggaf, U. A Robust LCMP Beamformer with Limited Snapshots. In Proceedings of the 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021; pp. 1831–1835. [Google Scholar]
Du, L.; Li, J.; Stoica, P. Fully Automatic Computation of Diagonal Loading Levels for Robust Adaptive Beamforming. IEEE Trans. Aerosp. Electron. Syst. 2010, 46, 449–458. [Google Scholar] [CrossRef]
Ke, Y.; Zheng, C.; Peng, R.; Li, X. Robust Adaptive Beamforming Using Noise Reduction Preprocessing-Based Fully Automatic Diagonal Loading and Steering Vector Estimation. IEEE Access 2017, 5, 12974–12987. [Google Scholar] [CrossRef]
Wang, X.; Liu, W.; Jin, M.; Ding, S. Parameter-Free Landweber Iteration Method for Robust Adaptive Beamforming. Circuits Syst. Signal Process. 2019, 39, 2716–2729. [Google Scholar] [CrossRef]
Jin, W.; Jia, W.; Yao, M. Iterative Diagonally Loaded Sample Matrix Inverse Robust Adaptive Beamforming. J. Electron. Inform. Technol. 2012, 34, 1120–1125. [Google Scholar]
Huang, Y.; Zhou, M.; Vorobyov, S.A. New Designs on MVDR Robust Adaptive Beamforming Based on Optimal Steering Vector Estimation. IEEE Trans. Signal Process. 2019, 67, 3624–3638. [Google Scholar] [CrossRef] [Green Version]
Zhuang, J.; Ye, Q.; Tan, Q.; Ali, A.H. Low-complexity variable loading for robust adaptive beamforming. Electron. Lett. 2016, 52, 338–340. [Google Scholar] [CrossRef]
Zhang, M.; Chen, X.; Zhang, A. A simple tridiagonal loading method for robust adaptive beamforming. Signal Process. 2018, 157, 103–107. [Google Scholar] [CrossRef]
Muhammad, M.; Li, M.; Abbasi, Q.H.; Goh, C.; Imran, M.A. Adaptive Diagonal Loading Technique to Improve Direction of Arrival Estimation Accuracy for Linear Antenna Array Sensors. IEEE Sensors J. 2022, 22, 10986–10994. [Google Scholar] [CrossRef]
Chen, P.; Gao, J.; Wang, W. Linear Prediction-Based Covariance Matrix Reconstruction for Robust Adaptive Beamforming. IEEE Signal Process. Lett. 2021, 28, 1848–1852. [Google Scholar] [CrossRef]
Pezeshki, A.; Van Veen, B.D.; Scharf, L.L.; Cox, H.; Nordenvaad, M.L. Eigenvalue Beamforming Using a Multirank MVDR Beamformer and Subspace Selection. IEEE Trans. Signal Process. 2008, 56, 1954–1967. [Google Scholar] [CrossRef] [Green Version]
Zhou, M.; Ma, X.; Shen, P.; Sheng, W. Weighted Subspace-Constrained Adaptive Beamforming for Sidelobe Control. IEEE Commun. Lett. 2019, 23, 458–461. [Google Scholar] [CrossRef]
Feng, Y.; Liao, G.; Xu, J.; Zhu, S.; Zeng, C. Robust adaptive beamforming against large steering vector mismatch using multiple uncertainty sets. Signal Process. 2018, 152, 320–330. [Google Scholar] [CrossRef]
Zhang, P.; Yang, Z.; Liao, G.; Jing, G.; Ma, T. An RCB-Like Steering Vector Estimation Method Based on Interference Matrix Reduction. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 636–646. [Google Scholar] [CrossRef]
Zhang, S.; Huang, Y. Complex Quadratic Optimization and Semidefinite Programming. SIAM J. Optim. 2006, 16, 871–890. [Google Scholar] [CrossRef] [Green Version]
Garofolo, J.S.; Lamel, L.; Fisher, W.M.; Fiscus, J.G.; Pallett, D.S. DARPA TIMIT Acoustic-Phonetic Continous Speech Corpus CD-ROM. In NIST Speech disc 1-1.1. NASA STI/Recon Technical Report N 93, 27403; NASA: Washington, DC, USA, 1993. [Google Scholar]
Cheng, R.; Bao, C.; Cui, Z. MASS: Microphone Array Speech Simulator in Room Acoustic Environment for Multi-Channel Speech Coding and Enhancement. Appl. Sci. 2020, 10, 1484. [Google Scholar] [CrossRef] [Green Version]
Rix, A.W.; Beerends, J.G.; Hollier, M.P.; Hekstra, A.P. Perceptual evaluation of speech quality (PESQ)-A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, UT, USA, 7–11 May 2001; pp. 749–752. [Google Scholar] [CrossRef]
Taal, C.H.; Hendriks, R.C.; Heusdens, R.; Jensen, J. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech. IEEE Trans. Audio Speech Lang. Process. 2011, 19, 2125–2136. [Google Scholar] [CrossRef]

Figure 1. The framework of the proposed method.

Figure 2. A comparison of spectrograms. The number of snapshots was 100 and the SNR was 10 dB.

Figure 3. A comparison of beampatterns. The number of snapshots was 100 and the SNR was 10 dB.

Figure 4. The results of the evaluation measures under different iSNRs: (a) SINR; (b) PESQ; (c) STOI; (d) SDI.

Figure 5. Histograms and DOA errors of the target source. (a) Histogram and (b) DOA error of the target source when the error of the initial DOA was 5°; (c) histogram and (d) DOA error of the target source when the error of the initial DOA was 10°.

Figure 6. The evaluation results of the optimization modules under different iSNRs: (a) SINR; (b) PESQ; (c) STOI; (d) SDI.

Table 1. Six Diagonal Loading Methods of the CMB.

Methods	Diagonal Loading	Methods	Diagonal Loading
HKB	$\frac{(M - 1) {(A \hat{η} - b)}^{2}}{{‖ \hat{η} ‖}_{2}^{2}}$	LNR	$1 0^{\frac{ζ_{L N R}}{10}} σ_{w}^{2}$
SMF	${\bar{a}}_{0}^{H} R_{X X} {\bar{a}}_{0}$	BPR	$\frac{δ {‖ A \hat{η} - b ‖}_{2}}{{‖ \hat{η} ‖}_{2}}$
GLC	$\underset{α, β}{m i n} {{‖ α R_{X X} + β I - R_{E X P} ‖}_{2}^{2}}$	NRP-TMMSE	$\underset{α, β}{m i n} {{‖ α R_{X X} + β I - R_{T} ‖}_{2}^{2}}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, J.; Bao, C. A Full Loading-Based MVDR Beamforming Method by Backward Correction of the Steering Vector and Reconstruction of the Covariance Matrix. Appl. Sci. 2023, 13, 285. https://doi.org/10.3390/app13010285

AMA Style

Zhou J, Bao C. A Full Loading-Based MVDR Beamforming Method by Backward Correction of the Steering Vector and Reconstruction of the Covariance Matrix. Applied Sciences. 2023; 13(1):285. https://doi.org/10.3390/app13010285

Chicago/Turabian Style

Zhou, Jing, and Changchun Bao. 2023. "A Full Loading-Based MVDR Beamforming Method by Backward Correction of the Steering Vector and Reconstruction of the Covariance Matrix" Applied Sciences 13, no. 1: 285. https://doi.org/10.3390/app13010285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Full Loading-Based MVDR Beamforming Method by Backward Correction of the Steering Vector and Reconstruction of the Covariance Matrix

Abstract

1. Introduction

2. Conventional Diagonal Loading Methods

3. The Proposed Method

3.1. Framework of the Proposed Method

3.2. The Improved GLC Method

3.2.1. Full Loading of the Covariance Matrix

3.2.2. Solution of the Non-Negative Shrinkage Parameters

3.3. Backward Correction of the Steering Vector of the Target Source

3.4. Reconstruction of the Covariance Matrix

3.5. MVDR Beamforming

3.6. DOA Deduction through the Spatial Response Power and Iteration

4. Simulations and Analysis

4.1. Simulation Setup

4.2. Comparison of Spectrograms

4.3. Comparison of Beampatterns

4.4. Comparison of Evaluation Measures

4.5. Verification of the DOA Deduction through the Spatial Response Power

4.6. Performance Analysis of the Optimization Modules

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI