1. Introduction
In recent years, the beamforming technology of a spherical microphone array has emerged as a significant research area in applications involving three-dimensional sound field reception, indoor acoustic sound field analysis, direction of arrival (DOA) estimation, and noise control. Compared to classical linear arrays, rectangular arrays, and circular arrays, the spherical array offers ease in spatial filtering or beamforming. It can be effectively designed to enhance target sources in arbitrary directions and leverages the elegant mathematical framework of spherical harmonic transformation for array processing [
1]. In practical scenarios, oversampling is often employed by spherical microphone arrays to obtain samples. By utilizing the spherical Fourier transform technique, these samples can be transferred to the spherical harmonic domain for more computationally efficient processing compared to the spatial domain. Additionally, taking advantage of decoupling between frequency components and angle components in the spherical harmonic domain allows the convenient design of wideband beamformers [
2].
In this digital age, the importance of acoustic signal processing is becoming increasingly prominent. Microphone array signal processing is an indispensable technology in acoustic signal processing, which can be used in various fields such as speech recognition, human–computer interaction, and smart speakers. Therefore, this article aims to explore the application of advanced technologies such as microphone array beamforming to meet the growing communication and perception needs of modern times. It is worth mentioning that there are some interesting research results worth noting recently, such as Dong et al. [
3] proposed an efficient source localization method and applied it to mining engineering, S. Cantero-Chinchilla et al. [
4] applied beamforming technology to damage localization, and Allegro, G [
5] designed and implemented a novel acoustic system that uses only low-cost off-the-shelf hardware and transmits a single appropriately designed signal in an inaudible frequency range to perform integrated perception and communication.
When random interference signals impinge on the receiving array, the signal processing system typically employs the adaptive null steering algorithm in the preprocessing stage to mitigate these interferences. However, the convergence speed and effectiveness of this algorithm often fall short, resulting in significant performance degradation. Therefore, designing a beamformer capable of effectively suppressing dynamic interference from sidelobe regions remains an active research area.
With the spherical harmonic expansion and orthogonality of spherical functions, we can calculate the array output in the spherical harmonic domain. Performing calculations in this domain has distinct advantages as we only need to adjust the array modal strength to model directional vectors for various array configurations. Rafaely [
6] utilized the delay caused by a single plane wave as beamforming weights and successfully designed a delay-and-sum beamformer in the spherical harmonic domain. This beamformer exhibits high robustness, but performs poorly in terms of directionality at low frequencies. The conventional beamformer proposed by Li and Duraswami [
7], which features constant array weights, has been widely applied in the field of plane wave decomposition [
8]. Rafaely introduced a beamformer with maximum white noise gain in [
9], which is equivalent to a delay-and-sum beamformer in free-field environments, which further proves the reason why delay-and-sum beamformers are widely used, as they possess reliable and robust characteristics. YuKang Liu proposed a superdirective beamformer in [
10] that achieves maximum directional gain. However, achieving a high directional index may come at the expense of robustness. The aforementioned methods do not strive to strike a balance between these two aspects. For this purpose, various design methods for beamformers with mixed objectives have been proposed. For instance, Rafaely [
9] presents a design method for beamformers with mixed objectives, which achieves a natural balance between directionality and white noise gain [
11]. Meyer and Elko [
12], presented array weight optimization methods to find the balance between beamforming directivity and robustness, which is useful in practical applications. However, these methods lack the capability to exert control over the sidelobe level of the beam pattern. Rafaely et al. also employed the classic Dolph–Chebyshev beam mode design method (DolphChebyshev) [
13] in the spherical harmonic domain to address this issue; however, this approach neglects consideration of white noise gain control, resulting in reduced robustness of low-frequency designed beamformers. Although Shefeng Yan and U. Peter Svensson et al. simultaneously considered multiple conflicting performance indicators, the weight vector design problem of the beamformer in the spherical harmonic domain was formulated as a multi-constraint problem to control various performance indicators such as sidelobe level (SOCP) [
14,
15]. However, this approach primarily optimizes a single target, limiting its ability to achieve overall optimality. Additionally, determining appropriate constraint values poses challenges for this method and requires advanced theoretical knowledge and engineering experience from users.
Table 1 summarizes the advantages and disadvantages of the two aforementioned beamforming design methods capable of controlling sidelobe levels. While the studies mentioned above considered only symmetrical beampatterns, Rafaely [
16] extended the beampattern design methods to non-symmetric cases for a spherical microphone array. The approach has been devised for both the spatial and spherical harmonics domains, utilizing a multiple null-steering method. This method creates notches in the beampattern and directs them towards interferences originating from known external beam directions, with the aim of improving the signal-to-noise ratio. Metaheuristic algorithms are widely used to solve the problem of high sidelobe levels in collaborative beamforming, as derivative-based optimization techniques often become stuck in local optima, and exhaustive search algorithms can be time-consuming [
17]. In references [
18,
19], particle swarm optimization (PSO) algorithm and genetic algorithm (GA) are, respectively, applied to solve the problem of beamforming pattern optimization. Suhanya Jayaprakasam proposed a beam mode optimization method based on multi-objective NSGA [
20]. This method effectively balances trade-offs between conflicting indicators and facilitates optimal beamformer design. Moreover, it eliminates the need for manual setting of constraint parameters in traditional methods, making it more user-friendly to implement. Overall, this approach significantly improves sidelobe suppression and directivity.
To date, the multi-objective optimization method for beam mode in the spherical harmonic domain has received limited attention. Therefore, this study fully exploits the advantages of beam design in the spherical harmonic domain and proposes a wideband beamformer design approach based on spherical harmonic domain-assisted NSGA-II [
21], building upon existing literature research. The proposed method formulates the optimization problem of beam mode in the spherical harmonic domain as a constrained multi-objective optimization problem and employs the NSGA-II algorithm with constraint processing technology [
22] to solve it. We also achieved dynamic control of the optimization range of beam weights by utilizing the positive-definite property of the expressions for white noise gain and directivity index. Our approach for beamformer design in the spherical harmonic domain is different from traditional methods in that it simultaneously optimizes three performance indicators: white noise gain, directional index, and maximum sidelobe level. As a result, this method provides superior overall performance for the designed beamformers. Furthermore, our proposed method requires only the pre-setting of the lowest thresholds for the white noise gain and directional index, respectively, to determine the range of optimized beamforming weights. Consequently, a series of optimal sets of beamforming weights can be obtained. So, dynamic weight selection is offered based on diverse application requirements.
The remaining sections of this paper are structured as follows:
Section 2 provides an introduction to the background of the spherical Fourier transform and beamformers in the spherical harmonic domain.
Section 3 presents a discussion on formulating the optimization problem for beam pattern in the spherical harmonic domain as a constrained multi-objective optimization problem, along with an algorithm employed for its solution. In
Section 4 and
Section 5, simulations and real-world experiments are conducted to validate the proposed method’s performance, respectively. Finally,
Section 6 concludes this paper.
2. Background
The present study adopts the conventional Cartesian coordinate system
and the spherical coordinate system
, where the elevation angle
and azimuth angle
are measured in radians from the positive
z-axis and positive
x-axis, respectively. Considering a unit amplitude plane wave arriving from direction
with a wavenumber
, impinging on a spherical array with a radius
and
microphones mounted on its surface; the sound field of the plane wave at a point
on the surface of the sphere can be expressed as follows [
23,
24]:
where
represents the spherical harmonic function of order
and degree
; * denotes the complex conjugate,
denotes the wavenumber relative to the speed of sound
, and
signifies the mode strength of the spherical array, which is contingent upon the array configuration. The commonly employed array configurations include open and rigid spherical arrays, with their corresponding mode strengths determined by equation [
24].
where
is an imaginary unit;
and
are the nth-order spherical Bessel and Hankel functions, respectively;
and
are their derivatives with respect to their arguments, respectively. The spherical harmonics, which serve as solutions to the Helmholtz equation, are defined as follows [
25]:
where
represents the associated Legendre functions. The spherical harmonics are a set of standard orthogonal functions that satisfy the following properties:
where
and
are Kronecker delta functions.
The spherical Fourier transform of a square integrable function p on the unit sphere, denoted as
, and its corresponding inverse transform can be expressed as [
26]
The application of the spherical Fourier transform (5) to a plane wave, as represented by Equation (1), yields the expression in the spherical harmonic domain for
as follows:
Note that for simplicity, is sometimes also written as .
If we denote the aperture weighting function by
, the array output is given as the integral of the product between the array input signal and the complex conjugated weighting function over the entire sphere [
2]:
where
denotes the spherical Fourier transform coefficients of
.
In practice, the sound pressure is spatially sampled at microphone positions
, where
. The positioning of the microphones must adhere to the following discrete orthogonality condition:
where
is a real number determined by the sampling scheme, and for near uniform sampling, we have
.
To avoid spatial aliasing and achieve accurate sound field reconstruction, the number of microphones must satisfy
, and the reconstruction order must satisfy
. A further analysis of the aliasing error in spherical sampling can be found in [
27].
The discrete spherical Fourier transform of
and the inverse transform are given by
Rafaely introduced multiple spatial sampling schemes in [
2]. For the sake of simplification, we assume in this paper that the microphones are uniformly distributed on the surface of the sphere.
The corresponding array output becomes
where
denotes the array weights and
denotes their spherical Fourier coefficients.
Meyer and Elko proposed a beamforming weight expression in the spherical harmonic domain, which yields beampatterns that are axisymmetric when viewed from an axis of symmetry [
27] and is given by the following expression:
where
is a new real-valued beamforming weight and
represents the viewing direction of the array. Substituting Equations (7) and (13) into Equation (12) yields the simplified array output as
where
is the angle between
and
. The derivation of the above equation uses the addition theorem of spherical harmonics [
28], which is shown below:
Equation (14) can be written in the following matrix form:
where
The weights now govern the response of the array’s beam pattern to unit-amplitude plane waves, and the array output solely relies on the incident direction of the plane wave relative to the array’s pointing direction. Consequently, it exhibits axial symmetry around the array’s pointing direction and can be conveniently rotated to other directions. Furthermore, by incorporating the term in Equation (13), it eliminates frequency-dependent components of the spherical harmonic-domain wave field in Equation (14). Thus, a set of array weights enables achieving a frequency-independent beam pattern, significantly simplifying broadband beamformer design process.
3. Method
The proposed metaheuristic multi-objective beamforming optimization method based on NSGA-II is presented in this section. Firstly, the formulation of the beamforming optimization problem as a multi-objective optimization problem is demonstrated. Secondly, the design concept and specific implementation details of the metaheuristic algorithm are provided.
3.1. Multi-Objective Beamforming Design Model
The optimization objective of this paper is to select in order to generate an optimal beamforming with a low sidelobe level, a high directional index, and a high white noise gain while maintaining an undistorted response to the array viewing direction. Subsequently, we elucidate how to formulate this objective as a constrained multi-objective optimization problem.
First, three crucial measures pertaining to array performance are presented in conjunction with the simplified Expression (16) for array output. The first measure is the white noise gain, which quantifies the improvement in the signal-to-noise ratio (SNR) at the array output compared to that at the input; a higher white noise gain indicates greater robustness of the beamformer. The formula for calculating the white noise gain is provided by [
9]
where
refers to the diagonalization operation.
The second measure is the directivity index, which is defined as the ratio between the peak and average values of the squared beam pattern; a larger directivity factor indicates an enhanced directional response for the array. This expression is also given by [
9]
where
The third measure is the sidelobe level. In traditional beam optimization using convex optimization [
14,
15], the sidelobe region is discretized based on continuity principles. Subsequently, a constraint is imposed on the amplitude of the sidelobe level at each discretized point to control the performance of the beamformer in terms of sidelobe levels. In this study, we adopt the maximum value of sidelobe level (MSL) within the sidelobe region as our third measure, which can be mathematically formulated as follows:
where
denotes the sidelobe region, and
represents the total number of discrete points within this region after discretization.
Then, combined with Equation (16), the distortionless response constraint can be formulated as follows:
The beamforming weight range determination process is finally presented. Initially, minimum thresholds for the directional factor (
) and the white noise gain (
) are established. These thresholds are then combined with Equation (26), resulting in the following expressions:
Simplifying the above equations, we have
and
The range of the beamforming weights, subject to constraints on white noise gain and directivity index, is ultimately determined as follows:
In the above equation, the vectors and represent the lower and upper bounds of the beamforming weights, respectively, denotes the minimum value operation, and by adjusting the values of and , the range of the beamforming weights can be controlled.
The beamforming optimization problem in the spherical harmonic domain can now be formulated as a multi-objective optimization problem, as presented below:
Although the constrained multi-objective optimization problem described above does not possess a closed-form solution, it can be effectively addressed through the utilization of intelligent optimization algorithms. In this study, we employ the NSGA-II algorithm with constraint handling to tackle this problem. For ease of exposition, we denote as , as , and as in subsequent discussions. Meanwhile, we define the deviation function . Please note the difference between this and the bolded mentioned earlier, as they represent different meanings.
3.2. Ideas and Implementation Details of the Metaheuristic Algorithm
The Nondominated Sorting Genetic Algorithm II (NSGA-II) is a multi-objective optimization algorithm that improves and optimizes the NSGA [
29]. It comprises two main components: nondominated sorting and crowding distance calculation. Nondominated sorting is a ranking method used to distinguish different levels of the Pareto front for individuals in the population, while crowding distance calculation is a technique used to ensure that the Pareto front is evenly distributed. NSGA-II has several advantages, including its ability to simultaneously handle multiple objective functions, finding a set of optimal solutions in the Pareto front, and its fast convergence speed and high efficiency.
However, it should be noted that the original NSGA-II algorithm is only suitable for ordinary multi-objective optimization problems where objective functions typically have no constraints and can be directly calculated to determine their respective Pareto fronts. As stated in
Section 3.1, the multi-objective beamforming optimization problem we proposed has an equality constraint. Therefore, we introduce an adaptive penalty function and the distance measurement constraint handling technique proposed by Yonas Gebre Woldesenbet et al. [
22] to address the constraint handling problem in multi-objective evolutionary algorithms. In this technique, the penalty function and distance measurement are dynamically adjusted based on the individual objective function values and constraint violation degrees. By modifying the objective function, this technique can find the optimal feasible and infeasible solutions during nondominated sorting. This approach is simple, easy to use, does not require any parameter tuning, and has shown good performance in experiments.
The specific steps of the algorithm proposed in this paper are shown as follows.
Determine the sidelobe range , discrete sampling method and sampling points ; the configuration of the spherical array; the wavenumber ; the beamforming order ; the white noise gain threshold ; the directional factor threshold ; and the NSGA-II related parameters. By setting large and values, the range of the optimization variable can be narrowed to make the algorithm converge faster to the optimal solution.
Calculate the lower bound and upper bound of the beamforming weights to be optimized using Equations (31) and (32).
Initialize the population and generate the initial set of individuals.
where , represents the population size and represents the individual in the population.
Calculate the objective function value and constraint deviation value for each individual in the population based on Equation (33), where .
Apply constraint handling techniques [
22] to each individual to calculate the distance measure
and penalty function
in each objective function dimension. The specific calculation process is described in detail in reference [
22]. The modified objective function value in the
objective function dimension is the sum of the penalty function and distance measure
Perform nondominated sorting [
21] on the current population based on the modified objective function value.
Assign fitness to individuals based on their Pareto ranking and crowding distance.
Use tournament selection to select parents.
Generate
offspring solutions through simulated binary crossover and polynomial mutation operations [
30].
Combine the parent and offspring populations into a set of individuals, perform nondominated sorting on this set, and select individuals based on their fitness to form the new generation population.
Continue executing steps 4–10 until the maximum number of generations has been reached.
The relevant parameters of the NSGA-II algorithm in Step 1 mainly include population size, number of generations, crossover probability, mutation probability, and selection method. For population size, it is recommended to choose between 100–300 individuals. A smaller population size may result in premature convergence, while a larger population size may increase computational time. For the number of generations, it is suggested to choose a value between 1000–3000 based on the complexity of the problem and available computational resources, and adjust it based on the convergence behavior. For low-frequency problems, larger values for these two parameters can be set, while smaller values are recommended for high-frequency problems to reduce computational time. A crossover probability of 0.8 and a mutation probability of 0.1 are recommended, as higher values promote exploration and lower values promote exploitation. The optimal balance can be found through experimentation. For selection method, tournament selection is recommended, as it is a commonly used method in NSGA-II.
Figure 1 shows the workflow diagram of the proposed algorithm.
4. Results and Discussion
This study uses both the NSGA-II algorithm with constraint handling and the multi-objective particle swarm optimization [
31] (MOPSO) algorithm with constraint handling to address the beam pattern optimization problem in the spherical harmonic domain. A comparative analysis is conducted between the obtained solution and those achieved by conventional optimization algorithms capable of controlling sidelobes, namely, the Dolph–Chebyshev beampattern design method (DolphChebyshev) [
13] and the optimal minimum sidelobe beamforming method in the spherical harmonic domain (SOCP) [
15]. When using the SOCP method, our goal is to maximize the directionality of the array while satisfying the preset white noise gain constraint, distortion-free response constraint and sidelobe level constraint. We determine the values of these constraint parameters using the optimization results of the proposed algorithm as a guiding principle.
The output of the algorithm proposed in this paper is a set of feasible optimal solutions. Therefore, for subsequent simulations and measurements, we select the solution from this set that has the minimum Euclidean distance to the utopia point
. The utopia point is defined as follows:
where
represents the value of solution
with respect to the objective function
, and so on;
represents the set of solutions output by the proposed algorithm. The Euclidean distance between solution
and the utopian point
is defined as follows:
All simulations and results were derived with the following parameter settings: Rigid spheres equipped with 32 and 36 microphones, uniformly distributed across their surfaces, are employed for third-order and fourth-order beamforming in the spherical harmonic domain, respectively. For NSGA-II, the crossover index is set to , the mutation index is set to , the mutation probability is set to 0.2, the number of generations is set to 3000, and the population size is set to 400. For the MOPSO algorithm, both the population and repository size are set to 200. The inertia weight, personal learning coefficient, global learning coefficient, and mutation rate are set to 0.5, 1, 2, and 0.4, respectively. The array’s viewing direction is , and the sidelobe region is uniformly sampled at intervals of . The WNG and DF thresholds for the proposed algorithm are set to .
Sample Results of the Optimization Process
Firstly, considering a third-order rigid sphere array,
Figure 2a,b illustrate the beampatterns obtained using different beamforming design methods at low frequency (
) and high frequency (
), respectively. At a low frequency, for the Dolph–Chebyshev method, the main-lobe width is set to 70°; for the SOCP method, the minimum WNG constraint is set to 5 and the maximum sidelobe level constraint is set to −10 dB; additionally, the sidelobe region
is defined. Similarly, at a high frequency using the Dolph–Chebyshev method with a main-lobe width of 60°; in SOCP method with minimum WNG constraint of 10 and maximum sidelobe level constraint of −10 dB; also defining sidelobe region as
.
Table 2 provides a comparison of these beampatterns’ main features where optimal DI, WNG, and MSL values are highlighted in bold for each sample within both frequencies. The Pareto optimal front solutions for the sample distribution depicted in
Figure 2 are illustrated in
Figure 3. The figure reveals that these Pareto optimal solutions are predominantly distributed along a curve, thereby indirectly validating the effectiveness of the proposed methodology and furnishing a dependable basis for dynamically selecting beamforming weights based on application requirements.
The results depicted in
Figure 2 and
Table 2 demonstrate that at higher frequencies, it is possible to achieve a beamformer with a narrower main lobe width while simultaneously maintaining or even enhancing other performance indicators compared to lower frequencies. At high frequency, the proposed method achieves a DI value decrease of only 0.0391 dB compared to the SOCP method, while increasing the WNG and MSL values by 0.676 dB and 2.983 dB, respectively. Similarly, at low frequency, the proposed method achieves only slight decreases in DI and WNG values (0.029 dB and 0.0193 dB, respectively), but increases the MSL value by 1.8437 dB compared to the SOCP method. Overall, the proposed method significantly improves the maximum sidelobe level compared to the SOCP method while almost maintaining other performance indicators at both high and low frequencies. Additionally, the proposed method does not require a complex constraint parameter tuning process compared to the SOCP method. Although the computational complexity of the proposed algorithm is high, these complex calculations can be completed offline, so there is no impact on the practical application of the algorithm. When using NSGA-II and MOPSO algorithms as optimization algorithms for beamforming, the former performs better in terms of white noise gain at low frequency, while the latter performs better in terms of maximum sidelobe level. Meanwhile, at high frequency, the MOPSO algorithm achieves improvements in WNG and MSL values at the expense of a wider main lobe width and a smaller DI value. Therefore, we cannot determine which optimization algorithm is better overall. However, later on, we will see that when the beamforming order is four, the NSGA-II algorithm outperforms the MOPSO algorithm as the optimization algorithm.
However, it should be noted that the performance enhancement achieved by the Dolph–Chebyshev method comes at the expense of a wider main lobe width; specifically, the former beampatterns exhibit a main lobe width of 60° whereas the latter has a narrower width of only 56°. Similarly, at low frequency, although the beampatterns obtained through the Dolph–Chebyshev method demonstrate superior DI and MSL values among all three methods considered here, this advantage is accompanied by compromised WNG performance. In particular, at low frequency, the beampatterns obtained by the Dolph–Chebyshev method have a −0.6950 dB WNG value, resulting in very poor noise robustness of the array [
15].
Figure 4 shows the beampatterns corresponding to the input signal with a SNR of 15 dB at low frequency. It can be observed from the figure that the beampatterns obtained by the Dolph–Chebyshev method is severely degraded, while the beampatterns corresponding to the other two algorithms well retains the original form.
Finally, as shown in Equation (16), the frequency-dependent component has been removed in advance, so only a set of array weights is required to achieve a frequency-independent beampattern. This is one of the main advantages of designing a broadband beamformer in the spherical harmonic domain compared with the spatial domain [
32].
Figure 5 shows the frequency-independent beampatterns generated by the weights obtained at
using the proposed algorithm.
In order to verify the effectiveness of the proposed algorithm, we choose a rigid spherical array of order
, and compare the proposed algorithm with the DolphChebyshev method and SOCP method at low frequency (
) and high frequency (
) again. At low frequency, for the DolphChebyshev method, we set the main lobe width to 60°, for the SOCP method, we impose a minimum white noise gain constraint of 5 dB and a maximum side lobe level constraint of −20 dB. The sidelobe region is defined as
. At high frequency, for the Dolph–Chebyshev method, we set the main lobe width to 50°, for the SOCP method, we impose a minimum white noise gain constraint of 15 dB and a maximum sidelobe level constraint of −20 dB. The sidelobe region is defined as
. The corresponding beampatterns are presented in
Figure 6. The major characteristics of these beampatterns are compared in
Table 3. From
Table 3(b), it can be seen that the proposed algorithm only has a slight loss in DI value compared to the two traditional algorithms under high frequency, but there is a significant improvement in WNG and MSL. When using NSGA-II as the optimization algorithm to solve the optimization problem, it outperforms MOPSO on both DI and MSL, with only a slight loss on WNG. It can be seen from
Table 3a that, at a low frequency, the beamforming is similar to that of the third-order rigid spherical array. Although the Dolph–Chebyshev method achieves optimal DI and MSL values, it comes at the expense of sacrificing array’s noise robustness. Meanwhile, under low frequency, compared with the SOCP method and the MOPSO method, the proposed algorithm only has a slight loss in DI value, but has significant improvement in WNG and MSL. Meanwhile, using MOPSO as the optimization algorithm produces better results than two traditional algorithms.