Next Article in Journal
The InflateSAR Campaign: Testing SAR Vessel Detection Systems for Refugee Rubber Inflatables
Next Article in Special Issue
Enhanced Pulsed-Source Localization with 3 Hydrophones: Uncertainty Estimates
Previous Article in Journal
Evaluation of P-Band SAR Tomography for Mapping Tropical Forest Vertical Backscatter and Tree Height
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Selection Based on Principal Component Regression for Underwater Source Localization by Deep Learning

Department of Electronic Systems, Norwegian University of Science and Technology, 7491 Trondheim, Norway
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(8), 1486; https://doi.org/10.3390/rs13081486
Submission received: 1 March 2021 / Revised: 6 April 2021 / Accepted: 9 April 2021 / Published: 13 April 2021
(This article belongs to the Special Issue Intelligent Underwater Systems for Ocean Monitoring)

Abstract

:
Underwater source localization is an important task, especially for real-time operation. Recently, machine learning methods have been combined with supervised learning schemes. This opens new possibilities for underwater source localization. However, in many real scenarios, the number of labeled datasets is insufficient for purely supervised learning, and the training time of a deep neural network can be huge. To mitigate the problem related to the low number of labeled datasets available, we propose a two-step framework for underwater source localization based on the semi-supervised learning scheme. The first step utilizes a convolutional autoencoder to extract the latent features from the whole available dataset. The second step performs source localization via an encoder multi-layer perceptron trained on a limited labeled portion of the dataset. To reduce the training time, an interpretable feature selection (FS) method based on principal component regression is proposed, which can extract important features for underwater source localization by only introducing the source location without other prior information. The proposed approach is validated on the public dataset SWellEx-96 Event S5. The results show that the framework has appealing accuracy and robustness on the unseen data, especially when the number of data used to train gradually decreases. After FS, not only the training stage has a 95% acceleration but the performance of the framework becomes more robust on the receiver-depth selection and more accurate when the number of labeled data used to train is extremely limited.

Graphical Abstract

1. Introduction

Underwater source localization is a relevant and challenging task in underwater acoustics. The most popular method for source localization is matched-field processing (MFP) [1], which has inspired several works [2,3,4,5]. One of the major drawbacks of the MFP method is the need to compute many “replica” acoustic fields with different environmental parameters via numerical simulations based on the acoustic propagation model. Accuracy of the results is heavily affected by the amount of prior information about the marine environment (e.g., sound speed profile, geoacoustic parameters, etc.), which unfortunately is often hard to acquire in real scenarios.
Artificial intelligence (AI), and primarily data-driven approaches based on machine learning (ML), has become pervasive in many research fields [6,7]. ML-techniques are commonly divided into supervised and unsupervised learning. The former approach relies on the availability of labeled datasets, i.e., when measurements are paired with ground truth information. The latter refers to the case when unlabeled data are available [8].
Recently, there have been several studies on underwater source localization based on ML using the supervised learning scheme [9,10,11,12,13,14,15,16,17]. The general approach of underwater source localization by supervised learning scheme is through the use of acoustic propagation simulation models to create a huge simulation dataset for covering the real scenario. This approach has two main limitations: firstly, creating such a huge simulation dataset is time consuming and requires large computer storage resources; secondly, the set of environmental parameters to create a simulation dataset may not be able to account and adapt for environmental changes in a real-world scenario. The latter aspect requires a new simulation process, which may often be unrealistic.
Apparently, data-driven ML approaches rely on information extracted from available data, then the need of being able to exploit both labeled and unlabeled data is crucial in many applications, including underwater source localization. Semi-supervised learning has been proposed to face this issue in computer vision [18] and room acoustics [19,20].
Deep learning is famous for its brilliant performance for many tasks; however, the huge computation is the price. In the study of Niu et al. [15], the training time was six days for their ResNet50-1 model and three days for each of the ResNet50-2-x-D models. Each ResNet50-2-x-R model took 15 days to train.
In real scenarios, the speed of training is vital for real-time localization. To accelerate the training speed, some feature selection (FS) methods have been applied in underwater acoustics [21,22,23,24]. Feature selection aims to find the optimal feature subspace that can express the systematic structure of the raw dataset [21]. Principal component analysis (PCA) is a well-known method which can maximize the variance in each principal direction and remove the correlations among the features of the raw dataset [21,25]. Furthermore, the latent relationship between features can be interpreted by studying the correlation loading plot of PCA [26]. Principal component regression (PCR) is a PCA-based method, which can find out the significant variables for the target of regression by analyzing the absolute value of the regression coefficients [27,28].
In our study, an interpretable FS method for underwater source localization based on PCR is proposed. To make the situation closer to the real scenario, a two-step semi-supervised framework, and the data collected by a single hydrophone are used to build and train the neural network, respectively.
Figure 1 shows the workflow to illustrate our approach. The raw data is firstly preprocessed by discrete Fourier transform and min-max scaling. To select the important features for source localization, the PCR is conducted. Based on the absolute value of the regression coefficients of PCR, the important features are selected. Finally, the selected features are fed into the two-step semi-supervised framework for source localization. The structure of the framework is built on the encoder of a convolutional autoencoder which is trained in unsupervised-learning mode, and a 4-layer multi-layer perceptron (MLP) which is trained in supervised-learning mode.
The performance of our approach is assessed on the public dataset SWellEx-96 Event S5 [29].
The objectives of this paper are:
  • Mitigating the problem related to the low number of labeled datasets in many real scenarios.
  • Reducing the training time of the neural network and keeping the localization performance as much as possible.
More specifically, the contributions of our work are:
  • An interpretable approach of FS for underwater acoustic source localization is proposed. This approach can reveal the important features related to sources by only introducing the source location without other prior information.
  • By using the selected features, the training time of the neural networks is significantly reduced with a slight loss of the performance of localization.
  • A semi-supervised two-step framework is used for underwater source localization exploiting both unlabeled and labeled data. The performance of the framework is assessed showing appealing behavior in terms of good performance combined with simple implementation and large flexibility.
The paper is organized as follows: Section 2 describes the theories of PCA and PCR, as well as the method of FS; Section 3 presents the two-step framework for underwater source localization; the public dataset SWellEx-96 Event S5, the data preprocessing, and the schemes of building the training and test datasets are given in Section 4; in Section 5, a comprehensive analysis of the localization performance between our framework based on the FS method and the control groups are described; the selected features are interpreted from both physical and data-science perspective in Section 6; and, finally, the conclusion is given in Section 7.

2. The Interpretable FS Method Based on PCR

The training time of a deep neural network could be huge, especially when the depth of the network is large. To reduce the training time, as well as keep the accuracy of the underwater source localization, an interpretable FS method based on PCR is introduced in this section.

2.1. Theory of PCA

PCA [28] refers to the following decomposition of a column-mean-centered data matrix X of size N × K , where N and K represent the number of samples and the number of features, respectively,
X = T P T + E ,
where ( . ) T is the transpose operation for a matrix, T is a score matrix of size N × A related to the projections of the matrix X into an A-dimensional space, P is a loading matrix of size K × A related to the projections of the features into the A-dimensional space (with P T P = I ), and E is a residual matrix of size N × K .
More specifically, the A-dimensional space is identified via the singular value decomposition (SVD) of X by selecting the first A principal components.
Denoting X = U S V T the SVD of X and U ^ , S ^ , and V ^ the matrices containing the first A columns of U , S , and V , respectively, then we have
T = U ^ S ^ P = V ^ ,
and X ^ = T P T is called the reconstructed data matrix.

2.2. Theory of PCR

The multiple linear regression (MLR) method is given by
y = X θ + e ,
where y is the regression target (in this paper is source location) of size N × 1 containing N samples; X is the data matrix as mentioned above; θ is the regression coefficients of size K × 1 ; and e is the unexplained residuals of y . Using ordinary least squares regression [30], the regression coefficients θ ^ MLR of size K × 1 can be estimated as
θ ^ MLR = ( X T X ) 1 X T y .
PCR is the MLR based on the first A PCs extracted from the original data matrix X . To estimate the regression parameters θ ^ PCR of size A × 1 , the score matrix T is used instead of X in Equation (4):
θ ^ PCR = ( T T T ) 1 T T y .

2.3. Method of FS

The aim of FS is to select a set of important variables for accelerating the speed of underwater acoustic source localization. Furthermore, PCA and PCR are highly interpretable methods, the correlation between variables and the significant variables for regression can be revealed by investigating the plot of the correlation loading and the values of the regression coefficients, respectively [28].
The method of FS has 5 steps:
  • Conducting mean-centered operation for each column in the data matrix X .
  • Conducting SVD on the column-mean-centered data matrix X to calculate the first A PCs ( A = 3 in this paper), as well as build the matrices of the score T and the loading P , following Equation (2).
  • Calculating the regression coefficients θ ¯ of size K × 1 for each original variable by
    θ ¯ = P θ ^ PCR .
  • Ranking the elements in θ ¯ with absolute value from high to low. And setting a threshold ϵ ( ϵ = 0.02 in this paper) to choose the variables equal to or greater than the threshold as the selected features.
  • A new data matrix X ¯ of size N × M is constructed based on the selected M features.

3. The Two-Step Framework for Underwater Source Localization

In many real scenarios, a whole dataset will often consist of a small portion of labeled data and a large portion of unlabeled data (purely acoustic signals). To make the experiment condition closer to real scenarios, in the following, we assume that a large-size dataset is available with most of the data being unlabeled and only a small fraction labeled.

3.1. Step-One: Training a Convolutional Autoencoder

The autoencoder is an unsupervised learning machine, which can be trained based on the unlabeled dataset [31]. The first step of the framework is to train a convolutional autoencoder (CAE) [32]. The role of the CAE is to conduct unsupervised learning since training the CAE does not need labels, which means that the whole dataset can be covered.
The structure of CAE is shown in Figure 2a, where the network consists of an encoder and a decoder. The arrows indicate the direction of the data stream.
The encoder, made of 4 blocks, is used to extract the compressed features from the input data. Each block contains a convolution layer (for extracting features), a batch norm layer (for speeding up training), and a leaky-ReLU layer (for operating a non-linear transform on the data stream). Additionally, the decoder has a dual-symmetric structure as the encoder and is used to create the reconstruction of the input data from the compressed features. After creating the reconstructed input, the mean squared error (MSE) is the selected loss function to measure the difference between the input and the reconstruction.
It is worth noticing that, in this step, the whole dataset (both the labeled and unlabeled portions) is used to train the network, but only the purely acoustic signals are involved as described later in the paper.

3.2. Step-Two: Training the Encoder-MLP Localizer Based on the Semi-Supervised Learning Scheme

After training the CAE, the second step requires training the Encoder-MLP for localization based on the semi-supervised learning scheme. The structure of this model is shown in Figure 2b, which consists of a pre-trained encoder extracting the compressed features from input data and a 4-layer-MLP estimating the location of the acoustic source based on the compressed features. The MLP consists of four blocks, with each block containing a dense layer followed by a dropout layer (for regularization) and a non-linear transform function. The sigmoid function is an appropriate choice for the non-linear transform since, during the data preprocessing stage, the regression target, i.e., the horizontal distance between source and receiver, is scaled into the interval ( 0 , 1 ) .
Similarly, the arrows in Figure 2b indicate the direction of the data stream. Since the encoder has been trained, its parameters will be frozen during the training stage of the second step. After the encoder, the compressed features are fed in the MLP, which will provide the estimated source location as output. Finally, the same loss function, i.e., MSE, is used since the localization task is a regression problem.

4. Dataset and Preprocessing

In this section, SWellEx-96 Event S5 is introduced. The preprocessing method and the schemes for building the training and test datasets are used. Finally, to illustrate the performance of our framework and the proposed FS method, the control groups are created.

4.1. SWellEx-96 Event S5

Vertical linear array (VLA) data from SWellEx-96 Event S5 are used to illustrate the localization performance of our framework. The event was conducted near San Diego, CA, where the acoustic source started its track of all arrays and proceeded northward at a speed of 5 knots (2.5 m/s). The source had two sub-sources, a shallow one was at a depth of 9 m and a deep one at 54 m. The sampling rate of the data was 1500 Hz and the recording time of the data was 75 min. The VLA contained 21 receivers equally spaced between 94.125 m and 212.25 m. The water depth was 216.5 m. Additionally, the horizontal distance between the source and the VLA is also provided in the dataset. More detailed information of this event can be found in Reference [29].

4.2. Preprocessing and FS

In this paper, the underwater acoustic signals collected by a single receiver are transformed into the frequency domain. We calculate the spectrum without overlap for each 1 s slice of the signal and arrange it in a matrix (namely, features) X format with the shape of 4500 × 750 , where each row is related to one slice. More specifically, 4500 is the total number of time-steps (75 min = 4500 s) and 750 is the number of frequencies. In the matrix, each row corresponds to one single time-step, and each column corresponds to one single frequency.
Besides the acoustic signals, the horizontal distance between the source and the VLA was provided in the original dataset, which can be expressed as a vector y (namely, labels) with the shape of 4500 × 1 , where 4500 indicates the total number of time-steps, and 1 indicates the distance at each time-step.
For the training stability of our framework, the features X and labels y are scaled into interval ( 0 , 1 ) by the min-max scaling method:
X = X X min X max X min , y = y y min y max y min .
After preprocessing, the FS is conducted following the steps in Section 2 based on X and y . Note that the systems with/without min-max scaling before FS have been compared showing that pre-scaling improves the performance.

4.3. Schemes for Building the Dataset

For step-one, the dataset for CAE is expressed as:
[ X ¯ ] = [ x ¯ i ] i = 1 N ,
where X ¯ is the features in matrix form, x ¯ i is a row-vector with the length of M (the number of selected features), corresponding to the ith row of the features matrix X ¯ , and N is the number of time-steps.
For step-two, the dataset for Encoder-MLP localizer is expressed as:
[ X ¯ , y ] = [ x ¯ i , y i ] i = 1 N ,
where X ¯ and y are the features and labels in matrix form. And y i is the ith element in labels vector y .

4.4. Schemes of Separating Training and Test Datasets

To illustrate the performance of the semi-supervised framework as the number of labeled datasets decreases, 50%, 25%, and 12.5% of the whole labeled dataset are chosen, respectively, as the training dataset of step-two.
Since source localization is a regression task, the labels in the training dataset of step-two should cover the whole interval of the horizontal distance between the source and receiver. As described above, the total number of time-steps is 4500, which can be expressed by the index i ( 1 , 4500 ) . The schemes of separating training and test datasets for step-two are:

4.4.1. Using 50% Data to Build Training Dataset

Training dataset : ( x ¯ i , y i ) i : m o d ( i , 2 ) = 1 Test dataset : ( x ¯ i , y i ) i : m o d ( i , 2 ) 1 .

4.4.2. Using 25% Data to Build Training Dataset

Training dataset : ( x ¯ i , y i ) i : m o d ( i , 4 ) = 1 Test dataset : ( x ¯ i , y i ) i : m o d ( i , 4 ) 1 .

4.4.3. Using 12.5% Data to Build Training Dataset

Training dataset : ( x ¯ i , y i ) i : m o d ( i , 8 ) = 1 Test dataset : ( x ¯ i , y i ) i : m o d ( i , 8 ) 1 .
To show the influence of different depths, receivers no. 1 (top), no. 10 (middle), and no. 21 (bottom) are chosen to build the dataset, respectively. For each receiver, there are 3 choices of the percentage to build the training dataset. Totally, there are 3 × 3 = 9 training datasets are built.

4.5. Control Group

4.5.1. Control Group for the Semi-Supervised Framework

To make a fair comparison, we trained a neural network with the same structure as the framework by the purely supervised learning scheme.

4.5.2. Control Group for the FS Method

To show the performance of our FS method, a framework without FS is trained in the same way. The matrix containing the whole features X of size 4500 × 750 is calculated from Equation (7) and used to build the dataset following the schemes described in Section 4.3 and Section 4.4.

5. Performance of Source Localization

In this section, the hyperparameters of the two-step framework are introduced. After that, several experiments are conducted to exam the performance of source localization. Finally, a comprehensive comparison of the localization performance is shown, which demonstrates the benefit of our approach.

5.1. Hyperparameters of the Framework

In Figure 2, the output channel (n_Conv1D) and the kernel size (k_Conv1D) of the 1D-convolutional layer, as well as the input channel (n_Dense1) of the first dense layer, are not fixed. This is because the size of input features varies between datasets collected by different receivers.
For the training dataset without FS,
  • n_Conv1D = 738;
  • k_Conv1D = 13;
  • n_Dense1 = 24,843.
For the training dataset with FS,
  • n_Conv1D = 114;
  • k_Conv1D = M 113 ;
  • n_Dense1 = 507.
After FS, the number of selected features M is shown in Table 1.
To train the framework, the learning rates for step-one and step-two are 1 × 10 4 and 5 × 10 5 , respectively. The optimization scheme is Adam. The epoch and the batch-size are 100 and 5 for each step, respectively.
All the networks mentioned in this paper are trained using one NVIDIA RTX 2080Ti GPU card.

5.2. Examining the Performance When Removing Some 2D-Convolutional Layers of the Framework after FS

The function of the encoder is to compress the original dataset and create its compressed expression, which is similar to our manual FS method. To find the best structure for the framework using FS, the number of 2D-convolutional layers of the encoder, and the corresponding number of transposed 2D-convolutional layers of the decoder are gradually decreased. After re-training the modified CAE, step-two is conducted as before. The structures of CAE after removing one and two 2D-convolutional layers are shown in Table 2 and Table 3, respectively. The performance will be discussed in Section 5.3.

5.3. Overall Analysis of the Localization Performance

To make a comprehensive comparison, 4 pairs of networks are tested on the data collected by all receivers and trained separately based on the data collected by receivers no. 1, no. 10, and no. 21. One pair is trained without FS, the rest are all trained with the FS method proposed by this paper. For the rest 3 pairs of networks, one has the same number of layers as the networks trained without FS; others have the structures shown in Table 2 and Table 3, respectively. Additionally, each pair of networks consists of the framework trained by the semi-supervised learning scheme and the same network of step-two trained by the purely supervised learning scheme.

5.3.1. Comparison between the Framework and the Purely Supervised Learning Scheme after FS

After FS and tested on all receivers, the performance of our framework and the purely supervised learning scheme is shown in Table 4. In the Table, the first row indicates the percentage of the data used to build the training dataset. In the first column, R1 to R21 indicate receivers no. 1 to no. 21, respectively. Additionally, the mean indicates the average of MSE on all receivers. The bold numbers indicate the lower values of MSE in every pair of our framework and the purely supervised learning scheme, which means the model has a better performance on source localization.
Observing Table 4, interesting phenomena can be found:
  • Performance of the purely supervised learning scheme:
    The network trained by the supervised learning scheme can attain the lower MSE only when the test dataset is chosen near the receiver used to build the training dataset. When the test dataset is far from the receiver used to train, its performance is getting worse dramatically. This trend is more obvious when the percentage of data used to train decreases. This shows the limitation of the purely supervised learning scheme: when the labeled training dataset is limited, the generalization ability of the model is poor.
  • Performance of our framework:
    Compared to the purely supervised learning scheme, our framework is more robust and has much lower MSE on the data collected by those receivers which are far from the receiver used to build the training dataset, even though its performance on the data collected by the receivers near the receiver used to train is a bit poorer. This trend is more obvious when the percentage of data used to train decreases.
  • Comparison of the different percentages used to train:
    When the percentage of the data used to build the training dataset decreases, the performance of both schemes becomes worse. However, the degree of performance degradation of our framework is smaller than that of the purely supervised learning scheme.

5.3.2. Comparison of the Mean MSE and the Training Time between the Networks with and without FS

The performance between the networks with and without FS is shown in Table 5. In the table, the residual illustrates the difference of the mean MSE between networks, and the percentage of the residual illustrates the performance improvement (positive value) and degradation (negative value) by FS. They are calculated by
R e s i d u a l = M S E W i t h o u t F S M S E W i t h F S P e r c e n t a g e o f R e s i d u a l = R e s i d u a l M S E W i t h o u t F S .
Observing Table 5, phenomena can be found:
  • When the percentage of data used to train is 50% and 25%, respectively, the performance of the framework trained on R1 and R10 has some degradation (12.82% to 17.65%). However, when the percentage of data used to train is 12.5%, the performance of the framework trained on R1 and R10 has a slight improvement (7.69%) and degradation (4%), respectively.
  • Trained on R21, the framework’s performance has significant improvements, which are 17.24%, 31.82%, and 44.71% when the percentage of data used to train is 50%, 25%, and 12.5%, respectively.
  • Compared to the framework, the performance of the purely supervised learning scheme gains more improvement (14.04% to 50.47%) after FS. The performance degradation only happens when it is trained on 50% R1, 50% R10, and 25% R10.
The training time of the networks is shown in Table 6. This table illustrates that the training time is reduced significantly after FS for both framework and the purely supervised learning scheme.

5.3.3. The Best Structure for the Framework after the FS

As mentioned in Section 5.2, the comparison of the mean MSE and the training time between different structures of networks is shown in Table 7 and Table 8, respectively.
According to the Tables, interesting phenomena can be found:
  • For the framework, Structure 1 attains the lowest MSE except for trained on 50% R1 and 12.5% R1.
  • For the purely supervised learning scheme, Structure 2 attains the lowest MSE with a slight improvement compared to Structure 1 when the percentages of data used to train are 25% and 12.5%.
  • Structure 1 shows the best performance for training time reduction. Considering both MSE and training time, the best structure after the FS is Structure 1.

5.3.4. Conclusions of the Performance Analysis

In Figure 3, the conclusion of the performance analysis is illustrated. The legend used in all the sub-figures is the same. The blue bar is related to the network without FS. The orange, yellow, and gray bars are related to the ’Original structure’, ’Structure 1’, and ’Structure 2’ in Table 7 and Table 8, respectively.
From Figure 3, interesting phenomena can be found:
  • When the number of labeled data is gradually decreasing, the power of the framework with the semi-supervised learning scheme is revealed.
  • The FS method is beneficial for both the framework and the purely supervised learning scheme, which can significantly decrease the training time with a slight loss of the performance of localization.
  • After FS, the difference in performance between different receiver-depth is not significant, which means it can increase the robustness of the receiver-depth selection.
To have an intuitive view of the performance, Figure 4 shows the localization result of our framework trained on 50% R1 after FS.

6. Discussion of FS

In this section, the discussion of the selected features is given, which demonstrates that the most significant portion of the original features for source localization has been selected by performing the FS. The training dataset using 50% data collected by receiver no. 21 is used in this section for illustration.

6.1. Details of the Sources in SWellEx-96 Event S5

According to the details on the website of SWellEx-96 Event S5 [29], the deep source (J-15) transmitted 5 sets of 13 tones between 49 Hz and 400 Hz. The first set of tones was projected at maximum transmitted levels of 158 dB. The second set of tones was projected with levels of 132 dB. The subsequent sets (3rd, 4th, and 5th) were each projected 4 dB down from the previous set. The shallow source transmitted only one set containing 9 tones between 109 Hz and 385 Hz. According to Du et al. [33], 500–700 Hz is related to the noise radiated by the ship towing the sources in the experiment, which is also an important contribution for source localization.

6.2. Interpretation of the Selected Features

After the FS described in Section 2.3, the matrix X ¯ of size N × M containing selected features is created. To interpret the selected features, another PCA is conducted on this matrix. To investigate the correlation structure between the features and the PCs, correlation loading is calculated based on the method proposed by Frank Westad et al. [26].
As shown in Figure 5, the abscissa is PC 1 and the ordinate is PC 3. There are 2 circles in the plot, in which the inner and outer ones indicate 50% and 100% explained variance, respectively. The points between the two circles are the significant features that can explain at least a 50% variance of the data. And the legend with different colors illustrates different sets of tones and the ship noise.
From the correlation loading plot in Figure 5, phenomena with physical meanings can be found:
1. Along PC 1, the frequencies related to the high transmitted signal level are in the positive half-axis. The frequencies related to lower energy levels are in the negative half-axis. For more details:
  • 7 frequencies (127, 145, 198, 232, 280, 335, 385 Hz) of the shallow source and 3 frequencies (238, 338, 388 Hz) of the deep source are in the area between two circles, which means that they are significant features.
  • The rest frequencies of the shallow source and the highest transmitted level (and also the tone with 391 Hz related to the second transmitted level) of the deep source are also close to the boundary of the inner circle, which means that they still have some importance for the data.
  • Expect for frequencies of the highest transmitted level and 391 Hz of the second transmitted level, the rest frequencies are closer to the origin, which means that they are less significant from the statistical perspective.
2. Along PC 3, the frequencies related to the shallow source are in the negative half-axis (except for the tone with 391 Hz). Furthermore, the frequencies related to the ship noise are also in the negative half-axis, since the ship can be treated as a shallow noise source. The frequencies related to the deep source are in the positive half-axis.
More specifically, the numbers of selected features among the different subsets of tones and ship noise are:
  • Deep 1st: 11 (13 in total);
  • Deep 2nd: 7 (13 in total);
  • Deep 3rd: 3 (13 in total);
  • Deep 4st: 5 (13 in total);
  • Deep 5st: 3 (13 in total);
  • Shallow: 9 (9 in total);
  • Ship noise (500–700 Hz): 15.
According to the discussion above, the frequencies related to the shallow source, deep source, and the ship noise are selected by applying the FS, which are the most important features for source localization. The FS process does not need any prior information.

6.3. Different Roles of the FS and the Autoencoder

Autoencoders are often used as feature extractors; thus, the considered feature-selection stage might seem redundant. However, the PCR adapted for FS is a linear method that can select the most important subset of variables for the regression target (i.e., source localization in this paper). The effect is that the non-linear processing of the autoencoder becomes easier to train (i.e., significantly reduced training time) while keeping approximately the same performance.
After the FS stage, the most important subset of the original features is gained.
More specifically, the roles of the FS and the autoencoder in our framework are:
  • The FS: Selecting the most important subset of the original features for reducing the training time of our framework and providing a nice starting point for the framework.
  • The autoencoder: Conducting the unsupervised learning to cover all the information in the dataset.

7. Conclusions

In this paper, we utilize a two-step semi-supervised framework for source localization to deal with the condition of the limited amount of labeled data in many real scenarios. To accelerate the training stage of the framework for the real-time operation, a FS method based on PCR is proposed.
Based on a public dataset, SWellEx-96 Event S5, the performances of our FS method and the two-step framework have been demonstrated. The results show that the framework is more robust on the unseen data, especially when the number of labeled data used to train gradually decreases. After FS, the training time is significantly reduced (by an average of 95%). The localization performance has a slight degradation when 50% and 25% of data are used to train. However, when the percentage of data used to train is 12.5%, this condition is closer to the real scenario, the FS method can improve the performance of both semi-supervised learning and purely supervised learning.
It needs to be mentioned that the structure of the network used in this paper is just a demo for showing the performance of our framework. More complex and powerful networks can be applied in this framework, and, based on our anticipation, the performance of source localization will be better as long as the network has been trained appropriately and well.

Author Contributions

Formal analysis, X.Z., H.D., P.S.R. and M.L.; Funding acquisition, H.D. and M.L.; Methodology, X.Z., H.D. and P.S.R.; Resources, X.Z., H.D. and P.S.R.; Software, X.Z. and H.D.; Writing—original draft, X.Z.; Writing—review & editing, H.D., P.S.R. and M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: http://swellex96.ucsd.edu/index.htm (accessed on 31 March 2021).

Acknowledgments

The authors would like to acknowledge the Norwegian Research Council and the industry partners of the GAMES consortium at NTNU for financial support (Grant No. 294404). Xiaoyu Zhu would like to acknowledge the China Scholarship Council (CSC) for the fellowship support (No. 201903170205).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PCAPrincipal component analysis
PCRPrincipal component regression
MLPMulti-layer perceptron
MFPMatched field processing
AIArtificial intelligence
MLMachine learning
SVDSingular value decomposition
MLRMultiple linear regression
PCPrincipal component
CAEConvolutional autoencoder
MSEMean squared error
VLAVertical linear array
FSFeature selection

References

  1. Baggeroer, A.B.; Kuperman, W.; Schmidt, H. Matched field processing: Source localization in correlated noise as an optimum parameter estimation problem. J. Acoust. Soc. Am. 1988, 83, 571–587. [Google Scholar] [CrossRef]
  2. Bogart, C.W.; Yang, T. Comparative performance of matched-mode and matched-field localization in a range-dependent environment. J. Acoust. Soc. Am. 1992, 92, 2051–2068. [Google Scholar] [CrossRef]
  3. Baggeroer, A.B.; Kuperman, W.A.; Mikhalevsky, P.N. An overview of matched field methods in ocean acoustics. IEEE J. Ocean. Eng. 1993, 18, 401–424. [Google Scholar] [CrossRef]
  4. Mantzel, W.; Romberg, J.; Sabra, K. Compressive matched-field processing. J. Acoust. Soc. Am. 2012, 132, 90–102. [Google Scholar] [CrossRef] [Green Version]
  5. Yang, T. Data-based matched-mode source localization for a moving source. J. Acoust. Soc. Am. 2014, 135, 1218–1230. [Google Scholar] [CrossRef]
  6. Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press Cambridge: Cambridge, MA, USA, 2016; Volume 1. [Google Scholar]
  7. Chen, R.; Zhang, W.; Wang, X. Machine Learning in Tropical Cyclone Forecast Modeling: A Review. Atmosphere 2020, 11, 676. [Google Scholar] [CrossRef]
  8. Ghahramani, Z. Unsupervised learning. In Summer School on Machine Learning; Springer: New York, NY, USA, 2003; pp. 72–112. [Google Scholar]
  9. Lefort, R.; Real, G.; Drémeau, A. Direct regressions for underwater acoustic source localization in fluctuating oceans. Appl. Acoust. 2017, 116, 303–310. [Google Scholar] [CrossRef]
  10. Niu, H.; Reeves, E.; Gerstoft, P. Source localization in an ocean waveguide using supervised machine learning. J. Acoust. Soc. Am. 2017, 142, 1176–1188. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Niu, H.; Ozanich, E.; Gerstoft, P. Ship localization in Santa Barbara Channel using machine learning classifiers. J. Acoust. Soc. Am. 2017, 142, EL455–EL460. [Google Scholar] [CrossRef] [PubMed]
  12. Wang, Y.; Peng, H. Underwater acoustic source localization using generalized regression neural network. J. Acoust. Soc. Am. 2018, 143, 2321–2331. [Google Scholar] [CrossRef] [PubMed]
  13. Huang, Z.; Xu, J.; Gong, Z.; Wang, H.; Yan, Y. Source localization using deep neural networks in a shallow water environment. J. Acoust. Soc. Am. 2018, 143, 2922–2932. [Google Scholar] [CrossRef]
  14. Liu, Y.N.; Niu, H.Q.; Li, Z.L. Source ranging using ensemble convolutional networks in the direct zone of deep water. Chin. Phys. Lett. 2019, 36, 044302. [Google Scholar] [CrossRef]
  15. Niu, H.; Gong, Z.; Ozanich, E.; Gerstoft, P.; Wang, H.; Li, Z. Deep-learning source localization using multi-frequency magnitude-only data. J. Acoust. Soc. Am. 2019, 146, 211–222. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Wang, W.; Ni, H.; Su, L.; Hu, T.; Ren, Q.; Gerstoft, P.; Ma, L. Deep transfer learning for source ranging: Deep-sea experiment results. J. Acoust. Soc. Am. 2019, 146, EL317–EL322. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Lin, Y.; Zhu, M.; Wu, Y.; Zhang, W. Passive Source Ranging Using Residual Neural Network With One Hydrophone in Shallow Water. In Proceedings of the 2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP), Shanghai, China, 12–15 September 2020; pp. 122–125. [Google Scholar]
  18. Zhai, X.; Oliver, A.; Kolesnikov, A.; Beyer, L. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 1476–1485. [Google Scholar]
  19. Bianco, M.J.; Gannot, S.; Gerstoft, P. Semi-supervised source localization with deep generative modeling. arXiv 2020, arXiv:2005.13163. [Google Scholar]
  20. Hu, Y.; Samarasinghe, P.N.; Abhayapala, T.D.; Gannot, S. Unsupervised Multiple Source Localization Using Relative Harmonic Coefficients. In Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 571–575. [Google Scholar]
  21. Zeng, X.; Wang, Q.; Zhang, C.; Cai, H. Feature selection based on ReliefF and PCA for underwater sound classification. In Proceedings of the 2013 3rd International Conference on Computer Science and Network Technology, Dalian, China, 12–13 October 2013; pp. 442–445. [Google Scholar]
  22. Ouelha, S.; Mesquida, J.R.; Chaillan, F.; Courmontagne, P. Extension of maximal marginal diversity based feature selection applied to underwater acoustic data. In Proceedings of the 2013 OCEANS-San Diego, San Diego, CA, USA, 21–25 October 2013; pp. 1–5. [Google Scholar]
  23. Yang, H.; Gan, A.; Chen, H.; Pan, Y.; Tang, J.; Li, J. Underwater acoustic target recognition using SVM ensemble via weighted sample and feature selection. In Proceedings of the 2016 13th International Bhurban Conference on Applied Sciences and Technology (IBCAST), Islamabad, Pakistan, 12–16 January 2016; pp. 522–527. [Google Scholar]
  24. Erkmen, B.; Yıldırım, T. Improving classification performance of sonar targets by applying general regression neural network with PCA. Expert Syst. Appl. 2008, 35, 472–475. [Google Scholar] [CrossRef]
  25. Jackson, J.E. A User’s Guide to Principal Components; John Wiley & Sons: New York, NY, USA, 2005; Volume 587. [Google Scholar]
  26. Westad, F.; Hersletha, M.; Lea, P.; Martens, H. Variable selection in PCA in sensory descriptive and consumer data. Food Qual. Prefer. 2003, 14, 463–472. [Google Scholar] [CrossRef]
  27. CAMO ASA Norway. The Unscrambler User Manual; CAMO ASA Norway: Oslo, Norway, 1998. [Google Scholar]
  28. Esbensen, K.H.; Guyot, D.; Westad, F.; Houmoller, L.P. Multivariate Data Analysis: In Practice: An Introduction to Multivariate Data Analysis and Experimental Design; CAMO Process As: Oslo, Norway, 2002. [Google Scholar]
  29. Murray, J.; Ensberg, D. The Swellex-96 Experiment. 1996. Available online: http://http://swellex96.ucsd.edu/index.htm (accessed on 1 March 2021).
  30. Høy, M.; Westad, F.; Martens, H. Combining bilinear modelling and ridge regression. J. Chemom. A J. Chemom. Soc. 2002, 16, 313–318. [Google Scholar] [CrossRef]
  31. Hinton, G.E.; Zemel, R.S. Autoencoders, minimum description length and Helmholtz free energy. Adv. Neural Inf. Process. Syst. 1994, 6, 3–10. [Google Scholar]
  32. Chen, M.; Shi, X.; Zhang, Y.; Wu, D.; Guizani, M. Deep features learning for medical image analysis with convolutional autoencoder neural network. IEEE Trans. Big Data 2017. [Google Scholar] [CrossRef]
  33. Du, J.Y.; Liu, Z.W.; Lü, L.G. Range Localization of a Moving Source Based on Synthetic Aperture Beamforming Using a Single Hydrophone in Shallow Water. Appl. Sci. 2020, 10, 1005. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Workflow.
Figure 1. Workflow.
Remotesensing 13 01486 g001
Figure 2. Design of our framework: (a) the convolutional autoencoder; (b) the Encoder-multi-layer perceptron (MLP) localizer.
Figure 2. Design of our framework: (a) the convolutional autoencoder; (b) the Encoder-multi-layer perceptron (MLP) localizer.
Remotesensing 13 01486 g002
Figure 3. Conclusion of the performance analysis: (ac) are the comparison of the mean MSE between different configurations of the networks trained on 50%, 25%, and 12.5% labeled data, respectively; (d) the comparison of the training time in minutes.
Figure 3. Conclusion of the performance analysis: (ac) are the comparison of the mean MSE between different configurations of the networks trained on 50%, 25%, and 12.5% labeled data, respectively; (d) the comparison of the training time in minutes.
Remotesensing 13 01486 g003
Figure 4. Illustration of the localization result.
Figure 4. Illustration of the localization result.
Remotesensing 13 01486 g004
Figure 5. Correlation loading plot.
Figure 5. Correlation loading plot.
Remotesensing 13 01486 g005
Table 1. Number of selected features among different receivers.
Table 1. Number of selected features among different receivers.
Receiver50%25%12.5%
No. 1121122125
No. 10129137134
No. 21126126127
Table 2. Structure 1: Removing one 2D-convolutional layer of the encoder.
Table 2. Structure 1: Removing one 2D-convolutional layer of the encoder.
BlockOutput ChannelKernel SizeStride
Conv1D114M-1131
EncoderConv2D_1128 3 × 3 2
Conv2D_23 3 × 3 2
Transposed-Conv2D_1128 4 × 4 2
DecoderTransposed-Conv2D_21 4 × 4 2
Transposed-Conv1D1M-1131
Table 3. Structure 2: Removing two 2D-convolutional layers of the encoder.
Table 3. Structure 2: Removing two 2D-convolutional layers of the encoder.
BlockOutput ChannelKernel SizeStride
Encoder Conv1D114M-1131
Conv2D_13 3 × 3 2
DecoderTransposed-Conv2D_11 4 × 4 2
Transposed-Conv1D1M-1131
Table 4. The mean squared error (MSE) of models with feature selection (FS) trained on receiver no. 1.
Table 4. The mean squared error (MSE) of models with feature selection (FS) trained on receiver no. 1.
50%25%12.5%
FrameworkSupervisedFrameworkSupervisedFrameworkSupervised
R10.220.220.310.310.400.44
R20.310.330.360.390.480.51
R30.340.370.390.420.440.50
R40.350.390.430.440.450.55
R50.410.440.470.440.470.58
R60.390.420.40.420.420.54
R70.440.470.460.460.50.58
R80.380.40.380.430.420.48
R90.360.420.40.410.40.54
R100.390.430.460.450.480.59
R110.40.50.490.520.490.64
R120.390.40.430.460.460.54
R130.370.450.440.480.50.67
R140.40.490.490.490.490.63
R150.40.410.40.430.470.48
R160.470.490.480.50.560.59
R170.560.50.510.570.620.61
R180.430.470.450.510.510.57
R190.410.510.50.520.480.63
R200.430.510.510.530.530.67
R210.450.530.530.590.580.8
Mean0.400.440.440.470.480.58
Table 5. Comparison of the mean MSE for networks with and without FS.
Table 5. Comparison of the mean MSE for networks with and without FS.
Trained on R1Trained on R10Trained on R21
FrameworkSupervisedFrameworkSupervisedFrameworkSupervised
50%Without FS0.340.430.350.430.580.57
With FS0.400.440.410.480.480.49
Residual−0.06−0.01−0.06−0.050.100.08
Percentage of residual−17.65%−0.02%−17.14%−11.63%17.24%14.04%
25%Without FS0.390.570.400.520.660.79
With FS0.440.470.460.550.450.53
Residual−0.050.10−0.06−0.030.210.26
Percentage of residual−12.82%17.54%−15.00%−5.77%31.82%32.91%
12.5%Without FS0.520.780.500.780.851.07
With FS0.480.580.520.580.470.53
Residual0.040.20−0.020.200.380.54
Percentage of residual7.69%25.64%−4.00%25.64%44.71%50.47%
Table 6. Comparison of the training time for networks with and without FS.
Table 6. Comparison of the training time for networks with and without FS.
50%25%12.5%
FrameworkSupervisedFrameworkSupervisedFrameworkSupervised
Stet-oneWithout FS3 h 30 m 45 s-3 h 30 m 45s-3 h 30 m 45 s-
With FS7 m 7 s-6 m 59s-7 m 4 s-
Percentage of reduction96.62%-96.68%-96.65%-
Step-twoWithout FS1 h 3 m 46 s1 h 30 m 58 s59 m 33 s1 h 26 m 31 s54 m 12 s1 h 8 m 28 s
With FS3 m 18 s4 m 31 s2 m 19 s2 m 57 s1 m 50 s2 m 9 s
Percentage of reduction94.82%95.03%96.11%96.59%96.62%96.86%
Table 7. Comparison of the mean MSE for different structures of the networks after FS.
Table 7. Comparison of the mean MSE for different structures of the networks after FS.
Trained on R1Trained on R10Trained on R21
FrameworkSupervisedFrameworkSupervisedFrameworkSupervised
50%Original structure0.400.440.410.480.480.49
Structure 10.410.410.400.450.440.43
Structure 20.370.420.410.420.440.45
25%Original structure0.440.470.460.550.450.53
Structure 10.420.470.430.540.440.48
Structure 20.460.430.450.460.440.45
12.5%Original structure0.480.580.520.580.470.53
Structure 10.510.550.500.620.460.56
Structure 20.520.490.510.530.460.48
Table 8. Comparison of the training time for different structures of the networks after FS.
Table 8. Comparison of the training time for different structures of the networks after FS.
50%25%12.5%
FrameworkSupervisedFrameworkSupervisedFrameworkSupervised
Step-oneOriginal structure7 m 7 s-6 m 59 s-7 m 4 s-
Structure 15 m 27 s-5 m 27 s-5 m 17 s-
Structure 23 m 42 s-3 m 42 s-3 m 44 s-
Step-twoOriginal structure3 m 18 s4 m 31 s2 m 19 s2 m 57 s1 m 50 s2 m 9 s
Structure 13 m 58 s4 m 38 s2 m 31 s2 m 51 s1 m 48 s1 m 59 s
Structure 28 m 14 s8 m 44 s4 m 47 s5 m 1 s3 m 3 s3 m 12 s
TotalOriginal structure10 m 25 s4 m 31 s9 m 18 s2 m 57 s8 m 54 s2 m 9 s
Structure 19 m 25 s4 m 38 s7 m 58 s2 m 51 s7 m 5 s1 m 59 s
Structure 211 m 56 s8 m 44 s8 m 29 s5 m 1 s6 m 47 s3 m 12 s
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhu, X.; Dong, H.; Salvo Rossi, P.; Landrø, M. Feature Selection Based on Principal Component Regression for Underwater Source Localization by Deep Learning. Remote Sens. 2021, 13, 1486. https://doi.org/10.3390/rs13081486

AMA Style

Zhu X, Dong H, Salvo Rossi P, Landrø M. Feature Selection Based on Principal Component Regression for Underwater Source Localization by Deep Learning. Remote Sensing. 2021; 13(8):1486. https://doi.org/10.3390/rs13081486

Chicago/Turabian Style

Zhu, Xiaoyu, Hefeng Dong, Pierluigi Salvo Rossi, and Martin Landrø. 2021. "Feature Selection Based on Principal Component Regression for Underwater Source Localization by Deep Learning" Remote Sensing 13, no. 8: 1486. https://doi.org/10.3390/rs13081486

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop