Next Article in Journal
AI-Enabled Smart Wristband Providing Real-Time Vital Signs and Stress Monitoring
Next Article in Special Issue
Coseismic Deformation Field and Fault Slip Distribution Inversion of the 2020 Jiashi Ms 6.4 Earthquake: Considering the Atmospheric Effect with Sentinel-1 Data Interferometry
Previous Article in Journal
Leveraging Active Learning for Failure Mode Acquisition
Previous Article in Special Issue
SAR Image Simulations of Ocean Scenes Based on the Improved Facet TSM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modern Synergetic Neural Network for Synthetic Aperture Radar Target Recognition

Faculty of Computing, Harbin Institute of Technology, No. 92, Xidazhi Street, Nangang District, Harbin 150001, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(5), 2820; https://doi.org/10.3390/s23052820
Submission received: 12 December 2022 / Revised: 2 March 2023 / Accepted: 3 March 2023 / Published: 4 March 2023

Abstract

:
Feature extraction is an important process for the automatic recognition of synthetic aperture radar targets, but the rising complexity of the recognition network means that the features are abstractly implied in the network parameters and the performances are difficult to attribute. We propose the modern synergetic neural network (MSNN), which transforms the feature extraction process into the prototype self-learning process by the deep fusion of an autoencoder (AE) and a synergetic neural network. We prove that nonlinear AEs (e.g., stacked and convolutional AE) with ReLU activation functions reach the global minimum when their weights can be divided into tuples of M-P inverses. Therefore, MSNN can use the AE training process as a novel and effective nonlinear prototypes self-learning module. In addition, MSNN improves learning efficiency and performance stability by making the codes spontaneously converge to one-hots with the dynamics of Synergetics instead of loss function manipulation. Experiments on the MSTAR dataset show that MSNN achieves state-of-the-art recognition accuracy. The feature visualization results show that the excellent performance of MSNN stems from the prototype learning to capture features that are not covered in the dataset. These representative prototypes ensure the accurate recognition of new samples.

1. Introduction

With the development of synthetic aperture radar (SAR) technology, the explosive growth of SAR images has presented new challenges for highly accurate target recognition. Compared with manual feature extraction methods [1,2,3,4], deep learning techniques represented by the autoencoder (AE) enable the automatic extraction of target features [5], achieving a better performance with improvements in efficiency, and are therefore widely used in SAR image recognition tasks. With the development of deep learning, the basis model evolved from stacked AE [6,7,8] and sparse AE [9,10,11] to convolutional AE [12,13,14]. The evolution of the model enhances its accuracy. However, with the rising complexity of the network structure, the extracted features are implicitly and increasingly abstracted inside the network, making it difficult to determine whether the features are effectively extracted and to explain the excellent performance of the network.
During the application of AE to SAR, a fusion model of AE and the synergetic neural network (SNN) is proposed [15]. AE-extracted features are transmitted to SNN to improve the robustness of classification. However, many outstanding SNN characteristics are not fully utilized. SNN handles non-binary data [16,17,18] with an excellent theoretical performance [19], and all its attractors in the dynamic system correspond to valid memories. These characteristics mean that the SNN’s classification is actually an association between the features and the extracted prototypes [20,21,22]. SNN’s simulation of associative memory achieves remarkable results in image retrieval [23], face recognition [24], and semantic role annotation [25,26]. Its synergetics-based dynamics [27] can improve the recognition performance of the network and optimize the pattern extraction process by constructing an attractor–prototype correspondence. Therefore, significant breakthroughs can be obtained from the SNN-optimized AE feature extraction for recognition performance enhancement.
In this paper, we investigate the fusion method of AE and SNN and propose the modern SNN (MSNN) model. More than a decade before the formal introduction of AE, researchers showed that the weight matrices of the encoder and the decoder from single-layer linear AE are mutual M-P inverse if the loss function reaches a stationary point, including one global minimum and multiple saddle points [28], which are precisely in line with the requirements of the SNN prototype and adjoint matrix. Inspired by this study, we extended this conclusion to multilayer nonlinear AE and prove that its weight matrices can be regarded as multiple sets of M-P inverses during convergence. Next, we proposed MSNN. We treated the encoder and decoder as the generalized adjoint and prototype features of SNN and used the pattern extraction of the AE to realize nonlinear prototype self-learning. The synergetic dynamics allow for the codes of MSNN to converge to one-hot in a more stable and controllable manner than sparse or adversarial AEs. The binarized codes can be efficiently visualized through associative memory. Experimental results on the MSTAR dataset [29] show that MSNN outperforms sparse AEs, marginally outperforms convolutional AE without coding restrictions, and effectively visualizes the extracted features. Visualization results show that the self-learned prototypes reconstruct the sample features not contained in the dataset, and the reconstructed images are noise-free, which proves that the MSNN can successfully learn unobserved representative features for target recognition.
The objective of this paper is to propose an SNN-AE fusion model MSNN to learn nonlinear prototypes as representative features of SAR images to improve recognition accuracy. The contributions of this paper are (1) proposing the fusion model of SNN and AE with complementary strengths, (2) proposing the prototype self-learning method to enhance feature representativeness and visualization, and (3) attributing the recognition performance of deep learning to the effective learning of unobserved features.
This paper is structured as follows. Section 2 provides the background of the study. Section 3 introduces the fusion model MSNN. The first two sections provide the theoretical basis. Section 3 describes MSNN’s working process and learning method. Section 4 demonstrates the experiment configurations, results, and analysis. Section 5 summarizes the whole paper and offers the limitations, unexplored perspectives, and future directions of our study. This paper proposes a novel artificial neural network model to solve the feature recognition problem of SAR images, which may be of great interest to radar and artificial intelligence scientists, researchers, and trainees and may provide assistance in their research.

2. Background

2.1. SNN Overview

Synergetics use dynamic systems to study the dynamic process of multiple subsystems from disordered states to ordered states [27]. SNN, as an application of synergetics for associative memory tasks in computer science, considers the dynamic query pattern x as a disordered state and the static memory v as an ordered state and uses the dynamical system to form the variation from the query pattern to a stored memory in order to simulate the human associative memory process. For the query pattern x and the matrix of the static prototypes V = [ v 1 , , v N ] representing memories, the updated formula of SNN [19] with the default hyperparameter setting is as follows:
ξ = V + x
ξ new = Syn ( ξ ) = γ ξ 3 + ξ 2 ξ 2 2 + 1 γ 1 ξ
x new = V ξ new
where V + is the Moore–Penrose inverse of V [30,31], Syn can be interpreted as a synergetics-based activation function, and γ is the learning rate. Equation (3) is derived from the enslaving principle of synergetics, representing the dynamics of v , which are dominated by ξ . The variation in x is derived through the dynamic update to the ξ . Equation (1) transforms x to ξ , and Equation (2) updates ξ . The network repeatedly takes x new as the new input until it no longer changes, at which point the network converges. As shown in Figure 1, SNN converges to three kinds of stationary points: the target stable point, the saddle point, and the local maxima point. The target stable point is ξ , which reaches the positive or negative one-hot encoding, which is the general case. The network outputs a single prototype ± v , reflecting the association from x to v . The saddle point is ξ , reaching multiple identical non-zero values encodings that stem from multiple equal extremes in the initial value of ξ . The local maximum point is that all elements of ξ are initialized with 0; the network cannot be updated due to the division by 0 steps. SNN restricts the independence of all v and their number to less than its dimension, such that V + V is the identity matrix. Two consecutive iterations of the network were examined and substituted (3) into (1), ξ = V + x new = ξ new . Thus, the update formula can be simplified to the repeated application of Syn to ξ .

2.2. Relation of Linear AE Weights

For a single-layer, linear, unbiased, activation-free AE, the single global minimum of the loss function corresponds to the encoder–decoder weights, which are mutually M-P inverse. Suppose the covariance matrix of the trainset is full rank and the loss function contains a global minimum and multiple saddle points; when the global minimum is reached, the network output is an orthogonal projection to the span of eigenvectors of the covariance matrix, and the network is equivalent to an orthogonal projection matrix [28]. By Moore’s definition of the M-P inverse, the weight matrix of the encoder and the decoder are mutually M-P inverse [30], the decoder matrix is composed of eigenvectors, and the codes are the linear combination weights.

3. MSNN

We propose an AE-SNN deep fusion model MSNN by proving that the global minimum converged stacked or convolutional AE weights can be divided into tuples of M-P inverses and are suitable for the generalized prototype and adjoint matrices. MSNN replaces the adjoint and prototype matrices with the AE’s multilayer structure, so the prototype learning method is upgraded to nonlinear.

3.1. Stacked AE for Prototype Learning

The weights of the global minimum-converged AE can be divided into tuples of M-P inverses. The working process of the nonlinear AE layer is
z = W T x + b
y = σ ( z )
Let x = ( x , 1 ) T , construct
W = W 0 b T 1
Thus,
z = ( z , 1 ) T = W T x
The activation function σ scales the elements of z to a specific interval, which is equivalent to applying a z -related scaling factor to the corresponding column of W . Let the dimension of x be N; for 1 n N , we construct the scaling parameter
i n = y n z n , z n 0 0 , z n = 0
Thus, all kinds of σ except Sigmoid can be substituted by i (0 cannot be scaled to 0.5 by i). Construct i = i 1 , , i N , 1 T ,
y = ( y , 1 ) T = i T W T x
where “∘” is the Hadamard product. A different x relates to different i , which yields different equivalent matrices. All possible constructions of the equivalent matrices substitute multiple linear AEs for the nonlinear AE. For instance, when σ is ReLU, i n = 1 for z n > 0 and i n = 0 for z n 0 . For different x, each i has two possible values, so i has a total of 2 N possible values. When σ is tanh, i n has a different value for a different z n . Therefore, it is almost impossible to find different x corresponding to the same i. The possible values of i will depend on the number of samples. The linear AEs have an identical loss function to the nonlinear AE, and the added dimension of x and y does not generate additional gradients during the error backpropagation process, so these linear AEs converge to the global minima when the nonlinear AE reach the global minima. According to Section 2.2, the encoder and decoder weights are tuples of mutual M-P inverse.
The above conclusion also applies to stacked AE. The stacked AE can be regarded as applying multiple equivalent matrices i T W to x . The product of these matrices can be regarded as the single equivalent matrix, so the weights of the global minimum converged stacked AE weights can also be divided into tuples of M-P inverses.

3.2. Convolutional AE for Prototype Learning

The convolutional and transposed convolutional layer can be transformed into equivalent fully connected layers, so the above conclusion can be generalized to convolutional AE. By inputting a feature map X of size ( H , W ) into a convolutional layer with kernel size K and stride 1, the following output can be obtained:
z i j = i = 1 H K + 1 j = 1 W K + 1 b i j + k 1 = 1 K k 2 = 1 K w k 1 k 2 x i + k 1 j + k 2
This process can be implemented by constructing an equivalent matrix. Construct w k = w k 1 , , w k K , 0 , , 0 T , k = 1 , 2 , , K 1 of dimension W and w K = w K 1 , w K 2 , , w K K T of dimension K, and put them together as w = ( w 1 , , w K ) T . Stack w along the principle diagonal to obtain
A = w 0 0 w
of size ( K W , W K + 1 ) . By stacking A along the principle diagonal and filling the rest with zero matrix O of size ( W , W K + 1 ) , we obtain
W = A O O O A O O O A
of size ( H W + 1 , ( W K + 1 ) ( H K + 1 ) ) . Construct x with elements of X and 1 as the last dimension; the convolution is equivalent to
z = ( z , 1 ) T = W T x
Figure 2 visualizes the above process. The equivalent matrix of the transposed convolutional layer can be obtained by removing the transpose on W .

3.3. MSNN Model

The structure of MSNN is shown in Figure 3. For the input x , decoder output x new , code h , and classifier output y , the working process of network modules is as follows:
h = Encoder ( x )
h i new = Syn N ( h i ) , i = 1 , 2 , , M
x new = Decoder ( h new )
y = Classifier ( h )
We use convolutional layers as the main body of the coders and add a fully connected layer adjacent to h to integrate or recover features of the feature maps. h is equally partitioned into M subcodes h 1 to h M and input in M SNNs for N iterations.
We introduce multiple SNNs to ensure the reconstruction quality of the decoder by extending the possible values of h new . The convergence target of SNN is one-hot, i.e., one code equals one, and other codes equal zero. Let the dimension of the code be H. When a single SNN is used to receive h , there are only H different positions for the value one, and thus the total number of possible codes is also H. The limited code number means that many samples correspond to one code; however, only one code can be reconstructed to one output. Such an output is usually composed of the common features among those samples, which makes it difficult to cover the representative information. Features lacking representation will damage both the visualization and the recognition performances. When multiple SNNs receive h , each SNN has H / M positions for the value one. Therefore, the total number of h new obtained by M SNNs increases to ( H / M ) M . The total number of codes is exponential to M, so the one-to-one correspondence between the input and h new can be achieved by expanding the value of M.
The previous sections only prove that the stacked AE and convolutional AE weights are tuples of M-P inverses when the global minimum is reached, yet the actual convergence state requires additional verification. From Section 2.2, the weights of the decoder consist of the eigenvectors of the covariance matrix. Since the covariance matrix is symmetric and full rank, the eigenvectors are orthogonal to each other. From the property of the M-P inverse, if a matrix V has linearly independent columns, V V + is the identity matrix. Extending this conclusion, the product of the decoder’s equivalence matrix and the encoder’s equivalence matrix is also the identity matrix when the stacked or convolutional AE reaches the global minimum. Therefore, the code remains unchanged after recoding:
h new = Syn N Encoder Decoder h new = Syn N Encoder x new
We introduce the verification term from the equation above to quantify the convergence state.
P = MSE h new , Syn N ( Encoder ( x new ) )
MSE is the mean squared error. The closer the P is to zero, the better the parameters are trained. Note that the loss function of the coder is still M S E ( x , x new ) . P is only for convergence verification; it is not involved in any parameter tuning process.
For error backpropagation, Syn repeatedly imposes a polynomial function onto the input, which may lead to the gradient exploding or vanishing. The gradient problem is so severe that conventional means such as gradient clipping can barely circumvent the non-convergence. Therefore, we used the gradient bypass technique [32,33] from the incomputable partial solution (Figure 3 red arrow). This technique passes the gradient of the certain network layer and outputs directly to the input during backpropagation in order to circumvent the inappropriate activation functions causing the gradient to explode or vanish and even the gradient intransmissible caused by discontinuity.
The visualization of extracted features can be easily achieved using the recall of MSNN. Although the MSNN prototypes can be derived theoretically by calculating the decoder’s equivalent matrices, the number of equivalence matrices is enormous and difficult to filter. From Section 3.1, the number of equivalent matrices depends on i. i is exponential to the number of neurons with ReLU activation, and even infinite with other activations, so the calculation of equivalence matrices is time-consuming or even non-traversable. In addition, most of these matrices will not occur in the application due to the limited number of training samples and the intrinsic connection among codes; therefore, it is inefficient and difficult to discover the core prototypes by this approach. Alternatively, we derived the visualization results by inputting a specific code to the decoder. For example, one-hot codes were input to observe individual prototypes, and clustering centers of inner-class codes were input to observe their common features.

4. Experiments

4.1. MSTAR Dataset Configuration

We designed experiments using SAR images of 10 different military carriers from the MSTAR dataset [29]. The carriers included 2S1, BMP2, BRDM2, BTR60, BTR70, D7, T62, T72, ZIL131, and ZSU234. X-band SAR generates these images with a resolution of 0.3 m × 0.3 m. The images acquired from 17° and 15° were used as the training and test sets, respectively, and there were 2747 and 2426 samples with a near-uniform distribution; details are shown in Table 1. Since the images of different classes varied in size, although the pivotal features were located in the center of the image, we normalized all image sizes to 256 × 256 and used the 64 × 64 pixels in the center as the network input. To minimize the differences in data distribution between the training and test sets, we applied affine transformations to train samples, including random rotation (−15°,15°) and random shear (−15°,15°) in the horizontal and vertical directions. The batch size was 256.

4.2. MSNN Configuration

The encoder of the MSNN contained 4 convolutional layers and 1 fully connected layer with the channel configuration 32–64–64–64. This channel configuration ensures that the code dimension is not larger than the input and avoids overparameterization of AE. The kernel size was 4, the stride was 1, and the padding was 1. Due to the limitation of the transpose convolutional layer, having an identical configuration to the convolutional layer is not enough to obtain an output with an equal size to the input. Therefore, we used the common structure to ensure identical data size. Each convolutional layer was followed by a dropout of D c o n v . The neuron number of the fully connected layer equals its input dimension. The encoder’s output h was divided into M parts, input to the SNN, and iterated I times. The hyperparameter γ = 1. SNN works properly when γ is no greater than 1 [19], so this configuration allows for the fastest convergence. To improve the efficiency of the SNN and avoid the error from the all-zero input, we normalized the input before each iteration and directly output the all-zero code. The decoder and encoder were configured symmetrically. All network layers use the ReLU activation function except the activation-free output layer of the decoder. The classifier inputs h and outputs the category label prediction. To show that the feature extraction of SNN substantially reduces the recognition difficulty, the classifier contains only 1 fully connected layer with a pre-layer dropout of D c l a s s . The weights of all network layers were initialized using Xavier uniform distribution to match the ReLU activation. The MSE loss function was used for the feature reconstruction task, and the cross-entropy loss function was used for the feature recognition task. The network iterated 120 epochs. Since the feature reconstruction task is simpler than recognition, the recognition task is ignored by MSNN when the training ratio of the two is 1:1. We determined the training ratio of reconstruction to recognition to be 1:10 after pre-experiments. The optimizer was AdamW [34] for faster training, and the learning rate was adjusted using the OneCycle [35] approach to escape the local minimum with the defaulted maximum 1 × 10 3 . Gradients were cropped to a defaulted upper limit of 1 under Euclidean length to avoid accidental network divergence. The optimal value of the letter-denoted hyperparameters is discussed in the next section.

4.3. Recognition Results Analysis

The feature recognition results of MSNN were optimized with a hyperparameter grid search with ablation test, as shown in Table 2, and compared with other methods in Table 3, and the confusion matrix is shown in Table 4. The validation term P reduced from the order of 1 × 10 7 to 1 × 10 8 , which satisfies the application premise. The optimal result of MSNN was obtained at epoch 71 by the early stopping technique. From Table 2, SNN improves the AE recognition performance for all dropout rate configurations. A larger dropout rate needs to correspond to a larger code segment number with fewer iterations to obtain a better performance. Since deep learning is highly dependent on the amount of data and is sensitive to noise, data augmentation and denoising are widely used in these methods to reduce the application difficulty and obtain an accuracy close to 100%. To maintain a uniform task difficulty compared to other studies, we only compared the results with studies that did not use data manipulation. As a result, MSNN outperformed other methods and achieved results with data augmentation. In addition, the recognition result of MSNN achieved 100% recognition accuracy in five classes, which is also a new state-of-the-art result.
For the MSNN training speed, the network contains 4 convolutional layers, 4 transposed convolutional layers, 1 fully connected layer, and a set of SNN with 16 iterations. SNNs work in parallel and can be approximated as 16 network layers. Therefore, the training speed of the MSNN is roughly equivalent to that of a 25-layer deep network. We used an RTX 3070 graphics card and CUDA 11.6 + pytorch 1.12 environment to compare the time required for one epoch of MSNN with several other methods, and the results are shown in Table 5. The training speed of MSNN is moderate. Although the recurrent architecture prolongs the training time to some extent, it still takes less time than DeepMemory, which has more layers to train.

4.4. Prototype Analysis

We first evaluated the effectiveness of the prototypes by counting the average percentage of one-hot codes and zero vectors of subcodes. One-hot represents one prototype that learns the feature. The higher the percentage, the better the effectiveness. The total percentage of the one-hot and zero vectors can characterize the convergence of each SNN. The higher the percentage, the more networks reach convergence. The statistics of the training and test sets are shown in Figure 4. Almost all subcodes are one-hot or zero-vector forms, which means that the MSNN prototype can learn effectively. The proportions of both codes in the train and test sets are similar, reflecting the consistent representation of prototypes. The optimal recognition accuracy is achieved at a low one-hot ratio, implying that reducing the sparsity of the code can improve the recognition performance.
We then visualized the prototype patterns obtained by the MSNN self-learning. Since the prototypes of each SNN capture partial sample features and their combination is nonlinear, the visualization of single prototypes is vague and indistinguishable. Therefore, we clustered similar codes of inner-class training samples by K-means using K = 8 and visualized the binarized clustering centers. The results are shown in Figure 5. We also listed the most similar images from the train-set to judge whether the visualization results are merely the simple reconstruction of the input samples. The comparisons show that some visualizations significantly differ from the most similar samples, indicating that the prototypes can learn key angle features that are not covered in the dataset.

5. Conclusions

In this paper, we proposed the use of an AE-SNN fusion model MSNN for the feature recognition task of SAR images. AE solves SNN’s lack of prototype nonlinear learning, and SNN improves the efficiency and stability of AE coding regularization and offers a simple and effective feature visualization approach. Experiments on the unpreprocessed MSTAR dataset showed that MSNN obtained the optimal feature recognition performance, and the feature visualization results show that these excellent results originate from the network’s effective prototype learning, which spontaneously captures representative features that are not covered in the training set. It is worth noting that the feature extraction and visualization of MSNN are based on the premise that the critical information is located at the center of the SAR image. For data with a more complex distribution, the features learned by MSNN will be more abstract, so more sophisticated visualization methods need to be developed to ensure performance. In addition, the recognition performance of MSNN on partially missing, mislabeled, and new category samples also needs more studies to achieve a broader application. Target recognition by the fusion of SAR images and high-resolution range profile (HRRP) is a research hotspot. We plan to apply MSNN to SAR-HRRP feature matching and feature fusion to further improve the recognition accuracy.

Author Contributions

Conceptualization, H.L. and L.M.; methodology, Z.W.; software, Z.W.; validation, Z.W., H.L. and L.M.; formal analysis, Z.W.; investigation, Z.W.; resources, Z.W.; data curation, Z.W.; writing—original draft preparation, Z.W.; writing—review and editing, Z.W.; visualization, Z.W.; supervision, Z.W.; project administration, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.sdms.afrl.af.mil/index.php?collection=mstar, accessed on 1 March 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cong, Y.; Chen, B.; Liu, H.; Jiu, B. Nonparametric Bayesian Attributed Scattering Center Extraction for Synthetic Aperture Radar Targets. IEEE Trans. Signal Process. 2016, 64, 4723–4736. [Google Scholar] [CrossRef]
  2. Gao, F.; Mei, J.; Sun, J.; Wang, J.; Yang, E.; Hussain, A. Target detection and recognition in SAR imagery based on KFDA. J. Syst. Eng. Electron. 2015, 26, 720–731. [Google Scholar]
  3. Pei, J.; Huang, Y.; Huo, W.; Wu, J.; Yang, J.; Yang, H. SAR Imagery Feature Extraction Using 2DPCA-Based Two-Dimensional Neighborhood Virtual Points Discriminant Embedding. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 2206–2214. [Google Scholar] [CrossRef]
  4. Song, S.; Xu, B.; Yang, J. SAR target recognition via supervised discriminative dictionary learning and sparse representation of the SAR-HOG feature. Remote Sens. 2016, 8, 683. [Google Scholar] [CrossRef] [Green Version]
  5. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [Green Version]
  6. Ni, J.C.; Xu, Y.L. SAR automatic target recognition based on a visual cortical system. In Proceedings of the Proceedings of the 2013 6th International Congress on Image and Signal Processing (CISP 2013), Hangzhou, China, 16–18 December 2013; Volume 2, pp. 778–782. [Google Scholar] [CrossRef]
  7. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  8. Kang, M.; Ji, K.; Leng, X.; Xing, X.; Zou, H. Synthetic aperture radar target recognition with feature fusion based on a stacked autoencoder. Sensors 2017, 17, 192. [Google Scholar] [CrossRef] [Green Version]
  9. Shao, Z.; Zhang, L.; Wang, L. Stacked Sparse Autoencoder Modeling Using the Synergy of Airborne LiDAR and Satellite Optical and SAR Data to Map Forest Above-Ground Biomass. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 5569–5582. [Google Scholar] [CrossRef]
  10. Chen, X.; Deng, J. A Robust Polarmetric SAR Terrain Classification Based on Sparse Deep Autoencoder Model Combined with Wavelet Kernel-Based Classifier. IEEE Access 2020, 8, 64810–64819. [Google Scholar] [CrossRef]
  11. Wang, J.; Qin, C.; Yang, K.; Ren, P. A SAR Target Recognition Algorithm Based on Guided Filter Reconstruction and Denoising Sparse Autoencoder. Binggong Xuebao/Acta Armamentarii 2020, 41, 1861–1870. [Google Scholar] [CrossRef]
  12. Geng, J.; Fan, J.; Wang, H.; Ma, X.; Li, B.; Chen, F. High-Resolution SAR Image Classification via Deep Convolutional Autoencoders. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2351–2355. [Google Scholar] [CrossRef]
  13. Guo, J.; Wang, L.; Zhu, D.; Hu, C. Compact convolutional autoencoder for SAR target recognition. IET Radar Sonar Navig. 2020, 14, 967–972. [Google Scholar] [CrossRef]
  14. Seyfioǧlu, M.S.; Özbayoǧlu, A.M.; Gürbüz, S.Z. Deep convolutional autoencoder for radar-based classification of similar aided and unaided human activities. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 1709–1723. [Google Scholar] [CrossRef]
  15. Sun, Z.; Xue, L.; Xu, Y. Recognition of SAR target based on multilayer auto-encoder and SNN. Int. J. Innov. Comput. Inf. Control 2013, 9, 4331–4341. [Google Scholar]
  16. Amiri, M.; Davande, H.; Sadeghian, A.; Chartier, S. Feedback associative memory based on a new hybrid model of generalized regression and self-feedback neural networks. Neural Netw. 2010, 23, 892–904. [Google Scholar] [CrossRef]
  17. Chartier, S.; Proulx, R. NDRAM: Nonlinear dynamic recurrent associative memory for learning bipolar and nonbipolar correlated patterns. IEEE Trans. Neural Netw. 2005, 16, 1393–1400. [Google Scholar] [CrossRef]
  18. Liu, J.; Gong, M.; He, H. Deep associative neural network for associative memory based on unsupervised representation learning. Neural Netw. 2019, 113, 41–53. [Google Scholar] [CrossRef]
  19. Haken, H. Synergetic Computers and Cognition: A Top-Down Approach to Neural Nets; Springer: Berlin/Heidelberg, Germany, 1991. [Google Scholar]
  20. Wang, H.; Yu, Y.; Wen, G.; Zhang, S.; Yu, J. Global stability analysis of fractional-order Hopfield neural networks with time delay. Neurocomputing 2015, 154, 15–23. [Google Scholar] [CrossRef]
  21. Wu, A.; Zeng, Z.; Song, X. Global Mittag-Leffler stabilization of fractional-order bidirectional associative memory neural networks. Neurocomputing 2016, 177, 489–496. [Google Scholar] [CrossRef]
  22. Yang, Z.; Zhang, J. Global stabilization of fractional-order bidirectional associative memory neural networks with mixed time delays via adaptive feedback control. Int. J. Comput. Math. 2020, 97, 2074–2090. [Google Scholar] [CrossRef]
  23. Zhao, T.; Tang, L.H.; Ip, H.H.; Qi, F. On relevance feedback and similarity measure for image retrieval with synergetic neural nets. Neurocomputing 2003, 51, 105–124. [Google Scholar] [CrossRef]
  24. Wong, W.M.; Loo, C.K.; Tan, A.W. Parameter controlled chaotic synergetic neural network for face recognition. In Proceedings of the 2010 IEEE Conference on Cybernetics and Intelligent Systems (CIS 2010), Singapore, 28–30 June 2010; pp. 58–63. [Google Scholar] [CrossRef]
  25. Huang, Z.; Chen, Y.; Shi, X. A parallel SRL algorithm based on synergetic neural network. J. Converg. Inf. Technol. 2012, 7, 1–8. [Google Scholar] [CrossRef]
  26. Huang, Z.; Chen, Y.; Shi, X. A synergetic semantic role labeling model with the introduction of fluctuating force accompanied with word sense information. Intell. Data Anal. 2017, 21, 5–18. [Google Scholar] [CrossRef]
  27. Haken, H.P. Synergetics. IEEE Circuits Devices Mag. 1988, 4, 3–7. [Google Scholar] [CrossRef]
  28. Baldi, P.; Hornik, K. Neural networks and principal component analysis: Learning from examples without local minima. Neural Netw. 1989, 2, 53–58. [Google Scholar] [CrossRef]
  29. Ross, T.D.; Mossing, J.C. MSTAR evaluation methodology. In Proceedings of the AeroSense’99, Orlando, FL, USA, 5–9 April 1999. [Google Scholar] [CrossRef]
  30. Moore, E.H. On the reciprocal of the general algebraic matrix. Bull. Am. Math. Soc. 1920, 26, 394–395. [Google Scholar]
  31. Penrose, R. A generalized inverse for matrices. Math. Proc. Camb. Philos. Soc. 1955, 51, 406–413. [Google Scholar] [CrossRef] [Green Version]
  32. Van Den Oord, A.; Vinyals, O.; Kavukcuoglu, K. Neural discrete representation learning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6307–6316. [Google Scholar]
  33. Razavi, A.; van den Oord, A.; Vinyals, O. Generating diverse high-fidelity images with VQ-VAE-2. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
  34. Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  35. Smith, L.N.; Topin, N. Super-convergence: Very fast training of neural networks using large learning rates. In Proceedings of the SPIE Defense + Commercial Sensing, Baltimore, MD, USA, 14–18 April 2019; p. 36. [Google Scholar] [CrossRef] [Green Version]
  36. Chen, S.; Wang, H.; Xu, F.; Jin, Y.Q. Target Classification Using the Deep Convolutional Networks for SAR Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4806–4817. [Google Scholar] [CrossRef]
  37. Pei, J.; Huang, Y.; Huo, W.; Zhang, Y.; Yang, J.; Yeo, T.S. SAR automatic target recognition based on multiview deep learning framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2196–2210. [Google Scholar] [CrossRef]
  38. Zhong, C.; Mu, X.; He, X.; Wang, J.; Zhu, M. SAR Target Image Classification Based on Transfer Learning and Model Compression. IEEE Geosci. Remote Sens. Lett. 2019, 16. [Google Scholar] [CrossRef]
  39. Shang, R.; Wang, J.; Jiao, L.; Stolkin, R.; Hou, B.; Li, Y. SAR Targets Classification Based on Deep Memory Convolution Neural Networks and Transfer Parameters. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2834–2846. [Google Scholar] [CrossRef]
  40. Zhu, H.; Wang, W.; Leung, R. SAR target classification based on radar image luminance analysis by deep learning. IEEE Sens. Lett. 2020, 4, 7000804. [Google Scholar] [CrossRef]
Figure 1. SNN’s convergence to stationary points. Curves with different colors represent different order parameters. In the figure, (a,b) converge to the target stable point, and the positive or negative one-hot, i.e., one-order parameter converges to ± 1 , while others converge to 0; (c) converges to the saddle point that stems from multiple identical extreme values in ξ ; (d) converges to the local maximum point that stems from the zero-vector initialization of ξ . Note that this chart is only for representation. The divide-by-0 error terminates the iteration in the update formula.
Figure 1. SNN’s convergence to stationary points. Curves with different colors represent different order parameters. In the figure, (a,b) converge to the target stable point, and the positive or negative one-hot, i.e., one-order parameter converges to ± 1 , while others converge to 0; (c) converges to the saddle point that stems from multiple identical extreme values in ξ ; (d) converges to the local maximum point that stems from the zero-vector initialization of ξ . Note that this chart is only for representation. The divide-by-0 error terminates the iteration in the update formula.
Sensors 23 02820 g001
Figure 2. The equivalent matrix construction for the convolutional layer. The left side shows how the convolutional layer works, and the output is summed by bitwise multiplication between the convolution kernel and the input that it covers. The right side shows the construction method of the equivalent matrix W . The elements of z have the same operation procedure as the left side.
Figure 2. The equivalent matrix construction for the convolutional layer. The left side shows how the convolutional layer works, and the output is summed by bitwise multiplication between the convolution kernel and the input that it covers. The right side shows the construction method of the equivalent matrix W . The elements of z have the same operation procedure as the left side.
Sensors 23 02820 g002
Figure 3. The network structure of MSNN. The encoder inputs x and outputs code h . h is divided into M subcodes before inputting to M SNNs and obtaining the updated code h new . The decoder inputs h new and outputs x new . Meanwhile, h is input to the classifier to obtain the label y . Parameters are trained using the error backpropagation. The loss function of the coders is M S E ( x , x new ) (MSE is mean squared error), and the loss function of the classifier is C r o s s E n t r o p y ( y , label ) . The gradient of h new from error backpropagation bypasses SNN and directly transmits to h . The training of the coders is monitored by the value of P. P approaching zero means good training.
Figure 3. The network structure of MSNN. The encoder inputs x and outputs code h . h is divided into M subcodes before inputting to M SNNs and obtaining the updated code h new . The decoder inputs h new and outputs x new . Meanwhile, h is input to the classifier to obtain the label y . Parameters are trained using the error backpropagation. The loss function of the coders is M S E ( x , x new ) (MSE is mean squared error), and the loss function of the classifier is C r o s s E n t r o p y ( y , label ) . The gradient of h new from error backpropagation bypasses SNN and directly transmits to h . The training of the coders is monitored by the value of P. P approaching zero means good training.
Sensors 23 02820 g003
Figure 4. The averaged percentage of train and test set subcodes reaching one-hot or zero vectors. The total percentage rises to nearly 1 at the early stage of the training.
Figure 4. The averaged percentage of train and test set subcodes reaching one-hot or zero vectors. The total percentage rises to nearly 1 at the early stage of the training.
Sensors 23 02820 g004
Figure 5. The prototype visualization results (bottom) and their most similar training samples (top). Some visualizations significantly differ from the most similar samples.
Figure 5. The prototype visualization results (bottom) and their most similar training samples (top). Some visualizations significantly differ from the most similar samples.
Sensors 23 02820 g005
Table 1. The number of MSTAR train and test set samples.
Table 1. The number of MSTAR train and test set samples.
ClassTypeTrain Num.Test Num.
2S1b01299274
BMP29563233195
BRDM2E-71298274
BTR60k10yt7532256195
BTR70C71233196
D792v13015299274
T62A51299273
T72132232196
ZIL131E12299274
ZSU234d08299274
Table 2. The grid search results of code segmentation number M, SNN iteration I, and two dropout rates D c o n v and D c l a s s . SNN is ablated when I is zero, and the dropout layer is ablated when the corresponding D is zero. Results, ordered from largest to smallest, are labeled on a red–white–blue scale.
Table 2. The grid search results of code segmentation number M, SNN iteration I, and two dropout rates D c o n v and D c l a s s . SNN is ablated when I is zero, and the dropout layer is ablated when the corresponding D is zero. Results, ordered from largest to smallest, are labeled on a red–white–blue scale.
(a) D conv = 0.0 , D class = 0.0
I 08163264
M
32 96.9997.6697.6398.26
64 96.8097.0298.2996.50
12897.5196.5196.9996.6796.94
256 98.1198.1396.7098.21
512 97.5397.3997.4197.87
(b) D conv = 0.1 , D class = 0.0
I 08163264
M
32 96.3797.1897.7498.18
64 98.7096.6897.5097.69
12897.6696.9997.0996.7898.35
256 97.3398.4497.9998.09
512 98.0897.6796.6997.61
(c) D conv = 0.0 , D class = 0.75
I 08163264
M
32 97.6398.4597.2098.27
64 97.9497.6898.5797.15
12897.8098.2098.3397.5298.31
256 98.4997.9497.1098.02
512 97.3498.0697.8198.11
(d) D conv = 0.1 , D class = 0.75
I 08163264
M
32 98.3598.4798.1598.43
64 98.7297.9898.6098.43
12898.0098.4398.5298.6098.64
256 98.5298.9398.5298.27
512 98.3598.1598.5698.64
Table 3. Recognition results comparison. The results in the upper part of the table do not use data manipulation methods, and the lower methods are labeled with specific approaches.
Table 3. Recognition results comparison. The results in the upper part of the table do not use data manipulation methods, and the lower methods are labeled with specific approaches.
MethodAcc. (%)
A-ConvNet [36]96.49
2-VDCNN [37]97.81
3-VDCNN [37]98.17
Pruned-70 [38]98.39
CCAE [13]98.59
MSNN (proposed)98.931
A-ConvNet with data augmentation [36]99.13
DeepMemory with data augmentation [39]99.71
LADL with data denoising [40]99.99 1
1 Optimal result.
Table 4. The confusion matrix of the recognition results.
Table 4. The confusion matrix of the recognition results.
Class2S1BMP2BRD M2BTR 60BTR 70D7T62T72ZIL 131ZSU 234Acc. (%)
2S126305000501095.99
BMP2019600000000100
BRDM221266120004097.08
BTR6000419100000097.95
BTR70000019600000100
D701100272000099.27
T62000000273000100
T7200001001950099.49
ZIL131000000002740100
ZSU234000000000274100
Total (%)98.93
Table 5. The training time of different networks in one epoch.
Table 5. The training time of different networks in one epoch.
MethodTime (s)
LADL2.7
A-ConvNet3.1
CCAE4.3
MSNN (proposed)4.6
DeepMemory9.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Z.; Li, H.; Ma, L. Modern Synergetic Neural Network for Synthetic Aperture Radar Target Recognition. Sensors 2023, 23, 2820. https://doi.org/10.3390/s23052820

AMA Style

Wang Z, Li H, Ma L. Modern Synergetic Neural Network for Synthetic Aperture Radar Target Recognition. Sensors. 2023; 23(5):2820. https://doi.org/10.3390/s23052820

Chicago/Turabian Style

Wang, Zihao, Haifeng Li, and Lin Ma. 2023. "Modern Synergetic Neural Network for Synthetic Aperture Radar Target Recognition" Sensors 23, no. 5: 2820. https://doi.org/10.3390/s23052820

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop