Next Article in Journal
Adaptive Square-Root Unscented Kalman Filter Phase Unwrapping with Modified Phase Gradient Estimation
Previous Article in Journal
Estimation of Aerosol Extinction Coefficient Using Camera Images and Application in Mass Extinction Efficiency Retrieval
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mixed Structure with 3D Multi-Shortcut-Link Networks for Hyperspectral Image Classification

1
Ural Institute, North China University of Water Resource and Electric Power, Zhengzhou 450045, China
2
School of Earth and Space Sciences, Peking University, Beijing 100871, China
3
Department of Applied Chemistry, Chubu University, 1200 Matsumoto-cho, Kasugai 487-8501, Japan
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(5), 1230; https://doi.org/10.3390/rs14051230
Submission received: 28 January 2022 / Revised: 17 February 2022 / Accepted: 28 February 2022 / Published: 2 March 2022
(This article belongs to the Topic Artificial Intelligence in Sensors)

Abstract

:
A hyperspectral image classification method based on a mixed structure with a 3D multi-shortcut-link network (MSLN) was proposed for the features of few labeled samples, excess noise, and heterogeneous homogeneity of features in hyperspectral images. First, the spatial–spectral joint features of hyperspectral cube data were extracted through 3D convolution operation; then, the deep network was constructed and the 3D MSLN mixed structure was used to fuse shallow representational features and deep abstract features, while the hybrid activation function was utilized to ensure the integrity of nonlinear data. Finally, the global self-adaptive average pooling and L-softmax classifier were introduced to implement the terrain classification of hyperspectral images. The mixed structure proposed in this study could extract multi-channel features with a vast receptive field and reduce the continuous decay of shallow features while improving the utilization of representational features and enhancing the expressiveness of the deep network. The use of the dropout mechanism and L-softmax classifier endowed the learned features with a better generalization property and intraclass cohesion and interclass separation properties. Through experimental comparative analysis of six groups of datasets, the results showed that this method, compared with the existing deep-learning-based hyperspectral image classification methods, could satisfactorily address the issues of degeneration of the deep network and “the same object with distinct spectra, and distinct objects with the same spectrum.” It could also effectively improve the terrain classification accuracy of hyperspectral images, as evinced by the overall classification accuracies of all classes of terrain objects in the six groups of datasets: 97.698%, 98.851%, 99.54%, 97.961%, 97.698%, and 99.138%.

1. Introduction

Hyperspectral images (HSIs) contain rich spatial and spectral information [1,2] and are widely applied in the fields of precision agriculture [3], urban planning [4], national defense construction [5], and mineral exploitation [6], among other fields. It has been very successful in allowing active users to participate in collecting, updating, and sharing the massive amounts of data that reflect human activities and social attributes [7,8,9,10].
The terrain classification of hyperspectral images is a fundamental problem for various applications, where the classification aims to assign a label with unique class attributes to each pixel in the image based on the sample features of HSI. However, an HSI is high-dimensional with few labeled samples, images between wavebands have a high correlation, and terrain objects with heterogeneous structures that may be homogeneous present the terrain classification of HSIs with huge challenges.
To this end, scholars have put forward many terrain classification algorithms for HSIs, such as support vector machine (SVM) [11,12,13,14], sparse representation [15], semi-supervised classification [16,17], extreme learning machine (ELM) [18], and artificial neural network (ANN) [19]. Unfortunately, these classification methods cannot fully utilize the spatial features of HSI data and have poor expressiveness and generalization ability, leading to relatively low classification accuracy [20,21,22].
Over the recent years, deep-learning-based image classification has achieved substantial development. Compared with traditional algorithms, it mainly utilizes a complex neural network structure to extract deep abstract features, thereby improving the accuracy of image classification. Hence, it is widely applied in target detection [23], image classification [24,25], pattern recognition [26,27,28], and other fields. In HSI classification, spatial and spectral information can be extracted and transformed via channel mapping into one-dimensional vectors for classification by using stack autoencoding (SAE) [29,30], a deep belief network (DBN) [31], a recurrent neural network (RNN) [32,33], etc. However, these methods require a large number of parameters for expression at the loss of spatial 2D features and relevance between different wavebands, resulting in low classification accuracy. As the most representative network model in deep learning, a convolutional neural network (CNN) can extract spatial and spectral features simultaneously and exhibits good performance in HSI classification [34,35,36,37,38,39,40,41]. However, with the increase in the number of network layers, CNN is prone to losing detailed information that is useful for data fitting during the training process, hence producing the phenomenon of gradient disappearance.
Although many current deep learning network structures have already achieved satisfactory classification results, it remains very difficult for them to achieve perfect classification results in the face of issues such as the curse of dimensionality, image noise, too few labeled samples, and “the same object with distinct spectra, and distinct objects with the same spectrum” [42,43,44].
Against the above-mentioned issues, this study fused shallow features and deep features by constructing a mixed structure with 3D multi-shortcut-link networks, using hybrid functions to activate neurons; utilized the global self-adaptive average pooling layer to eliminate noises; and finally implemented the terrain classification of HSI via the L-softmax loss function. The validity of the algorithm in this study was verified using six groups of hyperspectral datasets.

2. Related Work

2.1. Shortcut Link

Theories and practices using short links have been studied for a long time [45,46,47]. In the early stages, training a multi-layer perceptron was mainly achieved by adding an input linear layer to the output layer of the network [46,47]. To solve the gradients vanishing/exploding issue, Szegedy [48] and Lee [49] connected some intermediate layers to auxiliary classifiers directly. Some researchers [50,51,52,53] introduced the centralization of layer responses, gradients, and propagation errors to the shortcut links.
In [54,55], Ioffe and Szegedy composed an inception structure with a shortcut branch and a few deeper branches. He [56,57] did a series of studies on residuals and shortcuts and deduced the mathematical formula of residual networks (ResNet) in detail simultaneously. Then, Szegedy introduced ResNet to an inception structure [58]; through a large amount of parameter tuning, the training speed and performance were greatly improved.

2.2. ResNet in HSI Classification

To address the issue of model degeneration, Lu et al. [59] introduced the ResNet into HSI classification methods; the identity mapping of residual modules can be utilized to effectively address the issues of gradient disappearance and overfitting while increasing the network depth. Liu et al. [60] adopted the multilevel fusion structure to extract multiscale spatial–spectral features and introduced ResNet to map shallow features into the space of deep features, promoting the model’s classification accuracy. Meng et al. [61] made hybrid use of the dense network and ResNet to extract deeper features to enhance the expressiveness of a CNN. Zhong et al. [62] proposed an end-to-end spatial–spectral residual network to learn the continuous features of HSI data, thereby improving the classification accuracy. Cao et al. [63] used hybrid dilated convolution in a ResNet to extract deep features of a vast receptive field without increasing the computational load. Xu et al. [64] used the anti-noise loss function to improve the model’s robustness and used a single-layer dual-channel residual network to classify HSIs.
Gao et al. [65] combined multiple filters to obtain a multiscale residual network to extract multiscale features from HSIs with different receptive fields while reducing the computational load of the network. Dang et al. [66] combined depth-separable convolution with a ResNet structure to shorten the model training time while ensuring high classification accuracy. Paoletti et al. [67] proposed a deep residual network with a pyramid structure, which consisted of multiple residual modules of pyramid bottlenecks that can extract more abstract spatial–spectral features with an increase in the number of layers. Wang et al. [68] introduced the compression incentive mechanism into ResNet and utilized the attention mechanism to extract strongly discriminative features and inhibit nonessential features to enhance the classification accuracy. Zhang et al. [69] designed a fractal residual network to extract spatial–spectral information, enhancing the model’s classification ability. Xu et al. [70] introduced multiscale feature fusion and dilated convolution into a ResNet, which could fit in better with the cubic structure of HSI data and make effective use of the spatial–spectral joint information.
In this study, we constructed 3D multi-shortcut-link networks (MSLNs) on the basis of a 2D ResNet, analyzed its theoretical support in detail, and aimed to solve the problem of matter spectrum disorder and deep network model degradation, where the effectiveness of the MSLN model was demonstrated using experiments.

3. Methods

3.1. Three-Dimensional CNN

A CNN is a feed-forward neural network with a deep structure that includes convolution computation and is one of the most representative algorithms for deep learning. It usually comprises an input layer, a convolution layer, a pooling layer, a fully connected layer, and an output layer. Combining the properties of local connection and weight sharing, it can go through the end-to-end processing procedure, and in 2D images, it can utilize the deep network structure to solve complex classification problems, exhibiting an outstanding performance. However, for 3D data of HSIs, not only is the spatial–spectral information unable to be fully utilized but dimension reduction processing is typically required when using a 2D CNN for HSI classification. This would even add a large number of parameters, which dramatically decreases the computational efficiency and training speed, and more likely causes overfitting for HSIs with fewer labeled samples.
Modified on the basis of a 2D CNN, a 3D CNN [39,70] is mainly applied in video classification, action recognition, and other fields and can perform simultaneous convolution operations in three directions—height, width, and depth—for HSIs without undergoing dimension reduction processing (Figure 1), which can immediately extract spatial–spectral joint high-order features and make full use of spatial–spectral correlation information.

3.2. Residual Networks (ResNets)

With the deepening of network layers, a 3D CNN is prone to gradient dispersion or gradient explosion. Proper use of regularity initialization and the intermediate regularization layer can deepen the network, but the training set accuracy will be saturated or even decreased. A ResNet [56] alleviates the gradient disappearance problem that occurs in deep neural networks by skipping a connection in the hidden layer.
A ResNet is a hypothesis raised on the basis of identity mapping: assuming a network n with K layers is the currently optimal network, then several layers of a built deeper network should be the identity mapping from the outputs at the Kth layer of the network n. The deeper network should not underperform relative to the shallower network. If the input and output dimensions of the network’s nonlinear units are consistent, each unit can be expressed as a general formula:
  y = F ( x , { W i   } ) + x  
Here, x and y are input and output vectors of layers considered, respectively, and F(·) is a residual function. In Figure 2, there are two layers, i.e., F = W2σ(W1x), in which σ denotes the ReLU and the biases are omitted for simplifying notations. The process of F + x is an operation of a shortcut connection and element-wise addition.
The residual learning structure (Figure 2) functions by adding the output(s) from the previous layer(s) and the output computed at the current layer and inputting the result of the summation into the activation function as the output of the current layer, which addresses the degeneration of neural networks satisfactorily. A ResNet converges faster under the precondition of the same number of layers.

3.3. Activation Function

Since the gradient of the activation function (Figure 3) of the rectified linear unit (ReLU) [71] is always 0 when the input is a negative value, the ReLU neurons will not be activated after parameters are updated, leading to the “death” of some neurons during the training process.
The parametric rectified linear unit (PReLU) [72] is used to address the issue of neuronal death brought about by the ReLU function, where the parameters in it are learned through backpropagation.
The activation function of the self-exponential linear unit (SELU) [73] demonstrates high robustness against noise and enables the mean value of activation of neurons to tend to 0 so that the inputs become fixedly distributed after a certain number of layers.
Therefore, the algorithm in this study used PReLU as the activation function after the shallow multi-shortcut-link networks convolution operation, and SELU as the activation function in the deep residual structure of the block, which can make full use of hyperspectral 3D cube data.

3.4. Loss Function

In deep learning, the softmax function (Equation (2)) is usually used as a classifier, which maps the outputs of multiple neurons into the interval (0, 1). Define the ith input feature x i with label yi, fj denotes the jth data point (j∈[1, N], N is the number of classes) of the vector of class scores f, and M is the number of training data. In Equation (2), f is the activations of a fully connected layer W; f y i = W y i T x i in which W y i is the y i th column of W. Take a dichotomy as an example, i.e., ||W1|| ||x||cos(θ1) > ||W2|| ||x||cos(θ2), and thus the correct classification result of x is obtained. However, its learning ability is relatively weak for strongly discriminative features. This study adopted large-softmax as the loss function to upgrade the classification accuracy of HSI datasets.
L = 1 M   i L i   = 1 M   i l o g ( e f y i j e f j )
Large-softmax [74] is a margin-based softmax loss function, i.e., ||W1||||x||cos(θ1) ≥ ||W1|| ||x||cos(1) ≥||W2|| ||x||cos(θ2). By adding a positive integer variable m to adjust the required margin, a decision-making margin constraint is added to endow the learned features with the properties of intraclass compactness and interclass separation, as well as to effectively avoid overfitting.
The L-softmax loss function can be defined by the following expression:
L i = l o g   ( e | | W y i | |   | | x i | |   φ ( θ y i ) | | W y i | |   | | x i | |   φ ( θ y i ) + j y i e | | W j | |   | | x i | |   c o s ( θ j ) )
where φ(θ) can be expressed as:
φ ( θ )   = { cos ( m θ ) , 0     θ   π m D ( θ ) , π m     θ     π
Experiments demonstrated that the features acquired by L-softmax have more distinctive distinguishability [74,75] and achieve better results than using softmax in both classification and verification tasks.

3.5. Multi-Shortcut-Link Networks (MSLNs)

3.5.1. Analysis of a Multi-Shortcut Link

In a ResNet, the author develops an architecture that stacks building blocks of the same shortcut connecting pattern called “residual units (ResUs).” The original ResU can be computed using these formulas:
  y l   = h ( x l   ) + F ( x l   , W l   )
x l + 1   = f ( y l   )
Here, xl is the input for the lth ResU. Wl = {Wl,k|1≤kK} is a set of weights and bias corresponding to the lth ResU, and K is the number of layers in the ResU. F denotes the residual function, e.g., a stack of two 3 × 3 convolutional layers in a ResNet. This study expanded the convolutional dimension to 3 × 3 × 3. Function f is the process after element-wise addition, then operates the ReLU activation function. Function h is set as the identity mapping: h(xl) = xl.
This study mainly focused on creating a multi-shortcut-link path for propagating tensor information, not only within ResU but also through the entire network model. As mentioned above, we denote s(x0) as a shortcut link of the original ResU, which suggests   y 0   = h ( x 0   ) + F ( x 0   , W 0   ) . If f was also used as an identity mapping, that meant xl+1 ≡ yl; putting s(x0) and Equation (2) into Equation (1), and adding a multi-shortcut link S gave
x l + 1   = x l   + F ( x l   ,   W l   ) + x 0   + F ( x 0 , W 0   )
After recursion, we obtained
x l + 2   = x l + 1   + F ( x l + 1   ,   W l + 1   ) = x l   + F ( x l , W l   ) + F ( x l + 1 , W l + 1   ) + x 0   + F ( x 0 , W 0   )
x L   = x l   + i = l L 1 F ( x l   , W i   ) + L ( x 0   + F ( x 0   , W 0   ) )
Equation (9) indicates that for any deeper (L) and shallower (l) unit, the feature xL of L can be represented as the feature xl of l plus a residual operation, which is between any L and l presenting as a residual function in an MSLN.
Denoting a loss function as ξ, according to the chain rule of backpropagation, we obtained
ξ x L = ξ x L x L x l = ξ x L ( 1 + x l i = l L 1 F ( x i   ,   W i   ) )
Equation (10) exhibits that the gradient ξ x l can be decomposed into two additive terms: ξ x L and ξ x L ( x l i = l L 1 F ) which propagates information directly without any concerned weight layers and with weight layers. The additive term of ξ x L guarantees that information is propagated back to any shallower unit l directly. In another way, it implies that because the term x l i = l L 1 F cannot always be −1, the gradient ξ x l is unlikely to be canceled out for a mini-bath. This indicates that even when the weights are extremely small, the gradient of a layer also does not vanish.
This derivation reveals that if we add a shortcut link before a residual block and both h(xl) and f(yl) are identity mappings, the feature map signal could be propagated both forward and backward. This indicates that fused shallow features and deep features via multi-shortcut-link networks can are certain to obtain strongly discriminative features, which was also shown in the experiments in Section 4.2.

3.5.2. Structure of an MSLN

Based on the network structure of ResNet 18 (Table 1), this study added a convolution layer preceding each of the 2nd, 6th, 10th, and 14th layers, which is spliced in depth with the output result of the previous layer and as the input of the next convolution layer. Meanwhile, the 3D convolution operation of the original HSI cube block (H × W × C) contains the 1st, 2nd, 7th, 12th, and 17th layers in the MSLN, which implies the input shape of all these five layers are (batch size, input data, channels of HSI, kernel size, stride), and the numbers of convolution kernels were set according to Table 1, which were 16, 16, 16, 32, and 64 (Figure 4), respectively.
In order to make the MSLN more convenient and minimize the conflicts when splicing channels as much as possible, the number of convolution kernels in the block was set to 16, 32, 64, and 128, respectively. Compared with the number of channels in the ResNet (64, 128, 256, 512), the total size of parameters in the ResNet (130.20 ± 0.65 MB) was substantially decreased (10.87 ± 0.27 MB) and were greatly improved regarding the convergence speed.
As shown in Figure 5, conv i_1 (i = 1, 2, 3, 4) output were shallow features, and the feature graph had a higher resolution, which could retain more feature information and better describe the overall characteristics of the data. As the depth of the network increased, the deep features became more and more abstract. Fusing shallow features and deep features via multi-shortcut-link networks could reduce the loss of shallow features and correlation decay of gradients, boost the use ratio of features, and enhance the network’s expressiveness.
Therefore, splicing the shallow feature conv i_1 (i = 1, 2, 3, 4) in depth with the output of each residual block (conv j_1 (j = 1, 2, 3,4)) to implement the multi-shortcut-link network fusion of features across different network layers could better alleviate gradient dispersion (explosion) and even network degeneration.
Figure 4 displays the overall process of the HSI classification framework of the MSLN. It can be seen from this figure that the multi-shortcut-link network’s structure is bridged to four residual blocks (Figure 6) for a total of four times (Figure 7); then, the last output layer of the third residual block is spliced with conv4_1 as the input tensor for the first layer of the fourth residual block; after the fourth residual block is processed, the global self-adaptive average pooling downsampling is used to expand the output tensor into one-dimensional vectors via the fully connected layer to map the learned distributed features into the space of sample labels; finally, the large-softmax loss function is used for classification.
In this study, all the convolution kernels adopted the uniform size 3 × 3 × 3, which could both reduce the computational load and enlarge the receptive field of convolution operation [56].
Figure 8 shows the visualization of the MSLN structure and training process using the Botswana dataset, which covers 145 wavebands. The gray elements in the graph indicate the node is a backward operation, and the light blue elements indicate the node is a tensor that is input/output.
At the top of Figure 8, there is a tensor whose shape is (64, 1145, 3, 3), which means the batch size processed in the model was 64 and the input 3D tensor of the MSLN model was (145, 3, 3). There were five sets of bias matrices and weight matrices whose background color is light blue in the second row, corresponding to the first convolutional layer and four shortcut link convolutional layers, and the names of the rectangles are arranged in order conv1, add 1.0, add 2.0, add 3.0, and add 4.0. There are four residual blocks in the MSLN structure referred to as layerx (x = 2, 3, 4) and each layer contains four convolution layers, which were named layerx.0.conv1, layerx.0.conv2, layerx.1.conv1, and layerx.1.conv2 (x = 1, 2, 3, 4), and the downsampling operation was mainly done to manage the problem of an inconsistent number of convolution kernel channels. At the bottom of this figure, there is a green element, which is the final output of the results of the HSI classification.
Deep features are abstract and associated with a small receptive field. When the shallow features of a vast receptive field are mapped into the space of abstract features of a small receptive field, the number of parameters will grow with the increase in the number of layers, which will lead to an increased computational load, as well as a large loss of shallow representation information. The multi shortcut link networks structure proposed in this study, combined with the ResNet, can make up for the information loss of the deep network’s shallow features very well and better address the difficulty in learning deep abstract features.

4. Datasets Results and Analysis

The MSLN proposed in this study was based on the Python language and PyTorch deep learning framework, with the test environment being a Windows 10 OS with 32 GB RAM, an Intel i7-8700 CPU, and an NVIDIA Quadro P1000 4 GB GPU.

4.1. Hyperspectral Test Datasets

To validate the robustness and generalization property of the proposed algorithm, six groups of opensource datasets collected by M Graña et al. were used [76] to learn all types of labeled terrain objects without undergoing any human screening, and the ratio of training sets to validation sets to test sets was 0.09:0.01:0.9 (Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 for details).
(1)
Indian Pines (IP) Dataset
The IP dataset was acquired using AVIRIS sensors on the test site in Indiana in 1996 with an image resolution of 145 × 145 pixels, a spectral range of 0.4–2.45 μm, and a spatial resolution of 20 m. There remained 200 effective wavebands for classification after the wavebands affected by noises and suffering severe water vapor absorption were eliminated, and a total of 16 crops were labeled. This dataset was shot in June, when some crops, such as corn and soybean, were in the early growth stage. The coverage rate of less than 5% was prone to pixel mixture, leading to a significant increase in the difficulty of vegetative terrain classification.
(2)
Salinas (S) Dataset
The S dataset was shot in Salinas Valley, California, using AVIRIS sensors with an image resolution of 512 × 217 pixels, a spectral range of 0.43–0.86 μm, and a spatial resolution of 3.7 m. There remained 204 effective wavebands for classification after the wavebands affected by noises and suffering severe water vapor absorption were eliminated, and a total of 16 crops were labeled, covering vegetables, bare soils, vineyards, etc.
(3)
Pavia Centre (PC) and Pavia University (PU) Datasets
The PC and PU datasets stemmed from two scenes were captured using ROSIS sensors during a flight over Pavia in northern Italy, with 102 and 103 wavebands remaining, respectively, after the noise-affected wavebands and information-free regions were eliminated; with image resolutions of 1096 × 715 pixels and 610 × 340 pixels, respectively; and a spatial resolution of 1.3 m; both images have nine classes of labeled terrain objects, though the categories are not fully congruent.
(4)
Kennedy Space Center (KSC) Dataset
The KSC dataset was shot using AVIRIS sensors at Kennedy Space Center, Florida, on March 23, 1996, with an image resolution of 512 × 614 pixels, a spectral range of 0.4–2.5 μm, and a spatial resolution of 18 m. There remained a total of 176 wavebands for analysis after the wavebands suffering water vapor and noises were eliminated, and a total of 13 classes of terrain objects. The extremely low spatial resolution plus the similarity in the spectral signatures of some vegetation types provide considerably increased difficulty regarding terrain classification.
(5)
Botswana (B) Dataset
The B dataset was shot using Hyperion sensors at Okavango Delta in Botswana with an image resolution of 1476 × 256 pixels, a spectral range of 0.4–2.5 μm, and a spatial resolution of 30 m, covering 242 wavebands in total. The UT Space Research Center eliminated the uncalibrated and noise-affected wavebands covered with moisture absorption features, leaving a total of 145 wavebands for classification, including 14 observation data whose categories were determined, and these categories include the seasonal and sporadic swamps, dry forests, and other land cover types in the delta.
The IP dataset and S dataset both contain 6 major categories and 16 sub-categories; it is necessary to improve the discrimination between classes to improve the classification accuracy. However, the IP dataset has a lower resolution; if the training data is selected according to the above ratio, four terrain objects have no training data (alfalfa, grass-pasture-mowed, oats, and stone-steel-towers). The same situation is also reflected in the KSC dataset (hardwood swamp) and B dataset (hippo grass and exposed soils), mainly due to too few labeled samples. To reduce the validation error and test error, one sample was randomly selected among the training samples for validation.
The PC dataset and PU dataset have fewer categories, with a higher resolution and richer labeled samples, but have the issue of “distinct objects with the same spectrum.” There are 7 swamp types out of the 13 classes of terrain objects in the KSC dataset, which has the issue of “the same object with distinct spectra” and considerably increased difficulty regarding terrain classification.
In this study, the MSLN structure in the ResNet and some hyperparameter settings could resolve the above problems and improve the classification accuracy of the six groups of the datasets.

4.2. Results and Analysis

To validate the effectiveness and classification results of the MSLN 22 network structure, an ordinary 3D CNN [37] was selected as a baseline network for comparative analysis with an RNN [32], a multiscale 3D CNN (MC 3D CNN) [44], a 3D CNN residual (3D CNN Res) [41], and a 3D ResNet. The settings of the hyperparameters were as follows: batch size was set to 64, which not only mitigated gradient oscillation but also allowed for better performance of the GPU; the initial learning rate was set to 0.01 and the learning rate dropped to 0.001 after the loss function was stabilized; the maximum count of iterations epoch was set to 600; the cross-entropy was selected as the loss function in the comparison algorithm, whereas L-softmax was adopted as the loss function in the network produced in this study; the dropout was set to 0.5 after the global self-adaptive average pooling and before the fully connected layer; and 50% of the neurons were discarded at random to address the overfitting issue and enhance the model’s generalization property.
Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13 present the classification results corresponding to the six datasets. The multi-shortcut-link networks structure of the MSLN proposed in this study could extract spatial–spectral joint features and fuse shallow and deep features, and the evaluation criterion kappa coefficient, average accuracy (AA), and overall accuracy (OA) were all the highest among the networks tested. Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 each represent the comparison of the HSI classification results among all network structures.
Table 8, Table 9 and Table 10 and Figure 9, Figure 10 and Figure 11 indicate that the MSLN, which had 22 convolution layers, could extract rich deep features, and the multi-shortcut-link network’s structure also fused the shallow features of a vast receptive field and mitigated the effect of noises via the global self-adaptive average pooling layer, achieving a significant improvement and exhibiting good robustness in the classification results, with the average accuracy of classification across all terrain objects reaching 98.1% (IP dataset), 99.4% (S dataset), and 99.3% (B dataset), including the terrain objects that share similar attributes (untrained grapes, untrained vineyard, and vineyard vertical trellis) and relatively few samples in the IP dataset (alfalfa, grass-pasture-mowed, oats, and stone-steel-towers) and the B dataset (hippo grass and exposed soils).
The PC and PU datasets each have nine disjoint classes of terrain objects with excellent connectivity and whose attributes are simplex, while the connectivity is slightly inferior for the terrain objects asphalt, shadows, gravel, and bare soil, which are exposed to more noise, highlighting the issue of “distinct objects with the same spectrum” and even resulting in as low as a 67.8% classification accuracy for some algorithms. Through the double constraints of shallow and deep features, the algorithm produced in this study could improve the classification results so that the classification accuracies for the above terrain objects were 97.9%, 99.9%, 98.3%, and 99.8%, respectively, and that the object–spectrum confounding issue was addressed effectively.
The KSC dataset has 7 swamp types out of the 13 classes of terrain objects, which presented increased considerable difficulty regarding terrain classification, leading to all comparison algorithms being unable to effectively distinguish the discriminative features of different types of swamps during learning, mainly because of the “the same object with distinct spectra” circumstance being very obvious for swamps; therefore, the validation accuracies of all comparison algorithms shown in Figure 15e were highly unstable. According to the classification results, the fused features extracted via multi-shortcut-link networks by the algorithm produced in this study could express the abstract attributes of “the same object with distinct spectra” very well. The accuracy of classification results was 96% at minimum for the swamp type of terrain objects, and the classification results for salt marsh were all correct. However, the classification results remained not very high for cabbage palm and oak hammock, as well as oak and broadleaf hammock, which have extremely similar spectral signatures, mainly because these two pairs of terrain objects have quite similar shallow features and inferior connectivity, in addition to more noise in the datasets such that the classification accuracies were only 88.6% and 89.4%, respectively.
To better indicate the parameter size of each network structure, the upper limit of the vertical coordinate in Figure 16 was set to 25 MB, mainly because of the great number of channels in the 3D ResNet algorithm and the large parameter size (130.20 ± 0.65 MB) when the six datasets were learned, which also coincided with the longest training time. It can be seen from Figure 16 that the parameter size of the MSLN network structure was 10.87 ± 0.27 MB, while Figure 17 indicates that the learning time of the algorithm proposed in this study failed to increase with the increase in parameter size. This fully illustrated that the multi-shortcut-link network structure proposed in this study not only improved the overall classification accuracy but also shortened the model’s training and learning times while diminishing overfitting. Therefore, the algorithm proposed in this study took a shorter training time to achieve the highest accuracy compared with other models.

5. Conclusions

In view of the characteristics of hyperspectral images, such as few labeled samples, excess noise, and homogeneity with heterostructures, this study built a multi-shortcut-link network structure to extract the 3D spatial–spectral information of HSIs based on the properties of a 3D CNN and the shortcut link characteristics in a ResNet and tested six groups of HSI datasets by making full use of the shallow representational features and deep abstract features of HSIs. The results showed the following: (i) The MSLN could directly input the cube data of the HSIs and could then effectively extract the spatial–spectral information. The hybrid use of activation functions ensured the integrity of the nonlinear features of input data, which not only improved the use ratio of neurons but also increased the model’s rate of convergence. (ii) The multi-shortcut-link network’s structure fused shallow features and deep features, which reduced the gradient loss of deep features, solved the degeneration of the deep network satisfactorily, and enhanced the network’s generalization ability. The L-softmax loss function endowed the learned features with stronger discriminatory power, effectively addressing the issue of “the same object with distinct spectra, and distinct objects with the same spectrum” and achieving more significant classification results. Therefore, the MSLN proposed in this study could effectively improve the overall classification result.
Although the multi-shortcut-link network’s structure proposed in this study demonstrated extraordinary superiority regarding performance and classification accuracy, no discussion has ever covered the issues of information interaction and weight allocation between different channels. In future work, the attention mechanism will be introduced while the network is deepened, and the relevance between space and channels will be utilized to enhance the discriminatory power of the features of terrain objects with inferior classification accuracy to achieve higher classification accuracy.

Author Contributions

Conceptualization, M.S.; methodology, H.Z.; software, H.Z.; validation, Y.C., G.G. and X.G.; formal analysis, Y.J.; investigation, J.M.; resources, Y.C.; data curation, Y.C.; writing—original draft preparation, H.Z.; writing—review and editing, Y.J.; visualization, H.Z.; supervision, M.S.; project administration, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

List of Acronyms

HSIHyperspectral image
SVMSupport vector machine
ELMExtreme learning machine
ANNArtificial neural network
SAEStack autoencoding
DBNDeep belief network
RNNRecurrent neural network
CNNConvolutional neural network
ResNetResidual network
ResUResidual unit
MSLNMulti-shortcut-link network
ReLURectified linear unit
SELUSelf-exponential linear unit
PReLUParametric rectified linear unit
MC 3D CNNMultiscale 3D CNN
3D CNN Res3D CNN residual
AAAverage accuracy
OAOverall accuracy

References

  1. Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep Learning for Hyperspectral Image Classification: An Overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef] [Green Version]
  2. Sun, L.; Wu, F.; He, C.; Zhan, T.; Liu, W.; Zhang, D. Weighted Collaborative Sparse and L1/2 Low-Rank Regularizations with Superpixel Segmentation for Hyperspectral Unmixing. IEEE Geosci. Remote Sens. Lett. 2020, 19, 1–5. [Google Scholar]
  3. Maes, W.H.; Steppe, K. Perspectives for remote sensing with unmanned aerial vehicles in precision agriculture. Trends Plant Sci. 2019, 24, 152–164. [Google Scholar] [CrossRef]
  4. Sun, H.; Zheng, X.; Lu, X.; Wu, S. Spectral-Spatial Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3232–3245. [Google Scholar] [CrossRef]
  5. Shimoni, M.; Haelterman, R.; Perneel, C. Hypersectral Imaging for Military and Security Applications: Combining Myriad Processing and Sensing Techniques. IEEE Geosci. Remote Sens. Mag. 2019, 7, 101–117. [Google Scholar] [CrossRef]
  6. Yokoya, N.; Chan, J.C.; Segl, K. Potential of Resolution-Enhanced Hyperspectral Data for Mineral Mapping Using Simulated EnMAP and Sentinel-2 Images. Remote Sens. 2016, 8, 172. [Google Scholar] [CrossRef] [Green Version]
  7. Wu, H.; Lin, A.; Clarke, K.C.; Shi, W.; Cardenas-Tristan, A.; Tu, Z. A comprehensive quality assessment framework for linear features from Volunteered Geographic Information. Int. J. Geogr. Inf. Sci. 2021, 35, 1826–1847. [Google Scholar] [CrossRef]
  8. Lin, A.; Sun, X.; Wu, H.; Luo, W.; Wang, D.; Zhong, D.; Wang, Z.; Zhao, L.; Zhu, J. Identifying urban building function by integrating remote sensing imagery and POI data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8864–8875. [Google Scholar] [CrossRef]
  9. Chehreghan, A.; Ali Abbaspour, R. An evaluation of data completeness of VGI through geometric similarity assessment. Int. J. Image Data Fusion 2018, 9, 319–337. [Google Scholar] [CrossRef]
  10. Barrington-Leigh, C.; Millard-Ball, A. The world’s user-generated road map is more than 80% complete. PLoS ONE 2017, 12, e0180698. [Google Scholar]
  11. Camps-Valls, G.; Bruzzone, L. Kernel-based methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1351–1362. [Google Scholar] [CrossRef]
  12. Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
  13. Huang, W.; Huang, Y.; Wang, H.; Liu, Y.; Shim, H.J. Local binary patterns and superpixel-based multiple kernels for hy-perspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 4550–4563. [Google Scholar] [CrossRef]
  14. Ye, Q.; Zhao, H.; Li, Z.; Yang, X.; Gao, S.; Yin, T.; Ye, N. L1-Norm Distance Minimization-Based Fast Robust Twin Support Vector kk-Plane Clustering. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 4494–4503. [Google Scholar] [CrossRef]
  15. Ghamisi, P.; Benediktsson, J.A.; Ulfarsson, M.O. Spectral-spatial hyperspectral image classification via multiscale adaptive sparse representation. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2565–2574. [Google Scholar] [CrossRef] [Green Version]
  16. Liu, B.; Yu, X.; Zhang, P.; Tan, X.; Yu, A.; Xue, Z. semi-supervised convolutional neural network for hyperspectral image classification. Remote Sens. Lett. 2017, 8, 839–848. [Google Scholar] [CrossRef]
  17. Qin, A.; Shang, Z.; Tian, J.; Wang, Y.; Zhang, T.; Tang, Y.Y. Spectral-spatial Graph Convolutional Networks for Semisupervised Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 241–245. [Google Scholar] [CrossRef]
  18. Huang, G.B.; Ding, X.; Zhou, H. Optimization method based extreme learning machine for classification. Neurocomputing 2010, 74, 155–163. [Google Scholar] [CrossRef]
  19. Hernández-Espinosa, C.; Fernández-Redondo, M.; Torres-Sospedra, J. Some experiments with ensembles of neural networks for classification of hyperspectral images. In Proceedins of the International Symposium on Neural Networks, Dalian, China, 19–21 August 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 912–917. [Google Scholar]
  20. Yu, C.; Xue, B.; Song, M.; Wang, Y.; Li, S.; Chang, C.I. Iterative Target-Constrained Interference-Minimized Classifier for Hyperspectral Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1095–1117. [Google Scholar] [CrossRef]
  21. Luo, F.; Du, B.; Zhang, L.; Zhang, L.; Tao, D. Feature Learning Using Spatial-Spectral Hypergraph Discriminant Analysis for Hyperspectral Image. IEEE Trans. Cybern. 2019, 49, 2406–2419. [Google Scholar] [CrossRef]
  22. Yin, B.; Cui, B. Multi-feature extraction method based on Gaussian pyramid and weighted voting for hyperspectral image classification. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15–17 January 2021; IEEE: New York, NY, USA, 2021; pp. 645–648. [Google Scholar]
  23. Huang, W.; Li, G.; Chen, Q.; Ju, M.; Qu, J. CF2PN: A Cross-Scale Feature Fusion Pyramid Network Based Remote Sensing Target Detection. Remote Sens. 2021, 13, 847. [Google Scholar] [CrossRef]
  24. Zhang, X.; Lu, W.; Li, F.; Peng, X.; Zhang, R. Deep Feature Fusion Model for Sentence Semantic Matching. Comput. Mater. Contin. 2019, 61, 601–616. [Google Scholar] [CrossRef]
  25. Wu, H.; Liu, Q.; Liu, X. A Review on Deep Learning Approaches to Image Classification and Object Segmentation. Comput. Mater. Contin. 2019, 60, 575–597. [Google Scholar] [CrossRef] [Green Version]
  26. Xu, Y.; Du, B.; Zhang, L. Beyond the Patchwise Classification: Spectral-spatial Fully Convolutional Networks for Hyperspectral Image Classification. IEEE Trans. Big Data 2020, 6, 492–506. [Google Scholar] [CrossRef]
  27. Jiang, Y.; Li, Y.; Zou, S.; Zhang, H.; Bai, Y. Hyperspectral Image Classification with Spatial Consistence Using Fully Convolutional Spatial Propagation Network. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10425–10437. [Google Scholar] [CrossRef]
  28. Xu, Q.; Xiao, Y.; Wang, D.; Luo, B. CSA-MSO3DCNN: Multiscale Octave 3D CNN with Channel and Spatial Attention for Hyperspectral Image Classification. Remote Sens. 2020, 12, 188. [Google Scholar] [CrossRef] [Green Version]
  29. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep Learning-Based Classification of Hyperspectral Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 7, 2094–2107. [Google Scholar]
  30. Suk, H.-I.; Lee, S.-W.; Shen, D. Latent feature representation with stacked auto-encoder for AD/MCI diagnosis. Brain Struct. Funct. 2015, 220, 841–859. [Google Scholar] [CrossRef]
  31. Chen, Y.; Zhao, X.; Jia, X. Spectral-Spatial Classification of Hyperspectral Data Based on Deep Belief Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [Google Scholar] [CrossRef]
  32. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep Recurrent Neural Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef] [Green Version]
  33. Seydgar, M.; Alizadeh Naeini, A.; Zhang, M.; Li, W.; Satari, M. 3-D convolution-recurrent networks for spectral-spatial clas-sification of hyperspectral images. Remote Sens. 2019, 11, 883. [Google Scholar] [CrossRef] [Green Version]
  34. He, C.; Sun, L.; Huang, W.; Zhang, J.; Zheng, Y.; Jeon, B. TSLRLN: Tensor subspace low-rank learning with non-local prior for hyperspectral image mixed denoising. Signal Process. 2021, 184, 108060. [Google Scholar] [CrossRef]
  35. Sharma, V.; Diba, A.; Tuytelaars, T.; Van Gool, L. Hyperspectral CNN for Image Classification & Band Selection, with Application to Face Recognition; Technical Report: KUL/ESAT/PSI/1604; KU Leuven, ESAT: Leuven, Belgium, 2016. [Google Scholar]
  36. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep Convolutional Neural Networks for Hyperspectral Image Classification. J. Sens. 2015, 2015, 258619. [Google Scholar] [CrossRef] [Green Version]
  37. Hamida, A.B.; Benoit, A.; Lambert, P.; Amar, C.B. 3-D Deep Learning Approach for Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4420–4434. [Google Scholar] [CrossRef] [Green Version]
  38. Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef] [Green Version]
  39. Ying, L.; Haokui, Z.; Qiang, S. Spectral–Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network. Remote Sens. 2017, 9, 67. [Google Scholar]
  40. Luo, Y.; Zou, J.; Yao, C.; Zhao, X.; Li, T.; Bai, G. HSI-CNN: A Novel Convolution Neural Network for Hyperspectral Image. In Proceedins of the 2018 International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, China, 16–17 July 2018; IEEE: New York, NY, USA, 2018. [Google Scholar]
  41. Lee, H.; Kwon, H. Contextual Deep CNN Based Hyperspectral Classification. In Proceedings of the Geoscience & Remote Sensing Symposium, Beijing, China, 10–15 July 2016; IEEE: New York, NY, USA, 2016; pp. 3322–3325. [Google Scholar]
  42. He, M.; Li, B.; Chen, H. Multi-scale 3D deep convolutional neural network for hyperspectral image classification. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September2017; IEEE: New York, NY, USA, 2017; pp. 3904–3908. [Google Scholar]
  43. Sun, L.; He, C.; Zheng, Y.; Tang, S. SLRL4D: Joint restoration of subspace low-rank learning and non-local 4-D transform filtering for hyperspectral image. Remote Sens. 2020, 12, 2979. [Google Scholar] [CrossRef]
  44. Gao, H.; Chen, Z.; Li, C. Sandwich Convolutional Neural Network for Hyperspectral Image Classification Using Spectral Feature Enhancement. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3006–3015. [Google Scholar] [CrossRef]
  45. Bishop, C.M. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
  46. Ripley, B.D. Pattern Recognition and Neural Networks; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
  47. Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S-PLUS; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  48. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, R.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 17–20 June 2015; pp. 1–9. [Google Scholar]
  49. Lee, C.Y.; Xie, S.; Gallagher, P.; Zhang, Z.; Tu, Z. Deeply-supervised nets. In Proceedings of the Artificial Intelligence and Statistics, San Diego, California, USA, 9–12 May 2015; PMLR: New York, NY, USA, 2015; pp. 562–570. [Google Scholar]
  50. Raiko, T.; Valpola, H.; LeCun, Y. Deep learning made easier by linear transformations in perceptrons. In Proceedings of the Artificial intelligence and statistics, La Palma, Canary Islands, 21–23 April 2012; PMLR: New York, NY, USA, 2012; pp. 924–932. [Google Scholar]
  51. Schraudolph, N. Accelerated Gradient Descent by Factor-Centering Decomposition; Technical report/IDSIA; IDSIA: Lugano, Switzerland, 1998; p. 98. [Google Scholar]
  52. Schraudolph, N.N. Centering neural network gradient factors. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 1998; pp. 207–226. [Google Scholar]
  53. Vatanen, T.; Raiko, T.; Valpola, H.; LeCun, Y. Pushing stochastic gradient towards second-order methods–backpropagation learning with transformations in nonlinearities. In Proceedings of the International Conference on Neural Information Processing, Daegu, Korea, 3–7 November 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 442–449. [Google Scholar]
  54. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: New York, NY, USA, 2015; pp. 448–456. [Google Scholar]
  55. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  56. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  57. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar]
  58. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the thirty-first AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 4278–4284. [Google Scholar]
  59. Lu, Y.S.; Li, Y.X.; Liu, B. Hyperspectral Data Haze Monitoring Based on Deep Residual Network. Acta Opt. Sin. 2017, 37, 1128001. [Google Scholar]
  60. Liu, D.; Han, G.; Liu, P.; Yang, H.; Sun, X.; Li, Q.; Wu, J. A Novel 2D-3D CNN with Spectral-Spatial Multi-Scale Feature Fusion for Hyperspectral Image Classification. Remote Sens. 2021, 13, 4621. [Google Scholar] [CrossRef]
  61. Meng, Z.; Jiao, L.; Liang, M.; Zhao, F. Hyperspectral Image Classification with Mixed Link Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2494–2507. [Google Scholar] [CrossRef]
  62. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral-spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
  63. Cao, F.; Guo, W. Deep hybrid dilated residual networks for hyperspectral image classification. Neurocomputing 2020, 384, 170–181. [Google Scholar] [CrossRef]
  64. Xu, Y.; Li, Z.; Li, W.; Du, Q.; Liu, C.; Fang, Z.; Zhai, L. Dual-channel residual network for hyperspectral image classification with noisy labels. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–11. [Google Scholar] [CrossRef]
  65. Gao, H.; Yang, Y.; Li, C.; Gao, L.; Zhang, B. Multiscale residual network with mixed depthwise convolution for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3396–3408. [Google Scholar] [CrossRef]
  66. Dang, L.; Pang, P.; Lee, J. Depth-Wise Separable Convolution Neural Network with Residual Connection for Hyperspectral Image Classification. Remote Sens. 2020, 12, 3408. [Google Scholar] [CrossRef]
  67. Paoletti, M.E.; Haut, J.M.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.J.; Pla, F. Deep pyramidal residual networks for spectral–spatial hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 740–754. [Google Scholar] [CrossRef]
  68. Wang, L.; Peng, J.; Sun, W. Spatial–spectral squeeze-and-excitation residual network for hyperspectral image classification. Remote Sens. 2019, 11, 884. [Google Scholar] [CrossRef] [Green Version]
  69. Zhang, X.; Wang, Y.; Zhang, N.; Xu, D.; Luo, H.; Chen, B.; Ben, G. Spectral-spatial Fractal Residual Convolutional Neural Network with Data Balance Augmentation for Hyperspectral Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10473–10487. [Google Scholar] [CrossRef]
  70. Xu, H.; Yao, W.; Cheng, L.; Li, B. Multiple Spectral Resolution 3D Convolutional Neural Network for Hyperspectral Image Classification. Remote Sens. 2021, 13, 1248. [Google Scholar] [CrossRef]
  71. Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
  72. He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
  73. Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-normalizing neural networks. In Proceedings of the 31st international conference on neural information processing systems, Long Beach, CA, USA, 4–9 December 2017; pp. 972–981. [Google Scholar]
  74. Liu, W.; Wen, Y.; Yu, Z.; Yang, M. Large-Margin Softmax Loss for Convolutional Neural Networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 507–516. [Google Scholar]
  75. Zhang, Y.Z.; Xu, M.M.; Wang, X.H.; Wang, K.Q. Hyperspectral image classification based on hierarchical fusion of residual networks. Spectrosc. Spectr. Anal. 2019, 39, 3501–3507. [Google Scholar]
  76. Graña, M.; Veganzons, M.A.; Ayerdi, B. Hyperspectral Remote Sensing Scenes. Available online: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes (accessed on 6 October 2021).
Figure 1. The theory of a 3D CNN.
Figure 1. The theory of a 3D CNN.
Remotesensing 14 01230 g001
Figure 2. The structure of residual learning.
Figure 2. The structure of residual learning.
Remotesensing 14 01230 g002
Figure 3. Activation function.
Figure 3. Activation function.
Remotesensing 14 01230 g003
Figure 4. Parameter settings of the multi-shortcut-link convolution kernel.
Figure 4. Parameter settings of the multi-shortcut-link convolution kernel.
Remotesensing 14 01230 g004
Figure 5. The flowchart of the MSLN 22 HSI classification framework.
Figure 5. The flowchart of the MSLN 22 HSI classification framework.
Remotesensing 14 01230 g005
Figure 6. The structure of Blocki (i = 1, 2, 3, 4).
Figure 6. The structure of Blocki (i = 1, 2, 3, 4).
Remotesensing 14 01230 g006
Figure 7. The structure of the MSLN.
Figure 7. The structure of the MSLN.
Remotesensing 14 01230 g007
Figure 8. Network structure/training process visualization.
Figure 8. Network structure/training process visualization.
Remotesensing 14 01230 g008
Figure 9. Classification maps and overall accuracy from using different methods on the Indian Pines dataset.
Figure 9. Classification maps and overall accuracy from using different methods on the Indian Pines dataset.
Remotesensing 14 01230 g009
Figure 10. Classification maps and overall accuracy from using different methods on the Salinas dataset.
Figure 10. Classification maps and overall accuracy from using different methods on the Salinas dataset.
Remotesensing 14 01230 g010
Figure 11. Classification maps and overall accuracy from using different methods on the Botswana dataset.
Figure 11. Classification maps and overall accuracy from using different methods on the Botswana dataset.
Remotesensing 14 01230 g011
Figure 12. Classification maps and overall accuracy from using different methods on the Pavia Centre dataset.
Figure 12. Classification maps and overall accuracy from using different methods on the Pavia Centre dataset.
Remotesensing 14 01230 g012
Figure 13. Classification maps and overall accuracy from using different methods on the Pavia University dataset.
Figure 13. Classification maps and overall accuracy from using different methods on the Pavia University dataset.
Remotesensing 14 01230 g013aRemotesensing 14 01230 g013b
Figure 14. Classification maps and overall accuracy from using different methods on the KSC dataset.
Figure 14. Classification maps and overall accuracy from using different methods on the KSC dataset.
Remotesensing 14 01230 g014
Figure 15. (af) Comparison of epoch validation accuracy and overall accuracy for different datasets.
Figure 15. (af) Comparison of epoch validation accuracy and overall accuracy for different datasets.
Remotesensing 14 01230 g015aRemotesensing 14 01230 g015bRemotesensing 14 01230 g015c
Figure 16. Memory usage of different datasets by different methods.
Figure 16. Memory usage of different datasets by different methods.
Remotesensing 14 01230 g016
Figure 17. Time used on different datasets by different methods.
Figure 17. Time used on different datasets by different methods.
Remotesensing 14 01230 g017
Table 1. Convolutional layer of MSLN 22 and ResNet 18.
Table 1. Convolutional layer of MSLN 22 and ResNet 18.
Layer_Name18-Layer ResNet
Kernel_Size Kernel_Number Stride
Layer_Name22-Layer MSLN
Kernel_Size Kernel_Number Stride
conv 1 7 × 7 64 S 2 conv 1 3 × 3 × 3 16 S 1
conv 1_1 3 × 3 × 3 16 S 1
Block1 [ 3 × 3 64 S 1 3 × 3 64 S 1 ] × 2 Block1 [ 3 × 3 × 3 16 S 1 3 × 3 × 3 16 S 1 ] × 2
Block2 [ 3 × 3 128 S 1 3 × 3 128 S 1 ] × 2 conv 2_1 3 × 3 × 3 16 S 1
Block2 [ 3 × 3 × 3 32 S 1 3 × 3 × 3 32 S 1 ] × 2
Block3 [ 3 × 3 256 S 1 3 × 3 256 S 1 ] × 2 conv 3_1 3 × 3 × 3 32 S 1
Block3 [ 3 × 3 × 3 64 S 1 3 × 3 × 3 64 S 1 ] × 2
Block4 [ 3 × 3 512 S 1 3 × 3 512 S 1 ] × 2 conv 4_1 3 × 3 × 3 64 S 1
Block4 [ 3 × 3 × 3 128 S 1 3 × 3 × 3 128 S 1 ] × 2
Average pool 1000-d fc l-softmaxAverage pool 128-d fc l-softmax
Table 2. The information of samples in the Indian Pines dataset.
Table 2. The information of samples in the Indian Pines dataset.
Sample No.ClassTrainValidationTest
1Alfalfa4141
2Corn-notill129141285
3Corn-mintill758747
4Corn222213
5Grass-pasture435435
6Grass-trees437680
7Grass-pasture-mowed3124
8Hay-windrowed435430
9Oats3116
10Soybean-notill8710875
11Soybean-mintill220252210
12Soybean-clean536534
13Wheat182185
14Woods113131139
15Buildings-grass-trees-drives354347
16Stone-steel-towers8184
Total8991059245
Table 3. The information of samples in the Salinas dataset.
Table 3. The information of samples in the Salinas dataset.
Sample No.ClassTrainValidationTest
1Brocoli_green_weeds_1181201808
2Brocoli_green_weeds_2355373334
3Fallow177201779
4Fallow_rough_plow125141255
5Fallow_smooth241272410
6Stubble357393563
7Celery322363221
8Grapes_untrained101411310,144
9Soil_vinyard_develop558625583
10Corn_senesced_green_weeds292332953
11Lettuce_romaine_4wk9611961
12Lettuce_romaine_5wk171191737
13Lettuce_romaine_6wk819826
14Lettuce_romaine_7wk9611963
15Vineyard_untrained646716551
16Vineyard_vertical_trellis157171633
Total486953948,721
Table 4. The information of samples in the Pavia Centre dataset.
Table 4. The information of samples in the Pavia Centre dataset.
Sample No.ClassTrainValidationTest
1Water592565759,251
2Trees684766838
3Asphalt277312736
4Self-blocking bricks241272417
5Bitumen592665924
6Tiles833928317
7Shadows656736558
8Meadows384242538,415
9Bare soil257292577
Total13,3071476133,033
Table 5. The information of samples in the Pavia University dataset.
Table 5. The information of samples in the Pavia University dataset.
Sample No.ClassTrainValidationTest
1Asphalt593655973
2Meadows167418616,789
3Gravel188211890
4Trees275312758
5Painted metal sheets121131211
6Bare Soil453504526
7Bitumen120131197
8Self-blocking bricks331373314
9Shadows8510852
Total384042638,510
Table 6. The information of samples in the Kennedy Space Center dataset.
Table 6. The information of samples in the Kennedy Space Center dataset.
Sample No.ClassTrainValidationTest
1Scrub688685
2Willow-swamp222219
3Cabbage palm hammock233230
4Cabbage palm/oak hammock223227
5Slash pine142145
6Oak/broadleaf hammock212206
7Hardwood swamp10194
8Graminoid marsh394388
9Spartina marsh475468
10Cattail marsh364364
11Salt marsh384377
12Mud flats455453
13Wate8310834
Total468534690
Table 7. The information of samples in the Botswana dataset.
Table 7. The information of samples in the Botswana dataset.
Sample No.ClassTrainValidationTest
1Water243243
2Hippo grass9191
3Floodplain grasses 1232226
4Floodplain grasses 2192194
5Reeds243242
6Riparian243242
7Firescar233233
8Island interior182183
9Acacia woodlands283283
10Acacia shrublands232233
11Acacia grasslands273275
12Short mopane162163
13Mixed mopane243241
14Exposed soils9185
Total291332934
Table 8. The classification results from using different methods on the Indian Pines dataset.
Table 8. The classification results from using different methods on the Indian Pines dataset.
Indian Pines3D CNNRNNMC 3D CNN3D CNN Res3D ResNetMSLN
10.6560.4360.5430.0000.0860.978
20.5630.6460.8150.4540.4910.982
30.6260.4240.5910.5550.3130.961
40.5430.3620.8250.4160.2690.947
50.7100.8210.8680.3610.6640.996
60.9160.8940.9690.8570.8780.995
70.8640.3230.7000.3430.0771.000
80.9070.9390.9680.9390.9120.996
90.0000.5380.9440.0000.3291.000
100.5860.5660.7620.6400.0680.979
110.4330.6700.8080.7090.1750.986
120.5350.5990.7310.3390.5000.981
130.9520.9561.0000.9210.930 1.000
140.9240.9280.9520.8430.8470.987
150.5820.5750.6650.5320.4290.912
160.7940.8520.9030.8390.8800.989
Kappa0.6010.6550.7890.6100.4490.974
AA0.6620.6580.8150.5470.4910.981
OA (%)64.46669.90881.61566.01650.33197.698
Table 9. The classification results from using different methods on the Salinas dataset.
Table 9. The classification results from using different methods on the Salinas dataset.
Salinas3D CNNRNNMC 3D CNN3D CNN Res3D ResNetMSLN
10.9830.9970.9760.9950.9910.993
21.0000.9980.9990.9951.0001.000
30.9970.9830.9880.9850.9951.000
40.9970.9880.9920.9960.9960.999
50.9960.9810.9960.9890.9941.000
60.9960.9990.9920.9991.0001.000
70.9930.9980.9930.9960.9981.000
80.9200.7780.9050.8780.9110.985
90.9970.9870.9960.9970.9981.000
100.9770.9630.9550.9550.9730.995
110.9720.9750.9570.9460.9780.998
120.9850.9850.9780.9910.9870.997
130.9870.9760.9880.9770.9920.998
140.9710.9480.9770.9650.9870.994
150.8750.2390.8260.7980.8690.975
160.9400.9940.9130.9860.9640.972
Kappa0.9470.8580.9290.9330.9500.987
AA0.9740.9240.9640.9660.9770.994
OA (%)95.22587.40193.64394.01295.50398.851
Table 10. The classification results from using different methods on the Botswana dataset.
Table 10. The classification results from using different methods on the Botswana dataset.
Botswana3D CNNRNNMC 3D CNN3D CNN Res3D ResNetMSLN
10.998 1.000 1.000 1.000 1.000 1.000
20.989 0.978 0.984 0.906 0.984 1.000
30.998 0.971 0.980 0.915 0.736 1.000
40.956 0.904 0.950 0.754 0.831 1.000
50.852 0.836 0.876 0.719 0.786 0.985
60.808 0.760 0.795 0.610 0.844 0.974
70.994 0.991 0.996 0.987 0.969 1.000
80.986 0.958 0.984 0.893 0.894 1.000
90.896 0.760 0.898 0.765 0.906 0.987
100.796 0.942 0.984 0.866 0.332 0.972
110.862 0.955 0.995 0.965 0.975 0.980
120.905 1.000 0.991 0.981 0.832 1.000
130.897 0.998 0.994 0.919 0.673 1.000
140.988 0.951 0.944 0.982 0.944 1.000
Kappa0.907 0.914 0.948 0.855 0.822 0.991
AA0.923 0.929 0.955 0.876 0.836 0.993
OA (%)91.382 92.100 95.212 86.662 83.550 99.138
Table 11. The classification results from using different methods on the Pavia Centre dataset.
Table 11. The classification results from using different methods on the Pavia Centre dataset.
Pavia Centre3D CNNRNNMC 3D CNN3D CNN Res3D ResNetMSLN
10.999 0.996 0.994 0.999 0.997 0.998
20.961 0.985 0.972 0.980 0.990 0.996
30.893 0.950 0.911 0.941 0.961 0.979
40.834 0.980 0.965 0.950 0.977 0.999
50.949 0.993 0.991 0.985 0.990 0.999
60.961 0.988 0.985 0.982 0.988 0.997
70.945 0.988 0.979 0.978 0.988 0.999
80.996 0.996 0.994 0.999 0.998 0.998
90.990 0.998 0.998 1.000 1.000 1.000
Kappa0.976 0.985 0.977 0.989 0.990 0.993
AA0.948 0.986 0.977 0.979 0.988 0.996
OA (%)98.285 98.944 98.376 99.250 99.259 99.540
Table 12. The classification results from using different methods on the Pavia University dataset.
Table 12. The classification results from using different methods on the Pavia University dataset.
Pavia University3D CNNRNNMC 3D CNN3D CNN Res3D ResNetMSLN
10.900 0.974 0.972 0.969 0.983 0.992
20.952 0.968 0.956 0.975 0.976 0.982
30.678 0.940 0.948 0.938 0.943 0.983
40.938 0.983 0.981 0.936 0.981 0.994
50.999 0.995 0.999 1.000 0.999 0.999
60.880 0.991 0.987 0.962 0.985 0.998
70.728 0.961 0.964 0.911 0.964 0.992
80.771 0.961 0.978 0.948 0.964 0.991
90.994 0.995 0.995 0.995 0.999 1.000
Kappa0.865 0.945 0.933 0.949 0.958 0.973
AA0.871 0.974 0.976 0.959 0.977 0.992
OA (%)89.839 95.784 94.852 96.151 96.803 97.961
Table 13. The classification results from using different methods on the KSC dataset.
Table 13. The classification results from using different methods on the KSC dataset.
KSC3D CNNRNNMC 3D CNN3D CNN Res3D ResNetMSLN
10.065 0.536 0.935 0.917 0.894 0.988
20.000 0.000 0.885 0.636 0.822 0.980
30.526 0.000 0.779 0.539 0.947 0.976
40.000 0.009 0.504 0.277 0.562 0.886
50.000 0.000 0.671 0.400 0.460 0.928
60.000 0.000 0.694 0.567 0.503 0.894
70.000 0.000 0.842 0.000 0.662 0.960
80.000 0.224 0.868 0.833 0.889 0.979
90.000 0.000 0.935 0.922 0.966 0.998
100.275 0.187 0.853 0.785 0.848 0.988
110.000 0.582 0.971 0.901 0.924 1.000
120.588 0.361 0.829 0.740 0.770 0.978
130.409 0.743 0.965 0.931 0.954 0.999
Kappa0.200 0.360 0.852 0.759 0.827 0.974
AA0.143 0.203 0.825 0.650 0.785 0.966
OA (%)31.748 44.542 86.652 78.401 84.478 97.698
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zheng, H.; Cao, Y.; Sun, M.; Guo, G.; Meng, J.; Guo, X.; Jiang, Y. Mixed Structure with 3D Multi-Shortcut-Link Networks for Hyperspectral Image Classification. Remote Sens. 2022, 14, 1230. https://doi.org/10.3390/rs14051230

AMA Style

Zheng H, Cao Y, Sun M, Guo G, Meng J, Guo X, Jiang Y. Mixed Structure with 3D Multi-Shortcut-Link Networks for Hyperspectral Image Classification. Remote Sensing. 2022; 14(5):1230. https://doi.org/10.3390/rs14051230

Chicago/Turabian Style

Zheng, Hui, Yizhi Cao, Min Sun, Guihai Guo, Junzhen Meng, Xinwei Guo, and Yanchi Jiang. 2022. "Mixed Structure with 3D Multi-Shortcut-Link Networks for Hyperspectral Image Classification" Remote Sensing 14, no. 5: 1230. https://doi.org/10.3390/rs14051230

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop