Next Article in Journal
Surface Temperature of the Planet Earth from Satellite Data over the Period 2003–2019
Next Article in Special Issue
Ensemble Learning Approaches Based on Covariance Pooling of CNN Features for High Resolution Remote Sensing Scene Classification
Previous Article in Journal
Evaluation of the Ocean Surface Wind Speed Change following the Super Typhoon from Space-Borne GNSS-Reflectometry
Previous Article in Special Issue
Exploring TanDEM-X Interferometric Products for Crop-Type Mapping
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Residual Group Channel and Space Attention Network for Hyperspectral Image Classification

College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2020, 12(12), 2035; https://doi.org/10.3390/rs12122035
Submission received: 24 May 2020 / Revised: 13 June 2020 / Accepted: 23 June 2020 / Published: 24 June 2020
(This article belongs to the Special Issue Feature-Based Methods for Remote Sensing Image Classification)

Abstract

:
Recently, deep learning methods based on three-dimensional (3-D) convolution have been widely used in the hyperspectral image (HSI) classification tasks and shown good classification performance. However, affected by the irregular distribution of various classes in HSI datasets, most previous 3-D convolutional neural network (CNN)-based models require more training samples to obtain better classification accuracies. In addition, as the network deepens, which leads to the spatial resolution of feature maps gradually decreasing, much useful information may be lost during the training process. Therefore, how to ensure efficient network training is key to the HSI classification tasks. To address the issue mentioned above, in this paper, we proposed a 3-DCNN-based residual group channel and space attention network (RGCSA) for HSI classification. Firstly, the proposed bottom-up top-down attention structure with the residual connection can improve network training efficiency by optimizing channel-wise and spatial-wise features throughout the whole training process. Secondly, the proposed residual group channel-wise attention module can reduce the possibility of losing useful information, and the novel spatial-wise attention module can extract context information to strengthen the spatial features. Furthermore, our proposed RGCSA network only needs few training samples to achieve higher classification accuracies than previous 3-D-CNN-based networks. The experimental results on three commonly used HSI datasets demonstrate the superiority of our proposed network based on the attention mechanism and the effectiveness of the proposed channel-wise and spatial-wise attention modules for HSI classification. The code and configurations are released at Github.com.

Graphical Abstract

1. Introduction

With the rapid development of remote sensing hyperspectral imaging technology, hyperspectral image has been studied and applied in more and more practical applications, including ocean research [1], vegetation analysis [2], road detection [3], geological disaster detection [4], and environmental analysis [5], etc. A hyperspectral image (HSI) contains abundant spectral and spatial information, which makes the HSI supervised classification task a hot research topic in the remote sensing analysis field. However, owing to the diversity of ground materials and the Hughes phenomenon coming from the increasing number of spectral bands [6], how to make full use of and extract the most discriminative features from spectral and spatial dimensions is a crucial issue in the HSI classification task.
In the past, traditional machine learning (ML)-based HSI classification methods mainly contain two steps, i.e., feature engineering and classifier training [7]. These methods usually focus on feature selection and classifier design, which requires lots of manual design based on specific HSI data. For example, [8] divided bands into several sets by the cluster method and selected useful bands to construct tasks. Manifold ranking was introduced to eliminate the drawbacks of traditional salient band selection methods [9]. In [10], the Markov random field (MRF) was used in combination with band selection. However, inappropriate dimensionality reduction in the spectral domain from the manual design may lead to the loss of much useful spectral information. Therefore, while adopting SVM as the final classifier like [11], many papers also suggested to explore input data with more spatial information in the feature engineering to improve the classification performance [12,13]. For instance, [14] developed a region kernel to measure the region-to-region distance similarity and extract spectral-spatial combined features. A common problem among these methods is that traditional ML-based methods usually cannot make full use of ground material feature expression due to the difficulty in designing feature extraction methods. Therefore, traditional ML-based methods usually cannot achieve a high classification performance.
In recent years, deep learning (DL) has shown a powerful ability to extract hierarchical and nonlinear features, and DL methods based on the convolutional neural network (CNN) have been widely used in HSI classification tasks. So far, many works based on CNN have demonstrated that the end-to-end approach can reduce the possibility of the information loss during data preprocessing and improve the classification accuracy by learning deep features. For example, a unified framework combining CNN with a stacked autoencoder (SAE) was proposed to adaptively learn weight features for each pixel by one-dimensional (1-D) convolutional layers [15,16]. In [17], SAE was also used to capture the representative stacked features. However, the input data of these 1-D-CNN-based methods must be flattened into 1-D vectors, which means that they cannot make full use of the spatial contextual relationship between pixels from raw HSI data.
To solve the above problems, the two-dimensional convolution neural network (2-D-CNN) was introduced to extract spectral and spatial features at the same time in many papers. For example, [18] proposed a multiscale covariance maps (MCMs)-based feature extraction method, and combined it with the 2-D-CNN model to integrate the spectral and spatial information. However, the proposed method required specific hand-crafted feature extraction for different HSI datasets, which was difficult to design precise and complete artificial features. Then, a contextual deep CNN was introduced in [19] to explore local contextual interactions. It used 2-D-CNN to extract spectral and spatial information separately. However, in these methods, when the network is deep, the rapid increase of network parameters will cause these 2-D-CNN-based models to be difficult to train. It may cause a degradation problem and finally lead to low classification accuracies. Therefore, residual connections were used to alleviate this phenomenon. The authors of [20] adopted residual learning to optimize several convolutional layers as the identity mapping, and constructed a very deep network to extract spectral and spatial features. Since HSIs have both spectral and spatial information, [21] proposed an end-to-end spectral-spatial residual network (SSRN), which consists of spectral and spatial residual blocks consecutively, to learn spectral and spatial features, respectively. In addition, to obtain a lower training cost and parameter scale, inspired by the densely connected convolutional network [22], an end-to-end spectral-spatial dual-channel dense network (SSDC-DenseNet) was proposed to reduce the model scale and explore high-level features [23]. Due to the densely connected structure, each layer will accept feature maps from all previous layers as its additional input data. Though these HSI classification methods based on 2-D-CNN could utilize the spatial context information, they separated spectral-spatial joint features into two independent learning parts. Since HSI is a 3-D data cube, it means that these methods neglect the close correlations between spectral information and spatial information.
Therefore, some three-dimensional convolution neural networks (3-D CNNs) models were proposed to learn spectral-spatial joint features directly from raw HSI data. With the help of residual connections, [24] constructed a three-dimensional residual network (3-D-ResNet) to improve the classification performance. The experimental results demonstrated that 3-D-ResNet could mitigate the declining accuracy effect and achieved promising classification performance with few training samples. The authors of [25] further studied 3-D CNNs to extract spectral-spatial combined features by using input cubes of HSIs with a smaller spatial size. Based on 3-D CNN and the densely connected convolutional network [23], [26] proposed the three-dimensional densely connected convolutional network (3-D-DenseNet) for HSI classification. The network could become very deep and extract more representative spectral-spatial combined features. To further reduce the training time, [27] proposed an end-to-end fast dense spectral-spatial convolution (FDSSC) by using a dynamic learning rate and parametric rectified linear units. To reduce the number of the parameters and solve the imbalance of classes, [28] used 3-D-ResNeXt and the label smoothing strategy to simultaneously extract spectral and spatial features, and achieved obvious classification performance improvement. To extract the spectral-spatial features of different scales, [29] designed a multiscale octave 3-D CNN with channel and spatial attention (CSA-MSO3-DCNN). 3-D-CNN convolution kernels of different sizes could capture diverse features of HSI data. These 3-D-CNN-based methods can indeed make full use of the original characteristics of raw HSI data and the correlation between spectral and spatial information. Furthermore, the graph convolutional network [30] was applied to alleviate the deficient labeled samples in [31]. The spatial information was added into the approximate convolutional operation on the graph signal. So, the features obtained by the graph convolutional network made full use of both the adjacency nodes in the graph and neighbor pixels in the spatial domain. However, the features processed by the convolutional layers may contain much useless or disturbing information. If these useless features are sent directly to the next layer without any process, as the network is going deeper, the learning efficiency of the network will be lower, and will finally affect the classification performance. Therefore, how to deal with the feature maps after convolutional layers and pay more attention on those features with a large amount of useful information is another key for HSI classification tasks.
Recently, many classical and effective computer vision methods have been embedded in CNN to improve the performance of DL models. Among them, the CNN model fused with the attention mechanism delivers promising outcomes in improving HSI classification performance. The goal of the attention mechanism is to focus on salient features or regions with a large amount of information. Through a series of weight coefficients, the CNN model with the attention mechanism could improve the quality of feature maps after convolutional layers. For example, to extract more discriminative spectral and spatial features, [32] combined FDSSC [27] and the convolutional block attention module (CBAM) [33], and proposed a double-branch multi-attention mechanism network (DBMA) for HSI classification, which consists of two parallel branches using channel-wise and spatial-wise attention separately. The experimental results demonstrated the effectiveness of channel-wise and spatial-wise attention. However, the parallel branching method did not take the correlation between spectral and spatial information into consideration, so DBMA did not obtain a satisfactory classification accuracy. In order to introduce global spatial information and solve the locality of the convolution operation, the self-attention mechanism [34] was introduced to construct the non-local neural network. In [35], this attention module was attached to the spectral-spatial attention network (SSAN). This attention module, which only focuses on spatial information, cannot globally optimize feature maps processed by convolutional layers. The authors of [36] adopted the squeeze-and-excitation network (SENet) [37] to adaptively recalibrate channel feature responses by explicitly modelling interdependencies between channels. In addition, [38] also constructed a spatial-spectral squeeze-and-excitation (SSSE) module based on SENet. There was a problem that the channel features processed by the SSSE module contain much redundant information, which may affect the classification performance. To generate high-quality samples containing a complex spatial-spectral distribution, [39] proposed a symmetric convolutional GAN based on collaborative learning and the attention mechanism (CA-GAN). The joint spatial-spectral attention module could emphasize more discriminative features and suppress less useful ones. However, although these methods based on the attention mechanism can play a role in optimizing features, they all have limitations: For one thing, the network can only be optimized in a specific spectral or spatial dimension; for another thing, as the network deepens, the channel attention module may lose much useful information because the optimization is operated on the whole channels.
Our motivation was to construct a 3-DCNN-based network using an efficient attention module to solve the problem mentioned above. Inspired by the principle of SENet [37] and the bottom-up top-down structure that has been applied to image segmentation [40], we proposed a 3-DCNN-based residual group channel and space attention network (RGCSA) for HSI classification (The code and configurations are released at https://github.com/Lemon362/RGCSA-master). The framework consists of several building blocks with the same topology. Each block contains a convolutional layer for learning features, a residual group channel-wise attention module, and a residual spatial-wise attention module. According to the principle of bottom-up top-down, we unified the channel attention module and space attention module into the same structure, but their implementation methods of up-sampling are different. To optimize features in the channel dimension to the greatest extent and reduce the loss of useful information, we introduced the principle of grouping into the channel-wise attention module to realize the group channel attention mechanism. Compared with 3-D-ResNeXt [28], we only need fewer training samples to achieve a better classification accuracy by using the attention mechanism. When compared with DBMA [32] and SSAN [35], our network has fully optimized the feature maps processed by each convolutional layer in the channel dimension and spatial dimension, and shows an effective improvement for HSI classification.
In short, the three major contributions of this paper are listed as follows:
  • Combining the bottom-up top-down attention structure with the residual connection, we constructed residual channel and space attention modules without any additional manual design, and proposed a 3-DCNN-based residual group channel and space attention network (RGCSA) for HSI classification. On the one hand, residual connection can accelerate the flow of information, making the network better training. On the other hand, the structure of channel-wise attention first and then spatial-wise attention could strengthen important information and weaken unimportant information during the training process, and compared to the previous methods, RGCSA can achieve a higher HSI classification accuracy with fewer training samples.
  • We applied the principle of group convolution to the channel attention structure to construct a residual group channel attention module, which aims to emphasize each piece of useful channel information. Compared with the previous channel attention methods, it can reduce the possibility of losing useful channel information during attention optimization.
  • We proposed a novel spatial-wise attention module, which utilized transposed convolution as an up-sampling method. It ensures the mapping relationship of spatial pixels in the attention optimization process, and makes full use of context information to optimize the features in the spatial dimension to focus on the most informative areas.
The remaining sections of this paper are organized as follows. Section 2 illustrates the related work about our proposed network for HSI classifications. Section 3 presents a detailed network configuration of the overall framework and individual modules. Then, Section 4 illustrates experimental datasets and the parameter setting, and then shows the experimental results and analyses. Finally, in Section 5, we summarize some conclusions and introduce future work.

2. Related Work

In this section, we first introduce the proposed end-to-end pixel-level HSI classification framework and some basic knowledge, including ResNeXt and SeNet, and then introduce the architecture of our proposed residual group channel and space attention mechanism in detail.

2.1. Pixel-Level HSI Classification Framework

The proposed end-to-end pixel-level HSI classification architecture explores spectral-spatial combined information by 3-D-CNN, which can make full use of the close correlation between these two dimensions. To fully utilize the whole spatial information, we adopted zero padding in the four directions of the spatial dimension of the raw HSI data. Therefore, the raw HSI data cube is converted into X R w × w × b , where w represents the spatial width and height, and b represents the number of spectral bands. In this framework, all available labeled data are divided into three groups: Training dataset, validation dataset, and testing dataset for each HSI dataset. Firstly, the network is fed into a small batch of data to train the model for hundreds of epochs. During the training process, the label smoothing strategy is used in the cross-entropy loss function to alleviate the problem of the imbalance of classes. Then, at the same time, the validation dataset monitors the whole training process by computing the classification accuracy every few epochs; in this way, the network can choose the best model with the highest accuracy. Finally, the testing dataset is adopted to evaluate the classification performance of proposed network.

2.2. Three-Dimensional ResNeXt Network

In the HSI classification task, in order to mitigate the decreasing-accuracy phenomenon and reduce the huge number of parameters caused by 3-D-CNN, 3-D-ResNeXt was first proposed in [28] and achieved high classification accuracy.
As shown in Figure 1, with the growing number of hyperparameters (width, filter sizes, strides, etc.), 3-D-CNN will lead to a dramatic increase in computational cost, especially when there are multiple layers. Therefore, with the split-transform-merge strategy, [41] split the CNN layer of ResNet into several groups, so that each feature learning process was performed in a low-dimensional embedding, and each output feature map was aggregated by summation performing in a high-dimensional embedding. From Figure 1, we can find that the ResNeXt with c a r d i n a l i t y = 8 has roughly the same complexity as the ResNet. This operation makes it possible to reduce the number of parameters to a large extent when building a deep network through 3-D convolutional layers. Therefore, 3-D-ResNeXt is a suitable choice in the HSI classification networks based on 3-D-CNN.
However, there is a problem with the 3-D-ResNeXt network proposed in [28] that 3-D-ResNeXt could not optimize the output feature maps of each convolutional layer during the training process. It may cause lots of useless information to be sent to the next layer, which seriously affects the efficiency of network training. Therefore, in this paper, on the basis of using 3-D-ResNeXt to extract spectral-spatial features, we will pay more attention on how to optimize the feature maps extracted by 3-D-ResNeXt in the channel dimension and spatial dimension by the attention mechanism.

2.3. Squeeze-and-Excitation Network

As we all know, the convolution operation is the core of convolutional neural networks (CNNs), which enables networks to extract informative features from different dimensions. Now, many researchers focus on how to strengthen the representational power of a CNN, and [37] proposed a novel architectural unit focusing on the channel relationship, named the squeeze-and-excitation (SE) block.
The structure of the SE block is shown in Figure 2. The SE block consists of a global pooling layer, two fully connected (FC) layers, and two activation function layers (one is ReLU, and the other is Sigmoid). The principle of the SE block is to enhance the important features and weaken the unimportant features by controlling the weight coefficient of each channel. First, the global average pooling (GAP) layer implements the squeeze process. To take advantage of the correlation of channels, GAP averages the spatial dimension of feature maps with a size of H × W × C ( H and W represent the two dimensions in space, and C represents the number of channels) to form 1 × 1 × C feature maps, which can shield the spatial distribution information and integrate global spatial information to obtain the importance of the feature channel. Then, the excitation process contains two FC layers. The first FC layer is used to compress C channels into C / r channels and the second one restores the compressed feature map to the original size of 1 × 1 × C . Finally, by multiplying the weight coefficients limited by Sigmoid to the [0, 1] range with the original output feature maps, it can be ensured that the input features of the next layer are optimal in the channel-wise dimension.

2.4. Proposed Attention Mechanism

In general, the purpose of the attention mechanism is to strengthen important information and weaken unimportant information by certain methods. Inspired by the principle of SENet [36], we used the bottom-up top-down structure to construct the attention mechanism based on CNN. Our proposed channel-wise and spatial-wise attention modules have the same architecture, but the implementation methods of up-sampling are different. In addition, due to the superposition of attention modules, the network becomes deeper and some feature maps may be weakened, which may destroy the original features of input and result in a declining accuracy effect. Therefore, to alleviate this problem, we also introduced the residual connection to the attention module to form the residual attention module.
The basic structure of the proposed residual attention module is shown in Figure 3. The whole residual attention module consists of two parts: One is the attention module and the other is the residual connection part. In the attention module, the down-sampling layer compresses the features extracted by 3-D-CNN in the corresponding dimension to learn the compact features. Then, these feature maps are sent to the up-sampling layer to restore the original size by some methods. It ensures that the subsequent calibration feature process can accurately assign the weight coefficient to the corresponding position. Finally, the Sigmoid function is used to limit the optimized feature attention maps to [0, 1] to obtain the weight coefficient corresponding to each position. The coefficients tending to 0 indicate that the amount of information at this position is small, while those tending to 1 show that this position has more important information. Therefore, we only need to multiply the output with the feature maps from 3-D-CNN to rescale the final output of the attention module. In this way, these weights are assigned to each feature map to achieve adaptively recalibrating features. With the superposition of attention modules, more important features can be strengthened and become clearer while unimportant features will be gradually weakened to 0 so as not to affect the network training process. Furthermore, to allow this attention module to be inserted into deep networks, we add the residual connection and an extra convolutional layer to prevent the original features from being destroyed and accelerate the flow of information.
In our proposed RGCSA, the network has two attention modules based on the above residual attention structure: Residual group channel-wise attention module (RGCA) and residual spatial-wise attention (RSA) module. These two attention modules are all based on CNN operation and the bottom-up top-down attention structure. The specific implementation details of the attention module will be presented in the following paragraphs.

2.4.1. Residual Group Channel-Wise Attention Module

The channel-wise attention module mainly refines the channel weights of the feature maps. Since each channel of the feature maps is considered as a feature detector, channel attention focuses on the meaningful channels and decreases the meaningless channels. Figure 4 shows the structure of the proposed residual group channel-wise attention (RGCA), which consists of G residual channel-wise attention (RCA) blocks, where G is the number of groups. Therefore, we will give a detailed introduction to the structure of the RCA block.
In our proposed residual channel-wise attention (RCA) module, which is shown in Figure 5, we first use global average pooling (GAP) and reshape the operation to shield the information of the spectral and spatial dimensions to obtain feature maps with a size of 1 × 1 × 1 ,   C , where C represents the number of channels. Then, the feature tensors are sent to a 1 × 1 × 1 ,   C / r 3-D convolutional layer instead of fully connected layer to reduce the channel dimensionality and extract abstract features with more important information. After convolution operation, the number of channels becomes C / r , where r is a reduction ratio. Then, a ReLU layer is used to strengthen the nonlinear relationship of channel responses. Next is the up-sampling process. Here, we still use a 1 × 1 × 1 ,   C 3-D convolutional layer to increase the channel dimension and finally generate C feature maps. As mentioned above, a Sigmoid function is applied to limit the range of features, and the output is multiplied with the feature maps from 3-D-CNN to obtain the channel-refined features.
In addition, if we directly optimize the whole channel, as the network deepens, the weight coefficients of some originally useful channel features may decrease, so that this part of important information is lost. Therefore, we introduce the principle of group convolution of 3-D-ResNeXt to the RCA module to construct the final residual group channel-wise attention (RGCA) module. From Figure 4, we can see that the feature maps are divided into G groups, and then sent to each RCA module respectively. In this way, we can ensure that each channel of the feature maps in the deep network is rescaled to the optimal value, and reduce the possibility that useful features may be lost during the optimization process.
The residual group channel-wise attention module is added after the 3-D-CNN layer of 3-D-ResNeXt but before the summation operation for residual connection, which is shown on the left side of Figure 6. In this paper, to simplify the complexity of the network module design, we set the group number G of RGCA to the same value as the parameter c a r d i n a l i t y of 3-D-ResNeXt. In this way, we can conveniently and effectively combine the grouping operation of RGCA with 3-D-ResNeXt to form the structure on the right side of Figure 6, which greatly simplifies the complexity of the network.

2.4.2. Residual Spatial-Wise Attention Module

Compared with the channel-wise attention module, the spatial-wise attention pays attention to the informative region of the spatial dimension. To optimize the spatial features, we proposed a novel spatial attention module, and the structure of the proposed residual spatial-wise attention (RSA) module is shown in Figure 7.
In the residual spatial-wise attention (RSA) module, we also use 3-D-CNN with stride = (2, 2, 1) as the down-sampling layer to reduce the spatial dimension of features while keeping the spectral dimension unchanged. After two down-sampling layers, we obtain the feature maps containing important spatial information. Then, we introduce a novel way (transposed convolution) to restore the original size. Transposed convolution is a special kind of forward convolution: First, the size of the input feature map is expanded by zero-filling operation according to a certain ratio; then, the convolution kernel is rotated by 180 ° , which is equivalent to transposing the convolution kernel matrix; finally, the normal convolution operation is performed by this new convolution kernel. Transposed convolution can maintain the mapping relationship of spatial positions before and after operation when restoring the size, which is important for the subsequent weight optimization process.
Inspired by the structure of CBAM [33], the residual spatial-wise attention module is added after the residual group channel-wise attention module to form an optimized structure of the channel first and space second. Additionally, the experiment proves that this sequential attention mechanism can obtain a higher classification accuracy with fewer training samples than 3-D-ResNeXt.

3. Network Configuration and Experimental Setup

In this section, taking the Indian Pines (IN) dataset as an example, we give an introduction of the overall framework and present the network configuration of each module and experiment setup in detail.

3.1. Overall Framework of the Proposed Network

Figure 8 shows the network structure of the proposed residual group channel and space attention (RGCSA) network. In Figure 8, we take the Indian Pines (IN) dataset with 16 × 16 patch as the input to illustrate the size of the feature maps used in our network. It consists of three major modules: The first is the initialization module, which is used to initially reduce the spectrum dimension by a 3-D convolutional layer; the second is the residual group channel and space attention module; and the third is the classification module.
The residual group channel and space attention module contains four of the same building blocks, whose filters numbers are 64 ,   128 ,   256 ,   512 , respectively. We use 3-D-ResNeXt to extract the spectral-spatial features, and insert the proposed RGCSA into the 3-D-ResNeXt to optimize the features. In addition, in block 4, we did not use the residual spatial-wise attention module because the spatial dimension in block 4 is too small to perform a good dimensionality reduction operation. In short, the features of the first three modules are optimized by RGCSA and the last module only contains RGCA. In the classification module, we use the global average pooling (GAP) layer to replace the fully connected layer (FC) to greatly reduce the number of parameters and improve the network training efficiency. Furthermore, we also introduced the label smoothing strategy proposed in [28] to solve the imbalance of classes.
Next, we will introduce the specific network configuration of each module.

3.2. Network Configuration of the Proposed Network

First, we introduce the network configuration of the overall RGCSA, and the network configurations and parameter settings of the RGCA and RCA module are introduced later. Taking the IN dataset as an example, the detailed network parameter setting of the proposed RGCSA network for three HSI datasets is shown in Table 1.
Here, taking block 1 as an example, Figure 9 shows the structure of the building block. First, the 1 × 1 × 1 convolutional layer changes the dimensionality of the feature channel. Then, the input tensor is transformed to G groups through the splitting operation. These feature maps of each group are sent to the 3-D-CNN layer with 3 × 3 × 3 convolutional kernel to extract spectral-spatial features. Then, the RGCA is used to focus on the important channel features of each group, which are aggregated into a high-dimensional feature vector again through a concatenate layer. The feature maps optimized by RGCA are fed into RSA to pay attention to the spatial dimension to strengthen the important information region and weaken unimportant region information. Finally, the residual connection allows the original input features to be fused with the processed features at the same size.

3.2.1. Network Configuration of the Residual Group Channel-Wise Attention Module

Since RGCA consists of G group RCA, according to the structure of RCA shown in Figure 5, we only select one group to introduce the network configuration of the residual channel-wise attention module which is shown in Table 2. First of all, followed by the 3-D-CNN layer of 3-D-ResNeXt, a convolutional layer with a size of 3 × 3 × 3 ,   8 and the ‘same’ padding method is used to further extract abstract features. After the first ReLU activation layer, feature maps with the shape of 16 × 16 × 100 ,   8 are obtained. Then, the global average pooling layer and reshape operation are used to flatten the spectral and spatial dimensions to obtain 1 × 1 × 1 ,   8 feature maps. Setting the reduction ratio r = 4 , 1 × 1 × 1 ,   2 3-D-CNN is used to reduce the number of channels. To perform the up-sampling operation, feature maps are processed by 1 × 1 × 1 ,   8 3-D-CNN again. Finally, the optimized feature vectors ranging from 0 to 1 are multiplied by the features processed by 3 × 3 × 3 ,   8 3-D-CNN, and the residual connection is used to add the original input from 3-D-ResNeXt and these optimized features. After the channel attention module, the important channel is highlighted while the unimportant channel is suppressed.

3.2.2. Network Configuration of the Residual Spatial-Wise Attention Module

The network configuration of the residual spatial-wise attention module in block 1 is described in Table 3. First, like the residual channel-wise attention module, we use 3 × 3 × 1 ,   64 3D-CNN with the ‘same’ padding method to learn spatial features while keeping the spectral dimension unchanged. Then, two convolutional layers with a size of 3 × 3 × 1 , s t r i d e = 2 ,   2 ,   1 , and 64 ,   128 filters, respectively, are used to focus on the important spatial information and reduce the spatial dimension. Then, two 3 × 3 × 1 ,   64 transposed convolutional layers with s t r i d e = 2 ,   2 ,   1 realize the up-sampling operation. Finally, the layer implement optimization and fusion operations are multiplied and added, respectively. In addition, in block 3 of the RGCSA network, since the feature dimension becomes 4 × 4 × 25 , we set the size of 3-D-CNN, which is used to focus on the important spatial information, to 1 × 1 × 1 . Additionally, in block 4, we do not add this residual spatial-wise attention module to optimize the spatial information.

3.3. Experimental Setup

We tested the factors that affect the HSI classification performance of the proposed network (i.e., the ratios of the training, validation, and test datasets for different HSI datasets, and the number of groups G ), and the experimental results compared with several widely used methods are illustrated in Section 4. Finally, the most suitable ratios were 3 : 1 : 6 for the Indian Pines (IN) dataset, and 2 : 1 : 7 for the Pavia University (UP) and Kennedy Space Center (KSC) datasets. The number of groups G in RGCA was 8, and the reduction ratio r of RCA was 4. RMSProp was adopted as the optimizer to minimize the cross-entropy loss function. The initial learning rate was set to 0.0003. All the training and testing results were obtained on the same computer, with the configuration of 32GB of memory, NVIDIA GeForce GTX 1070 8GB, and Intel i7 7820HK.

4. Experiments and Results

In this section, we first introduce three HSI datasets used in this paper, i.e., the Indian Pines (IN) dataset, the Pavia University (UP) dataset, and the Kennedy Space Center (KSC) dataset. Then, we discuss the two main factors affecting the classification performance. Finally, we compare the proposed RGCSA network with several representative HSI classification models, which are introduced in Section 1, i.e., SVM [11], SSRN [21], 3-D-ResNeXt [28], DBMA [32], and SSAN [35]. The overall accuracy (OA), average accuracy (AA), and kappa coefficient (Kappa) are used as the indicators to measure HSI classification performance. OA refers to the ratio of the number of correct classifications to the total number of HSI pixels in the test datasets. AA refers to the average accuracy of all classes. The kappa coefficient is an indicator used for the consistency test between the classification results and ground truth, and it can also be used to measure the classification accuracy. In short, the higher values of these three indicators represent the better classification results. Let M R N × N represent the confusion matrix of classification results, where N is the number of land-cover categories. According to [35], the values of OA, AA, and kappa can be calculated as follows:
O A = s u m d i a g M / s u m M ,
A A = m e a n d i a g M / s u m M , 2 ,
K a p p a = O A s u m M , 1 × s u m M , 2 / s u m M 2 1 s u m M , 1 × s u m M , 2 / s u m M 2 ,
where d i a g M R N × 1 is a vector of the diagonal elements of M , s u m · R 1 represents the sum of all elements of the matrix, s u m · ,   1 R 1 × N represents the sum of elements in each column s u m · ,   2 R N × 1 represents the sum of elements in each row, m e a n · R 1 represents the mean of all elements, and . / represents the elementwise division.
To obtain a statistical evaluation, we repeated each experiment 5 times, and calculated the mean value as the final result.

4.1. Experimental Datasets

We used three available commonly used HSI datasets [42] in our experiment to evaluate the classification performance of the proposed RGCSA model.
The Indian Pines (IN) dataset [20] was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) in 1992 from Northwest Indiana. It contains 16 classes with the size of 145 × 145 pixels and a spatial resolution of 20 m by pixel. There are 220 bands in the wavelength range of 0.4 to 2.5 um. Since 20 bands are corrupted by water absorption effects; the remaining 200 bands can be adopted for HSI experiments.
The Pavia University (UP) dataset [20], gathered by Reflective Optics System Imaging Spectrometer (ROSIS) in 2001 in the Pavia region of northern Italy, has 610 × 340 pixels with a resolution of 1.3 m by pixel, and contains 9 vegetation classes. Since 12 bands with strong noise and water vapor absorption were removed, 103 bands ranging from 0.43 to 0.86 um were adopted for analysis.
The Kennedy Space Center (KSC) [43] was firstly acquired by AVIRIS in 1996 in the Kennedy Space Center, containing 224 bands with center wavelengths in the range of 0.4 to 2.5 um. The image has 512 × 614 pixels with a spatial resolution of 18 m and 13 types of geographic objects. Since water absorption and low signal-to-noise ratio (SNR) bands were removed, 176 bands were adopted for analysis.
Table 4, Table 5 and Table 6 list the total number of samples of each class in each dataset and the number of training, validation, and test samples of three datasets under the optimal ratios, which obtained the best classification performance, i.e., 3 : 1 : 6 for the IN dataset, and 2 : 1 : 7 for the UP and KSC datasets.

4.2. Experimental Parameter Discussion

We focused on two main factors that affect the classification performance of our proposed network, i.e., the ratio of the training dataset, and the number of groups G of the residual group channel-wise attention (RGCA) module. Finally, according to the results of experiments, the ratios of the training, validation, and test datasets for the IN, UP, and KSC datasets are 3 : 1 : 6 ,   2 : 1 : 7 ,   2 : 1 : 7 , respectively. Additionally, we set the number of groups of RGCA for three datasets to 8. Furthermore, the spatial input size of the network was constantly set to 16 × 16 for all experiments.

4.2.1. Effect of Different Ratios of the Training, Validation, and Test Datasets

According to the experiment of the effect with different ratios of training samples in [28], we also divided the HSI datasets into four different rations 2 : 1 : 7 ,   3 : 1 : 6 ,   4 : 1 : 5 ,   5 : 1 : 4 , and tested the impact of different numbers of training samples on our proposed model. To obtain accurate results with different training samples, we set the epochs of different ratios to 100 ,   100 ,   60 ,   60 , respectively. At the same time, the number of groups G was 8 . Finally, the training time, test time, and results of the three indicators (i.e., OA, AA, and kappa) with different ratios of the proposed model for three HSI datasets are list in Table 7, Table 8 and Table 9.
From Table 7, Table 8 and Table 9, we can find that in general, only with few training samples can our proposed network obtain high OA indicators in all three HSI datasets. Specifically, for the IN dataset, when the ratio changed from 2 : 1 : 7 to 3 : 1 : 6 , the OA indicator showed a clear and large increasing trend from 99.52 % to 99.87 % , and in contrast, the training time increased less. When the number of training samples further increased, as the epochs of the training process were reduced from 100 to 60, the OA decreased slightly. Therefore, we chose the ratio of 3 : 1 : 6 for the IN dataset. For the UP and KSC datasets, with the increasing number of training samples, the training time rose rapidly, whereas the accuracy decreased a little because of the epoch decreasing. Especially for the UP dataset, when the ratio was 2 : 1 : 7 , it had taken a long time to train the model. Additionally, when the ratio changed to 3 : 1 : 6 , the training time showed a dramatic jump from 25,837.94 s to 37,310.21 s, which nearly doubled. Additionally, with the further increase of the ratio, the corresponding training times were all longer than 2 : 1 : 7 . While for the KSC dataset, since it has fewer samples than the other two datasets, the training times of the different ratios were all lower, and when the ratio was 2 : 1 : 7 , the proposed model had already classified the KSC categories with a classification accuracy OA close to 100 % . Therefore, for the UP and KSC datasets, we chose 2 : 1 : 7 as the most suitable ratios. Furthermore, compared with the previous methods, our proposed network reached the highest classification accuracy; the detailed results will be shown later.
In addition, we may notice that the training time and training samples did not show a linear growth relationship. The reason is that the epochs of the four ratios were different, as mentioned above. Considering that when the ratio is 2 : 1 : 7 or 3 : 1 : 6 , the number of the training samples is small, we therefore increased the epochs of 2 : 1 : 7 and 3 : 1 : 6 to 100 epochs to obtain the best results. While when the ratio came to 4 : 1 : 5 or 5 : 1 : 4 , more training samples in each epoch could make the network achieve a high classification performance with fewer epochs.

4.2.2. Effect of the Number of Groups

In the proposed RGCSA network, in order to better optimize the channels, we divided the channels to G groups and then used RCA module for each group, and finally merged the optimized channels of each group. Therefore, the number of groups is the other key factor for our proposed network. At the same time, since we utilized 3-D-ResNeXt to extract spectral-spatial features, for the convenience of the experiment, we set the cardinality (i.e., the size of the set of transformations) and G to the same value. We evaluated the classification performance of the proposed RGCSA network with different numbers of groups G and the results are shown in Table 10. In this experiment, we set the spatial input size to 16 × 16 , and the ratio to 3 : 1 : 6 for the IN dataset and 2 : 1 : 7 for the UP and KSC datasets.
From the table, we can find that as the number of groups G increases, the number of parameters and training time all gradually increase, while the OA indicators for the three HSI datasets fluctuate a little. It means that the reasonable division of the number channel groups is key to the influence on the classification performance of the network. If the number of groups G is small, some channels may not be optimized, and even some useful channel information may be gradually discarded due to the deep network. If G is too large, each group has fewer channels to be optimized, and the network cannot accurately extract useful channel information. Finally, we chose the number of groups G = 8 for three HSI datasets.

4.3. Classification Results Comparison with State-of-the-Art

To verify the effectiveness of our proposed RGCSA network, we compared RGCSA with several classic methods, i.e., SVM [11], SSRN [21], 3-D-ResNeXt [28], DBMA [32], and SSAN [35]. To obtain fair comparison results, our proposed RGCSA network and compared methods adopted the same spatial input size of 16 × 16 × b ( b represents the number of spectral bands), the ratio of 3 : 1 : 6 for the IN dataset, and 2 : 1 : 7 for the UP and KSC dataset for all methods.
Table 11, Table 12 and Table 13 report the OAs, AAs, kappa coefficients, and the classification accuracy for each class for three HSI datasets. From the tables, we can see that the proposed RGCSA achieved the highest classification accuracy than other methods for all three HSI datasets. First of all, SVM achieved the lowest classification accuracy in the three HSI datasets among all the methods. Secondly, since the training samples of class 1, 7, and 9 (alfalfa, grass-pasture-mowed, and oats, respectively) in the IN dataset are lower than 50 and SSRN divided the network into spectral and spatial feature learning parts, 2-D-CNN-based SSRN showed a lower classification accuracy on these classes, especially class 9, than other 3-D-CNN-based methods, such as 3-D-ResNeXt, DBMA, SSAN, and RGCSA. Though DBMA and SSAN used 3-D-CNN to extract features, these two models separated the spectral dimension and spatial dimension. It means that these models cannot make use of the close correlation between these two dimensions, and the results of DBMA and SSAN in Table 11, Table 12 and Table 13 proved it. Thirdly, we find that the models combined with the attention modules can achieve high classification accuracy, especially for the UP and KSC datasets. It means that channel-wise attention and spatial-wise attention can indeed optimize the features extracted by CNN and improve the classification performance. Furthermore, compared with these methods, our proposed RGCSA network could classify all classes for the three datasets more accurately, with classification accuracies higher than 99.80 % . It means that our proposed network only needs fewer training samples to obtain higher classification performance through our proposed group-channel and space joint attention mechanism.
Figure 10, Figure 11 and Figure 12 show the visualization maps of all classes of all methods based on CNN (i.e., SSRN, 3-D-ResNeXt, DBMA, SSAN, and our proposed RGCSA), along with the false color images of the HSI datasets and their corresponding ground-truth maps. In the IN dataset, the visualization maps of SSRN, DBMA, and SSAN did not show class 9 (oats, labeled in dark red). However, our proposed RGCSA network could classify class 9, which is completely displayed in Figure 10g. In the UP and KSC datasets, we find that the edge contours of each class of our proposed network are clearer and smoother than others. In addition, the prediction effect of our proposed RGCSA network on unlabeled parts is also significantly better than other methods. For example, on the right of class 1 (labeled by bright red) in the IN dataset, it can be seen from the false color map in Figure 10a that this unlabeled part should belong to class 6. None of the comparison methods can accurately predict this part. In contrast, it can be clearly seen from Figure 10g that our proposed RGCSA network can predict this part and fully visualize it. Similarly, in the lower middle of the UP dataset, the part marked in dark blue belongs to class 3 (gravel), and our proposed network can visualize it more completely than other methods. In summary, our proposed RGCSA network can clearly visualize all the labeled classes, and can predict and visualize the unlabeled parts more accurately than other methods.
Table 14 and Figure 13 show the comparison results of the 3D-ResNeXt (without the channel-wise and spatial-wise attention modules), RGCA (only with the group channel-wise attention RGCA module), RSA (only with the spatial-wise attention RSA module), and RGCSA (both with the RGCA module and RSA module). Except for the different attention modules, we set these four models to the same structure. The ratios of the three HSI datasets are 3 : 1 : 6 ,   2 : 1 : 7 ,   2 : 1 : 7 , respectively, and the epochs were all set to 100 . Table 14 shows the corresponding network parameters, training time, and test time and Figure 13 shows the OAs of four different attention mechanisms on the three HSI datasets. We find that the models with attention modules need more time to train, and the proposed RGCSA has the longest training time. Although RGCA, RSA, and RGCSA all generate more computational costs and consumption, from the figure, we can see that 3-D-ResNeXt without any attention modules achieved the lowest accuracies on the three HSI datasets, which proves the effectiveness of the attention mechanism. The OAs of RGCA with only the group channel-wise attention module in the three datasets are all higher than those of RSA with only the spatial-wise attention module, but the gap between RGCA and RSA is not obvious. It means that the proposed attention mechanisms in these two dimensions have optimized the channel and spatial features. Furthermore, when combining these two attention modules, the proposed RGCSA obtained the highest classification accuracies. It fully demonstrated that the proposed channel space joint attention mechanism plays an important role in HSI classification and is suitable for HSI classification tasks.
To test the robustness and generalizability of the proposed RGCSA under different ratios of training datasets, three models, i.e., SSRN based on 2-D-CNN, 3-D-ResNeXt based on 3-D-CNN, and the proposed RGCSA based on 3-D-CNN and attention mechanism, were selected to do this experiment. Figure 14, Figure 15 and Figure 16 illustrate the overall accuracies (OAs) of these models using different ratios of training datasets. When the number of training samples is small, such as the ratios of 2 : 1 : 7 and 3 : 1 : 6 in the three HSI datasets, the proposed RGCSA network obtained the highest OA indicators among the three methods. Especially for the IN and KSC datasets, the OAs of our proposed network always maintain a high level under different ratios. It means that we can achieve better classification results by the proposed RGCSA network with fewer training samples. It is important that when the total number of samples is small, or when there are few samples of some classes, such as class 1, 7, and 9 in the IN dataset, the proposed RGCSA can still generate a superior classification performance. As the training samples increase, the OAs of the proposed network in the three HSI datasets show a slight fluctuation but can still maintain over 99 % . Since SSRN divided the network into the spectral learning part and spatial learning part, it cannot make full use of the relationship between the spectral and spatial dimensions. Therefore, in the three HSI datasets, the OAs of SSRN are the lowest among the three methods. It means that 3-D-CNN can extract spectral-spatial features with more useful information than 2-D-CNN. When compared with 3-D-ResNeXt, which needs more training samples to achieve high classification accuracies, our proposed network benefits from the residual attention mechanism to obtain the same high OA indicators with fewer training samples, and the classification accuracies are all higher than 3-D-ResNeXt under different ratios. In summary, it is obvious that the proposed channel-wise and spatial-wise attention modules, which can pay attention to the informative features, strengthen the representation of these features, and suppress the interference of useless information, are more suitable for HSI classification tasks. It can be demonstrated that our proposed RGCSA network has strong robustness and stability under different ratios of training datasets.
At the same time, Table 15, Table 16 and Table 17 show the training time and test time of the above three models under different ratios. The epochs of the corresponding ratios are { 100 ,   100 ,   60 ,   60 }, respectively. From the tables, we can see that our proposed network needs more training time and test time in the three HSI datasets. The reason is that the proposed RGCA and RSA completely use 3-D-CNNs instead of FC layers to extract channel information and spatial context information. The proposed network is much deeper than the other networks, which results in more time to train the model. However, from the perspective of the classification results, it is feasible to exchange more computational costs for higher classification accuracies.

5. Conclusions

In this paper, we proposed a supervised 3-D deep learning framework for HSI classification, using the bottom-up top-down attention structure with the residual connection. Compared with the previous traditional ML-based methods, the end-to-end deep learning methods can make full use of the GPU performance to accelerate network training and avoid complex artificial design of feature extraction. Compared with the deep learning methods based only on CNN, the proposed attention mechanism could strengthen important information and weaken unimportant information.
The above experiments verify the effectiveness of the proposed residual group channel and space attention module in the HSI classification tasks. In summary, the three major differences between our proposed RGCSA classification model and other deep learning-based models are as follows: First, the designed residual group channel-wise attention module and spatial-wise attention module have the same basic structure, which is easily inserted into any networks. Additionally, the residual connection can accelerate the flow of information for better training. Second, the group channel-wise attention module can reduce the possibility of losing useful information during the attention optimization. Additionally, the novel spatial-wise attention module can learn the context information and maintain the mapping relationship of spatial pixels before and after the optimization process. Third, the most important point is that in the face of poorly distributed HSI datasets, we can use the proposed RGCSA to optimize the learning process and obtain a higher classification performance with fewer training samples. In summary, the above advantages enable our RGCSA network to gain higher classification results than previous models, whether it is a CNN-based network or an attention-based network.
From the perspective of network complexity, the future work will focus on how to effectively reduce the number of network parameters and computational costs while maintaining high classification accuracy through the attention mechanism.

Author Contributions

P.W. and Z.C. conceived and designed the experiments; P.W. and Z.G. presented tools and carried out the data analysis; P.W. and Z.C. wrote the paper. Z.C. and F.L. guided and revised the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Natural Science Foundation of China (NSFC) (61501260) and “1311 Talent Program” of NJUPT.

Acknowledgments

In this section you can acknowledge any support given which is not covered by the author contribution or funding sections. This may include administrative and technical support, or donations in kind (e.g., materials used for experiments).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chen, H.-B.; Jiang, S.; He, G.; Zhang, B.; Yu, H. TEANS: A Target Enhancement and Attenuated Nonmaximum Suppression Object Detector for Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2020, 1–5. [Google Scholar] [CrossRef]
  2. Borana, S.L.; Yadav, S.K.; Parihar, S.K. Hyperspectral Data Analysis for Arid Vegetation Species: Smart & Sustainable Growth. In Proceedings of the 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 18–19 October 2019; pp. 495–500. [Google Scholar]
  3. Wang, J.; Qin, Q.; Zhao, J.; Ye, X.; Qin, X.; Yang, X.; Wang, J.; Zheng, X.; Sun, Y. A knowledge-based method for road damage detection using high-resolution remote sensing image. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 3564–3567. [Google Scholar]
  4. Ren, Y.; Liu, Y. Geological disaster detection from remote sensing image based on experts’ knowledge and image features. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 677–680. [Google Scholar]
  5. Villalon-Turrubiates, I.E. Identification model for large remote sensing datasets applied to environmental analysis within Mexico. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 3019–3022. [Google Scholar]
  6. Ma, W.; Gong, C.; Hu, Y.; Meng, P.; Xu, F. The Hughes phenomenon in hyperspectral classification based on the ground spectrum of grasslands in the region around Qinghai Lake. In Proceedings of the International Symposium on Photoelectronic Detection and Imaging 2013: Imaging Spectrometer Technologies and Applications, Beijing, China, 30 August 2013; Volume 8910, p. 89101G. [Google Scholar]
  7. Gan, Y.; Luo, F.; Liu, J.; Lei, B.; Zhang, T.; Liu, K. Feature Extraction Based Multi-Structure Manifold Embedding for Hyperspectral Remote Sensing Image Classification. IEEE Access 2017, 5, 25069–25080. [Google Scholar] [CrossRef]
  8. Yuan, Y.; Lin, J.; Wang, Q. Hyperspectral Image Classification via Multitask Joint Sparse Representation and Stepwise MRF Optimization. IEEE Trans. Cybern. 2016, 46, 2966–2977. [Google Scholar] [CrossRef] [PubMed]
  9. Wang, Q.; Lin, J.; Yuan, Y. Salient Band Selection for Hyperspectral Image Classification via Manifold Ranking. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 1279–1289. [Google Scholar] [CrossRef] [PubMed]
  10. Li, S.; Jia, X.; Zhang, B. Superpixel-based Markov random field for classification of hyperspectral images. In Proceedings of the 2013 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Melbourne, VIC, Australia, 21–26 July 2013; pp. 3491–3494. [Google Scholar]
  11. Fauvel, M.; Benediktsson, J.A.; Chanussot, J.; Sveinsson, J.R. Spectral and Spatial Classification of Hyperspectral Data Using SVMs and Morphological Profiles. IEEE Trans. Geosci. Remote Sens. 2008, 46, 3804–3814. [Google Scholar] [CrossRef] [Green Version]
  12. Kang, X.; Li, S.; Benediktsson, J.A. Spectral–Spatial Hyperspectral Image Classification with Edge-Preserving Filtering. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2666–2677. [Google Scholar] [CrossRef]
  13. Pal, M.; Foody, G.M. Feature Selection for Classification of Hyperspectral Data by SVM. IEEE Trans. Geosci. Remote Sens. 2010, 48, 2297–2307. [Google Scholar] [CrossRef] [Green Version]
  14. Peng, J.; Zhou, Y.; Chen, C.L.P. Region-Kernel-Based Support Vector Machines for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4810–4824. [Google Scholar] [CrossRef]
  15. Li, S.; Zhu, X.; Liu, Y.; Bao, J. Adaptive spatial-spectral feature learning for hyperspectral image classification. IEEE Access 2019, 7, 61534–61547. [Google Scholar] [CrossRef]
  16. Makantasis, K.; Karantzalos, K.; Doulamis, A.; Doulamis, N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4959–4962. [Google Scholar]
  17. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  18. He, N.; Paoletti, M.E.; Haut, J.M.; Fang, L.; Li, S.; Plaza, A.; Plaza, J. Feature Extraction with Multiscale Covariance Maps for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 755–769. [Google Scholar] [CrossRef]
  19. Lee, H.; Kwon, H. Going Deeper with Contextual CNN for Hyperspectral Image Classification. IEEE Trans. Image Process. 2017, 26, 4843–4855. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Song, W.; Li, S.; Fang, L.; Lu, T. Hyperspectral image classification with deep feature fusion network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3173–3184. [Google Scholar] [CrossRef]
  21. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
  22. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
  23. Bai, Y.; Zhang, Q.; Lu, Z.; Zhang, Y. SSDC-DenseNet: A Cost-Effective End-to-End Spectral-Spatial Dual-Channel Dense Network for Hyperspectral Image Classification. IEEE Access 2019, 7, 84876–84889. [Google Scholar] [CrossRef]
  24. Zhong, Z.; Li, J.; Ma, L.; Jiang, H.; Zhao, H. Deep residual networks for hyperspectral image classification. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 1824–1827. [Google Scholar]
  25. Li, Y.; Zhang, H.; Shen, Q. Spectral–spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef] [Green Version]
  26. Zhang, C.; Li, G.; Du, S.; Tan, W.; Gao, F. Three-dimensional densely connected convolutional network for hyperspectral remote sensing image classification. J. Appl. Remote Sens. 2019, 13, 016519. [Google Scholar] [CrossRef]
  27. Wang, W.; Dou, S.; Jiang, Z.; Sun, L. A Fast Dense Spectral–Spatial Convolution Network Framework for Hyperspectral Images Classification. Remote Sens. 2018, 10, 1068. [Google Scholar] [CrossRef] [Green Version]
  28. Wu, P.; Cui, Z.; Gan, Z.; Liu, F. Three-Dimensional ResNeXt Network Using Feature Fusion and Label Smoothing for Hyperspectral Image Classification. Sensors 2020, 20, 1652. [Google Scholar] [CrossRef] [Green Version]
  29. Mu, C.; Guo, Z.; Liu, Y. A Multi-Scale and Multi-Level Spectral-Spatial Feature Fusion Network for Hyperspectral Image Classification. Remote Sens. 2020, 12, 125. [Google Scholar] [CrossRef] [Green Version]
  30. Ullah, I.; Manzo, M.; Shah, M.; Madden, M. Graph Convolutional Networks: Analysis, improvements and results. arXiv 2019, arXiv:191209592. [Google Scholar]
  31. Qin, A.; Shang, Z.; Tian, J.; Wang, Y.; Zhang, T.; Tang, Y.Y. Spectral–Spatial Graph Convolutional Networks for Semisupervised Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 241–245. [Google Scholar] [CrossRef]
  32. Ma, W.; Yang, Q.; Wu, Y.; Zhao, W.; Zhang, X. Double-Branch Multi-Attention Mechanism Network for Hyperspectral Image Classification. Remote Sens. 2019, 11, 1307. [Google Scholar] [CrossRef] [Green Version]
  33. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision –ECCV 2018, Munich, Germany, 8–14 September 2018; Volume 11211, pp. 3–19, ISBN 978-3-030-01233-5. [Google Scholar]
  34. Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
  35. Sun, H.; Zheng, X.; Lu, X.; Wu, S. Spectral-Spatial Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3232–3245. [Google Scholar] [CrossRef]
  36. Gao, H.; Yang, Y.; Yao, D.; Li, C. Hyperspectral Image Classification with Pre-Activation Residual Attention Network. IEEE Access 2019, 7, 176587–176599. [Google Scholar] [CrossRef]
  37. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  38. Wang, L.; Peng, J.; Sun, W. Spatial–Spectral Squeeze-and-Excitation Residual Network for Hyperspectral Image Classification. Remote Sens. 2019, 11, 884. [Google Scholar] [CrossRef] [Green Version]
  39. Feng, J.; Feng, X.; Chen, J.; Cao, X.; Zhang, X.; Jiao, L.; Yu, T. Generative Adversarial Networks Based on Collaborative Learning and Attention Mechanism for Hyperspectral Image Classification. Remote Sens. 2020, 12, 1149. [Google Scholar] [CrossRef] [Green Version]
  40. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 8–10 June 2015; pp. 3431–3440. [Google Scholar]
  41. Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar]
  42. Computational Intelligence Group of the Basque University (UPV/EHU). Hyperspectral Remote Sensing Scenes. Available online: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes (accessed on 30 April 2020).
  43. Xu, Y.; Zhang, L.; Du, B.; Zhang, F. Spectral–spatial unified networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5893–5909. [Google Scholar] [CrossRef]
Figure 1. A block of ResNet (Left) and ResNeXt with cardinality = 8 (Right). A layer is shown as (# in channels, filter size, # out channels).
Figure 1. A block of ResNet (Left) and ResNeXt with cardinality = 8 (Right). A layer is shown as (# in channels, filter size, # out channels).
Remotesensing 12 02035 g001
Figure 2. The structure of the SE block inserted into the ResNet.
Figure 2. The structure of the SE block inserted into the ResNet.
Remotesensing 12 02035 g002
Figure 3. The basic structure of the proposed residual attention module.
Figure 3. The basic structure of the proposed residual attention module.
Remotesensing 12 02035 g003
Figure 4. The structure of the residual group channel-wise attention (RGCA) module.
Figure 4. The structure of the residual group channel-wise attention (RGCA) module.
Remotesensing 12 02035 g004
Figure 5. The structure of the residual channel-wise attention (RCA) module.
Figure 5. The structure of the residual channel-wise attention (RCA) module.
Remotesensing 12 02035 g005
Figure 6. The structure of the residual group channel-wise attention (RGCA) module before simplification (left) and after simplification (right).
Figure 6. The structure of the residual group channel-wise attention (RGCA) module before simplification (left) and after simplification (right).
Remotesensing 12 02035 g006
Figure 7. The structure of the residual spatial-wise attention (RSA) module.
Figure 7. The structure of the residual spatial-wise attention (RSA) module.
Remotesensing 12 02035 g007
Figure 8. Overall HSI classification structure of the proposed residual group channel and space attention (RGCSA) network.
Figure 8. Overall HSI classification structure of the proposed residual group channel and space attention (RGCSA) network.
Remotesensing 12 02035 g008
Figure 9. General structure of the building block in RGCSA (taking block 1 as an example).
Figure 9. General structure of the building block in RGCSA (taking block 1 as an example).
Remotesensing 12 02035 g009
Figure 10. Classification results of the models in comparison with the IN dataset. (a) False color image. (b) Ground-truth labels, (cg) Classification results of SSRN, 3-D-ResNeXt, DBMA, SSAN, and RGCSA.
Figure 10. Classification results of the models in comparison with the IN dataset. (a) False color image. (b) Ground-truth labels, (cg) Classification results of SSRN, 3-D-ResNeXt, DBMA, SSAN, and RGCSA.
Remotesensing 12 02035 g010
Figure 11. Classification results of the models in comparison with the UP dataset. (a) False color image. (b) Ground-truth labels, (cg) Classification results of SSRN, 3D-ResNeXt, DBMA, SSAN, and RGCSA.
Figure 11. Classification results of the models in comparison with the UP dataset. (a) False color image. (b) Ground-truth labels, (cg) Classification results of SSRN, 3D-ResNeXt, DBMA, SSAN, and RGCSA.
Remotesensing 12 02035 g011
Figure 12. Classification results of the models in comparison with the KSC dataset. (a) False color image. (b) Ground-truth labels, (cg) Classification results of SSRN, 3D-ResNeXt, DBMA, SSAN, and RGCSA.
Figure 12. Classification results of the models in comparison with the KSC dataset. (a) False color image. (b) Ground-truth labels, (cg) Classification results of SSRN, 3D-ResNeXt, DBMA, SSAN, and RGCSA.
Remotesensing 12 02035 g012
Figure 13. OAs of four different attention mechanisms on the three HSI datasets.
Figure 13. OAs of four different attention mechanisms on the three HSI datasets.
Remotesensing 12 02035 g013
Figure 14. OAs of the SSRN, 3-D-ResNeXt, and RGCSA with different ratios of training samples for the IN dataset.
Figure 14. OAs of the SSRN, 3-D-ResNeXt, and RGCSA with different ratios of training samples for the IN dataset.
Remotesensing 12 02035 g014
Figure 15. OAs of the SSRN, 3-D-ResNeXt, and RGCSA with different ratios of training samples for the UP dataset.
Figure 15. OAs of the SSRN, 3-D-ResNeXt, and RGCSA with different ratios of training samples for the UP dataset.
Remotesensing 12 02035 g015
Figure 16. OAs of the SSRN, 3-D-ResNeXt, and RGCSA with different ratios of training samples for the KSC dataset.
Figure 16. OAs of the SSRN, 3-D-ResNeXt, and RGCSA with different ratios of training samples for the KSC dataset.
Remotesensing 12 02035 g016
Table 1. The network configuration of the proposed RGCSA network.
Table 1. The network configuration of the proposed RGCSA network.
LayerOutput SizeRGCSAConnected to
Input 16 × 16 × 200
CONVBN 16 × 16 × 100 ,   32 3 × 3 × 7 ,   32   c o n v
s = 1 ,   1 ,   2
Input
Block1 16 × 16 × 100 ,   64 3 × 3 × 3 ,   64   c o n v
same
CONVBN
Block2 8 × 8 × 50 ,   128 3 × 3 × 3 ,   128   c o n v
s = 2 ,   2 ,   2
Block1
Block3 4 × 4 × 25 ,   256 3 × 3 × 3 ,   256   c o n v
s = 2 ,   2 ,   2
Block2
Block4 2 × 2 × 13 ,   512 3 × 3 × 3 ,   512   c o n v
s = 2 ,   2 ,   2
Block3
GAP512 Block4
Dense (SoftMax)1616GAP
Table 2. The network configuration of the residual group channel-wise attention (RGCA) module.
Table 2. The network configuration of the residual group channel-wise attention (RGCA) module.
LayerOutput SizeRGCAConnected to
CONV3D 16 × 16 × 100 ,   8 Group1
CONVBN 16 × 16 × 100 ,   8 3 × 3 × 3 ,   8   c o n v
same
CONV3D
ReLU1 16 × 16 × 100 ,   8 CONVBN
GAP (Reshape) 1 × 1 × 1 ,   8 ReLU1
CONV3D 1 × 1 × 1 ,   2 1 × 1 × 1 ,   2 GAP
ReLU2 1 × 1 × 1 ,   2 CONV3D
CONV3D 1 × 1 × 1 ,   8 1 × 1 × 1 ,   8 ReLU2
Sigmoid 1 × 1 × 1 ,   8 CONV3D
Multiply 16 × 16 × 100 ,   8 Sigmoid, ReLU1
Add 16 × 16 × 100 ,   8 Multiply, CONV3D
Table 3. The network configuration of the residual spatial-wise attention (RSA) module.
Table 3. The network configuration of the residual spatial-wise attention (RSA) module.
LayerOutput SizeRSAConnected to
BN 16 × 16 × 100 ,   64 Concat
CONVBN 16 × 16 × 100 ,   64 3 × 3 × 1 ,   64   c o n v
same
BN
ReLU1 16 × 16 × 100 ,   64 CONVBN
CONV3D 8 × 8 × 100 ,   64 3 × 3 × 1 ,   64   c o n v
s = 2 ,   2 ,   1
ReLU1
ReLU2 8 × 8 × 100 ,   64 CONV3D
CONV3D 4 × 4 × 100 ,   128 3 × 3 × 1 ,   128   c o n v
s = 2 ,   2 ,   1
ReLU2
ReLU3 4 × 4 × 100 ,   128 CONV3D
Transposed Conv 8 × 8 × 100 ,   64 3 × 3 × 1 ,   64   c o n v
s = 2 ,   2 ,   1
ReLU3
ReLU4 8 × 8 × 100 ,   64 Transposed Conv
Transposed Conv 16 × 16 × 100 ,   64 3 × 3 × 1 ,   64   c o n v
s = 2 ,   2 ,   1
ReLU4
Sigmoid 16 × 16 × 100 ,   64 Transposed Conv
Multiply 16 × 16 × 100 ,   64 Sigmoid, ReLU1
Add 16 × 16 × 100 ,   64 Multiply, BN
Table 4. The number of training, validation, test, and total samples in the IN dataset.
Table 4. The number of training, validation, test, and total samples in the IN dataset.
No.ClassTrainValTestTotal Samples
1Alfalfa1413146
2Corn-notill4291318681428
3Corn-mintill24983498830
4Corn7222143237
5Grass-pasture14542296483
6Grass-trees22069441730
7Grass-pasture-mowed931628
8Hay-windrowed14455279478
9Oats641020
10Soybean-notill29294586972
11Soybean-mintill73726414542455
12Soybean-clean17856359593
13Wheat6226117205
14Woods3801367491265
15Buildings-Grass-Trees-Drives11634236386
16Stone-Steel-Towers2856093
Total 30811025614310,249
Table 5. The number of training, validation, test, and total samples in the UP dataset.
Table 5. The number of training, validation, test, and total samples in the UP dataset.
No.ClassTrainValTestTotal Samples
1Asphalt132767046346631
2Meadows3730181013,10918,649
3Gravel42024114382099
4Trees61333321183064
5Painted metal sheets2691349421345
6Bare Soil100650035235029
7Bitumen2661339311330
8Self-Blocking Bricks73736325823682
9Shadows19097660947
Total 8558428129,93742,776
Table 6. The number of training, validation, test, and total samples in the KSC dataset.
Table 6. The number of training, validation, test, and total samples in the KSC dataset.
No.ClassTrainValTestTotal Samples
1Scrub15378530761
2Willow swamp4929165243
3CP hammock5228176256
4Slash pine5131170252
5Oak/Broadleaf3318110161
6Hardwood4622161229
7Swamp21480105
8Graminoid marsh8745299431
9Spartina marsh10439377520
10Cattail marsh8140283404
11Salt marsh8439296419
12Mud flats10161341503
13Water18687654927
Total 104852136425211
Table 7. Training time, test time, and OA under different ratios on the IN dataset by the proposed method.
Table 7. Training time, test time, and OA under different ratios on the IN dataset by the proposed method.
RatiosTraining Time (s)Test Time (s)OA (%)AA (%)Kappa × 100
2:1:710,861.7899.9099.5299.2299.53
3:1:615,769.9385.9999.8799.8899.85
4:1:512,320.7872.5299.8699.7799.84
5:1:415,138.3059.0399.8699.7499.82
Table 8. Training time, test time, and OA under different ratios on the UP dataset by the proposed method.
Table 8. Training time, test time, and OA under different ratios on the UP dataset by the proposed method.
RatiosTraining Time (s)Test Time (s)OA (%)AA (%)Kappa × 100
2:1:725,837.94235.55100.099.9999.99
3:1:637,310.21205.7399.9799.9899.96
4:1:529,345.04171.8499.9899.9799.97
5:1:436,296.38135.8599.9899.9899.98
Table 9. Training time, test time, and OA under different ratios on the KSC dataset by the proposed method.
Table 9. Training time, test time, and OA under different ratios on the KSC dataset by the proposed method.
RatiosTraining Time (s)Test Time (s)OA (%)AA (%)Kappa × 100
2:1:75142.9547.34100.0100.0100.0
3:1:67292.3939.70100.099.9999.98
4:1:55779.2733.8299.9899.9899.99
5:1:47094.2328.2299.9999.9899.98
Table 10. Params, training time, test time, and OA for different numbers of groups G on the IN, UP, and KSC datasets.
Table 10. Params, training time, test time, and OA for different numbers of groups G on the IN, UP, and KSC datasets.
DatasetsGParamsTraining Time (s)Test Time (s)OA (%)
IN62,974,91211,923.8564.9699.54
84,489,12015,769.9385.9999.87
106,264,96021,401.03113.4799.87
UP62,972,22420,087.09183.9399.97
84,485,53625,837.94235.55100.0
106,260,48034,160.93299.1399.94
KSC62,973,7603887.3236.1699.97
84,487,5845142.9547.34100.0
106,263,0406734.4358.5699.97
Table 11. Classification results of different methods for the IN dataset.
Table 11. Classification results of different methods for the IN dataset.
SVMSSRN3D-ResNeXtDBMASSANRGCSA
OA (%)81.6799.4699.7998.1998.6499.87
AA (%)79.8493.0599.7196.3197.4599.92
Kappa × 10078.7699.3999.7097.9497.5099.85
196.78100.0100.0100.0100.0100.0
278.74100.0100.097.1097.65100.0
382.2699.0099.8099.0398.6999.80
499.0398.5998.5992.2096.9599.29
593.7599.6599.3099.2699.15100.0
685.96100.0100.098.2098.95100.0
740.00100.0100.081.2597.65100.0
891.80100.0100.0100.0100.0100.0
900100.085.7189.94100.0
1096.0097.44100.098.0099.14100.0
1170.9499.7399.5998.4699.1299.79
1274.7399.7299.7298.1598.9599.87
1399.04100.0100.0100.0100.0100.0
1494.2999.74100.099.7499.96100.0
1585.11100.0100.096.1298.14100.0
1696.7895.0098.3197.6797.33100.0
Table 12. Classification results of different methods for the UP dataset.
Table 12. Classification results of different methods for the UP dataset.
SVMSSRN3D-ResNeXtDBMASSANRGCSA
OA (%)90.5899.9799.9398.8899.05100.0
AA (%)92.9999.9699.9198.7198.9199.99
Kappa × 10087.2199.9699.9198.5098.64100.0
187.2499.8599.8599.3799.4599.98
289.93100.099.9999.7399.84100.0
386.48100.099.5999.1698.68100.0
499.9599.95100.098.2199.21100.0
595.7899.89100.0100.098.16100.0
697.6999.97100.097.4598.36100.0
795.44100.0100.0100099.11100.0
884.40100.099.7795.1298.26100.0
9100.0100.0100.099.3699.12100.0
Table 13. Classification results of different methods for the KSC dataset.
Table 13. Classification results of different methods for the KSC dataset.
SVMSSRN3D-ResNeXtDBMASSANRGCSA
OA (%)80.2999.9799.6799.7299.62100.0
AA (%)65.6499.9599.3099.4299.5399.99
Kappa × 10077.9899.9799.6399.5099.5899.99
192.16100.0100.0100.0100.0100.0
286.1699.4099.3897.1698.41100.0
342.55100.097.4098.4597.65100.0
467.69100.099.40100.099.4599.99
50100.095.761000100.0100.0
654.71100.0100.099.5899.69100.0
70100.0100.0100.0100.0100.0
865.12100.0100.099.56100.0100.0
967.82100.0100.0100.0100.099.99
1093.4100.0100.0100.0100.0100.0
11100.0100.0100.099.8999.42100.0
1283.75100.0100.0100.0100.0100.0
13100.0100.0100.097.7599.29100.0
Table 14. Params, training time, and test time for different attention mechanisms on the IN, UP, and KSC datasets.
Table 14. Params, training time, and test time for different attention mechanisms on the IN, UP, and KSC datasets.
DatasetsMethodsParamsTraining Time (s)Test Time (s)
IN3D-ResNeXt1,554,2883054.2920.68
RGCA2,736,99210,629.3455.43
RSA3,290,40012,593.9271.47
RGCSA4,489,12015,769.9385.99
UP3D-ResNeXt1,550,7046077.3054.85
RGCA2,733,40818,297.08155.33
RSA3,286,81618,801.38184.43
RGCSA4,485,53625,837.94235.55
KSC3D-ResNeXt1,552,7521253.4210.33
RGCA2,735,4563584.9630.23
RSA3,288,8644044.9938.97
RGCSA4,487,5845142.9547.34
Table 15. Training time and test time for different networks in the IN dataset.
Table 15. Training time and test time for different networks in the IN dataset.
Method 2:1:7 3:1:6 4:1:5 5:1:4
Training Time
(s)
SSRN942.891059.211110.621262.90
3D-ResNeXt2966.773054.292974.874035.33
RGCSA10,861.7815,769.9312,320.7815,138.30
Test Time
(s)
SSRN9.638.367.255.51
3D-ResNeXt25.8720.6816.6514.75
RGCSA99.9085.9972.5259.03
Table 16. Training time and test time for different networks in the UP dataset.
Table 16. Training time and test time for different networks in the UP dataset.
Method2:1:73:1:64:1:55:1:4
Training Time
(s)
SSRN2469.972821.262750.483282.24
3D-ResNeXt6077.307095.936857.428410.72
RGCSA25,837.9437,310.2129,345.0436,296.38
Test Time
(s)
SSRN25.0123.2417.8313.41
3D-ResNeXt54.8552.1839.3236.74
RGCSA235.55205.73171.84135.85
Table 17. Training time and test time for different networks in the KSC dataset.
Table 17. Training time and test time for different networks in the KSC dataset.
Method 2:1:7 3:1:6 4:1:5 5:1:4
Training Time
(s)
SSRN447.09643.90498.93610.44
3D-ResNeXt1253.421384.681345.301632.73
RGCSA5142.957292.395779.277094.23
Test Time
(s)
SSRN4.624.063.312.55
3D-ResNeXt10.338.667.225.75
RGCSA47.3439.7033.8228.22

Share and Cite

MDPI and ACS Style

Wu, P.; Cui, Z.; Gan, Z.; Liu, F. Residual Group Channel and Space Attention Network for Hyperspectral Image Classification. Remote Sens. 2020, 12, 2035. https://doi.org/10.3390/rs12122035

AMA Style

Wu P, Cui Z, Gan Z, Liu F. Residual Group Channel and Space Attention Network for Hyperspectral Image Classification. Remote Sensing. 2020; 12(12):2035. https://doi.org/10.3390/rs12122035

Chicago/Turabian Style

Wu, Peida, Ziguan Cui, Zongliang Gan, and Feng Liu. 2020. "Residual Group Channel and Space Attention Network for Hyperspectral Image Classification" Remote Sensing 12, no. 12: 2035. https://doi.org/10.3390/rs12122035

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop