Next Article in Journal
ECG Multi-Emotion Recognition Based on Heart Rate Variability Signal Features Mining
Previous Article in Journal
Optical Multimode Fiber-Based Pipe Leakage Sensor Using Speckle Pattern Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lightweight 3D Dense Autoencoder Network for Hyperspectral Remote Sensing Image Classification

1
Information and Communicaiton Schnool, Guilin University of Electronic Technology, Guilin 541004, China
2
Guangxi Key Laboratory of Precision Navigation Technology and Application, Guilin University of Electronic Technology, Guilin 541004, China
3
National & Local Joint Engineering Research Center of Satellite Navigation Positioning and Location Service, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(20), 8635; https://doi.org/10.3390/s23208635
Submission received: 27 September 2023 / Revised: 16 October 2023 / Accepted: 20 October 2023 / Published: 22 October 2023
(This article belongs to the Section Remote Sensors)

Abstract

:
The lack of labeled training samples restricts the improvement of Hyperspectral Remote Sensing Image (HRSI) classification accuracy based on deep learning methods. In order to improve the HRSI classification accuracy when there are few training samples, a Lightweight 3D Dense Autoencoder Network (L3DDAN) is proposed. Structurally, the L3DDAN is designed as a stacked autoencoder which consists of an encoder and a decoder. The encoder is a hybrid combination of 3D convolutional operations and 3D dense block for extracting deep features from raw data. The decoder composed of 3D deconvolution operations is designed to reconstruct data. The L3DDAN is trained by unsupervised learning without labeled samples and supervised learning with a small number of labeled samples, successively. The network composed of the fine-tuned encoder and trained classifier is used for classification tasks. The extensive comparative experiments on three benchmark HRSI datasets demonstrate that the proposed framework with fewer trainable parameters can maintain superior performance to the other eight state-of-the-art algorithms when there are only a few training samples. The proposed L3DDAN can be applied to HRSI classification tasks, such as vegetation classification. Future work mainly focuses on training time reduction and applications on more real-world datasets.

1. Introduction

Hyperspectral Remote Sensing Image (HRSI) contains abundant spectral and spatial information of ground objects, so it is widely used in land use and land cover [1,2], forestry [3,4], precision agriculture [5,6], environmental monitoring [7,8], and military surveillance [9,10], etc. In these applications, the task of classification of HRSI is a universal and significant process, whose purpose is to identify the class of ground object for every pixel, because the classification accuracy determines the effect of applications. Unfortunately, HRSI not only provides rich spectral and spatial features of ground objects, but also contains a large amount of redundant information, which increases the difficulty of feature extraction and reduces the classification accuracy. This is the so-called curse of dimensionality. In addition, it is expensive to specify the class of ground object for each pixel manually, so insufficient labeled pixels accelerate the difficulty of feature extraction. The difficulty in improving HRSI classification accuracy with insufficient labeled samples restricts these applications. In order to extract effective features and improve classification accuracy, a large number of algorithms have been proposed.
Early methods mainly reduced the spectral dimensionality by hand-designed features. Band selection aims to select partial bands to replace original data for features extraction through certain criteria, such as semantic information of bands [11], kernel similarity of discriminative information [12], and affinity propagation algorithm by unsupervised learning [13]. Although band selection methods can effectively reduce dimensionality of bands, the directly removed bands may contain more or less important information for classification, resulting in reduced classification accuracy.
More dimensionality reduction methods are based on Feature Extraction (FE), which maps the original spectral bands to a new feature domain by some algorithms. Based on Principal Component Analysis (PCA), which is a linear unsupervised statistical transformation, the Tensor PCA (TPCA) [14], Joint Group Sparse PCA (JGSPCA) [15], and Superpixelwise Kernel PCA (SuperKPCA) [16] are proposed for HRSI classification in the spectral domain. Morphological Attribute Profiles (MAP) [17] is another feature extraction method widely used in HRSI classification. Ye et al. [18] employed PCA and extended multiple attribute profiles (EMAP) to extract features. Liu et al. [19] combined MAP and deep random forest for small sample HRSI classification. Yan et al. [20] improved the 2D singular spectral analysis (2DSSA) for extracting global and local spectral features by fusing PCA and folded PCA (FPCA). Traditional FE methods can only extract shallow features. This makes it difficult to further improve the classification accuracy.
In recent years, Deep Learning (DL) has achieved significant success in image processing due to its powerful capabilities of deep feature extraction. Researchers are inspired to introduce DL methods into HRSI classification tasks and have achieved better classification results than traditional methods. Chen et al. [21] proposed a 1D autoencoder network to extract spatial features and spectral features, respectively. Mario et al. [22] employed 1D Stacked Autoencoders (SAE) with three layers of encoder and decoder for pixel-based classification. Bai et al. [23] proposed a two-stage multi-dimensional convolutional SAE for HRSI classification, which was composed of the SAE-1 sub-model based on 1D Convolutional Neural Network (CNN) and the SAE-2 sub-model based on 2D and 3D convolution operations. Zhao et al. [24] proposed a deep learning architecture by combining an SAE and 3D Deep Residual Network (3DDRN), where the SAE is designed for dimensionality reduction and the 3DDRN is used for extracting spatial–spectral joint features. Cheng et al. [25] proposed a Deep Two-stage Convolutional Sparse Coding Network (DTCSCNet) for HRSI classification without back propagation and a fine-tuning process. Although the models based on SAE can be trained without any labeled samples, the existence of the decoder restricts the increase in the number of layers in the encoder, as overfitting easily occurs when the depth of the SAE is too large. This constrains the improvement of feature extraction ability.
CNN is the most widely used model in HRSI classification [26]. Jacopo et al. [27] made a shallow 1D-CNN with only one hidden layer to achieve state-of-the-art performance by label-based data augmentation. Li et al. [28] proposed a DCNR composed of a deep cube 1D-CNN and a random forest classifier for extracting spectral and spatial information. From the perspective of kernel structure, 1D-CNNs are only suitable for extracting spectral features. If they are used to extract spatial features, the 2D spatial tensors must be converted into 1D vectors and this transformation will result in the loss of spatial information. In order to fully utilize the spatial information, 2D-CNNs are introduced for HRSI classification. Haque et al. [29] proposed multi-scale CNNs with three different sizes of 2D convolutional kernels to extract spectral and spatial features. Jia et al. [30] proposed an end-to-end deep 2D-CNN based on U-net which took the entire HSI as input instead of pixel patches. In 2D-CNNs, the spatial and spectral information are extracted separately. This ignores the fact that there is related information between spatial and spectral features. Gao et al. [31] proposed a lightweight spatial–spectral network which was composed of a 3D Multi-group Feature Extraction Module (MGFM) based on 3D-CNN and a 2D MGFM based on Depthwise Separable Convolution. Yu et al. [32] proposed a lightweight 2D-3D-CNN by combining 2D and 3D convolutional layers, which can extract the fused spatial and spectral features. Hueseyin et al. [33] proposed a 3D-CNN based on LeNet-5 model to extract spatial and spectral features from data processed by the PCA method. Due to the structural consistency between 3D convolutional kernels and 3D-cube HRSIs, 3D-CNNs can effectively extract spatial–spectral joint features. Unfortunately, the excellent feature extraction ability of 3D-CNNs requires sufficient labeled samples for training, but there is a high cost of labeling sample results in that there are insufficient labeled samples for training.
Recently, how to improve the feature extraction ability of 3D-CNNs with a small training sample size has become a research hotspot. Li et al. [34] proposed MMFN based on 3D-CNN, in which multi-scale architecture and residual blocks are introduced to fuse related information among different scale features and extract more discriminative features. Zhou et al. [35] proposed a Shallow-to-Deep Feature Enhancement (SDFE) model, which was composed of PCA, a shallow 3D-CNN (SSSFE), a channel attention residual 2D-CNN, and a Vision-Transformer network. Ma et al. [36] proposed a Multi-level Feature extraction Block (MFB) for spatial–spectral feature extraction and a spatial multi-scale interactive attention (SMIA) module for spatial feature enhancement. Paoletti et al. [37] proposed a pyramidal residual module architecture, which is used to build deep pyramidal residual networks for HRSI classification. In addition to residual structure, attention mechanism is also widely used to improve the feature extraction ability of CNNs. Zhu et al. [38] proposed a Residual Spectral–Spatial Attention Network (RSSAN) for HRSI classification. In this model, the raw data are sequentially processed through a spectral attention module, a spatial attention module, and a residual block. In the Cooperative Spectral–Spatial Attention Network (CS2ADN) [39], the spectral and spatial features are extracted by the independent spectral and spatial attention branches, respectively. Then, the fused features are further extracted by dense connection structure. Generally, there are two residual branches in a model, which are named the spatial residual branch and the spectral residual branch. Similarly, there are two attention branches named the spatial attention branch and the spectral attention branch, respectively, in a model based on attention mechanism. This structure composed of two independent branches cannot extract spatial–spectral joint features and increases the number of trainable parameters in the model. When the number of labeled samples is small, the trainable parameters of models cannot be fully trained and the classification accuracy will decrease.
To improve the HRSI classification accuracy-based deep learning models under a small number of labeled samples, a model with a small number of trainable parameters and robust deep feature extraction capability is necessary. Inspired by these studies, a Lightweight 3D Dense Autoencoder Network (L3DDAN) is proposed in this paper for HRSI classification. From the top-level architecture, the network is an SAE composed of an encoder for extracting features and a decoder for reconstructing data. First, the SAE is trained without any labeled samples through unsupervised learning. Then, all labeled samples are randomly divided into a training group, validating group and testing group. The fine-tuned encoder and trained classifier are completed with a small number of training groups and validating groups by supervised learning. Finally, the classification ability of the trained classifier is evaluated with the testing group. The experimental results indicate that the L3DDAN can extract deep spatial–spectral features with only a small number of labeled samples. Thanks to this, the high classification accuracy can still be achieved when there are insufficient labeled samples. The major contributions of this paper include the following:
(1)
A Lightweight 3D Dense Autoencoder Network (L3DDAN) is proposed for HRSI classification. The architecture of L3DDAN is an SAE based on 3D convolution operations and the Spectral–Spatial Joint Dense Block (S2DB) is introduced into the encoder to enhance the deep feature extraction ability. The high classification accuracy can be maintained when the number of training samples is small.
(2)
An SAE architecture is proposed to train the encoder by unsupervised learning. The encoder of the SAE composes 3D convolution operations and S2DB to extract deep features from HRSI. The decoder is implemented by 3D convolution operations. The SAE enables the L3DDAN to extract deep features from origin data without labeled samples.
(3)
The Spectral–Spatial Joint Dense Block (S2DB) is proposed to replace the traditional separated spatial residual branch and spectral residual branch. The S2DB not only avoids the loss of spectral–spatial joint features, but also reduces the number of trainable parameters in L3DDAN.
The rest of this paper is organized as follows. In Section 2, the detailed framework and related principle of L3DDAN are illustrated. In Section 3, the details and results of extensive experiments are provided. Finally, the conclusion is presented in Section 4.

2. Methodology

2.1. 3D Convolution Operation

The structure of HRSI data is a 3D cube that contains both spatial and spectral features. Figure 1 illustrates the principle of HRSI feature extraction by 1D, 2D, and 3D convolution operations, respectively. The 1D kernels in 1D-CNN can only extract spectral features and cannot extract spatial features. Similarly, only the spatial information in HRSIs can be extracted by the 2D convolution kernels. Only the 3D convolution kernels can extract both spatial and spectral features simultaneously, as they are structurally consistent with the 3D cube HRSI data. In order to improve the feature extraction ability of proposed model in this paper, all required convolution operations are implemented by 3D convolution kernels. This design enables the traditional separated spatial and spectral feature extraction branches to be replaced by just one block.
The 3D convolution operation can be formulated as follows:
v i , j x , y , z = f ( m h = 0 H i 1 w = 0 W i 1 s = 0 S i 1 k i , j , m h , w , s v ( i 1 ) , m ( x + h ) ( y + w ) ( z + s ) + b i , j )
where v i , j x , y , z indicates the activation value at position x , y , z in the jth feature map in the ith layer, m is the index of feature maps in the i 1 th layer, k i , j , m h , w , s is the value of the kernel at position h , w , s connected to the m th feature map in the preceding layer, b is the bias and f · is the activation function, the H , W and S is the height, width, and spectral dimension size of the kernel, respectively.
In the L3DDAN, all convolution operations are implemented by 3D convolution kernels to avoid the loss of spatial–spectral joint features as much as possible. In addition, 3D convolution operations are used for spectral dimensionality reduction. This design makes the hyperspectral data without any preprocess to be directly used as the input of L3DDAN.

2.2. Spectral–Spatial Joint Dense Block

He et al. [40] proposed the residual structure to solve the degradation problem of deep neural networks and make feature extraction easier without adding any learnable parameters. Many models [41] for HRSI classification have been proposed inspired by residual block. Generally there are two independent residual blocks in these models to extract spectral and spatial features, respectively. This structure of two separated branches not only fails to extract spectral–spatial joint features, but also increases the number of trainable parameters. In this paper, a Spectral–Spatial Joint Residual Block (S2RB) based on 3D convolution operations was proposed. In the S2RB shown in Figure 2, the input hyperspectral data cube x R N × W × H extracts features by 3D convolution operations to form output feature map F ( x ) R N × W × H . The spatial dimensions of x and F x are equal, and the spectral dimension is reduced from H to H . The introduction of a skip connection changes the mapping from F x to H x = F x + x .
To further improve the feature extraction ability of model, a Spectral–Spatial Joint Dense Block (S2DB) is proposed inspired by Dense Convolutional Network [42]. Different from S2RB, there are skip connections between each layer and all preceding layers in S2DB as shown in Figure 3. Consequently, the output feature maps of the first layer are formulated as H l = F 1 + x . This is the same as the output feature maps of S2RB. But for the second layer, the output feature maps of S2DB are formulated as H 2 = F 2 + F 1 + x . Compared to the H2 of S2RB, there is an additional term x in the H2 of S2DB.
Previous models typically are composed of two separate branches for extracting spectral and spatial features, respectively. Because both S2RB and S2DB can extract spatial–spectral joint features, single branch structure can be adopted in the L3DDAN.

2.3. Proposed Framework

The framework of proposed L3DDAN is shown in Figure 4. Structurally, the model consists of an encoder, decoder, and classifier. The training and testing process is as follows: (1) The SAE composed of the encoder and decoder is first trained by unsupervised learning without any labeled samples. The purposes of encoder and decoder are to extract deep features from unpreprocessed data and reconstruct input data according to the feature maps, respectively. (2) After the training of SAE, the output feature maps from encoder are fed into the classifier. The fine-tuning for trained encoder and the training for classifier are completed simultaneously through supervised learning with small number of labeled samples. The decoder, which function is only to complete the training of the encoder, is not involved in this step. (3) The model composed of encoder and classifier can perform classification tasks on the test dataset.
The encoder of SAE is composed of 3D convolution operations and S2DB. The input data of L3DDAN are the patches of raw data without any preprocess. The spectral dimensionality of input data is reduced by the first 3D convolution layer. The S2DB is used to extract spatial–spectral joint features from the output data of previous layer. The deep features of raw data are obtained from the output feature maps of S2DB by the final 3D convolution layer of encoder. The output features maps of encoder are reconstructed by the decoder of SAE which composes two 3D deconvolution layers. In the classifier, the feature maps from encoder are classified by full connection layer after the last 3D convolution operation. Table 1 lists the detailed structure of the L3DDAN.

3. Results and Discussion

3.1. Datasets Description

Three benchmark datasets were selected to validate the performance of the proposed L3DDAN for HRSI classification. The first dataset Indian Pines (IP) was acquired by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) instrument over a mixed vegetation site in northwestern Indiana, USA. It contains 16 cover types and 145 × 145 pixels with 220 bands covering from 0.4   μ m to 2.5   μ m . The noise-affected 20 bands (104–108, 150–163, 220) were removed and the retaining 200 bands were utilized for experiments. The labeled pixels of IP are randomly divided into training (3%), validation (3%), and testing (94%) groups. The second dataset University of Pavia (UP) was gathered by the Reflective Optics System Imaging Spectrometer (ROSIS) over the urban area of Pavia, northern Italy. It contains 610 × 340 pixels with 1.3 m spatial resolution and 103 spectral bands covering from 0.4   μ m to 1   μ m . All labeled pixels were divided into nine cover types. The third dataset Salinas Valley (SV) was captured over the Salinas Valley, CA, USA. It contains 512 × 217 pixels with 3.7 m/pixel spatial resolution and 204 bands in the wavelength ranging from 0.4   μ m to 2.5   μ m . The labeled pixels of UP and KSC are randomly divided into training (1%), validation (1%), and testing (98%) groups. The minimum number of samples per group is three for three datasets. The pseudocolor images and the ground-truth maps of three datasets are shown in Figure 5. The categories and pixel count for each dataset are listed in Table 2, Table 3 and Table 4.

3.2. Parameter Analysis

In the stage of SAE training, all samples were used for unsupervised learning without any labels. Then, all labeled samples are randomly divided into training group, validation group, and testing group. The training group was used for fine-tuning of encoder and training of classifier. The validation and testing groups were used to monitor the training process and evaluate the classification performance of L3DDAN, respectively. In the experiments, the overall accuracy (OA), average accuracy (AA), and Kappa coefficient [43] are used to quantitatively evaluate the classification performance. The proposed model was implemented by the Pytorch 1.10 framework (open source). All experiments were conducted on a PC (Lenovo, Shanghai, China) with Intel(R) Core i7-CPU, Nvidia Geforce GTX 3090 GPU and 64 GB RAM.

3.2.1. Effect Analysis of the S2DB

To evaluate the effectiveness of the S2DB in L3DDAN, two comparative networks named Model-1 and Model-2 were constructed. The Model-1 and Model-2 networks were constructed by replacing the S2DB in L3DDAN with normal 3D convolution operations and S2RB, respectively. The three networks are identical except for the above difference. Comparative experiments were conducted under identical parameters and the experimental results are shown in Figure 6. It can be seen that the highest classification accuracies were achieved on all three datasets by L3DDAN, followed by the Model-2. This indicates that the dense connection structure further improves the feature extraction ability of the network compared to residual skip connection.

3.2.2. Effect of the Patch Size

In the experiments, the small 3D neighboring patches P R S × S × L divided from HRSI data cube were used as the input of L3DDAN. The S is the number of pixels in spatial dimension and the L is the number of spectral bands. In this section, a series of comparative experiments were conducted to determine the value of S. The experimental results are shown in Figure 7.
As the spatial size increases from 3 to 17 , the classification accuracies first increase and then decrease. The OA reached the maximum value when the spatial size of patches was 7 , 9 , and 11 for the IP, UP, and SA datasets, respectively. This indicates that increasing the patch size appropriately can bring more spatial information, but oversize introduces noise and reduces the classification accuracy. The patch sizes 7 × 7 × L , 9 × 9 × L , and 11 × 11 × L were the optimal selections for the three datasets in the proposed network.

3.2.3. Impact of the Number of Training Epochs

To determine the number of training epochs, the loss and classification accuracies of training and validation groups vary with the number of training epochs on all three datasets shown in Figure 8. It can be seen that all curves converge after 500 epochs on the training group and validation group. This indicates that the deep features extracted from the training group are effective for the classification of the validation group. The number of training epochs was determined to be 500 for all experiments.

3.3. Performance Evaluation

3.3.1. Comparison of Classification Results

In this section, the proposed L3DDAN was compared with eight state-of-the-art algorithms including 1D-CNN [44], 2D-CNN [45], 3D-CNN [46], 3D-CAE [47], DBDA [48], DBMA [49], DSGSF [50], and AMGCFN [51]. These methods cover unsupervised learning and supervised learning algorithms with different architectures, such as 1D-CNN, 2D-CNN, 3D-CNN, autoencoder, and attention mechanism network. The architectures and hyperparameters of the above-mentioned comparative models are described in the corresponding published papers.
The classification results of the IP dataset are demonstrated in Table 5. From Table 5, it can be seen that the 1D-CNN achieves the worst results with only 75.67% OA. The main reason is considered that the spatial information is not utilized for classification. Compared with the 1D-CNN, the 2D-CNN, 3D-CNN, and 3D-CAE improve the OA to 83.48%, 84.08%, and 82.27%, respectively, because they consider both spectral and spatial information simultaneously. The accuracies of DBDA and DBMA are improved further by introducing attention mechanism and dense structure. For L3DDAN, the highest classification accuracy is achieved thanks to the autoencoder structure and spectral–spatial joint dense block.
The classification maps of different methods for IP datasets are shown in Figure 9. It can be observed that there are a large amount of misclassified pixels in (b), (c), (d) of Figure 9. In contrast, the classification map of L3DDAN is the most similar to the ground-truth.
The classification results of the UP dataset are reported in Table 6. From Table 6, it can be seen that the L3DDAN achieved the best performance on OA (99.31%), AA (98.67%), Kappa (0.9908), and 6 of 9 specific classes. The worst result is achieved by 1D-CNN and the results of DBDA and DBMA are similar.
Figure 10 shows classification maps of different methods for the UP dataset. More misclassified pixels appear in some classes with few labeled pixels such as Gravel and Bare Soil.
The categorized results for SV dataset are demonstrated in Table 7. From Table 7, it can be seen that the L3DDAN obtains the best results with 99.64% OA, 99.69% AA, and 0.9960 Kappa.
The classification maps for SV dataset are shown in Figure 11. There are obvious mislabeled pixels in areas of Vineyard—untrained and Grapes—untrained in classification maps of 1D-CNN, 2D-CNN, and DBMA. The classification map of the proposed method shows less mislabeled pixels than other methods.

3.3.2. Impact of Training Sample Size

Due to the limited number of HRSI labeled samples, the classification performance with a small number of training samples becomes particularly important. A series of experiments were conducted to explore the OA variations with different proportions of training samples for all methods. For the IP dataset, the training samples proportion is set to 1%, 3%, 5%, 10%, and 15%, and for the UP and SV datasets, they are set to 0.5%, 1%, 3%, 5%, and 7%. The experimental results are shown in Figure 12.
From Figure 12, the classification accuracy of 1D-CNN is the lowest in most cases and the increase in training sample number only improves limited OA. The reason is considered that the spatial features are discarded and the feature extraction capability of 1D-CNN is insufficient. The classification accuracies of other methods are close when the proportion of training samples is greater than 10% for the IP dataset and 3% for the UP and SV datasets. As the training samples proportion decreased, declines of all classification accuracies with different levels occurred. The proposed L3DDAN maintains the highest classification accuracies for all datasets. It indicates that the L3DDAN can extract more distinguishable features from limited training samples for classification.

3.3.3. Comparison of Parameter Quantity

The number of trainable parameters is an important indicator for evaluating the model. Table 8 lists the numbers of trainable parameters for all methods.
For all datasets, the number of trainable parameters of proposed L3DDAN is the second smallest. For the IP dataset, the trainable parameter number in L3DDAN has been reduced to 4.2%, 0.6%, 30.3%, 44.5%, 27.9%, 38.97%, and 46.23% of those in 2D-CNN, 3D-CNN, 3D-CAE, DBDA, DBMA, DSGSF, and AMGCFN, respectively. For the UP and SV datasets, the trainable parameter numbers of L3DDAN also have been significantly reduced.

3.4. Discussion

Based on the results in Table 5, Table 6 and Table 7, it is evident that the classification results of 1D-CNN (75.67% for IP, 86.25% for UP, and 89.19% for SV) are the lowest on all datasets. This indicates that the feature extraction ability of 1D convolution kernels is insufficient for HRSI, because it can only extract spectral features. By introduction of spatial features, the classification accuracies of 2D-CNN (83.48% for IP, 95.31% for UP, and 97.17% for SV) and 3D-CNN models (92.26% for IP, 92.26% for UP, and 94.42% for SV) have been significantly improved. It is notable that the classification results of 3D-CNN are lower than those of 2D-CNN on all datasets. This indicates that the feature extraction ability of models based on convolution operations for HRSI classification not only relates to the kernel type, but also to the model architecture. The classification accuracies of 3D-CAE (95.20% for UP and 96.12% for SV) are further improved by introducing the SAE into 3D-CNN. This indicates that the deep features can be learned by unlabeled data. Other compared methods significantly improved feature extraction capabilities by introducing a multi-branch structure. In DBDA and DBMA, both spatial and spectral branches are based on the attention mechanism. The difference between the two models is that the two branches are parallel in DBDA and serial in DBMA. In proposed L3DDAN, the two separated branches are replaced by a spatial–spectral joint branch to extract deep spatial–spectral joint features in HRSI. The classification accuracies of L3DDAN (97.65% for IP, 99.31% for UP, and 99.64% for SV) are higher than those of DBDA (94.97% for IP, 98.65% for UP, and 98.35% for SV) and DBMA (94.76% for IP, 98.62% for UP, and 97.57% for SV).
The experimental results of Figure 12 indicate that all methods except 1D-CNN can achieve satisfactory classification accuracies when there are sufficient training samples. As the training sample proportion decreases, the classification accuracies of 2D-CNN, 3D-CNN, and 3D-CAE decrease more significantly, even below that of 1D-CNN for UP and SV. Based on Table 8, the main reason for the above-mentioned results is that there are too many trainable parameters in 2D-CNN (4,013,386) and 3D-CNN (30,536,176) than 1D-CNN (72,216). The trainable parameter numbers of DBDA (382,326), DBMA (609,791), DUGS (436,791), and AMGCFN (368,217) are far fewer than 2D-CNN and 3D-CNN and these models can achieve better classification accuracies. The number of trainable parameters in L3DDAN is further reduced by the introduction of the spatial–spectral joint branch and it maintains the highest classification accuracies with the minimum number of training samples.

4. Conclusions

In this paper, a Lightweight 3D Dense Autoencoder Network is proposed for HRSI classification. The framework of L3DDAN is designed as an SAE to utilize unlabeled samples by unsupervised learning. In addition, the Spatial–Spectral Joint Dense Block is introduced to replace the traditional separated spatial and spectral feature extraction blocks. This architecture not only improves the spatial–spectral joint features extraction ability of L3DDAN, but also reduces the number of trainable parameters. Extensive experiments’ results demonstrate that the L3DDAN surpasses eight different framework state-of-the-art methods in the classification accuracies with a small number of training samples. In addition, the L3DDAN can still maintain excellent deep feature extraction ability when the number of training samples is decreased. This is extremely important for HRSI classification with limited labeled samples.
However, the introduction of SAE makes a significant increase in training time, because all samples are used for the training of the encoder and decoder. The training time of the proposed L3DDAN is greater than other methods except 3D-CAE. This process consumes significant computational resources. In addition, the generalizability of L3DDAN on more HRSI datasets still needs to be verified. The future directions of our work mainly focus on the above-mentioned two limitations: (1) The architecture of SAE will be optimized to reduce the time consumption; (2) The parameters will be optimized to make L3DDAN exhibit state-of-the-art performance on more datasets for application.

Author Contributions

Y.B.: Conceptualization, methodology, investigation, validation, formal analysis, visualization, writing—original draft. X.S.: Conceptualization, investigation, funding acquisition. Y.J.: Conceptualization, investigation, funding acquisition. W.F.: Resources, software, visualization, writing—review and editing. X.D.: Investigation, validation, formal analysis, visualization, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangxi Key Laboratory of Precision Navigation Technology and Application, Guilin University of Electronic Technology (No. DH202208, No. DH202215), the Project for Enhancing Young and Middle-aged Teacher’s Research Basic Ability in Colleges of Guangxi (2023KY0198) and the Science and Technology Major Project of Guangxi (No. AD22080061).

Institutional Review Board Statement

No applicable.

Informed Consent Statement

No applicable.

Data Availability Statement

All datasets used in this research are open accessible online (http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes, accessed on 15 October 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HRSIHyperspectral Remote Sensing Image
PCAPrincipal Component Analysis
MAPMorphological Attribute Profiles
DLDeep Learning
SAEStacked Autoencoder
3DDRN3D Deep Residual Network
CNNConvolutional Neural Network
CS2ADNCooperative Spectral–Spatial Attention Network
L3DDANLightweight 3D Dense Autoencoder Network
S2DBSpectral–Spatial Joint Dense Block
S2RBSpectral–Spatial Joint Residual Block
IPIndian Pines
AVIRISAirborne Visible Infrared Imaging Spectrometer
UPUniversity of Pavia
ROSISReflective Optics System Imaging Spectrometer
SVSalinas Valley
OAOverall Accuracy
AAAverage Accuracy
FEFeature Extraction

References

  1. Tan, X.; Xue, Z. Spectral-spatial multi-layer perceptron network for hyperspectral image land cover classification. Eur. J. Remote Sens. 2022, 55, 409–419. [Google Scholar] [CrossRef]
  2. Moharram, M.A.; Sundaram, D.M. Land use and land cover classification with hyperspectral data: A comprehensive review of methods, challenges and future directions. Neurocomputing 2023, 536, 90–113. [Google Scholar] [CrossRef]
  3. Zhang, B.; Zhao, L.; Zhang, X. Three-dimensional convolutional neural network model for tree species classification using airborne hyperspectral images. Remote Sens. Environ. 2020, 247, 111938. [Google Scholar] [CrossRef]
  4. Tong, F.; Zhang, Y. Spectral-Spatial and Cascaded Multilayer Random Forests for Tree Species Classification in Airborne Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 21764773. [Google Scholar] [CrossRef]
  5. Sethy, P.K.; Pandey, C.; Sahu, Y.K.; Behera, S.K. Hyperspectral imagery applications for precision agriculture—A systemic survey. Multimed. Tools Appl. 2022, 81, 3005–3038. [Google Scholar] [CrossRef]
  6. Lu, B.; Dao, P.D.; Liu, J.; He, Y.; Shang, J. Recent Advances of Hyperspectral Imaging Technology and Applications in Agriculture. Remote Sens. 2020, 12, 2569. [Google Scholar] [CrossRef]
  7. Stuart, M.B.; McGonigle, A.J.S.; Willmott, J.R. Hyperspectral Imaging in Environmental Monitoring: A Review of Recent Developments and Technological Advances in Compact Field Deployable Systems. Sensors 2019, 19, 3071. [Google Scholar] [CrossRef]
  8. Stuart, M.B.; Davies, M.; Hobbs, M.J.; Pering, T.D.; McGonigle, A.J.S.; Willmott, J.R. High-Resolution Hyperspectral Imaging Using Low-Cost Components: Application within Environmental Monitoring Scenarios. Sensors 2022, 22, 4652. [Google Scholar] [CrossRef] [PubMed]
  9. Shimoni, M.; Haelterman, R.; Perneel, C. Hyperspectral Imaging for Military and Security Applications Combining myriad processing and sensing techniques. IEEE Geosci. Remote Sens. Mag. 2019, 7, 101–117. [Google Scholar] [CrossRef]
  10. Gross, W.; Queck, F.; Voegtli, M.; Schreiner, S.; Kuester, J.; Boehler, J.; Mispelhorn, J.; Kneubuehler, M.; Middelmann, W. A Multi-Temporal Hyperspectral Target Detection Experiment—Evaluation of Military Setups. In Target and Background Signatures VII; Electr Network; SPIE: Bellingham, WA, USA, 2021. [Google Scholar]
  11. Sellami, A.; Farah, M.; Farah, I.R.; Solaiman, B. Hyperspectral Imagery Semantic Interpretation Based on Adaptive Constrained Band Selection and Knowledge Extraction Techniques. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2018, 11, 1337–1347. [Google Scholar] [CrossRef]
  12. Feng, J.; Jiao, L.; Sun, T.; Liu, H.; Zhang, X. Multiple Kernel Learning Based on Discriminative Kernel Clustering for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6516–6530. [Google Scholar] [CrossRef]
  13. Jia, S.; Ji, Z.; Qian, Y.; Shen, L. Unsupervised Band Selection for Hyperspectral Imagery Classification Without Manual Band Removal. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2012, 5, 531–543. [Google Scholar] [CrossRef]
  14. Ren, Y.; Liao, L.; Maybank, S.J.; Zhang, Y.; Liu, X. Hyperspectral Image Spectral-Spatial Feature Extraction via Tensor Principal Component Analysis. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1431–1435. [Google Scholar] [CrossRef]
  15. Khan, Z.; Shafait, F.; Mian, A. Joint Group Sparse PCA for Compressed Hyperspectral Imaging. IEEE Trans. Image Process. 2015, 24, 4934–4942. [Google Scholar] [CrossRef] [PubMed]
  16. Zhang, L.; Su, H.; Shen, J. Hyperspectral Dimensionality Reduction Based on Multiscale Superpixelwise Kernel Principal Component Analysis. Remote Sens. 2019, 11, 1219. [Google Scholar] [CrossRef]
  17. Ghamisi, P.; Dalla Mura, M.; Benediktsson, J.A. A Survey on Spectral-Spatial Classification Techniques Based on Attribute Profiles. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2335–2353. [Google Scholar] [CrossRef]
  18. Ye, Z.; Yan, Y.; Bai, L.; Hui, M. Feature Extraction Based on Morphological Attribute Profiles for Classification of Hyperspectral Image. In Proceedings of the Tenth International Conference on Digital Image Processing (ICDIP 2018), Shanghai, China, 11–14 May 2018; SPIE: Bellingham, WA, USA, 2018; Volume 10806. [Google Scholar]
  19. Liu, B.; Guo, W.; Chen, X.; Gao, K.; Zuo, X.; Wang, R.; Yu, A. Morphological Attribute Profile Cube and Deep Random Forest for Small Sample Classification of Hyperspectral Image. IEEE Access. 2020, 8, 117096–117108. [Google Scholar] [CrossRef]
  20. Yan, Y.; Ren, J.; Liu, Q.; Zhao, H.; Sun, H.; Zabalza, J. PCA-Domain Fused Singular Spectral Analysis for Fast and Noise-Robust Spectral-Spatial Feature Mining in Hyperspectral Classification. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5505405. [Google Scholar] [CrossRef]
  21. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep Learning-Based Classification of Hyperspectral Data. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  22. Jijón-Palma, M.E.; Kern, J.; Amisse, C.; Centeno, J.A.S. Improving stacked-autoencoders with 1D convolutional-nets for hyperspectral image land-cover classification. J. Appl. Remote Sens. 2021, 15, 26506. [Google Scholar] [CrossRef]
  23. Bai, Y.; Sun, X.; Ji, Y.; Fu, W.; Zhang, J. Two-stage multi-dimensional convolutional stacked autoencoder network model for hyperspectral images classification. Multimed. Tools Appl. 2023. [Google Scholar] [CrossRef]
  24. Zhao, J.; Hu, L.; Dong, Y.; Huang, L.; Weng, S.; Zhang, D. A combination method of stacked autoencoder and 3D deep residual network for hyperspectral image classification. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102459. [Google Scholar] [CrossRef]
  25. Cheng, C.; Peng, J.; Cui, W. A Two-Stage Convolutional Sparse Coding Network for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5501905. [Google Scholar] [CrossRef]
  26. Bai, Y.; Sun, X.; Ji, Y.; Huang, J.; Fu, W.; Shi, H. Bibliometric and visualized analysis of deep learning in remote sensing. Int. J. Remote Sens. 2022, 43, 5534–5571. [Google Scholar] [CrossRef]
  27. Jacopo, A.; Elena, M.; Lutgarde, M.C.B.; Thanh, T.; Twan, V.L. Spectral-Spatial Classification of Hyperspectral Images: Three Tricks and a New Learning Setting. Remote Sens. 2018, 10, 1156. [Google Scholar]
  28. Li, T.; Leng, J.; Kong, L.; Guo, S.; Bai, G.; Wang, K. DCNR: Deep cube CNN with random forest for hyperspectral image classification. Multimed. Tools Appl. 2019, 78, 3411–3433. [Google Scholar] [CrossRef]
  29. Haque, M.R.; Mishu, S.Z. Spectral-Spatial Feature Extraction Using PCA and Multi-Scale Deep Convolutional Neural Network for Hyperspectral Image Classification. In Proceedings of the 2019 22nd International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 18–20 December 2019; pp. 1–6. [Google Scholar]
  30. Jia, Z.; Lu, W. An End-to-End Hyperspectral Image Classification Method Using Deep Convolutional Neural Network With Spatial Constraint. IEEE Geosci. Remote Sens. Lett. 2021, 18, 1786–1790. [Google Scholar] [CrossRef]
  31. Gao, H.; Zhu, M.; Wang, X.; Li, C.; Xu, S. Lightweight Spatial-Spectral Network Based on 3D-2D Multi-Group Feature Extraction Module for Hyperspectral Image Classification. Int. J. Remote Sens. 2023, 44, 3607–3634. [Google Scholar] [CrossRef]
  32. Yu, C.Y.; Han, R.; Song, M.P.; Liu, C.Y.; Chang, C.I. A Simplified 2D-3D CNN Architecture for Hyperspectral Image Classification Based on Spatial-Spectral Fusion. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2020, 13, 2485–2501. [Google Scholar] [CrossRef]
  33. Firat, H.; Asker, M.E.; Bayindir, M.I.; Hanbay, D. Spatial-spectral classification of hyperspectral remote sensing images using 3D CNN based LeNet-5 architecture. Infrared Phys. Technol. 2022, 127, 104470. [Google Scholar] [CrossRef]
  34. Li, Z.; Huang, L.; He, J. A Multiscale Deep Middle-level Feature Fusion Network for Hyperspectral Classification. Remote Sens. 2019, 11, 695. [Google Scholar] [CrossRef]
  35. Zhou, L.; Ma, X.; Wang, X.; Hao, S.; Ye, Y.; Zhao, K. Shallow-to-Deep Spatial-Spectral Feature Enhancement for Hyperspectral Image Classification. Remote Sens. 2023, 15, 261. [Google Scholar] [CrossRef]
  36. Ma, Y.; Wang, S.; Du, W.; Cheng, X. An Improved 3D-2D Convolutional Neural Network Based on Feature Optimization for Hyperspectral Image Classification. IEEE Access. 2023, 11, 28263–28279. [Google Scholar] [CrossRef]
  37. Paoletti, M.E.; Mario Haut, J.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.J.; Pla, F. Deep Pyramidal Residual Networks for Spectral-Spatial Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 740–754. [Google Scholar] [CrossRef]
  38. Zhu, M.; Jiao, L.; Liu, F.; Yang, S.; Wang, J. Residual Spectral-Spatial Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 449–462. [Google Scholar] [CrossRef]
  39. Dong, Z.; Cai, Y.; Cai, Z.; Liu, X.; Yang, Z.; Zhuge, M. Cooperative Spectral-Spatial Attention Dense Network for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2021, 18, 866–870. [Google Scholar] [CrossRef]
  40. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  41. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral-Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
  42. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
  43. Sun, H.; Zheng, X.; Lu, X.; Wu, S. Spectral-Spatial Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3232–3245. [Google Scholar] [CrossRef]
  44. Wei, H.; Yangyu, H.; Li, W.; Fan, Z.; Hengchao, L.; Tianfu, W. Deep Convolutional Neural Networks for Hyperspectral Image Classification. J. Sens. 2015, 2015, 258619. [Google Scholar]
  45. Tun, N.L.; Gavrilov, A.; Tun, N.M.; Trieu, D.M.; Aung, H. Hyperspectral Remote Sensing Images Classification Using Fully Convolutional Neural Network. In Proceedings of the 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), St. Petersburg, Russia, 26–29 January 2021; pp. 2166–2170. [Google Scholar]
  46. Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef]
  47. Sun, Q.; Liu, X.; Bourennane, S. Unsupervised Multi-Level Feature Extraction for Improvement of Hyperspectral Classification. Remote Sens. 2021, 13, 1602. [Google Scholar] [CrossRef]
  48. Li, R.; Zheng, S.; Duan, C.; Yang, Y.; Wang, X. Classification of Hyperspectral Image Based on Double-Branch Dual-Attention Mechanism Network. Remote Sens. 2020, 12, 582. [Google Scholar] [CrossRef]
  49. Ma, W.; Yang, Q.; Wu, Y.; Zhao, W.; Zhang, X. Double-Branch Multi-Attention Mechanism Network for Hyperspectral Image Classification. Remote Sens. 2019, 11, 1307. [Google Scholar] [CrossRef]
  50. Guo, T.; Wang, R.; Luo, F.; Gong, X.; Zhang, L.; Gao, X. Dual-View Spectral and Global Spatial Feature Fusion Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5512913. [Google Scholar] [CrossRef]
  51. Zhou, H.; Luo, F.; Zhuang, H.; Weng, Z.; Gong, X.; Lin, Z. Attention Multihop Graph and Multiscale Convolutional Fusion Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5508614. [Google Scholar] [CrossRef]
Figure 1. Convolution operations: (a) 1D; (b) 2D; (c) 3D.
Figure 1. Convolution operations: (a) 1D; (b) 2D; (c) 3D.
Sensors 23 08635 g001
Figure 2. Spectral–spatial joint residual block.
Figure 2. Spectral–spatial joint residual block.
Sensors 23 08635 g002
Figure 3. Spectral–spatial joint dense block.
Figure 3. Spectral–spatial joint dense block.
Sensors 23 08635 g003
Figure 4. Framework of L3DDAN.
Figure 4. Framework of L3DDAN.
Sensors 23 08635 g004
Figure 5. The pseudocolor images and ground-truth: (a) IP; (b) UP; (c) SV.
Figure 5. The pseudocolor images and ground-truth: (a) IP; (b) UP; (c) SV.
Sensors 23 08635 g005
Figure 6. Effectiveness experiment of the S2DB.
Figure 6. Effectiveness experiment of the S2DB.
Sensors 23 08635 g006
Figure 7. Classification accuracies with different patch size.
Figure 7. Classification accuracies with different patch size.
Sensors 23 08635 g007
Figure 8. Loss and accuracy convergence versus epochs: (a) loss of training group; (b) accuracies of training group; (c) loss of validation group; (d) accuracies of validation group.
Figure 8. Loss and accuracy convergence versus epochs: (a) loss of training group; (b) accuracies of training group; (c) loss of validation group; (d) accuracies of validation group.
Sensors 23 08635 g008
Figure 9. Classification maps of different methods for the IP: (a) ground-truth; (b) 1D-CNN; (c) 2D-CNN; (d) 3D-CNN; (e) 3D-CAE; (f) DBDA; (g) DBMA; (h) DSGSF; (i) AMGCFN; (j) L3DDAN.
Figure 9. Classification maps of different methods for the IP: (a) ground-truth; (b) 1D-CNN; (c) 2D-CNN; (d) 3D-CNN; (e) 3D-CAE; (f) DBDA; (g) DBMA; (h) DSGSF; (i) AMGCFN; (j) L3DDAN.
Sensors 23 08635 g009aSensors 23 08635 g009b
Figure 10. Classification maps of different methods for the UP: (a) ground-truth; (b) 1D-CNN; (c) 2D-CNN; (d) 3D-CNN; (e) 3D-CAE; (f) DBDA; (g) DBMA; (h) DSGSF; (i) AMGCFN; (j) L3DDAN.
Figure 10. Classification maps of different methods for the UP: (a) ground-truth; (b) 1D-CNN; (c) 2D-CNN; (d) 3D-CNN; (e) 3D-CAE; (f) DBDA; (g) DBMA; (h) DSGSF; (i) AMGCFN; (j) L3DDAN.
Sensors 23 08635 g010aSensors 23 08635 g010b
Figure 11. Classification maps of different methods for the SV: (a) ground-truth; (b) 1D-CNN; (c) 2D-CNN; (d) 3D-CNN; (e) 3D-CAE; (f) DBDA; (g) DBMA; (h) DSGSF; (i) AMGCFN; (j) L3DDAN.
Figure 11. Classification maps of different methods for the SV: (a) ground-truth; (b) 1D-CNN; (c) 2D-CNN; (d) 3D-CNN; (e) 3D-CAE; (f) DBDA; (g) DBMA; (h) DSGSF; (i) AMGCFN; (j) L3DDAN.
Sensors 23 08635 g011
Figure 12. The impact of training samples proportions on classification accuracy for all datasets: (a) IP; (b) UP; (c) SV.
Figure 12. The impact of training samples proportions on classification accuracy for all datasets: (a) IP; (b) UP; (c) SV.
Sensors 23 08635 g012
Table 1. Network structure of L3DDAN.
Table 1. Network structure of L3DDAN.
LayerKernel SizeStrides
Encoder3D Convolution20@(1,1,7) 1(1,1,3)
3D Convolution10@(1,1,7)(1,1,1)
3D Convolution10@(1,1,7)(1,1,1)
3D Convolution10@(1,1,7)(1,1,1)
3D Convolution50@(1,1,65)(1,1,1)
Decoder3D Convolution100@(3,3,1)(1,1,1)
3D Convolution200@(3,3,1)(1,1,1)
Classifier3D pooling--
Full connection60@-
1 The value of 20 and (1,1,7) represents the number and size of kernel, respectively.
Table 2. The number of training, validation, and test samples in IP dataset.
Table 2. The number of training, validation, and test samples in IP dataset.
OrderClassTotal NumberTrainValidationTest
1Alfalfa463340
2Corn no-till142842421344
3Corn min-till8302424782
4Corn23777223
5Grass–pasture4831414455
6Grass–trees7302121688
7Grass–pasture–mowed283322
8Hay—windrowed4781414450
9Oats203314
10Soybean no-till9722929914
11Soybean min-till245573732309
12Soybean—clean5931717559
13Wheat20566193
14Woods126537371191
15Buildings–Grass–Trees–Drives3861111364
16Stone–Steel–Towers933387
Total10,2493073079635
Table 3. The number of training, validation, and test samples in UP dataset.
Table 3. The number of training, validation, and test samples in UP dataset.
OrderClassTotal NumberTrainValidationTest
1Asphalt663166666499
2Meadows18,64918618618,277
3Gravel209920202059
4Trees306430303004
5Painted metal sheets134513131319
6Bare Soil502950504929
7Bitumen133013131304
8Self-Blocking Bricks368236363610
9Shadows94799929
Total42,77642342341,930
Table 4. The number of training, validation, and test samples in SV dataset.
Table 4. The number of training, validation, and test samples in SV dataset.
OrderClassTotal NumberTrainValidationTest
1Broccoli–green–weeds_1200920201969
2Broccoli–green–weeds_2372637373652
3Fallow197619191938
4Fallow–rough–plow139413131368
5Fallow—smooth267826262626
6Stubble395939393881
7Celery357935353509
8Grapes—untrained11,27111211211,047
9Soil-vineyard—develop620362626079
10Corn-senesced–green–weeds327832323214
11Lettuce—romaine—4 wk106810101048
12Lettuce—romaine—5 wk192719191889
13Lettuce—romaine—6 wk91699898
14Lettuce—romaine—7 wk107010101050
15Vineyard—untrained726872727124
16Vineyard—vertical trellis180718181771
Total54,12953353353,063
Table 5. Classification results (%) for IP dataset with 3% training samples.
Table 5. Classification results (%) for IP dataset with 3% training samples.
Class1D-CNN2D-CNN3D-CNN3D-CAEDBDADBMADSGSFAMGCFNL3DDAN
110010087.5010010073.4710090.43100
265.5671.9275.5178.6494.4794.7896.0894.2097.76
372.2686.9784.5474.6396.5597.1387.0895.7294.30
448.6072.3290.1564.1088.1497.2796.3196.1693.39
583.4190.5682.8082.4893.4210099.1190.2299.76
684.8795.3888.9989.1095.2599.2799.4297.2399.42
751.5210036.0075.0070.9753.6676.1991.7062.86
888.1788.0599.7892.8010098.4710099.43100
910010044.0010065.0038.2450.0099.4466.67
1072.3471.6969.1582.4098.5098.7193.4393.2597.80
1176.7586.4287.7988.7494.6592.7498.1097.1399.04
1257.3672.2577.9059.4192.0487.6992.2390.6495.77
1392.7510098.3292.4999.4810095.9897.53100
1490.2994.0591.0887.9597.5698.1298.0598.9397.78
1563.8175.4190.8385.0384.7286.0298.3991.3796.73
1689.0191.7587.6585.3795.5610088.0491.1793.55
OA75.6783.4884.0883.2094.9794.7696.0295.5597.65
AA68.3169.8877.9676.1795.2993.6491.7794.9297.62
Kappa0.72220.81130.81760.80800.94260.94010.95450.94920.9733
Table 6. Classification results (%) for UP dataset with 1% training samples.
Table 6. Classification results (%) for UP dataset with 1% training samples.
Class1D-CNN2D-CNN3D-CNN3D-CAEDBDADBMADSGSFAMGCFNL3DDAN
191.3896.7289.9692.2099.2999.2697.8399.2399.49
288.5396.2296.3297.4899.7399.4499.7199.9799.86
373.0494.7778.6688.6295.9393.9999.6098.7998.30
490.8799.1994.1299.2799.6598.5999.6695.4997.44
599.3310097.3399.1899.7798.7399.9299.8399.92
681.4598.7288.2396.1599.6099.8899.8299.9999.98
766.3699.7777.4398.8198.7810096.9599.7999.62
876.2679.5891.2184.8991.6794.1592.7498.9497.00
998.9499.4588.0999.4097.4796.7597.7691.6699.34
OA86.2595.3192.2695.2098.6598.6298.6899.1999.31
AA84.7692.8888.0791.3997.5797.4598.2298.1998.67
Kappa0.81560.93740.89720.93600.98210.98170.98250.98930.9908
Table 7. Classification results (%) for SV dataset with 1% training samples.
Table 7. Classification results (%) for SV dataset with 1% training samples.
Class1D-CNN2D-CNN3D-CNN3D-CAEDBDADBMADSGSFAMGCFNL3DDAN
199.7910097.2499.0010010010099.99100
298.0399.9790.7499.8910099.8610099.99100
390.1798.6998.4093.4499.6499.6410010099.95
498.7598.9896.1596.1996.3392.7498.7598.7298.98
596.4497.9891.3392.6099.9296.8199.2298.3999.32
699.9299.9799.6599.9510010099.9599.9199.97
799.3899.4198.3310010010010099.97100
881.0893.5892.2996.2896.4393.5599.1798.3999.98
998.9799.6399.5098.8110010099.8010099.85
1090.1998.3896.7598.1998.8096.7498.6699.1199.78
1192.4996.6799.3291.6210099.9098.6899.4999.71
1296.7998.6598.8199.6810097.9399.7999.93100
1395.1499.8994.7996.1899.6598.7798.6799.2299.56
1496.3498.6899.8198.1295.3297.5497.6398.6696.85
1565.4692.5685.2187.3195.3597.7599.7598.7998.62
1698.3710010099.8310099.4310099.76100
OA89.1997.1794.4296.1298.3597.5799.5199.2699.64
AA93.7698.4295.6897.5398.8498.1299.3899.3999.69
Kappa0.87990.96850.93790.95680.98160.97290.99460.99180.9960
Table 8. The number of trainable parameters for all methods.
Table 8. The number of trainable parameters for all methods.
1D-CNN2D-CNN3D-CNN3D-CAEDBDADBMADSGSFAMGCFNL3DDAN
IP72,2164,013,38630,536,176561,472 1382,326609,791436,791368,217170,236 1
UP63,2494,012,1195,860,841561,017 1202,751324,376423,383343,61489,879 1
SV74,2164,013,3866,076,656561,472 1389,622621,407437,099369,009170,236 1
1 The parameters in decoders have not been considered, because the decoders are not used for classification.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bai, Y.; Sun, X.; Ji, Y.; Fu, W.; Duan, X. Lightweight 3D Dense Autoencoder Network for Hyperspectral Remote Sensing Image Classification. Sensors 2023, 23, 8635. https://doi.org/10.3390/s23208635

AMA Style

Bai Y, Sun X, Ji Y, Fu W, Duan X. Lightweight 3D Dense Autoencoder Network for Hyperspectral Remote Sensing Image Classification. Sensors. 2023; 23(20):8635. https://doi.org/10.3390/s23208635

Chicago/Turabian Style

Bai, Yang, Xiyan Sun, Yuanfa Ji, Wentao Fu, and Xiaoyu Duan. 2023. "Lightweight 3D Dense Autoencoder Network for Hyperspectral Remote Sensing Image Classification" Sensors 23, no. 20: 8635. https://doi.org/10.3390/s23208635

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop