Next Article in Journal
Assessment of the Urban Extreme Precipitation by Satellite Estimates over Mainland China
Next Article in Special Issue
A Cross-Channel Dense Connection and Multi-Scale Dual Aggregated Attention Network for Hyperspectral Image Classification
Previous Article in Journal
Characteristics of Dust Weather in the Tarim Basin from 1989 to 2021 and Its Impact on the Atmospheric Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Local and Global Spectral Features for Hyperspectral Image Classification

1
School of Earth Science, Zhejiang University, Hangzhou 310030, China
2
Key Laboratory of Geoscience Big Data and Deep Resource of Zhejiang Province, School of Earth Sciences, Zhejiang University, Hangzhou 310030, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(7), 1803; https://doi.org/10.3390/rs15071803
Submission received: 11 February 2023 / Revised: 25 March 2023 / Accepted: 27 March 2023 / Published: 28 March 2023
(This article belongs to the Special Issue Deep Learning for Hyperspectral Image Classification)

Abstract

:
Hyperspectral images (HSI) contain powerful spectral characterization capabilities and are widely used especially for classification applications. However, the rich spectrum contained in HSI also increases the difficulty of extracting useful information, which makes the feature extraction method significant as it enables effective expression and utilization of the spectrum. Traditional HSI feature extraction methods design spectral features manually, which is likely to be limited by the complex spectral information within HSI. Recently, data-driven methods, especially the use of convolutional neural networks (CNNs), have shown great improvements in performance when processing image data owing to their powerful automatic feature learning and extraction abilities and are also widely used for HSI feature extraction and classification. The CNN extracts features based on the convolution operation. Nevertheless, the local perception of the convolution operation makes CNN focus on the local spectral features (LSF) and weakens the description of features between long-distance spectral ranges, which will be referred to as global spectral features (GSF) in this study. LSF and GSF describe the spectral features from two different perspectives and are both essential for determining the spectrum. Thus, in this study, a local-global spectral feature (LGSF) extraction and optimization method is proposed to jointly consider the LSF and GSF for HSI classification. To increase the relationship between spectra and the possibility to obtain features with more forms, we first transformed the 1D spectral vector into a 2D spectral image. Based on the spectral image, the local spectral feature extraction module (LSFEM) and the global spectral feature extraction module (GSFEM) are proposed to automatically extract the LGSF. The loss function for spectral feature optimization is proposed to optimize the LGSF and obtain improved class separability inspired by contrastive learning. We further enhanced the LGSF by introducing spatial relation and designed a CNN constructed using dilated convolution for classification. The proposed method was evaluated on four widely used HSI datasets, and the results highlighted its comprehensive utilization of spectral information as well as its effectiveness in HSI classification.

Graphical Abstract

1. Introduction

Hyperspectral images (HSI) generally contain hundreds of subdivided spectral bands captured at a continuous wavelength [1]. Compared with multispectral images, HSI significantly increases the spectral dimension while retaining the spatial dimension. With rich spectral information, HSI enables the detection, identification, and discrimination of target materials at a more detailed level [2] and is widely used in geology [3,4], agriculture [5,6], environmental studies [7,8], quantitative inversion [9,10] and other fields [11]. Due to its powerful spectral characterization ability, HSI has also served as one of the most significant data sources in the remote sensing community, especially for some classification or identification applications, such as Land Use and Land Cover (LULC) mapping [12,13]. This rich spectral information brings unique advantages to HSI, but it also presents challenges for image processing. On the one hand, the HSI band number and data dimension have a geometric multiple increase compared with those of traditional multispectral images, which leads to a significant increase in the calculation as well as the curse of dimensionality [14]. The increased data volume also requires more samples to support, and the acquisition of refined labels is difficult [15]. On the other hand, the adjacent bands of remote sensing images are more strongly correlated to closer bands. The interval between bands in HSI is significantly reduced, resulting in strong redundancy and band correlation [16], which increases the difficulty of mining hidden information from HSI. Therefore, for HSI classification, it is vital to identify a method that can effectively use the rich spectral information.
Spectral information is typically represented by a spectral curve in an HSI. Thus, a large number of single curve analysis techniques are proposed to describe the spectral characteristics, and most of them contain the idea of dimensionality reduction. (1) Some methods use band selection to pick up the characteristics of the original spectral curve with a series of standards (e.g., spectral distance [17], spectral divergence [18] and spectral variance [19]) for subsequent processing. Demir et al. [20] used feature-weighting algorithms to obtain weights for all bands and selected bands with higher weights for classification. (2) Feature extraction methods describe and calculate the curve features, such as shape and variation, by designing a series of indices for classification (e.g., the spectral angle mapper (SAM)). Chen et al. [21] combined the SAM feature and maximum likelihood classification (MLC) with the magnitude and shape features for classification. They discovered that adding SAM could improve accuracy. He et al. [22] proposed a handcrafted feature extraction method based on multiscale covariance maps. The spectral features were obtained by computing the covariance matrices of the central pixels at various scales, where the values were the covariance of the spectral band pairs. They found that using the multiscale covariance maps as input features could greatly improve the classification accuracy. (3) In order to suppress the redundant information and highlight the useful information for a better description of the spectral curve, spectral transformation methods convert the original spectral space to another feature space via mathematical transformations. The most typical spectral transformation method is the principal component analysis (PCA). Jiang et al. [23] proposed a superpixel-wise PCA (SuperPCA) approach that considers the diversity in different homogeneous regions and was able to incorporate spatial context information. Although SuperPCA is an unsupervised method, its performance is comparable to supervised approaches. Fu et al. [24] proposed a PCA and segmented PCA (SPCA)-based multiscale 2-D-singular spectrum analysis (2-D-SSA) fusion method for joint spectral–spatial HSI feature extraction and classification. The method can extract multiscale spectral–spatial features and outperform other state-of-the-art feature-extraction methods. However, the above spectral information utilization methods describe the spectral characteristics based on manual design, which are generally limited in type and quantity, and it is also difficult to extract deeper and more representative information [25]. Thus, these methods have limitations in the face of the complex spectral information in HSI, and they are difficult to fully use the spectral information to describe the characteristics of the targets. In recent years, breakthroughs have been made in artificial intelligence. Data-driven deep learning methods can automatically learn features at different levels of the data for classification and have achieved remarkable success in the field of computer vision. As a typical deep learning model, convolutional neuronal network (CNN) has also been widely used to analyze the spectral curve [26] and has been shown to have the potential to surpass traditional methods that utilize manually designed features [27,28]. The 1D CNN uses 1D convolution kernels to extract features on the single spectral curve and combines them through a deep network structure. Hu et al. [29] built a 1D CNN consisting of five layers with weights for HSI classification. The experimental results demonstrated the effectiveness of the proposed method when compared with traditional methods such as SVM.
However, the linearly arranged single spectral curve limited the expression of spectral relationship because the spectral information cannot be effectively aggregated in this structure. In order to utilize more information, there are some methods that treat the spectral information as non-curve data for feature extraction and classification. Some methods consider the spectral information as a cube. The 2D CNN methods use 2D convolution kernels on a 3D HSI cube for feature extraction, mainly for better use of spatial information. Kussul et al. [30] compared the performances of 1D CNNs and 2D CNNs for land cover and crop classification. They showed that 2D CNNs outperformed 1D CNNs, although some small objects in the classification maps provided by 2D CNNs were smoothed and misclassified. Song et al. [31] proposed a deep 2D CNN based on residual learning and fused the outputs of different hierarchical layers. Their proposed network can extract deeper features and achieve state-of-the-art performance. The 3D CNN methods use 3D convolution kernels on a 3D HSI cube for feature extraction to fuse both spatial and spectral information effectively. Hamida et al. [32] introduced a 3D CNN for jointly processing spectral and spatial features, as well as establishing low-cost imaging. They also proposed a set of 3D CNN schemes and evaluated their feasibility. Li et al. [33] proposed a novel 3D CNN that takes full advantage of both spectral and spatial features. In addition, there are also some 3D CNNs mixed with other convolutional kernels to complement their advantages [34,35]. These CNN methods use a convolution operation to extract spectral features. However, the convolution operation focuses on extracting the features of adjacent data owing to their local perception characteristics. As a result, these methods mainly extract the local spectral features (LSF) for classification. The LSF reflect the local statistical information of the spectrum and describe the relationship between adjacent bands in a local neighborhood (i.e., the local change rate of the original curve). The advantage of LSF is that they can effectively express the characteristics of spectral changes as the wavelength gradually increases and reduces the noise interference. However, the spectral curve is expressed sequentially in a one-dimensional manner, and the distance between bands increases linearly with the increase in wavelength, which leads to the possibility that the distance between effective bands (features) may be large. Thus, the disadvantage of LSF is that they are limited to processing the long-distance spectral relationship, resulting in insufficient expression of complete spectral information. To address this issue, there are some methods that consider the spectral information as an image. Yuan et al. [36] reshaped the 1D spectral vector into a 2D spectral image. In the 2D spectral image, the long-distance spectra from the original 1D spectral vector can be aligned closely or even directly connected, which significantly expands the spectral coverage of the feature extraction window and is conducive to obtaining more spectral feature patterns. This method is conducive to constructing another feature, which is opposite to LSF and describes the relationship between non-adjacent bands in a long-distance span (i.e., the global shape of the original curve), named global spectral features (GSF). The advantage of GSF is that they can selectively construct spectral relationships between arbitrary bands to reflect the characteristics of targets (e.g., the Normalized Difference Water Index uses the long-distance spectral relationship of the green and near-infrared bands to describe the characteristics of water). However, it is challenging to identify a suitable band combination for GSF extraction among a large number of bands in an HSI.
In general, the LSF and GSF describe the spectral information at local and global levels, respectively, and are the embodiment of the characteristics of the targets in different aspects. Thus, for the feature extraction method, the LSF and GSF should be fully considered to effectively express and utilize spectral information. Considering the importance of LSF and GSF when utilizing spectral information, a local-global spectral feature (LGSF) extraction and optimization method is proposed in this study to effectively combine both LSF and GSF for HSI classification. First, we analyzed the limitations of the traditional spectral feature extraction strategy that uses the 1D spectral vector as input, and to obtain more diverse spectral features, the 1D spectral vector was transformed into a 2D spectral image for feature extraction. Second, to extract the LSF, convolution was used as a local feature descriptor to aggregate adjacent spectral statistical information. Third, to extract the GSF, all spectral bands were combined in pairs and modeled automatically by the fully connected layers to introduce the distance-independent GSF upon the LSF, and further fused to form the LGSF. Fourth, to increase the effectiveness of LGSF in classification, LGSF was optimized by maximizing the difference between classes and increasing feature separability. Fifth, based on the LGSF of each pixel, a dilated convolution-based network with multiple receptive fields was designed for classification, and the pixel class was obtained. Moreover, we noticed the importance of spatial information for HSI classification and enhanced the LGSF with spatial relation to comprehensively utilize spectral and spatial information. To demonstrate the efficiency of the proposed method, we evaluated it on four widely used HSI datasets with several comparison methods. The experimental results demonstrated that the proposed method significantly enhanced the classification accuracy with a more comprehensive use of spectral features.
The contributions of this study are as follows:
  • A hyperspectral image classification method combining the local and global spectral features is proposed in this paper.
  • We transformed the original spectrum from a 1D spectral curve into a 2D spectral image for feature extraction. The spectral reorganization enhances the spectral connection and is beneficial to obtain more diverse spectral features.
  • The image processing and feature extraction methods for the 2D image were used to analyze the spectral information, which could extract more sufficient and stable spectral features with higher class separability.
The rest of this article is organized as follows: In Section 2, the proposed method is introduced in detail. In Section 3, the experiment details are introduced, including the datasets, modeling parameters and comparison methods. In Section 4, the experiment results are analyzed and discussed. In Section 5, the content of the article is concluded.

2. Methods

2.1. Overview of the Proposed Method

The HSI contains rich spectral information and has a great potential for classifying targets. However, the hidden complex spectral features within HSI increase the difficulty of effectively using them. Although CNN can learn to extract features automatically, its local perception characteristic limits the extracted features to cover only the local spectral range (i.e., LSF) and ignores the long-distance global spectral range (i.e., GSF). Thus, in this study, both the LSF and GSF were extracted, combined, and optimized for HSI classification.
Figure 1 shows an overview of the proposed method. The spectral information of each pixel in the HSI corresponds to a 1D spectral vector or spectral curve. In this study, to increase the correlation between spectra, the original 1D spectral vector of each pixel was transformed into a more compact 2D spectral image for subsequent feature extraction and classification. Next, the local spectral feature extraction module (LSFEM) was designed to extract the LSF from the 2D spectral image, and the global spectral feature extraction module (GSFEM) was designed to extract the GSF from the extracted LSF and to join them to obtain the LGSF. Moreover, the loss function for spectral feature optimization (SFOL) was designed to further optimize the effectiveness of the extracted LGSF automatically, based on the idea of maximizing the separability between classes. Finally, the LGSF of each pixel was input into a classification network built using dilated convolutions to determine the category of the corresponding pixel. To improve the robustness of classification, we also introduced a spatial relation to enhance the LGSF of the pixels to be classified.
The proposed method consists of six parts, and their corresponding sections are as follows: (1) transformation from a 1D spectral vector into a 2D spectral image (Section 2.2); (2) extraction of LSF (Section 2.3); (3) extraction of GSF and combination of LGSF (Section 2.4); (4) optimization of LGSF (Section 2.5); (5) structure of classification network (Section 2.6); (6) spatial enhancement of the LGSF (Section 2.7).

2.2. Transformation of Spectrum from 1D to 2D

To enhance the spectral connection and obtain more complex and diverse spectral features, in this study, the spectral information represented by 1D form in HSI was first converted into 2D. The details of the transformation and feature comparison between 1D and 2D are as follows.
Traditional spectral feature extraction is generally based on a 1D spectral vector. However, in such a sequential 1D structure, the distance between bands increases linearly with the increase in wavelength, which leads to a long distance between the small and large wavelength bands. Because convolution is a local feature extractor, the long-distance relationships are difficult to capture based on the 1D structure. To solve this problem, we transformed the 1D spectral vector of the center pixel in the HSI patch into a 2D spectral image (shown by the black dotted lines in Figure 2) and used it as the basic spectral input data for the subsequent feature extraction. In the 2D spectral image, each band can be adjacent to more bands, which can be seen as the addition of a large number of shortcuts between long-distance bands in the 1D spectral vector. The green lines in Figure 2 show the feature extraction of the 1D spectral vector and 2D spectral image using the corresponding 1D and 2D convolution kernels. Theoretically, compared with the 1D combination, there are more spectral bands and larger spectral coverage in the 2D combination, which makes it more convenient to obtain complex and diverse spectral features.
Notably, the conversion of 1D spectral vector into the 2D spectral image requires square numbers of bands. If the size of the 1D spectral vector is 1 × b, we first interpolate the original 1D spectral vector to modify the band number to a specific square number, denoted as l2 (l ∈ Z), and then transform it into a 2D spectral image with size l × l.

2.3. Local Spectral Feature Extraction Module

LSF is an effective spectral feature that aggregates adjacent spectra and describes the local variation of the spectral curve. To extract the LSF, the LSFEM was proposed and its characteristics are presented below.
Considering that the convolution operation is a widely used local feature extractor that meets the requirements of LSF extraction, in this study, the LSFEM utilizes convolutions to extract LSF, as shown in Figure 3.
The input of the LSFEM is a 2D spectral image, denoted as I2D. Two groups of 2D convolution kernels were used to aggregate the local spectra upon I2D and obtain the LSF. The LSFEM calculation is express in Equation (1):
L S F 1 = C o n v 2 [ B ( C o n v 1 ( I 2 D ) ) ] L S F 2 = C o n v 4 [ B ( C o n v 3 ( I 2 D ) ) ]
where Conv1, Conv2, Conv3, and Conv4 are four convolution groups with N kernels; B is the batch normalization and ReLU operation; while LSF1 and LSF2 are two groups of LSF with both sizes of N × l × l. N is a hyperparameter and was set to 4 in this study.

2.4. Global Spectral Feature Extraction Module

GSF is another effective spectral feature that connects bands with spans and describes the relative shape of the spectral curve. To extract the GSF, the GSFEM was proposed and its characteristics are presented below.
Considering the characteristics of GSF, which only depend on the spectral values between bands and are independent of the distance, in the GSFEM, the spectral bands were scattered and combined in pairs. Due to the relationship complexity between bands, in the GSFEM, the data-driven method was used to directly and automatically learn the construction of inter-band features, rather than a manually designed method, as shown in Figure 4.
The input of the GSFEM is LSF1 and LSF2 and was obtained using the LSFEM, where each pixel represents an N-D feature of one band. Pixels in LSF1 and LSF2 were combined to model GSF. If the pixel of row i and column j in LSF1 is Pij, and the pixel of row i′ and column j′ in LSF2 is Pij, the combined spectral feature vector of these two pixels (bands) would be PPiijj, whose size is 2 × N. PPiijj represents an input into a network with two fully connected (FC) layers to compute the GSF of the two target bands. By traversing all combinations of bands in LSF1 and LSF2 and modeling them using the network, a feature map with a size of l2 × l2 can be obtained. The generation of this map includes the extraction of LSF and GSF and their fusion, which gives the map the potential to express local as well as global spectral features. Thus, we refer to it as LGSF. Equation (2) shows the modeling process of the pixels of rows a and b in LGSF from Pij and Pij in LSF1 and LSF2, where is the concatenation operation, FC1 and FC2 are the two FC layers, and R is the ReLU layer. Equation (3) shows the correspondence between subscript parameters.
P P ' i j i ' j ' = P i j P ' i ' j ' L G S F a b = F C 2 [ R ( F C 1 ( P P ' i j x y ) ) ]
a = i × l + j b = i ' × l + j '

2.5. The Loss Function for Spectral Feature Optimization

In LSFEM, the LSF is extracted through the convolution operation, whereas in GSFEM, the GSF is further superimposed on the LSF to obtain the final LGSF that contains both local and global feature information. Obviously, the quality of LGSF depends on the feature extraction from the LSFEM and GSFEM, and is controlled by internal learnable parameters. For classification tasks, an effective feature is supposed to have a high inner-class similarity but a low inter-class similarity. To move towards this goal, in this study, inspired by info noise contrastive estimation loss (infoNCE loss) in the contrastive learning domain, we designed the SFOL to constrain the update direction of parameters in LSFEM and GSFEM.
LGSF was considered as the input of a CNN (introduced in Section 2.6) for classification with a batch size of B. After a series of convolutions, pooling, and global average pooling layers, a group of feature vectors (FVs) with a size of B × K was obtained, where K is the channel number of the last convolution layer. The FVs are high-level abstract representations of LGSF and are directly input into a classifier (e.g., the FC layer) to determine the final classification result. Thus, it is necessary to maximize the differences between the classes of the FVs.
Next, if m and n are two training samples in one batch, the category consistency as well as similarity of their FVs were computed. By traversing all sample pairs, a matrix of category consistency (Mcc) and a matrix of similarity of FVs (Msim) can be obtained as described by Equations (4) and (5), whose sizes are both B × B.
M c c m n = 1 C ( m ) = = C ( n ) 0 C ( n )   ! = C ( n )
M s i m m n = N o r m ( F V m ) N o r m ( F V n T )
C, Norm, and represent the class of samples, normalized operations, and matrix multiplication, respectively.
Sample pairs with the same category were defined as positive samples, while those with different categories were defined as negative samples. Based on Mcc and Msim, the similarity between positive samples (SSpos) and negative samples (SSneg) can be computed using Equations (6) and (7):
S S p o s = M s i m × ( M c c I )
S S n e g = M s i m × ( 1 M c c )
where I is the identity matrix used to eliminate the influence of sample pairs composed of the same sample (diagonal elements).
The SSpos was averaged and concatenated with the SSneg to form a new similarity vector (SV). In SV, the first element is the average similarity of all positive samples, and the other elements are the similarities of the negative samples. To increase the similarity between positive samples and reduce the similarity between negative samples, the value of the first element of the SV should approach 1, whereas that of the other elements should approach 0. This is similar to the use of a one-hot code to represent the first class in a classification task. Therefore, we created a pseudo-classification task, whose input is the SV divided by a normalized parameter (i.e., the temperature parameter in the infoNCE loss and was set to 0.07, which is widely used in relevant studies [37]), and label is the one-hot code of the first class, to auxiliary optimize the parameters with the cross-entropy loss. In addition, the length of the SV may change significantly because the number of negative samples is not fixed in a batch. To fix the SV length, the top 20 negative samples with the highest similarity were selected because the negative samples with higher similarity were more likely to confuse the classifier and need more attention.

2.6. Dilated Convolution-Based Network

After the processing steps in Section 2.2, Section 2.3, Section 2.4 and Section 2.5 the spectral vector (spectral curve) of a single pixel is represented by a 2D LGSF image. This LGSF image was further classified to obtain the pixel categories. To better utilize the information hidden in the LGSF, a CNN based on dilated convolution was designed and used for classification, as the dilated convolution can significantly enlarge the receptive field and extract multiscale image features [38].
The overall architecture of the CNN is shown in Figure 5, which takes the LGSF as input and consists of five convolution layers (‘ConvLayer’ in Figure 5), one global average pooling layer (‘GAP’ in Figure 5), one fully connected layer (‘FC’ in Figure 5), and one softmax layer (‘Softmax’ in Figure 5). Each ConvLayer contains four dilated convolutions (‘Dconv’ in Figure 5, the numbers in parentheses represent the dilation rate, kernel size and output channel, respectively) with various dilation rates and kernel sizes but the same number of output channels to extract multiscale features. Following batch normalization and ReLU (‘BN + ReLU’ in Figure 5), these features were concatenated and fused by another dilated convolution. Max pool (‘Maxpool’ in Figure 5) was used to decrease the feature size, whose results are the output of the current ConvLayer as well as the input of the next ConvLayer. The features after the five ConvLayers as well as the GAP layer were fed into a FC layer for classification, which contains one linear layer with an input size of 16 and an output size that is the same as the class number. The cross-entropy loss function was used for optimization.

2.7. Enhancement of LGSF with Spatial Relation

In the above sections, we introduced the process of extracting the LGSF and using a CNN to obtain the classification results. However, this pattern is applicable only when the input is the spectral vector of a single pixel. Considering the noise interference in HSI, introducing spatial constraints on the LGSF has a significant positive effect on improving robustness.
To enhance the LGSF at the spatial level, we defined a spatial window in the HSI, and the LGSF of each pixel in the spatial window was combined. Suppose that the size of the HSI patch in the spatial window is H × W × N, where H and W are the height and width of the spatial window, respectively, and N is the band number of the HSI. The LGSF for each pixel in the HSI patch was extracted to a size of N × N × HW. We used a simple and effective fusion method, namely averaging, to directly combine and fuse the LGSF and obtain the spatially enhanced LGSF with a size of N × N × 1, as shown in Figure 6. The spatially enhanced LGSF was used as an input feature of the network introduced in Section 2.6 for classification, the result of which represents the class of the center pixel in the HSI patch.

3. Experiments

To verify the effectiveness of our proposed method and ensure its universality, we compared it with a series of classical and advanced HSI classification methods using four standard and widely used HSI datasets. Section 3.1 introduces the details of the four HSI datasets, as well as the preparation of the training and test samples. Section 3.2 introduces the training and modeling details as well as the experimental environment. Section 3.3 presents the comparison methods and evaluation metrics.

3.1. Datasets

1.
Houston 2013
The Houston 2013 dataset was provided by the IEEE Geoscience and Remote Sensing Society (GRSS). The HSI was acquired by the NSF-funded Center for Airborne Laser Mapping (NCALM) over the University of Houston campus and the neighboring urban area, which consists of 144 spectral bands from 380 to 1050 nm with a spatial resolution of 2.5 m and an image size of 349 × 1905. It also contains a total of 15,029 ground truth samples with 15 classes and a pre-defined training and test sample division strategy (details are listed in Table 1), which was also used in this study.
2.
Houston 2018
The Houston 2018 dataset was provided by the 2018 IEEE GRSS Data Fusion Contest and acquired by the National Center for Airborne Laser Mapping over the University of Houston campus and its neighborhood. The HSI covers a 380–1050 nm spectral wavelength range with 48 bands at a 1 m ground sampling distance of size 1202 × 4172. The ground truth samples contained 20 classes and had a higher spatial resolution but smaller spatial extent, with an image size of 1202 × 4768. Thus, the spatial extent and resolution of the HSI and the ground truth sample image were unified using clipping and resampling, resulting in an image size of 601 × 2384 and a total sample number of 504,712. In this study, we randomly selected 100 samples from each class for training and the rest were used for testing (details are listed in Table 2).
3.
Pavia University
The Pavia University dataset was acquired using the Reflective Optics System Imaging Spectrometer sensor over Pavia University, Northern Italy. The HSI has a size of 610 × 340, with 103 bands, a spectral wavelength range of 430–860 nm, and a spatial resolution of 1.3 m. The dataset contained 20 classes, with a total number of 42,776 ground truth samples. In this study, we randomly selected 100 samples from each class for training and the rest were used rest for testing (details are listed in Table 3).
4.
Salinas Valley
The Salinas Valley dataset was collected using the AVIRIS sensor over Salinas Valley, CA, USA. The image has a size of 512 × 217 pixels, with 204 bands, a spectral wavelength range of 360–2500 nm, and a spatial resolution of 3.7 m. The dataset contained 16 classes with a total number of 54,129 ground truth samples. In this study, we randomly selected 100 samples from each class for training and the rest were used for testing (details are listed in Table 4).

3.2. Method Modeling

All deep learning methods were trained with the following hyperparameters: a batch size of 32, a learning rate of 0.001 with a decay rate of 0.8 every 20 epochs; an optimizer of stochastic gradient descent (SGD) with a momentum of 0.9, weight decay of 0.0001, and a total training epoch of 500. All models were trained using HSI patches with a spatial size of 11 × 11.
All experiments in this study were implemented using Pytorch [39] on a single computer, and the environment was as follows: Windows operating system, Intel (R) Core (TM) i9-10900 K, 64 GB RAM, and GPU of NVIDIA GeForce RTX 3090 with 24 GB GPU memory.

3.3. Method Comparison

To verify the effectiveness of our proposed method, we compared it with classical and state-of-the-art HSI classification methods based on machine learning and deep learning. These methods focus on using different features in the HSI, the details of which are described as follows.
SVM [40]: The Support Vector Machine (SVM) is one of the most important machine learning methods widely used for classification. An SVM was implemented using the Python library of Sklearn. We used the ‘linear’ kernel function and the default values for the other parameters. This is a typical method that directly uses the original spectral curve for classification without feature extraction.
1D CNN [29]: Hu et al., performed pioneering work on HSI classification based on CNN. The proposed 1D CNN contains a 1D convolution layer, max pooling layer, and two-layer FC. The kernel number of the convolution layer was 20. This is a typical method used for extracting LSF for classification.
R2D [36]: Yuan et al., reshaped each pixel of the HSI from 1D into 2D to increase the spectral coverage of convolution and obtain more diverse features. The reshaped 2D images were used as input for classification using the same classification network proposed in this study. This method can be regarded as extracting and using an optimized LSF for classification.
2D CNN [41]: This 2D CNN contains three convolution groups and a two-layer FC. Each convolution group contains two convolution layers and one max-pooling layer. The kernel numbers of the six convolution layers were 32, 32, 64, 64, 128, and 128, the node numbers of the FC were 1024 and the class number. The input HSI was masked by a spectral attention module that can be obtained using global convolution with a nonlinear activation function. This is a typical method that focuses on spatial rather than spectral features for classification.
3D CNN [32]: Hamida et al., proposed and evaluated a set of 3D CNNs, which were widely used in subsequent studies. The 3D CNN contains four 3D convolutions layers, two 3D max pooling layers, and a one-layer FC. The kernel numbers of the 3D convolution layers were 20, 35, 35, and 35, respectively. This is a typical method that uses both spatial features and LSF for classification.
MCM [22]: He et al., constructed multiscale HSI cubes by increasing the size of the spatial windows of the center pixel. A covariance map was generated for each scale to represent the information of the central pixel, and the covariance maps obtained by various scales were used to generate multiscale covariance maps (MCM). MCMs were further used for classification using the same classification network proposed in this study. This is a typical method that uses spatial features and manually designed GSF for classification.
LGSF: The local-global spectral feature proposed in this study, which fully considers the LSF, GSF, and spatial features in the HSI. Moreover, the extraction and generation of LSF and GSF are both automatic and data-driven.
The following common metrics were used to evaluate the performance of the methods: producer’s accuracy (for each class, PA), overall accuracy (OA), average accuracy (AA), and Kappa coefficient (KC). Moreover, considering that the small number of training samples and the random initialization of parameters may introduce uncertain effects and cause the classification accuracy to fluctuate, each method was evaluated three times, and the mean accuracy was considered the final accuracy score.

4. Results and Discussion

Each method listed above was trained and modeled until convergence, and the PA, OA, AA, and KC were used to measure their accuracies. The PA describes the accuracy of each class, and a higher PA indicates a more accurate result for the corresponding class. The OA describes the accuracy of all pixels, and a higher OA indicates that more pixels are classified correctly. However, OA can be affected by classes in large numbers. AA is the average PA of each class, and a higher AA indicates a better accuracy in all classes. KC is a comprehensive evaluation index, and a higher KC indicates not only a higher accuracy but also less misclassification of each class.
Table 5, Table 6, Table 7 and Table 8 report the PA, OA, AA, and KC of all methods on the Houston 2013, Houston 2018, Pavia University, and Salinas Valley datasets, respectively. The highest accuracies are highlighted in bold. Overall, our proposed LGSF outperforms all comparison methods and has the highest OA, AA, and KC on the four datasets. The SVM has the lowest accuracy among the comparison methods. By combining the results of these methods with their characteristics, we observed that sufficiently describing the spectral features leads to a higher accuracy. Moreover, the comparison results show that introducing spatial information can promote the classification accuracy greatly. However, we also discovered that the spatial and spectral features are suitable for different classes, which means that spatial and spectral features should be selectively used according to class characteristics.

4.1. Comparison of Different Datasets

For the Houston 2013 dataset, the OA, AA, and KC of our LGSF were 85.30%, 87.17%, and 84.09%, respectively (Table 5). Compared with all baselines, LGSF showed great advantages in terms of overall statistical metrics. There were seven classes in which LGSF surpassed the other methods. Most classes in this dataset achieved 80% accuracy using LGSF, except for synthetic grass and water soil. The highest accuracy of these two classes was obtained using R2D and SVM, which are both methods based only on spectral information. Houston 2013 is a city dataset whose classes cover a wide range, and the spectral properties of classes vary greatly. Thus, this dataset was suitable for testing the applicability of these methods. Classification methods have poor applicability when they show high accuracy on a few classes, but low accuracy on other classes, such as SVM. Our LGSF surpassed the other methods on seven classes and has the highest AA, which reveals its effectiveness.
For the Houston 2018 dataset, the OA, AA, and KC of our LGSF were 76.09%, 65.24%, and 70.41%, respectively (Table 6), which were the highest among the evaluated methods. There were 12 classes in which LGSF surpassed the other methods, which was also the highest among all methods. In the Houston 2018 dataset, the accuracy fluctuated greatly among different methods, with a 50% difference between the highest and lowest accuracy, and there was one class in which each method had an accuracy of less than 10%. The reason may be that this dataset is relatively complex owing to its numerous classes and small band number, which is insufficient spectral information for distinguishing targets. The superiority of our method on such datasets verifies the effectiveness of the proposed method for spectral utilization.
For the Pavia University dataset, the OA, AA, and KC of our LGSF were 98.94%, 97.95%, and 98.59%, respectively (Table 7), which were the highest among the evaluated methods. There were six classes in which LGSF surpassed the other methods and the accuracies of the remaining classes were very close to the highest accuracy. The LGSF accuracy on each class was greater than 95%, which shows great advantages compared with other methods. The accuracy for this dataset was much higher than that for the Houston 2013 and Houston 2018 datasets, which are under the city scenario. The reason may be that the spatial coverage and category numbers of this dataset are both smaller than those of the above two datasets.
For the Salinas Valley dataset, the OA, AA, and KC of our LGSF were 98.23%, 99.13%, and 98.03%, respectively (Table 8). In this dataset, a few classes achieved 100% accuracy using various methods. LGSF had the best accuracy for nine classes, and the accuracies of most class were close to 99%. This may be because this dataset is a vegetation dataset, which is relatively simple compared to the above three datasets under the complex city scenario. Moreover, sufficient spectral information in this dataset (containing most bands among the four datasets) is also helpful for classification.

4.2. Comparison of Different Methods

Comparing the performances of methods that only use spectral information (i.e., SVM, 1D CNN, and R2D), we saw that the accuracy increases significantly with the refinement of the design of spectral features (SVM with no LSF, 1D CNN with LSF, and R2D with optimized LSF) on all datasets, which emphasizes the importance of LSF as well as the effectiveness of transforming the 1D spectral vector into a 2D spectral image.
Comparing the performances of methods that contain spatial information (i.e., 2D CNN, 3D CNN, MCM, and LGSF), it is surprising that 2D CNN has a better accuracy than 3D CNN on all datasets, which implies that simply combining spatial and spectral information using 3D convolution may not be optimal. We noticed that the difference in network structure (e.g., layer number and feature number) can also lead to this situation; however, it also means that 3D CNN requires a more refined design than 2D CNN.
Comparing the performance of methods with and without spatial information, we observed that there is considerable salt and pepper noise in the classification results of methods with no spatial information (Figure 7c–e, Figure 8c–e, Figure 9c–e and Figure 10c–e), whereas the classification results of methods with spatial information (Figure 7f–i, Figure 8f–i, Figure 9f–i and Figure 10f–i) are smoother. This demonstrates that introducing spatial information can improve robustness.
Comparing the performance of the LGSF with other methods, we observed that the MCM and LGSF exhibited the top two accuracies on almost all datasets. The common point between them is the use of GSF, highlighting its importance. LGSF surpasses MCM on all datasets, and MCM has an abnormal drop in the Houston 2013 dataset, which may be due to the lack of LSF or the limitation of manually designed GSF in MCM. It can be concluded that with the combination of LSF, GSF, and spatial information, our LGSF can obtain better accuracy of HSI classification. Moreover, in the Houston 2013 dataset, there was a large dark area caused by clouds (marked with a red box, where most classes were related to urban scenes). The spectral information of ground objects is strongly changed in this area, and most methods misclassified this area into classes that appear dark in the image, such as water (Figure 7d,e) or synthetic grass (Figure 7c,f,g). However, LGSF (Figure 7i) classified this area into residential and parking lots, which are close to the original classes (all belong to the urban scene). This may be because the GSF in LSGF describes the relative shape of the spectral curve, which shows a certain similarity between the urban classes. This also means that LGSF has a better spectrum understanding ability. A similar situation also appears in the stadium seat class (marked with a red box) in the Houston 2018 dataset. Most comparison methods misclassified these pixels into classes with similar artificial materials, such as cars and trains (Figure 8c,e–g), or even water (Figure 8d,h), because of the existence of shadows. Our LGSF (Figure 8i) retained a good classification ability and classified this area with high accuracy.

4.3. Comparison of Different Land Use Classes

The above datasets can be roughly divided into four common land use classes: artificial targets, vegetation, water, and bare soil.
For the artificial targets, most road classes (e.g., road and highway in Houston 2013, and roads, sidewalks, crosswalks, and highways in Houston 2018) are associated with higher accuracy when using methods that contain spatial information (e.g., 2D CNN) than those methods that rely only on spectral information (e.g., SVM, 1D CNN, and R2D). This may be due to the strong spectral similarities of these artificial roads, while their spatial texture features are more distinguishable than the spectral features. Thus, for road classes, the use of spatial features is more beneficial than spectral features. The same principles apply to residential classes whose spectral information can be messy and whose spatial information is more representative. However, for other artificial targets made of unique materials (e.g., the railway in Houston 2013 and Houston 2018), spectral-based methods have better classification capabilities than spatial-based methods.
For vegetation, mainly in the Salinas Valley and Pavia University datasets, we discovered that the difference in accuracy between methods is much smaller than that of other classes, which indicates that the classification accuracy of vegetation will not be too poor using any of the methods to classify the vegetation. However, under the demands of refined vegetation classification, it is still important to design appropriate spectral features to reflect the characteristics of different vegetation types. For example, the Lettuce_romaine_wk series in the Salinas Valley dataset has an accuracy of over 90% for most methods. However, methods that use LSF or GSF have more stable and accurate results.
For water, SVM obtained the highest accuracy in Houston 2013 (e.g., water soil) and Houston 2018 (e.g., water). However, the OA, AA, and KC of SVM were the lowest for both datasets. This is an interesting phenomenon, and it indicates that the water class has completely different characteristics from those of the other classes. MCM and LGSF obtained the second highest accuracy for the water class in Houston 2013 and Houston 2018, respectively, both of which contain GSF. Considering that SVM uses the original spectral curve and the GSF is a description of the overall shape of the spectral curve, we believe that more global spectral information (complete spectral information or GSF) should be used to classify water, rather than the local spectral information.
For bare soil (the soil in the Houston 2013, the bare earth in the Houston 2018, and bare soil in the Pavia University dataset), MCM and LGSF obtained the best results. In particular, the accuracy of MCM was the lowest in the Houston 2013 dataset; however, its accuracy for soil was over 99%. Both MCM and LGSF contain GSF, which means that GSF is a necessary feature for identifying bare land. Moreover, we also noticed that introducing spatial information is helpful for improving the accuracy of bare soil, as the accuracies of 2D CNN and 3D CNN were higher than those of SVM, 1D CNN, and R2D. This may be because the heterogeneity of bare soil is relatively high, and there are numerous pixels belonging to other classes in the bare soil, such as grass, which confuse classifiers. The above analysis shows that the classification of bare soil should rely on spatial information as well as GSF.

4.4. Comparison of Ablation Experiments

To further demonstrate the effectiveness and contribution of each component in this paper, a series of ablation experiments were conducted, details shown in Table 9. The AE_LGSF contained the complete components of this paper. Based on AE_LGSF, the AE_2D removed the 2D transformation and used the original 1D spectral vector; the AE_LSF removed the LSFEM; the AE_GSF removed the GSFEM; the AE_SFOL removed the SFOL and used only cross-entropy loss for classification; the AE_DCBN used the traditional convolution-based network instead of the dilated convolution-based network (DCBN) for classification with same number of feature maps in each layer; the AE_SPAT removed the spatial enhancement. By comparing each ablation experiment that removed a specific component with the AE_LGSF, the contribution of the removed component to the classification results can be verified. The results of these ablation experiments on four datasets are shown in Table 10, Table 11, Table 12 and Table 13. It can be seen from the tables, when any component is removed, that the classification accuracy decreases to a certain extent. In the following, we discuss the results from three perspectives (data input, feature extraction and feature classification), which cover the entire process of HSI classification.
For the data input part, the directly related experiment is the AE_SPAT. There is a large decrease in accuracy of AE_SPAT, which indicates the importance of spatial enhancement. The reason is that the extraction of spectral features is the statistical induction based on stable spectral patterns. However, the ubiquitous noise information in HSI makes the spectral value of a single pixel fluctuate within a certain range, resulting in a lack of stable characterization. Therefore, by introducing the spatial constraints and establishing a relationship between the spectra of surrounding pixels and central pixels, the effect of noise can be reduced and the statistical nature of the spectrum can be enhanced, thereby extracting more stable features and improving classification accuracy. Thus, it is important to using the spatial information to enhance and stabilize spectral information before feature extraction in HSI classification.
For the feature extraction part, the directly related experiments are the AE_2D, AE_LSF, AE_GSF and AE_SFOL. (1) The decrease of AE_2D reveals the positive effect of 2D transformation on spectral feature extraction, which can be explained in two perspectives. From a row perspective, in the 2D spectral image, the LSF that extracted by 1D convolution are still preserved in the 2D receptive field. Moreover, the relationship between LSF in different rows can be further mined and combined, which means that the 2D pattern can obtain more diverse spectral features without losing the LSF obtained by 1D pattern. From a column perspective, in the 2D spectral image, the band combinations of columns are regular, and express another LSF (i.e., LSF of uniformly spaced K bands, where K is the width of 2D image). This is similar with the idea of dilated convolution that enlarge the receptive fields with the dilated rate. In this way, the 2D pattern is conducive to capturing a wider range of LSF, and produces more diverse spectral features. (2) Comparing with the AE_LSF and AE_GSF (using the GSF and LSF for classification respectively), the AE_LSF outperformances the AE_GSF in three datasets, which indicates that the GSF may have a greater impact on classification than LSF. Therefore, when using CNN for HSI classification, it is necessary to jump out of the local receptive field of convolutional kernels and effectively design long-distance or distance-independent global features. (3) The decrease of AE_SFOL demonstrates the contribution of SFOL, which guides the network to automatically optimize LGSF for better category separability.
For the feature classification part, the directly related experiment is the AE_DCBN. The decrease of AE_DCBN indicates that based on good features, it is also important to use a good classifier for analysis and classification.

4.5. The Analysis and Discussion of Local and Global Characteristics of LGSF

Figure 11 shows the Pavia University dataset as an example of the extracted LGSF with the original spectral curve of each class. In LGSF maps, the pixels in each row and column represent the relationship between the spectra of the corresponding bands. For example, the pixel in rows 1 and columns 2 represents the relationship between bands 1 and 2. The redder areas indicate high values and can be roughly considered as activated areas, which contain more important features.
Analyzing the LGSF maps from a local perspective, we found that a larger local variation in the spectral curve causes a higher activation value. In Figure 11, the asphalt and gravel classes (Figure 11a,c) show activation in the LGSF maps where the corresponding reflectivity in the spectral curves fluctuate, while the shadow class (Figure 11i) activates the LGSF map on the left side of the spectral curve, where the reflectivity tends to decrease. This indicates that the local rapidly changing spectra in the spectral curve are more capable of representing the characteristics of the targets. This is consistent with the experience because the rapid change in a certain local part is generally caused by the unique nature of the targets. For example, for vegetation, the most typical feature is the rapid change in reflectivity between the red and near-infrared bands, which is also the basis of normalized difference vegetation index (NDVI). Bitumen is an interesting class (Figure 11g), whose reflectivity is generally flat, with only a slight jitter at the front and end of the curve. Even so, our method can effectively extract spectral features to represent the characteristics of the spectral curve.
Analyzing the LGSF maps from a global perspective, we found that the larger global variation in the spectral curve has a higher activation value, which is most obvious in the painted metal sheet class (Figure 11e). The spectral range of pixels with large activation values (on the left of the LGSF maps) corresponds to the reflection peak on the spectral curve. The spectra of the reflection peaks are greatly activated when combined with all other spectra, except for their adjacent spectra, because their values are similar to the surrounding spectra and differ greatly from others. The same phenomenon can also be observed in the meadows, trees, and bare soil classes (Figure 11b,d,f). The self-blocking bricks (Figure 11h) are a special class, whose reflectivity gradually increases with an increase in wavelength, which means that the farther the distance of the wavelength, the greater the difference in reflectivity. Therefore, in the LGSF maps, the activation value of the spectra with the farthest distance (the front and last bands) is the largest (the activation area on the top right), and the value gradually decreases with the decrease in spectral distance (the value gradually decreases from top to bottom and from right to left).

5. Conclusions

HSI contains rich spectral information and is widely used in a series of classification applications. However, the rich spectrum contained in HSI increases the difficulty of extracting useful hidden information for classification. The spectral features are considered to represent useful information in the spectrum as well as the basis of HSI classification. In this study, we summarize spectral features into two categories: LSF and GSF. The LSF describes the statistical information of the local and adjacent areas of the spectral curve, whereas the GSF describes the relative relationship between the long-distance and non-adjacent areas of the spectral curve. We demonstrated the importance of LSF and GSF when dealing with HSI and proposed a LGSF extraction and optimization method to extract and combine both. We first transformed the 1D spectral vector into 2D spectral images to increase the adjacency opportunities between spectra as well as increase the possibility of obtaining features with more forms. Next, the LSF was extracted using LSFEM, and the GSF was extracted using GSFEM upon the LSF to form the LGSF. The LGSF was optimized using the SFOL to maximize the class separability and was further enhanced with a spatial relation. A dilated convolution-based network was designed to obtain multiscale image features of LGSF and was used for HSI classification. We evaluated our method on four HSI datasets and compared it with several other methods that focus on various features for HSI classification. The experimental results showed that the proposed method achieved the highest accuracy compared with other methods that use single or incomplete LSF and GSF, which demonstrates that spectral information can be more effectively described after the extraction, combination, and optimization processes of local and global spectral features proposed in this article. Moreover, it also reveals that effective, full, and comprehensive use of spectral information can improve the classification accuracy of HSI and is of great significance to HSI application.

Author Contributions

Conceptualization, Z.X. and C.S.; methodology, Z.X. and S.W.; software, Z.X.; writing—original draft preparation, Z.X. and C.S.; supervision, C.S. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 42050103 and the National Key R&D Program of China, grant number 2018YFB0505002.

Data Availability Statement

The datasets used in this article are all publicly available. The Houston 2013 dataset was downloaded from https://hyperspectral.ee.uh.edu/?page_id=459, accessed on 11 June 2022. The Houston 2018 dataset was downloaded from https://hyperspectral.ee.uh.edu/?page_id=1075, accessed on 11 June 2022. The Pavia University and Salinas Valley datasets were downloaded from https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes, accessed on 11 June 2022.

Acknowledgments

We thank the institutions that collected the hyperspectral image datasets. We also thank the anonymous reviewers for their constructive comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yu, C.; Han, R.; Song, M.; Liu, C.; Chang, C. Feedback Attention-Based Dense CNN for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5501916. [Google Scholar] [CrossRef]
  2. Hong, D.; Wu, X.; Ghamisi, P.; Chanussot, J.; Yokoya, N.; Zhu, X.X. Invariant Attribute Profiles: A Spatial-Frequency Joint Feature Extractor for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3791–3808. [Google Scholar] [CrossRef] [Green Version]
  3. van Ruitenbeek, F.J.A.; van der Werff, H.M.A.; Hein, K.A.A.; van der Meer, F.D. Detection of pre-defined boundaries between hydrothermal alteration zones using rotation-variant template matching. Comput. Geosci. 2008, 34, 1815–1826. [Google Scholar] [CrossRef]
  4. Buckley, S.J.; Kurz, T.H.; Howell, J.A.; Schneider, D. Terrestrial lidar and hyperspectral data fusion products for geological outcrop analysis. Comput. Geosci. 2013, 54, 249–258. [Google Scholar] [CrossRef]
  5. Goel, P.K.; Prasher, S.O.; Patel, R.M.; Landry, J.A.; Bonnell, R.B.; Viau, A.A. Classification of hyperspectral data by decision trees and artificial neural networks to identify weed stress and nitrogen status of corn. Comput. Electron. Agric. 2003, 39, 67–93. [Google Scholar] [CrossRef]
  6. Zhang, X.; Sun, Y.; Shang, K.; Zhang, L.; Wang, S. Crop Classification Based on Feature Band Set Construction and Object-Oriented Approach Using Hyperspectral Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2016, 9, 4117–4128. [Google Scholar] [CrossRef]
  7. Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral Remote Sensing Data Analysis and Future Challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef] [Green Version]
  8. Benediktsson, J.A.; Palmason, J.A.; Sveinsson, J.R. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 2005, 43, 480–491. [Google Scholar] [CrossRef]
  9. Liu, H.; Zhu, H.; Wang, P. Quantitative modelling for leaf nitrogen content of winter wheat using UAV-based hyperspectral data. Int. J. Remote Sens. 2017, 38, 2117–2134. [Google Scholar] [CrossRef]
  10. Xie, J.; Wang, Q.; Liu, P.; Li, Z. A hyperspectral method of inverting copper signals in mineral deposits based on an improved gradient-boosting regression tree. Int. J. Remote Sens. 2021, 42, 5474–5492. [Google Scholar] [CrossRef]
  11. Paoletti, M.E.; Haut, J.M.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.J.; Pla, F. Deep Pyramidal Residual Networks for Spectral–Spatial Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 740–754. [Google Scholar] [CrossRef]
  12. Zhang, X.; Gao, Z.; Jiao, L.; Zhou, H. Multifeature Hyperspectral Image Classification with Local and Nonlocal Spatial Information via Markov Random Field in Semantic Space. IEEE Trans. Geosci. Remote Sens. 2018, 56, 1409–1424. [Google Scholar] [CrossRef] [Green Version]
  13. Ye, Z.; Chen, J.; Li, H.; Wei, Y.; Xiao, G.; Benediktsson, J.A. Supervised Functional Data Discriminant Analysis for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 841–851. [Google Scholar] [CrossRef]
  14. Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory. 1968, 14, 55–63. [Google Scholar] [CrossRef] [Green Version]
  15. Bhatti, U.A.; Yu, Z.; Chanussot, J.; Zeeshan, Z.; Yuan, L.; Luo, W.; Nawaz, S.A.; Bhatti, M.A.; Ain, Q.U.; Mehmood, A. Local Similarity-Based Spatial–Spectral Fusion Hyperspectral Image Classification with Deep CNN and Gabor Filtering. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
  16. Jia, X.; Richards, J.A. Efficient maximum likelihood classification for imaging spectrometer data sets. IEEE Trans. Geosci. Remote Sens. 1994, 32, 274–281. [Google Scholar] [CrossRef] [Green Version]
  17. Ifarraguerri, A.; Prairie, M.W. Visual Method for Spectral Band Selection. IEEE Geosci. Remote Sens. Lett. 2004, 1, 101–106. [Google Scholar] [CrossRef]
  18. Wang, J.; Chang, C. Independent component analysis-based dimensionality reduction with applications in hyperspectral image analysis. IEEE Trans. Geosci. Remote Sens. 2006, 44, 1586–1600. [Google Scholar] [CrossRef]
  19. Chang, C.I.; Du, Q.; Sun, T.L.; Althouse, M. A joint band prioritization and band-decorrelation approach to band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 1999, 37, 2631–2641. [Google Scholar] [CrossRef] [Green Version]
  20. Demir, B.; Ertürk, S. Phase correlation based redundancy removal in feature weighting band selection for hyperspectral images. Int. J. Remote Sens. 2008, 29, 1801–1807. [Google Scholar] [CrossRef]
  21. Chen, J.; Wang, R.; Wang, C. Combining magnitude and shape features for hyperspectral classification. Int. J. Remote Sens. 2009, 30, 3625–3636. [Google Scholar] [CrossRef]
  22. He, N.; Paoletti, M.E.; Haut, J.M.; Fang, L.; Li, S.; Plaza, A.; Plaza, J.O. Feature Extraction with Multiscale Covariance Maps for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 755–769. [Google Scholar] [CrossRef]
  23. Jiang, J.; Ma, J.; Chen, C.; Wang, Z.; Cai, Z.; Wang, L. SuperPCA: A Superpixelwise PCA Approach for Unsupervised Feature Extraction of Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4581–4593. [Google Scholar] [CrossRef] [Green Version]
  24. Fu, H.; Sun, G.; Ren, J.; Zhang, A.; Jia, X. Fusion of PCA and Segmented-PCA Domain Multiscale 2-D-SSA for Effective Spectral-Spatial Feature Extraction and Data Classification in Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
  25. Liu, Q.; Xiao, L.; Yang, J.; Wei, Z. CNN-Enhanced Graph Convolutional Network with Pixel- and Superpixel-Level Feature Fusion for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 8657–8671. [Google Scholar] [CrossRef]
  26. Zheng, Z.; Zhong, Y.; Ma, A.; Zhang, L. FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 5612–5626. [Google Scholar] [CrossRef]
  27. Safari, K.; Prasad, S.; Labate, D. A Multiscale Deep Learning Approach for High-Resolution Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2021, 18, 167–171. [Google Scholar] [CrossRef]
  28. Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN Feature Hierarchy for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 277–281. [Google Scholar] [CrossRef] [Green Version]
  29. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H.; Wu, T. Deep Convolutional Neural Networks for Hyperspectral Image Classification. J. Sens. 2015, 2015, 258619. [Google Scholar] [CrossRef] [Green Version]
  30. Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep Learning Classification of Land Cover and Crop Types Using Remote Sensing Data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
  31. Song, W.; Li, S.; Fang, L.; Lu, T. Hyperspectral Image Classification with Deep Feature Fusion Network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3173–3184. [Google Scholar] [CrossRef]
  32. Hamida, A.B.; Benoit, A.; Lambert, P.; Amar, C.B. 3-D Deep Learning Approach for Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4420–4434. [Google Scholar] [CrossRef] [Green Version]
  33. Li, Y.; Zhang, H.; Shen, Q. Spectral–Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef] [Green Version]
  34. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral—Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
  35. Zhang, J.; Wei, F.; Feng, F.; Wang, C. Spatial–Spectral Feature Refinement for Hyperspectral Image Classification Based on Attention-Dense 3D-2D-CNN. Sensors 2020, 20, 5191. [Google Scholar] [CrossRef]
  36. Yuan, S.; Song, G.; Huang, G.; Wang, Q. Reshaping Hyperspectral Data into a Two-Dimensional Image for a CNN Model to Classify Plant Species from Reflectance. Remote Sens. 2022, 14, 3972. [Google Scholar] [CrossRef]
  37. Wang, F.; Liu, H.P. Understanding the Behaviour of Contrastive Loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
  38. Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  39. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
  40. Chang, C.; Lin, C. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
  41. Mou, L.; Zhu, X.X. Learning to Pay Attention on Spectral Domain: A Spectral Attention Module-Based Convolutional Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 110–122. [Google Scholar] [CrossRef]
Figure 1. The overview of the proposed method.
Figure 1. The overview of the proposed method.
Remotesensing 15 01803 g001
Figure 2. The process of transforming 1D spectral vector into 2D spectral image and the comparison of features obtained using 1D and 2D convolution kernels.
Figure 2. The process of transforming 1D spectral vector into 2D spectral image and the comparison of features obtained using 1D and 2D convolution kernels.
Remotesensing 15 01803 g002
Figure 3. The architecture of the LSFEM. The input of the LSFEM is the 2D spectral image and the output of the LSFEM is two LSF groups.
Figure 3. The architecture of the LSFEM. The input of the LSFEM is the 2D spectral image and the output of the LSFEM is two LSF groups.
Remotesensing 15 01803 g003
Figure 4. GSFEM architecture. The input to the GSFEM is represented by two LSF groups, while the output is the LGSF.
Figure 4. GSFEM architecture. The input to the GSFEM is represented by two LSF groups, while the output is the LGSF.
Remotesensing 15 01803 g004
Figure 5. The architecture of the classification network. The input of the network is the LGSF, and the output of the network is the category of the pixel that generates the LGSF.
Figure 5. The architecture of the classification network. The input of the network is the LGSF, and the output of the network is the category of the pixel that generates the LGSF.
Remotesensing 15 01803 g005
Figure 6. The spatial enhancement process of LGSF.
Figure 6. The spatial enhancement process of LGSF.
Remotesensing 15 01803 g006
Figure 7. The classification results obtained using different methods on the Houston 2013 dataset. (a) Three-band composite image of the Houston 2013 HSI. (b) The ground truth of Houston 2013 HSI. The classification results of (c) SVM, (d) 1D CNN, (e) R2D, (f) 2D CNN, (g) 3D CNN, (h) MCM, and (i) LGSF.
Figure 7. The classification results obtained using different methods on the Houston 2013 dataset. (a) Three-band composite image of the Houston 2013 HSI. (b) The ground truth of Houston 2013 HSI. The classification results of (c) SVM, (d) 1D CNN, (e) R2D, (f) 2D CNN, (g) 3D CNN, (h) MCM, and (i) LGSF.
Remotesensing 15 01803 g007
Figure 8. The classification results obtained using different methods on the Houston 2018 dataset. (a) Three-band composite image of the Houston 2018 HSI. (b) The ground truth of Houston 2018 HSI. The classification results of (c) SVM, (d) 1D CNN, (e) R2D, (f) 2D CNN, (g) 3D CNN, (h) MCM, and (i) LGSF.
Figure 8. The classification results obtained using different methods on the Houston 2018 dataset. (a) Three-band composite image of the Houston 2018 HSI. (b) The ground truth of Houston 2018 HSI. The classification results of (c) SVM, (d) 1D CNN, (e) R2D, (f) 2D CNN, (g) 3D CNN, (h) MCM, and (i) LGSF.
Remotesensing 15 01803 g008
Figure 9. The classification results obtained using different methods on the Pavia University dataset. (a) Three-band composite image of the Pavia University HSI. (b) The ground truth of Pavia University HSI. The classification results of (c) SVM, (d) 1D CNN, (e) R2D, (f) 2D CNN, (g) 3D CNN, (h) MCM, and (i) LGSF.
Figure 9. The classification results obtained using different methods on the Pavia University dataset. (a) Three-band composite image of the Pavia University HSI. (b) The ground truth of Pavia University HSI. The classification results of (c) SVM, (d) 1D CNN, (e) R2D, (f) 2D CNN, (g) 3D CNN, (h) MCM, and (i) LGSF.
Remotesensing 15 01803 g009
Figure 10. The classification results obtained using different methods on the Salinas Valley dataset. (a) Three-band composite image of the Salinas Valley HSI. (b) The ground truth of Salinas Valley HSI. The classification results of (c) SVM, (d) 1D CNN, (e) R2D, (f) 2D CNN, (g) 3D CNN, (h) MCM, and (i) LGSF.
Figure 10. The classification results obtained using different methods on the Salinas Valley dataset. (a) Three-band composite image of the Salinas Valley HSI. (b) The ground truth of Salinas Valley HSI. The classification results of (c) SVM, (d) 1D CNN, (e) R2D, (f) 2D CNN, (g) 3D CNN, (h) MCM, and (i) LGSF.
Remotesensing 15 01803 g010
Figure 11. The LGSF and spectral curves of different classes from the Pavia University dataset.
Figure 11. The LGSF and spectral curves of different classes from the Pavia University dataset.
Remotesensing 15 01803 g011
Table 1. The number of training and test samples used in the Houston 2013 dataset.
Table 1. The number of training and test samples used in the Houston 2013 dataset.
Class No.Class NameTraining SamplesTest Samples
1Healthy grass1981053
2Stressed grass1901064
3Synthetic grass192505
4Trees1881056
5Soil1861056
6Water Soil182143
7Residential1961072
8Commercial1911053
9Road1931059
10Highway1911036
11Railway1811054
12Parking Lot 11921041
13Parking Lot 2184285
14Tennis Court181247
15Running Track187473
Total283212,197
Table 2. The number of training and test samples used in the Houston 2018 dataset.
Table 2. The number of training and test samples used in the Houston 2018 dataset.
Class No.Class NameTraining SamplesTest Samples
1Healthy grass1009699
2Stressed grass10032,402
3Artificial turf100584
4Evergreen trees10013,488
5Deciduous trees1004948
6Bare earth1004416
7Water100166
8Residential buildings10039,662
9Non-res. buildings100223,584
10Roads10045,710
11Sidewalks10033,902
12Crosswalks1001416
13Major thoroughfares10046,258
14Highways1009749
15Railways1006837
16Paved parking lots10011,375
17Unpaved parking lots10049
18Cars1006478
19Trains1005265
20Stadium seats1006724
Total2000502,712
Table 3. The number of training and test samples used in the Pavia University dataset.
Table 3. The number of training and test samples used in the Pavia University dataset.
Class No.Class NameTraining SamplesTest Samples
1Asphalt1006531
2Meadows10018,549
3Gravel1001999
4Trees1002964
5Painted metal sheets1001245
6Bare Soil1004929
7Bitumen1001230
8Self-Blocking Bricks1003582
9Shadows100847
Total90041,876
Table 4. The number of training and test samples used in the Salinas Valley dataset.
Table 4. The number of training and test samples used in the Salinas Valley dataset.
Class No.Class NameTraining SamplesTest Samples
1Brocoli_green_weeds_11001909
2Brocoli_green_weeds_21003626
3Fallow1001876
4Fallow_rough_plow1001294
5Fallow_smooth1002578
6Stubble1003859
7Celery1003479
8Grapes_untrained10011,171
9Soil_vinyard_develop1006103
10Corn_senesced_green_weeds1003178
11Lettuce_romaine_4wk100968
12Lettuce_romaine_5wk1001827
13Lettuce_romaine_6wk100816
14Lettuce_romaine_7wk100970
15Vinyard_untrained1007168
16Vinyard_vertical_trellis1001707
Total160052,529
Table 5. Classification results of different methods on the Houston 2013 dataset. (shown in %).
Table 5. Classification results of different methods on the Houston 2013 dataset. (shown in %).
Class NameSVM1-D CNNR2D2-D CNN3-D CNNMCMLGSF
Healthy grass93.2095.7898.5098.1998.0696.4699.46
Stressed grass99.0998.0898.5398.3194.0148.5898.03
Synthetic grass21.7868.0897.5328.1345.3594.5758.57
Trees98.1380.0498.7698.6398.4767.8887.45
Soil85.7392.9191.8897.5695.4299.0399.74
Water Soil100.0014.2822.9846.5045.9686.1065.58
Residential52.4062.9965.3789.0973.8540.3580.11
Commercial94.1969.2179.6979.6264.2772.1994.25
Road44.9870.2583.6693.5179.7288.9285.86
Highway58.9566.2887.5674.0470.1773.2285.81
Railway45.3159.1181.4375.0060.9726.4286.15
Parking Lot 18.5371.2590.0677.7168.5977.9592.15
Parking Lot 219.2718.7641.6277.8644.8967.4782.15
Tennis Court50.1056.4288.4390.7185.7687.4098.68
Running Track99.3698.1799.9392.8697.7488.1293.52
OA59.2169.6580.8980.2672.4953.7285.30
AA64.7368.1181.7381.1874.8874.3187.17
KC56.0667.3179.3878.6670.2749.5984.09
Table 6. Classification results of different methods on the Houston 2018 dataset. (shown in %).
Table 6. Classification results of different methods on the Houston 2018 dataset. (shown in %).
Class No.SVM1-D CNNR2D2-D CNN3-D CNNMCMLGSF
Healthy grass51.9262.8779.1869.8668.0371.8165.40
Stressed grass60.4469.8089.5691.6886.1792.3791.58
Artificial turf2.312.1284.4477.5063.9996.7796.05
Evergreen trees75.4473.8977.2482.3586.3481.0775.17
Deciduous trees10.9811.5232.5626.7320.3428.7432.90
Bare earth7.918.7059.9543.6839.6285.7989.81
Water100.007.4225.9020.9212.5439.9746.41
Residential buildings44.2146.6365.1365.5263.7964.5975.19
Non-res. buildings98.3198.7596.5596.9095.9697.2998.04
Roads27.2918.6452.5859.0544.6256.7159.36
Sidewalks36.4730.0047.6151.0147.6450.4543.56
Crosswalks1.581.894.785.144.156.476.67
Major thoroughfares38.9428.0161.3565.9658.5170.0978.40
Highways13.0815.5038.5554.2539.8050.5661.75
Railways13.9714.2884.1465.4068.2188.6795.80
Paved parking lots9.9210.1669.4355.5142.5172.0882.19
Unpaved parking lots0.090.298.992.593.356.5416.65
Cars11.459.9527.0934.1132.8536.3964.46
Trains17.6410.1724.2459.6948.0560.2776.21
Stadium seats35.8924.4250.2857.3255.8855.5149.23
OA24.0531.9466.9869.8963.7972.7576.09
AA32.8927.2553.9854.2649.1260.6165.24
KC21.2226.6960.3063.6056.7966.5570.41
Table 7. Classification results of different methods on the Pavia University dataset. (shown in %).
Table 7. Classification results of different methods on the Pavia University dataset. (shown in %).
Class No.SVM1-D CNNR2D2-D CNN3-D CNNMCMLGSF
Asphalt96.4093.3896.9298.7897.6099.2599.67
Meadows89.9486.6896.8899.2095.6699.7899.81
Gravel39.3636.9869.4482.9781.2594.7598.44
Trees53.3350.8290.2898.5498.8397.6396.69
Painted metal sheets98.4198.3399.5299.28100.0099.9299.36
Bare Soil37.0338.2879.4697.0072.4199.3799.59
Bitumen34.8839.7270.7276.5777.4887.2495.71
Self-Blocking Bricks66.5471.4281.8391.3587.6790.3996.18
Shadows99.8898.3299.6599.4999.8899.7296.10
OA64.8665.9690.3896.4391.0297.9798.94
AA68.4268.2187.1993.6990.0996.4597.95
KC57.0157.8187.2995.2588.1397.2998.59
Table 8. Classification results of different methods on the Salinas Valley dataset. (shown in %).
Table 8. Classification results of different methods on the Salinas Valley dataset. (shown in %).
Class No.SVM1-D CNNR2D2-D CNN3-D CNNMCMLGSF
Brocoli_green_weeds_191.9997.71100.00100.00100.0099.96100.00
Brocoli_green_weeds_299.1099.5199.88100.0099.7899.8099.93
Fallow84.6493.6099.2499.5297.4899.1299.95
Fallow_rough_plow97.7098.5198.3799.64100.0098.2498.91
Fallow_smooth97.6998.6398.9899.9499.1599.3899.74
Stubble100.00100.0099.9299.9799.1799.9799.95
Celery95.7696.9499.8699.8799.9299.2699.88
Grapes_untrained71.3378.1585.5490.8990.7595.5797.49
Soil_vinyard_develop96.6696.5898.8599.9599.8599.8199.95
Corn_senesced_green_weeds84.5685.1294.5498.1991.9495.7698.35
Lettuce_romaine_4wk83.4176.7798.1698.2697.6199.4299.76
Lettuce_romaine_5wk97.1596.1797.1999.9899.7899.29100.00
Lettuce_romaine_6wk92.6993.5598.40100.0099.8498.5599.88
Lettuce_romaine_7wk94.4393.4591.9899.8697.4498.9099.79
Vinyard_untrained60.8666.2271.1279.6182.7788.4792.44
Vinyard_vertical_trellis98.2193.4095.7999.7199.4999.47100.00
OA85.4487.8891.7794.9594.8196.8898.23
AA90.3991.5295.4997.8497.1898.1999.13
KC83.7486.4890.8394.3794.2196.5298.03
Table 9. The design of ablation experiments.
Table 9. The design of ablation experiments.
Trans_2DLSFEMGSFEMSFOLDCBNSpatial Enhance
AE_LGSF
AE_2D×
AE_LSF×
AE_GSF×
AE_SFOL×
AE_DCBN×
AE_SPAT×
Table 10. Classification results of ablation experiments on the Houston 2013 dataset. (shown in %).
Table 10. Classification results of ablation experiments on the Houston 2013 dataset. (shown in %).
AE_2DAE_LSFAE_GSFAE_SFOLAE_DCBNAE_SPATAE_LGSF
OA85.0683.4182.8184.6782.8280.9585.30
AA83.3787.1384.7287.0083.2182.8887.17
KC83.8781.9981.3483.4181.4079.3684.09
Table 11. Classification results of ablation experiments on the Houston 2018 dataset. (shown in %).
Table 11. Classification results of ablation experiments on the Houston 2018 dataset. (shown in %).
AE_2DAE_LSFAE_GSFAE_SFOLAE_DCBNAE_SPATAE_LGSF
OA75.8275.5574.5374.8375.1167.8076.09
AA61.6262.0261.8665.0962.5555.7165.24
KC70.2069.8768.6569.0369.0761.0170.41
Table 12. Classification results of ablation experiments on the Pavia University dataset. (shown in %).
Table 12. Classification results of ablation experiments on the Pavia University dataset. (shown in %).
AE_2DAE_LSFAE_GSFAE_SFOLAE_DCBNAE_SPATAE_LGSF
OA98.5898.1698.4498.7798.5989.9198.94
AA97.6196.8497.3897.9397.9887.0697.95
KC98.1197.5497.9298.3698.1286.6498.59
Table 13. Classification results of ablation experiments on the Salinas Valley dataset. (shown in %).
Table 13. Classification results of ablation experiments on the Salinas Valley dataset. (shown in %).
AE_2DAE_LSFAE_GSFAE_SFOLAE_DCBNAE_SPATAE_LGSF
OA98.0597.9797.2497.7997.2491.8398.23
AA98.9798.7698.7698.8698.7695.2699.13
KC97.8397.7496.9297.5396.9290.9098.03
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, Z.; Su, C.; Wang, S.; Zhang, X. Local and Global Spectral Features for Hyperspectral Image Classification. Remote Sens. 2023, 15, 1803. https://doi.org/10.3390/rs15071803

AMA Style

Xu Z, Su C, Wang S, Zhang X. Local and Global Spectral Features for Hyperspectral Image Classification. Remote Sensing. 2023; 15(7):1803. https://doi.org/10.3390/rs15071803

Chicago/Turabian Style

Xu, Zeyu, Cheng Su, Shirou Wang, and Xiaocan Zhang. 2023. "Local and Global Spectral Features for Hyperspectral Image Classification" Remote Sensing 15, no. 7: 1803. https://doi.org/10.3390/rs15071803

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop