Next Article in Journal
Sentinel-2 and AISA Airborne Hyperspectral Images for Mediterranean Shrubland Mapping in Catalonia
Next Article in Special Issue
A Lightweight Multi-Level Information Network for Multispectral and Hyperspectral Image Fusion
Previous Article in Journal
Globally Scalable Approach to Estimate Net Ecosystem Exchange Based on Remote Sensing, Meteorological Data, and Direct Measurements of Eddy Covariance Sites
Previous Article in Special Issue
An Enhanced Spectral Fusion 3D CNN Model for Hyperspectral Image Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Supervised Contrastive Learning-Based Classification for Hyperspectral Image

1
School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China
2
The Institute of Advanced Research in Artificial Intelligence (IARAI), 1030 Vienna, Austria
3
Helmholtz-Zentrum Dresden-Rossendorf, Machine Learning Group, Helmholtz Institute Freiberg for Resource Technology, Chemnitzer Str. 40, 09599 Freiberg, Germany
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(21), 5530; https://doi.org/10.3390/rs14215530
Submission received: 22 September 2022 / Revised: 17 October 2022 / Accepted: 25 October 2022 / Published: 2 November 2022
(This article belongs to the Special Issue Deep Learning for the Analysis of Multi-/Hyperspectral Images)

Abstract

:
Recently, deep learning methods, especially convolutional neural networks (CNNs), have achieved good performance for hyperspectral image (HSI) classification. However, due to limited training samples of HSIs and the high volume of trainable parameters in deep models, training deep CNN-based models is still a challenge. To address this issue, this study investigates contrastive learning (CL) as a pre-training strategy for HSI classification. Specifically, a supervised contrastive learning (SCL) framework, which pre-trains a feature encoder using an arbitrary number of positive and negative samples in a pair-wise optimization perspective, is proposed. Additionally, three techniques for better generalization in the case of limited training samples are explored in the proposed SCL framework. First, a spatial–spectral HSI data augmentation method, which is composed of multiscale and 3D random occlusion, is designed to generate diverse views for each HSI sample. Second, the features of the augmented views are stored in a queue during training, which enriches the positives and negatives in a mini-batch and thus leads to better convergence. Third, a multi-level similarity regularization method (MSR) combined with SCL (SCL–MSR) is proposed to regularize the similarities of the data pairs. After pre-training, a fully connected layer is combined with the pre-trained encoder to form a new network, which is then fine-tuned for final classification. The proposed methods (SCL and SCL–MSR) are evaluated on four widely used hyperspectral datasets: Indian Pines, Pavia University, Houston, and Chikusei. The experiment results show that the proposed SCL-based methods provide competitive classification accuracy compared to the state-of-the-art methods.

1. Introduction

Hyperspectral sensors obtain the spectral and spatial information of observed targets simultaneously. The abundant information obtained from the observed target makes the hyperspectral image (HSI) useful for various applications [1]. Many techniques have been proposed for the proper processing of HSI data [2].
Classification is one of the fundamental tasks in HSI processing. Given a set of training samples consisting of a set of images and corresponding labels, the purpose of the classification is to train a model that will be able to assign a proper label to an unknown image. As a pixel-based classification task [3,4], the HSI classification task aims to categorize the content of each pixel in the scene [5].
HSI classification is the basis of many applications, such as land usage, agriculture recognition, mineralogy, surveillance, healthcare, and environmental sciences [6,7]. Due to its importance, HSI classification has been widely investigated, and a wide diversity of HSI classification methods have been proposed in the past three decades [8,9,10] and have mainly focused on spectral or spatial–spectral information. For spectral information-based classification, the study in [11] designed a vegetation index for HSI classification which was based on a short wavelength infrared spectrum. Spectral unmixing techniques have also been explored for HSI classification [12]. For example, an enhanced bilinear mixing model was proposed in [13] for subpixel classification and gained competitive performance. Besides spectral information, the HSI obtains abundant spatial information with the development of imaging technology. In order to fully explore the spatial features of the HSI, many approaches have been proposed, such as extended MPs (EMPs) [14], the extended multi-attribute profile (EMAP) [15], and extinction profiles (EPs) [16].
Due to their powerful capability to extract discriminant and robust features, deep learning models have been widely investigated in many fields, including classification and regression tasks that involve image [17,18], language [19], and speech [20]. Recently, deep learning-based methods have been used for HSI feature extraction and classification. Many deep models, such as the stacked auto-encoder [21], the deep believe network [22], and the convolutional neural network (CNN), have been used for HSI classification [23,24]. Among the many deep learning models, the CNN-based methods have achieved state-of-the-art performance. In recent years, the research interests of CNN-based HSI classification methods have focused mainly on modifying CNNs [25,26] and combining CNNs with existing machine learning frameworks [27,28].
Recently, many works have improved the architecture of the CNN so that it can adapt to the characteristics of the HSI and perform better for HSI classification. In [29], the typical residual unit architecture was combined with a pyramidal structure to enhance the performance of the CNN. The authors in [30] proposed an end-to-end spectral–spatial 3D residual CNN to explore the spatial–spectral features of the HSI. The attention mechanism is another popular way to improve the performance of CNNs. In [31], a spatial–spectral dense CNN framework with a feedback attention mechanism was presented. In [32], a double-branch dual-attention mechanism was used to refine and optimize the extracted feature maps for the benefit of both channel and spatial attention.
Other machine learning or image processing techniques can be combined with a CNN for a better HSI classification performance, such as transfer learning [33], ensemble learning [34], and other spatial feature extraction methods [35]. An additional RGB dataset was utilized to pre-train 3D lightweight CNNs based on transfer learning [36]. In [37] and [38], morphological profiles followed by a CNN are used to fully extract the spatial features of the HSI. In addition, a transformer was explored together with a CNN to extract spectral–spatial features for HSI classification [39].
The above approaches obtained a superior performance if there was a sufficient number of labeled samples. However, it is costly and time-consuming to obtain high-quality labeled samples for HSIs. The limited number of training samples usually results in the problem of overfitting for deep learning-based classification methods and thus hinders researchers from improving classification accuracy [40]. To alleviate the problem, many deep learning-based methods for HSI classification have been explored [41], most of which can be summarized in the following three strategies: (1) Methods aided by the use of unlabeled samples. This strategy extracts useful information from the abundant unlabeled samples of HSIs to benefit the deep model’s training. Semi-supervised learning is a typical and promising framework for this strategy. For example, self-training [42] and co-training [43] have been explored for semi-supervised classification of HSIs. In addition, the generative adversarial network (GAN) has also been widely used for HSI classification in a semi-supervised manner [44]. (2) Methods with data combination. This strategy enlarges the number of inputs through the combination of training samples. For instance, pairs of pixels were built in [45] to enlarge the number of training inputs. In [46], the Siamese neural network, whose input is a pair of training samples, was explored to extract more discriminative features. (3) Methods with data augmentation. This strategy increases the amount of data by slightly modifying the existing data or creating synthetic data based on some rules. For example, Haut et al. [47] randomly occluded the pixels of different rectangular spatial regions in the HSI to generate training images, reducing the risk of overfitting for deep models.
Based on the above analysis, supervised contrastive learning (SCL), which is an extension of self-supervised contrastive learning [48,49,50] for supervised setting, is explored in this paper for HSI classification with limited training samples. The proposed SCL-based methods measure the similarities between different sample pairs and then use the designed supervised contrastive loss to increase similarities of positive pairs belonging to the same class in the latent space, while decrease similarities are for distinct classes. Through pairwise comparison, the learnt features can be more discriminative. Obviously, both data augmentation and data combination are adopted in SCL, which provides a strong ability to deal with the overfitting problem in HSI classification with limited training samples.
Specifically, on one hand, the proposed SCL designs a data augmentation method composed of multiscale and 3D random occlusion to increase the amount of training samples. Traditional augmentation methods are usually unilateral, only focusing on the scale, the spectral, or the spatial information of the HSIs. The designed data augmentation method in this paper not only learns more complex structural information via different input scales, it also encourages the CNN to utilize spatial–spectral information from the entire HSI, rather than relying on a small subset of spatial or spectral features. Therefore, more diverse training inputs can be generated when compared with other augmentation methods.
On the other hand, the SCL pairs training samples to enlarge the number of training inputs and pre-trains a feature encoder by optimizing the intra- and inter-class similarities using supervised contrastive loss. The low diversity of input pairs in a training mini-batch usually limits the classification performance in traditional data combination-based methods. Therefore, a queue is deigned in the proposed SCL to store the features of the augmented samples, which further enriches the positives and negatives in a mini-batch and thus improves the classification accuracy. In addition, a momentum-based moving average strategy is utilized in order to prevent the training procedure from fitting train samples quickly, which may easily lead to overfitting problem. Furthermore, a regularization method is also designed for SCL to prevent the positive/negative pairs from being too close with/far away from each other on the training set, which may cause the overfitting problem.
The contributions of this paper are listed as follows:
(1)
A supervised contrastive learning method for HSI classification is designed. In SCL, the labeled data are paired to pre-train a CNN-based feature encoder by the proposed supervised contrastive loss. To increase the diversity of the data pairs in a mini-batch and thus benefit the training procedure, SCL maintains a label queue and a feature queue, which are updated by a momentum-based moving average encoder.
(2)
A data augmentation method composed of multiscale and 3D random occlusion is proposed for HSI supervised contrastive learning. Multiscale augmentation randomly generates input samples with different window sizes, resulting in more complex spatial and structural information. Three-dimensional random occlusion disturbs the spatial–spectral content of the input, leading to better generalization by the model. The combination of these two data augmentation methods helps to improve the classification accuracy.
(3)
A regularization method, MSR, is combined with SCL (SCL–MSR) to further improve the generalization performance and classification accuracy of the supervised contrastive learning network.
The rest of the paper is organized as follows. Section 2 presents the proposed supervised contrastive learning framework for HSI classification. Section 3 describes the data classification results and the analysis of the comprehensive experiments. In Section 4, the paper’s conclusion is briefly summarized.

2. Methodology

Motivated by recent works on contrastive learning algorithms in computer vision and hyperspectral image classification, this paper designs a CL-based supervised pre-training framework for the HSI that learns representations by maximizing agreement between augmented views of samples belonging to the same class (positive pairs) in the latent space, while maximizing the difference between two distinct classes (negative pairs). This can be achieved via the proposed supervised contrastive loss. As illustrated in Figure 1, the pre-training framework comprises the following five major components.
(1)
A stochastic data augmentation module that generates two correlated views of any given HSI training sample ( x ,   y ) :
x q , x k = A u g ( x ) ,
where A u g ( · ) represents the augmentation mapping function.
(2)
A CNN-based encoder f q that maps augmented training input x q to a representation feature vector q :
q = f q ( x q ) .
(3)
A momentum-based moving average encoder f k that shares the same architecture with f q . Its weights are progressively updated from f q , and the representation feature is obtained by:
k = f k ( x k ) .
(4)
A feature queue K = { k i } i = 1 H and its corresponding label queue Y = { y i } i = 1 H , where H denotes the length. The feature queue is used to restore the HSI features generated by f k over the past few epochs, aiming to increase the positive and negative pairs in the training process.
(5)
A contrastive learning-based loss function defined for the supervised pre-training task.
The proposed methods will be described in detail in the following subsections.

2.1. Data Augmentation

Data augmentation is mainly used to create more training samples and improve the generalization performance of the deep model for HSI classification. From the perspective of scale and the spatial–spectral content for the HSI samples, the two data augmentation methods, multiscale and 3D random occlusion, are introduced, respectively.
(1)
Multiscale augmentation (MA): When using CNN to deal with HSI classification, it is often necessary to construct input samples by using neighborhood windows. Hoping that the CNN can learn more complex spatial structural information via different window sizes, multiscale augmentation is used here. Specifically, for a given HSI pixel, the MA randomly selects a spatial size (e.g., 23 × 23) from several candidate sizes (e.g., 27 × 27, 25 × 25, 23 × 23) and forms the corresponding cube. Then, the HSI cube is resized to the desired spatial size (e.g., 27 × 27) for the CNN’s input, using bilinear interpolation. Figure 2 illustrates the procedure of MA.
(2)
Three-dimensional random occlusion augmentation (ROA): In remote sensing, data occlusion usually occurs when some areas of Earth’s surface are not visible from the remote sensor due to an external factor (e.g., clouds). Motivated by the previous work in [47], the 3D ROA is designed for data augmentation that also takes the spectral bands of the HSI into account.
The 3D ROA randomly selects a cuboid region x e x × y × z from a given input HSI cube x W × W × B and covers its pixels with a random value. To obtain x e , its volume (denoted by V e , which has a value which is between a minimum and a maximum threshold) is first calculated:   V e = r a n d ( v m i n · V , v m a x · V ) , where V represents the volume of x . The next step is to obtain x ,   y , and z as follows:
x = V e l e r e 2 3 , y = V e l e r e 3 , z = V e r e l e 2 3 ,  
where r e and l e are also randomly selected values between the minimum and maximum threshold values, l e = r a n d ( l m i n ,   l m a x ) ,   r e = r a n d ( r m i n ,   r m a x ) , which control the shape of the cuboid x e . Finally, the location of x e is randomly selected, and the cuboid is then filled with a predetermined value, e.g., 0.5, for simplicity. Figure 3 illustrates some examples of 3D ROA.
Algorithm 1 demonstrates the procedure of the data augmentation methods in this paper.
Algorithm 1. Pseudocode of data augmentation for HSI.
Input: HSI cube x W × W × B , minimum scale S , occlusion probability p , occlusion shape parameters v m i n , v m a x , l m i n , l m a x , r m i n , l m a x
Output: augmented image x a
s = R a n d S e l e c t ( W , W 2 , , S )
x c = C e n t e r C r o p ( x , [ s , s , B ] )
x a = R e s i z e ( x c , [ W ,   W ,   B ] )
p 1 = R a n d ( 0 , 1 )
if  p 1 > p  then
return  x a
else
V = W × W × B
V e = R a n d ( v m i n · V , v m a x · V )
l e = R a n d ( l m i n ,   l m a x )
r e = R a n d ( r m i n ,   r m a x )
 Get x , y , and z using Equation (4)
x 1 = R a n d i n t ( 0 , W x )
y 1 = R a n d i n t ( 0 , W y )
z 1 = R a n d i n t ( 0 ,   B z )
x a [ x 1 : x 1 + x , y 1 : y 1 + y , z 1 : z 1 + z ] = 0.5
return  x a
R a n d S e l e c t : select one element randomly; C e n t e r C r o p : crop the given image at the center; R e s i z e : resize the given image to the desired size; Rand: select random float numbers from given range, following the “uniform distribution”; R a n d i n t : select random integers from given range, following the “discrete uniform distribution”.

2.2. Supervised Contrastive Learning for HSI Classification

This section introduces the proposed algorithm in detail. As illustrated by Figure 2, the training process for HSI classification entails two stages: pre-training and fine-tuning.
In the first stage, the CNN-based encoder is pre-trained using labeled training samples in a contrastive manner. The pre-training process comprises the following steps.
First, the data augmentation is performed on input HSI cube x to obtain two correlated views, x q and x k . Second, the feature vectors q and k are computed from the two augmented views via the CNN-based encoder f q and the momentum encoder f k , respectively. The positive pairs are then obtained from q , k , and the features in the queue that share the same label with q . The negative pairs are obtained from q and the features in the queue whose labels are different from q . Meanwhile, k and its corresponding label y are enqueued to the queue. Next, the positive and negative pairs are fed to the supervised contrastive loss for updating the encoder’s weights. Finally, the weights of momentum encoder f k are progressively updated from f q .
In the second stage, a fully connected layer is added following the pre-trained encoder to form a new network for HSI classification. This new network will then be fine-tuned by using cross-entropy loss to accomplish the final classification task.
Three important components of SCL, queue, momentum update, and supervised contrastive loss, are described in detail as follows.
Queue details: Due to the limited number of training samples, a queue scheme composed of a feature queue and a label queue is adopted for HSI classification. As mentioned above, the feature queue is used to restore the features generated in the past few epochs and can decouple the number of positive and negative pairs from the mini-batch size. Therefore, the queue size K can be treated as a hyperparameter and set to be much larger than the mini-batch size to form more feature pairs in the current training epoch. In addition, a label queue is maintained and updated along with the feature queue. This label queue contains the corresponding labels for the features in the queue.
The queue adopts the strategy of “first-in, first-out”. Specifically, the current mini-batch is enqueued to the queue, and the oldest mini-batch, which is the least consistent with the newest HSI samples, is removed from the queue. In this way, the queue always represents a sampled subset of all training samples.
A positive/negative pair is the pair of the two samples’ features which belong to the same class/two different classes. For a given HSI x and its corresponding representation feature q , we compare its label with the label queue and obtain positive pairs labeled 1 if q has the same class label as the features in the queue; otherwise, the negative pairs labeled 0 are obtained. This procedure can be formulated as:
l ( q ,   k i ) = { 1 ,       y = y i 0 ,       y y i ,
where l ( · ) denotes the label function for the pairs.
As is shown in Figure 4, the label of current input image is mental sheets, and there are features of the samples corresponding to the different classes in the feature queue. These features and labels in the previous epochs are stored. Then, we compare the label mental sheets with the label queue one by one. If they are different (i.e., Meadows lies first in the queue), a negative pair of corresponding features is generated. Using the queue module, lots of positive and negative pairs can be generated for training in the current epoch.
Momentum update details: The feature queue contains both the current mini-batch features and the older features, and thus, the gradient cannot propagate to all the features in the queue. Therefore, updating the weights in the encoder related to the queue needs to be considered. An intuitive idea is for the queue encoder f k to share the same weights with the other encoder f q , ignoring the gradient. By doing this, the oldest mini-batch of features may be very different from the newest ones. Even the features belonging to the same image may be dissimilar enough due to the rapidly changing encoder, which may lead to poor generalization. Therefore, a momentum-based moving average strategy is utilized to address this issue.
Let θ k denote the weights of f k , and θ q denote the weights of f q .   θ k is updated by:
θ k = m θ k + ( 1 m ) θ q ,
where m [ 0 , 1 ] is a momentum coefficient. This coefficient is used to control the update speed of the queue encoder. During training, only the parameters θ q are updated by back-propagation. From Equation (6), it can be seen that θ k is the moving average of θ q . The larger the value of m , the more slowly the weights update. As a result, though the features in the queue are generated by f q in different training epochs, the differences among these features will be less.
Supervised contrastive loss details: Let s i m ( a , b ) = a T b / ( a b ) represent the inner product between l 2 normalized a and b . Then, the supervised contrastive loss function is defined as
L S C L = log i = 1 H l i e s i / T i = 1 H l i e s i / T + i = 1 H ( 1 l i ) e s i / T ,
where l i = l ( q ,   k i ) and s i = s i m ( q ,   k i ) .   T is a temperature parameter, playing a role in controlling the strength of the penalties on the hard negative samples. The term in the numerator represents the sum of the positive pairs’ similarity scores, while the second term in the denominator represents the sum of the negative pairs’ similarity scores. Equation (7) can be simplified as
L S C L = log ( 1 + i = 1 H ( 1 l i ) e s i T · i = 1 H l i e s i T ) .  
From Equation (8), it can be seen that the supervised contrastive loss seeks to reduce the negative scores and increase the positive scores.
Considering the feature k generated by the queue encoder in the current epoch, Equation (8) can be modified as
L S C L = log ( 1 + i = 0 H ( 1 l i ) e s i T · i = 0 H l i e s i T ) ,  
where we define k 0 = k and l 0 = l ( q ,   k 0 ) = 1 . In this way, positive pairs are obtained in two ways: (1) as features of the augmented images from the same image ( q , k ) and (2) as features in the queue { k i } i = 1 H with the same label as the current feature q .

2.3. Multi-Level Similarity Regularization for SCL

Although data augmentation and data combination are performed for HSI, there is still the risk of overfitting in SCL; that is, positive pairs can be pulled too closely together, while the negative pairs can be pushed too far apart in the feature space. In order to further alleviate the overfitting problem caused by limited training samples for HSI classification, multi-level similarity regularization (MSR) is introduced here.
The MSR predefines a set of levels and forces the similarities of the sample pairs to move towards the levels. As illustrated by Figure 5, in the SCL’s training procedure, the supervised pre-training contrastive loss pulls the features belonging to the same class towards each other and pushes different classes’ features away in the embedding space. Apart from the push/pull effect caused by SCL, the MSR also forces the similarities to align with a set of predefined levels (denoted by dashed lines in Figure 5), preventing the positive/negative pairs from being too close to/far away from each other.
Let L = { L n } n = 1 A denote a set of pre-defined similarity levels. The function r ( s ,   L i , L ) is an assignment function that indicates whether the given similarity s is the closest to the given level L i ; it is defined as
r ( s ,   L i , L ) = { 1 ,       i f arg min L n L | s L n |   i s   L i ,             0 ,             o t h e r w i s e .                              
MSR minimizes the difference between a given pairwise similarity and the corresponding closest level, which can be achieved by minimizing the following loss:
L M S R = 1 H + 1 i = 0 H m = 1 N r ( s i ,   L m , L ) · | s i L m | .  
Note that the levels are initialized with given values, while they can be updated to optimally regularize the pairwise similarity during the training process.
When using MSR, the total loss function L S C L M S R is defined as the sum of L C L S P and L M S R :
L S C L M S R = L S C L + L M S R .  
Algorithm 2 demonstrates the pre-training procedure for SCL or SCL–MSR.
Algorithm 2. Pseudocode of SCL-based methods for HSI.
Input: input training samples loader, temperature T , momentum m .
Initialization: encoder networks f q and f k with θ q = θ k
       feature queue K of K elements
       label queue Y of K elements
Output: pre-trained encoder network f q
for  x , y in loader:
   x q = A u g ( x ) using Algorithm 1
   x k = A u g ( x ) using Algorithm 1
        q = f q ( x q )
      k = f k ( x k )
      D e t a c h ( k )
  get positive and negative pairs using Equation (5)
  compute the SCL-based loss using Equation (9) or Equation (12)
  back-propagation and update the encoder network f q
  update the momentum encoder network f k using Equation (6)
  update the label queue K and the feature queue Y
return  f q
Detach: block the gradient of the given tensor.

3. Results

3.1. Datasets Description

In the experiments, four widely used hyperspectral datasets, Indian Pines, Pavia University, Houston, and Chikusei, are employed to evaluate the performances of the proposed methods.
(1)
Indian Pines: This dataset mainly describes the scene of multiple agricultural fields in Northwestern Indiana, USA, acquired by the Airborne Visible/Infrared Imaging Spectrometer sensor in June 1992. The dataset contains 145 × 145 pixels with a spatial resolution of 20 m × 20 m. There are 220 spectral bands with wavelengths ranging from 400 nm to 2500 nm recorded in this dataset. In the experiments, 20 low signal-to-noise ratio (SNR) bands were removed due to water absorption, and the remaining 200 bands were used for evaluation. A total of 10,249 labeled samples, belonging to 16 different land cover types, are labeled in this dataset. Figure 6 illustrates the false color composite images and the corresponding ground truth map of the Indian Pines dataset. The numbers of training and test samples per class are listed in Table 1.
(2)
Pavia University: This dataset mainly covered an urban area with some manmade buildings and plants, acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor over the University of Pavia, Italy. The dataset contains 610 × 340 pixels with a resolution of around 1.3 m ×1.3 m. After removing the noisy and water absorption bands, 103 bands were reserved with wavelengths ranging from 430 nm to 860 nm. The dataset contains 42,776 labeled pixels from nine different land cover types. Figure 7 shows the false color composite images and corresponding ground truth map of the Pavia University dataset. The numbers of the training and test samples per class are listed in Table 2.
(3)
Houston: This dataset was acquired over an urban area surrounding the University of Houston campus by the National Center for Airborne Laser Mapping. It had been released in the 2013 IEEE GRSS Data Fusion Contest [51]. The dataset contains 349 × 1905 pixels with a spatial resolution of 2.5 m × 2.5 m. It consists of 144 bands with wavelengths ranging from 380 nm to 1050 nm. A total of 15,029 labeled pixels corresponding to 15 different land cover types are collected in the dataset. Figure 8 illustrates the false color composite images and corresponding ground truth maps. Table 3 lists the numbers of each class and their corresponding training and test samples.
(4)
Chikusei: The Chikusei dataset is an aerial hyperspectral dataset which was captured by the Headwall Hyperspec-VNIR-C sensor in Chikusei, Japan on July 29, 2014 [52]. This dataset contains 128 spectral bands with wavelengths ranging from 343 to 1018nm. The spatial size is 2517 × 2335, and the spatial resolution is 2.5 m. There are a total of 19 land types, including urban and rural areas. Figure 9 illustrates the false color composite images and the corresponding ground truth map of the Chikusei dataset. Table 4 lists the numbers of each class and their corresponding training and test samples.

3.2. Experimental Setup

For the four datasets, the samples were divided into two subsets, which contained the training and testing samples, respectively.
The proposed SCL-based methods are evaluated on the four datasets and compared with several existing methods, including SVM with extended morphological profiles (EMP–SVM), CNN, Siamese CNN pre-trained by supervised contrastive loss (SiamSCL), SSRN [30], DBMA [53], DBDA [32], and FDSSC [54].
Specifically, for EMP–SVM, a grid search strategy is utilized together with five-fold cross-validation to find the proper C and γ   ( C = 10 4 , 10 3 , , 10 3 , γ = 10 4 , 10 3 , , 10 3 ) . The first four principal components (PCs) are used when calculating EMP. For each PC, three openings and closings by reconstruction are conducted with a circular structuring element whose initial size is four and whose step size increment is two.
For SSRN, DBMA, DBDA, and FDSSC, the experimental settings are the same as those in [32].
For CNN, SiamSCL, SCL, and SCL–MSR, we use the same backbone for feature extraction, the detailed architecture of which is shown in Table 5. It contains five blocks. Each of the first four blocks consists of a convolutional layer, a BN layer, and an ReLU operation. The 1 × 1 convolution in the first block is mainly used for dimension reduction and less overfitting. Each of the second, third, and fourth blocks includes a 2 × 2 max-pooling layer. The fifth block contains only a linear layer, outputting a 256-dimension feature for each HSI cube. For the final classification task, a fully connected layer is added to the backbone to form the overall classification model. In order to use spatial information, the input images with a spatial size of 27 × 27 ( W = 27) are fed to the 2D CNN.
We used the Adam optimizer to train the SCL-based methods for a total of 300 epochs, including SiamSCL, SCL, and SCL–MSR. The cosine learning rate scheduler with an initial learning rate of 0.001 was adopted in the experiments.
When training the CNN or fine-tuning the SCL-based methods, the multi-step learning rate scheduler is utilized. The initial learning rate is set to be 0.001, and it is divided by 10 after 80 and 160 epochs. In the training procedure, the total number of epochs is set to 180 for all four datasets. In addition, the mini-batch method is adopted, where the batch size is set to 512.
For the data augmentation used in SCL and SCL–MSR, the minimum scale S and the occlusion probability p are treated as hyperparameters to be analyzed. The occlusion shape parameters are v m i n = 0 , and v m a x = 0.00625, which derives from 1/4 of the height, 1/4 of the width, and 1/10 of the number of bands. Furthermore, l m i n = 0.2, l m a x = 1 / l m i n , r m i n = 0.2, and r m a x = 1 / r m i n . The occlusion value is set to be 0.5, as suggested by [47].
In the experiments, we set the value of queue length H to be h times the number of total training samples N , namely H = h N . The temperature T, queue length ratio h, and momentum coefficient m are analyzed in the experiments.
In this portion of the experiments, the classification performance is mainly evaluated using overall accuracy (OA), average accuracy (AA), and the kappa coefficient (K). The experiments are repeated 10 times, and the training samples are randomly chosen from all the labeled samples each time.

3.3. Classification Results and Analysis

The classification results of the different methods over four datasets are reported in Table 6, Table 7, Table 8 and Table 9. All the reported results are the average values of 10 runs with respect to different random initializations.
(1) Classification Results: Table 6 demonstrates the classification accuracies for the Indian Pines dataset. It can be observed that the proposed SCL and SCL–MSR are superior to the other methods with 20 training samples per class. In particular, SCL outperforms DBDA by 0.69 percentage points, 11.39 percentage points, and 0.00074 in terms of OA, AA, and K. Compared to the original CNN, SCN improves the classification accuracy by 1.33 percentage points, 0.54 percentage points, and 0.0149 in terms of OA, AA, and K. In addition, SiamSCL works better than the original CNN due to the use of contrastive loss for supervised pre-training. SCL–MSR also achieves better results than SCL, which demonstrates the effectiveness of the multi-level similarity regularization. Table 7 demonstrates the classification accuracies for the Pavia University dataset. It is apparent that the proposed SCL-based methods still achieve better classification results for the Pavia University dataset when compared with the other methods. Specifically, SCL outperforms FDSSC by 1.06 percentage points, 2.53 percentage points, and 0.00138 in terms of OA, AA, and K, respectively. Both SCL and SCL–MSR work better with respect to the classification accuracy for each class. From Table 8, it can be seen that SCL and SCL–MSR obtain higher classification results for the Houston dataset, similar to those of the Pavia University and Indian Pines datasets. Table 9 demonstrates the classification accuracies for the Chikusei dataset. It can be seen that the proposed SCL and SCL–MSR achieves higher classification accuracies than the comparison methods.
(2) Analysis of the reason why the algorithms perform perfectly for some classes while not for the others: As far as we are concerned, there are three major factors that could affect the classification accuracies.
(i) The inherent cluster properties of different classes. In the feature space, samples from the same class should be close to each other, while samples from different classes should be far away from each other. In fact, the samples from some specific classes are in good agreement with the above requirements; that is, they have better clustering properties, and they are prone to be classified well compared to other classes. For example, Figure 10 shows the average spectrum reflectance curves of different classes on the Pavia University dataset. The average spectrum reflectance curves of the classes mental sheets and meadows are significantly different from other classes, which provides a potential to achieve high classification accuracies.
(ii) The distribution difference between the training set and the test set. If the training set and the test set share great similarity, a well-trained classifier on the training set will often perform well on the test set. Otherwise, it will perform poorly due to overfitting. In some cases, the samples of some classes are concentrated in a small area, and therefore, there is little difference among the samples. The randomly divided training set and test set share great similarity, which leads to a good test performance. For example, the Oats, Grass-pasture-mowed, and Alfalfa in the Indian Pines dataset have few labeled samples and cover small regions, and the CNN and SCL can achieve high accuracies. However, the DBDN, DBMA, and FDSSC obtain lower accuracies for the oats in the Indian Pines dataset compared with the CNN. The reason is that the DBDN, DBMA, and FDSSC all adopt an attention mechanism, which makes them focus on the pixels from other classes in some runs. This is related to the following third factor.
(iii) The properties of the given classifiers. The classification ability is also vital, and different classifiers have different classification performances. Generally speaking, the CNN-based methods usually achieve higher classification accuracies than the traditional methods such as EMP–SVM, due to its powerful feature extraction ability. They can learn implicit and complex patterns and have the potential to achieve higher accuracies.
(3) Analysis of the reason why the proposed SCL and SCL–MSR obtain lower accuracies in some classes compared to other methods: Generally speaking, the DBDA, DBMA, and FDSSC all adopt an attention mechanism, which makes the model pay more attention to the information contributed to the classification according to the training set. The CNN, SiamSCL, SCL, and SCL–MSR share the same backbone, which is a vanilla convolutional neural network.
To use spatial information, we use a neighborhood region surrounding the given pixel as the input for deep models. The attention mechanism makes DBDA and FDSSC focus on the most useful areas, especially when the input images contain complex land cover, such as boundaries. For example, some samples of the Grass-synthetic class in the Houston dataset are very close to the Running-track class, and some images may be similar to each other (e.g., samples in the red square in Figure 11.). In this case, the attention mechanism provides better discriminability than SCL and thus leads to better classification accuracies. This phenomenon can be found in other datasets.
(4) Analysis of the reason why SCL–MSR does not obtain a better performance than SCL for some classes: SCL–MSR acts to alleviate the overfitting problem. It disturbs the training process to prevent the positive/negative pairs from being too close to/far away from each other. However, deep neural networks (e.g., CNN) can be class-biased: some classes (“easy” classes) are easy to learn and converge faster than other classes (“hard” classes) [55]. After adding MSR, for those classes that are easy to learn and even to be overfitting, MSR plays a positive role which prevents the model from being overfitting on the classes, and it thus achieves high accuracies, while for some classes on which the model is not trained well the MSR has a negative effect which makes the model underfit these classes and makes it finally obtain worse accuracies.

3.4. Ablation Experiments

(1) Experiments of temperature T and queue length ratio h : As mentioned before, T is a temperature parameter that controls the concentration level of the distribution. From Equation (9), one can see that a smaller value of T will make the distribution of the features more concentrated, which means that the convergence will be faster but will carry more risk of overfitting. Parameter h controls the length of the queue in SCL. A larger value of h means a longer queue and a greater variety of features in the queue. A proper value of h will help SCL generalize better.
Figure 12, Figure 13, Figure 14 and Figure 15 show the distribution of the similarity scores { q · k i } i = 1 h N in a batch on the four datasets. It can be seen that the distributions shown in (a) and (c) are more concentrated than those in (b) and (d), due to the smaller T . Furthermore, the value of h has little influence on similarity. It can be seen that that the similarity distributions corresponding to the different values of h are similar when the value of T is fixed.
Figure 16 illustrates the overall accuracies obtained by using different values of temperature T and queue length ratio h. In this experiment, the grid search strategy is adopted to validate the proper T and h   ( T = 2 3 ,   2 2 , 2 1 ,   2 0 ;   k = 10 ,   15 ,   20 ,   25 ) . It can be seen that different datasets require different values of T and h for proper training. For example, the Indian Pines dataset usually prefers a larger value of T . However, a smaller value of T is more appropriate for the Houston dataset. It indicates that the choice of the T value is related to the complexity of the HSI dataset used.
As illustrated by Figure 16, T and h are set to the optimal value for different datasets. Specifically, the values of T and h are 1.0 and 15 for the Indian Pines dataset, respectively, while T is 0.5 and h are 25 for the Pavia University dataset, and T is set to 0.125 and h is set to 10 for the Houston dataset. For the Chikusei dataset, the best values of T and h are 0.125 and 25.
(2) Experiments of Momentum Coefficient m : Figure 17, Figure 18, Figure 19 and Figure 20 show four types of similarity statistics employed when using various values of m , including mean of negative similarity, variance of negative similarity, mean of positive similarity, and variance of positive similarity. It is worth noting that in the early training process, the values of similarity statistics are not as expected due to the queue initialization.
From Figure 17, Figure 18, Figure 19 and Figure 20, one can see that as the training process progresses, the mean of negative similarity, the variance of negative similarity, and the variance of positive similarity will decrease, but the mean of positive similarity will increase. Generally speaking, a smaller value of m means the faster update of the momentum encoder. For m = 1.0, the means of both negative and positive similarity are nearly constants, and the variances are large. In this case, the momentum encoder does not update its parameters, which makes the SCL model collapse. For m = 0.999, the SCL model is also hard to converge. As the value of m decreases, the mean of negative/positive similarity will increase/decrease more quickly. Specifically, for the mean of the negative similarity scores, the end value of m = 0.6 (pink) and m = 1.0 (grey) is much lower than m = 0.99 (green). However, a smaller value of m makes the model more prone to the risk of overfitting, which does harm to the generalization.
Figure 21 shows the overall accuracy of SCL with different values of momentum coefficient m . The results indicate that a larger value of m or a smaller one is not suitable for SCL. We find that m = 0.99 is proper for most cases via experiments. For simplicity and universality, we choose 0.99 as the value of m and control the training process by changing the values of temperature T and queue length ratio h.
(3) Experiments of Augmentation Techniques: Figure 22 shows the classification results of SCL using different data augmentation techniques over the four datasets. From Figure 22, it can be seen that the use of both multiscale and random occlusion augmentations makes the SCL perform better than when only one or no augmentation technique is used, which demonstrates the effectiveness of the introduced data augmentation methods.
Figure 23 shows the SCL classification overall accuracies on the four datasets over different values of p and S . From (a), one can see that a smaller value of S is more likely to gain a better OA, and the best values of p and S for the Indian Pines dataset are 0.6 and 19, respectively. For the Pavia University dataset, the best classification performance is obtained when p = 0.2 and S = 19. A smaller value of S (e.g., 19) seems to yield higher classification accuracy when the value of p is small (e.g., 0.2, 0.4, and 0.6), whereas a smaller S is more suitable if the value of p is set to be higher, e.g., 0.8. It can be seen that the best values of p and S for the Houston dataset are 0.6 and 19, respectively, and the best values of p and S for the Chikusei dataset are 0.8 and 23.
(4) Experiments of MSR Loss: As mentioned above, the classification accuracy gains obtained by using multi-level similarity regularization can be seen in Table 6, Table 7, Table 8 and Table 9.
By observing the distribution of the similarity scores for SCL, we find that most of the similarity scores are concentrated between [−0.3, 0.3]; so, we set the initial regularization levels as {−0.2, 0, 0.2}. Figure 24 illustrates the difference between SCL and SCL–MSR on the four datasets, with respect to the distribution of the similarity scores after training. The distribution of the similarity scores is reshaped by the regulation levels. It is worth noting that the levels can be learned by SCL–MSR to find proper values, just as (b) shows.
(5) Experiments of spectral unmixing and resolution: As this study is based on pixel-based classification, the effect of spectral unmixing and resolution is analyzed here. As is shown in [11,12,13], the pixel-based classification relies on the representation ability of the pixels, and it is necessary to take into account the spectrum mixing and resolution, which has also been demonstrated by the following designed experiments, including classification after spectral unmixing and classification when the spatial resolution is poor.
(i) Classification after spectral unmixing. We treat the Indian Pines, Pavia University, Houston, and Chikusei datasets as spectrally mixed data and use the spectral unmixing technique to process these datasets, following [56]. Then, the processed datasets are classified using different methods for evaluation.
The spectral unmixing decomposes each HSI spectrum as a mixture of endmembers with their proportions. The unmixing method in [56] considers a generalized spectral unmixing model that is a combination of a linear mixing model and a nonlinear model, given by:
x i = α M a i + ( 1 α ) Φ ( M ,   a i ) + n ,  
where x i is the i-th pixel sample containing B bands, and M denotes the endmember matrix. The abundance vector associated with x i is denoted as a i . Φ is a nonlinear function that characterizes the interactions between the endmembers, and λ is a hyper-parameter balancing the weights of the linear part and the nonlinear part. An encoder–decoder architecture is designed based on Equation (13) to simulate the mixing procedure, and it is trained to estimate the abundance representations from the HSI.
Table 10 shows the classification results after spectral unmixing. From Table 10, it can be seen that the unmixing is helpful for classification. The overall accuracies of the proposed method and the other state-of-the-art methods are all higher than those using the original datasets. For example, SCL gains 0.49 percentage points, 0.57 percentage points, and 0.0058 in terms of OA, AA, and K on the Indian Pines dataset after using spectral unmixing. In addition, the proposed methods still achieve better classification performances when compared with the other methods.
(ii) Classification when spatial resolution is poor. To obtain datasets whose spatial resolutions are poor, we downsample the original image (i.e., remove all odd rows and columns) and then resize them to the original sizes using bilinear interpolation. The obtained resolution of the hyperspectral images will be half of the original. Figure 25 shows the false color maps of the different resolutions for the Indian Pines and the Pavia University datasets.
Table 11 shows the classification results on the four HSI datasets when the spatial resolution is poor. From Table 3, it can be seen that in the Pavia, Houston, and Chikusei datasets, the accuracies decrease when we change the resolution. For example, the OA of SCL when the spatial resolution is poor is lower than original resolution by 1.82 percentage points, and the Houston dataset suffers the most among these datasets. However, better classification performance is obtained on the Indian Pines dataset. We infer that it is because the Indian Pines contains large areas of homogeneity, and the spatial structure is not complex. The process of downsampling plays a role in image smoothing. In addition, the Indian Pines dataset does not lose much spatial information but removes some spatial noise. However, the other three datasets contain rich spatial information, and the spatial resolution is vital for classification.
It can be seen that the proposed methods still achieve better classification performance when compared with the other methods.
To sum up, from the unmixing experiment it can be seen that the unmixing technique is helpful for pixel-based hyperspectral image classification, and from the resolution experiment, it can be seen that the resolution has an important influence on the HSI classification performance. This indicates that it is better to develop unmixing and super-resolution techniques for pixel-based HSI classification to obtain better performance.

3.5. Algorithm Complexity

Table 12 shows the algorithm complexity for different classification methods. FLOPs is the abbreviation of floating point operations, which means the number of floating point operations needed for a given model. In addition, it is understood as the amount of calculation. Param. Means the number of parameters to be trained in a given model. The FLOPs and Param. Can be used to measure the complexity of a model.
The proposed SCL and SCL–MSR are pre-training frameworks which use a vanilla CNN as a backbone. Their numbers of FLOPs and parameters are related to the CNN. Specifically, the SiamSCL, SCL, and SCL–MSR all adopt Siamese architecture and need fine-tuning so that their FLOPs are three times that of the CNN. However, SCL and SCL–MSR have twice as many parameters as CNN due to the momentum update module. It can be seen that the proposed SCL-based methods have fewer FLOPs when compared with SSRN, DBMA, DBDA, and FDSSC, and the SCL and SCL–MSR have fewer parameters than FDSSC. Taking the algorithm complexity and accuracy into consideration, the proposed methods are thought to be competitive.

3.6. Classification Maps of Different Classification Methods

Figure 26, Figure 27, Figure 28 and Figure 29 show the classification maps of the different methods for the four datasets. The classification performance obtained by the proposed methods is better than with other methods, which can be clearly seen from the classification maps.

4. Conclusions

In this study, a contrastive learning-based supervised pre-training framework is proposed for hyperspectral image classification with limited training samples; it includes data augmentation methods for HSI, a queue, and a momentum update scheme for supervised pre-training. Additionally, the multilevel regularization method is combined with SCL for better performance. Verification experiments were conducted on the four widely used datasets (the Indian Pines, Pavia University, Houston datasets, and Chikusei), and the following conclusions can be drawn from the results:
(1)
According to the comparative analysis of the classification results, the proposed methods outperform some existing state-of-the-art HSI classification methods in terms of OA, AA, and K.
(2)
The combination of the two data augmentation methods, MA and ROA, can improve the classification performance of SCL for HSI classification. The experimental results show the effect of each method.
(3)
The experimental results demonstrate that the queue and the momentum update scheme for SCL are effective for improving the classification accuracy.
(4)
The use of MSR regularizes the training procedure of SCL and improves the generalization performance for HSI classification.
This research suggests areas of further exploration in the field of HSI classification. Future work will extend the supervised contrastive learning-based HSI classification to unsupervised and semi-supervised settings.

Author Contributions

Conceptualization: Y.C.; methodology: L.H. and Y.C.; writing—original draft preparation: L.H., Y.C., X.H., and P.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of China under the Grant 61971164 and the Grant U20B2041.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Houston dataset is available at: https://hyperspectral.ee.uh.edu/ (accessed on 1 September 2020). The Indian Pines and Pavia University datasets are available at: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes (accessed on 1 September 2020). The Chikusei dataset is available at: https://naotoyokoya.com/Download.html (accessed on 1 September 2020).

Acknowledgments

The authors would like to thank the Hyperspectral Image Analysis group and the NSF Funded Center for Airborne Laser Mapping (NCALM) at the University of Houston for providing the datasets used in this study and the IEEE GRSS Data Fusion Technical Committee for organizing the 2013 Data Fusion Contest. The authors gratefully acknowledge the Space Application Laboratory, Department of Advanced Interdisciplinary Studies, the University of Tokyo, for providing the Chikusei data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Plaza, A.; Benediktsson, J.A.; Boardman, J.W.; Brazile, J.; Bruzzone, L.; Camps-Valls, G.; Chanussot, J.; Fauvel, M.; Gamba, P.; Gualtieri, A. Recent advances in techniques for hyperspectral image processing. Remote Sens. Environ. 2009, 113, S110–S122. [Google Scholar] [CrossRef]
  2. Ghamisi, P.; Yokoya, N.; Li, J.; Liao, W.; Liu, S.; Plaza, J.; Rasti, B.; Plaza, A. Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art. IEEE Geosci. Remote Sens. Mag. 2017, 5, 37–78. [Google Scholar] [CrossRef] [Green Version]
  3. Li, H.; Li, Z.; Dong, W.; Cao, X.; Wen, Z.; Xiao, R.; Wei, Y.; Zeng, H.; Ma, X. An automatic approach for detecting seedlings per hill of machine-transplanted hybrid rice utilizing machine vision. Comput. Electron. Agric. 2021, 185, 106178. [Google Scholar] [CrossRef]
  4. Lee, M.-K.; Golzarian, M.R.; Kim, I. A new color index for vegetation segmentation and classification. Precis. Agric. 2021, 22, 179–204. [Google Scholar] [CrossRef]
  5. Benediktsson, J.A.; Ghamisi, P. Spectral-Spatial Classification of Hyperspectral Remote Sensing Images; Artech House: London, UK, 2015. [Google Scholar]
  6. Chang, C.-I. Hyperspectral Data Exploitation: Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
  7. Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef] [Green Version]
  8. Ghamisi, P.; Plaza, J.; Chen, Y.; Li, J.; Plaza, A.J. Advanced spectral classifiers for hyperspectral images: A review. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–32. [Google Scholar] [CrossRef] [Green Version]
  9. Kuching, S. The performance of maximum likelihood, spectral angle mapper, neural network and decision tree classifiers in hyperspectral image analysis. J. Comput. Sci. 2007, 3, 419–423. [Google Scholar]
  10. Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
  11. Cimtay, Y.; Özbay, B.; Yilmaz, G.; Bozdemir, E. A new vegetation index in short-wave infrared region of electromagnetic spectrum. IEEE Access 2021, 9, 148535–148545. [Google Scholar] [CrossRef]
  12. Heylen, R.; Parente, M.; Gader, P. A review of nonlinear hyperspectral unmixing methods. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1844–1868. [Google Scholar] [CrossRef]
  13. Çimtay, Y.; İlk, H.G. A novel bilinear unmixing approach for reconsideration of subpixel classification of land cover. Comput. Electron. Agric. 2018, 152, 126–140. [Google Scholar] [CrossRef]
  14. Benediktsson, J.A.; Palmason, J.A.; Sveinsson, J.R. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 2005, 43, 480–491. [Google Scholar] [CrossRef]
  15. Xia, J.; Dalla Mura, M.; Chanussot, J.; Du, P.; He, X. Random subspace ensembles for hyperspectral image classification with extended morphological attribute profiles. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4768–4786. [Google Scholar] [CrossRef]
  16. Fang, L.; He, N.; Li, S.; Ghamisi, P.; Benediktsson, J.A. Extinction profiles fusion for hyperspectral images classification. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1803–1815. [Google Scholar] [CrossRef]
  17. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  18. Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [Green Version]
  19. Zhang, B.; Xiong, D.; Su, J. Neural machine translation with deep attention. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 154–163. [Google Scholar] [CrossRef]
  20. Ma, X.; Wang, H.; Geng, J. Spectral–spatial classification of hyperspectral image based on deep auto-encoder. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 4073–4085. [Google Scholar] [CrossRef]
  21. Tu, Y.-H.; Du, J.; Lee, C.-H. Speech enhancement based on teacher–student deep learning using improved speech presence probability for noise-robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 27, 2080–2091. [Google Scholar] [CrossRef]
  22. Chen, Y.; Zhao, X.; Jia, X. Spectral–spatial classification of hyperspectral data based on deep belief network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [Google Scholar] [CrossRef]
  23. He, N.; Paoletti, M.E.; Haut, J.M.; Fang, L.; Li, S.; Plaza, A.; Plaza, J. Feature extraction with multiscale covariance maps for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 755–769. [Google Scholar] [CrossRef]
  24. Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef]
  25. Yu, C.; Han, R.; Song, M.; Liu, C.; Chang, C.-I. A simplified 2D-3D CNN architecture for hyperspectral image classification based on spatial–spectral fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2485–2501. [Google Scholar] [CrossRef]
  26. Li, X.; Ding, M.; Pižurica, A. Deep feature fusion via two-stream convolutional neural network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2615–2629. [Google Scholar] [CrossRef] [Green Version]
  27. Alam, F.I.; Zhou, J.; Liew, A.W.-C.; Jia, X.; Chanussot, J.; Gao, Y. Conditional random field and deep feature learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1612–1628. [Google Scholar] [CrossRef] [Green Version]
  28. Bhatti, U.A.; Yu, Z.; Chanussot, J.; Zeeshan, Z.; Yuan, L.; Luo, W.; Nawaz, S.A.; Bhatti, M.A.; Ain, Q.U.; Mehmood, A. Local Similarity-Based Spatial–Spectral Fusion Hyperspectral Image Classification with Deep CNN and Gabor Filtering. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5514215. [Google Scholar] [CrossRef]
  29. Paoletti, M.E.; Haut, J.M.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.J.; Pla, F. Deep pyramidal residual networks for spectral–spatial hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 740–754. [Google Scholar] [CrossRef]
  30. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
  31. Li, R.; Zheng, S.; Duan, C.; Yang, Y.; Wang, X. Classification of hyperspectral image based on double-branch dual-attention mechanism network. Remote Sens. 2020, 12, 582. [Google Scholar] [CrossRef] [Green Version]
  32. Yu, C.; Han, R.; Song, M.; Liu, C.; Chang, C.-I. Feedback attention-based dense CNN for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5501916. [Google Scholar] [CrossRef]
  33. Jiang, Y.; Li, Y.; Zhang, H. Hyperspectral image classification based on 3-D separable ResNet and transfer learning. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1949–1953. [Google Scholar] [CrossRef]
  34. Lv, Q.; Feng, W.; Quan, Y.; Dauphin, G.; Gao, L.; Xing, M. Enhanced-random-feature-subspace-based ensemble CNN for the imbalanced hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3988–3999. [Google Scholar] [CrossRef]
  35. Chen, Y.; Zhu, L.; Ghamisi, P.; Jia, X.; Li, G.; Tang, L. Hyperspectral images classification with Gabor filtering and convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2355–2359. [Google Scholar] [CrossRef]
  36. Zhang, H.; Li, Y.; Jiang, Y.; Wang, P.; Shen, Q.; Shen, C. Hyperspectral classification based on lightweight 3-D-CNN with transfer learning. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5813–5828. [Google Scholar] [CrossRef] [Green Version]
  37. Aptoula, E.; Ozdemir, M.C.; Yanikoglu, B. Deep learning with attribute profiles for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1970–1974. [Google Scholar] [CrossRef]
  38. Roy, S.K.; Mondal, R.; Paoletti, M.E.; Haut, J.M.; Plaza, A. Morphological convolutional neural networks for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8689–8702. [Google Scholar] [CrossRef]
  39. He, X.; Chen, Y.; Lin, Z. Spatial-spectral transformer for hyperspectral image classification. Remote Sens. 2021, 13, 498. [Google Scholar] [CrossRef]
  40. Rao, M.; Tang, P.; Zhang, Z. Spatial–spectral relation network for hyperspectral image classification with limited training samples. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 5086–5100. [Google Scholar] [CrossRef]
  41. Yue, J.; Zhu, D.; Fang, L.; Ghamisi, P.; Wang, Y. Adaptive spatial pyramid constraint for hyperspectral image classification with limited training samples. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5512914. [Google Scholar] [CrossRef]
  42. Fang, L.; Zhao, W.; He, N.; Zhu, J. Multiscale CNNs ensemble based self-learning for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1593–1597. [Google Scholar] [CrossRef]
  43. Zhou, S.; Xue, Z.; Du, P. Semisupervised stacked autoencoder with cotraining for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3813–3826. [Google Scholar] [CrossRef]
  44. Zhan, Y.; Hu, D.; Wang, Y.; Yu, X. Semisupervised hyperspectral image classification based on generative adversarial networks. IEEE Geosci. Remote Sens. Lett. 2017, 15, 212–216. [Google Scholar] [CrossRef]
  45. Li, W.; Wu, G.; Zhang, F.; Du, Q. Hyperspectral image classification using deep pixel-pair features. IEEE Trans. Geosci. Remote Sens. 2016, 55, 844–853. [Google Scholar] [CrossRef]
  46. Liu, B.; Yu, X.; Zhang, P.; Yu, A.; Fu, Q.; Wei, X. Supervised deep feature extraction for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1909–1921. [Google Scholar] [CrossRef]
  47. Haut, J.M.; Paoletti, M.E.; Plaza, J.; Plaza, A.; Li, J. Hyperspectral image classification using random occlusion data augmentation. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1751–1755. [Google Scholar] [CrossRef]
  48. Zhao, L.; Luo, W.; Liao, Q.; Chen, S.; Wu, J. Hyperspectral Image Classification with Contrastive Self-Supervised Learning Under Limited Labeled Samples. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6008205. [Google Scholar] [CrossRef]
  49. Hou, S.; Shi, H.; Cao, X.; Zhang, X.; Jiao, L. Hyperspectral Imagery Classification Based on Contrastive Learning. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5521213. [Google Scholar] [CrossRef]
  50. Zhu, M.; Fan, J.; Yang, Q.; Chen, T. SC-EADNet: A Self-Supervised Contrastive Efficient Asymmetric Dilated Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5519517. [Google Scholar] [CrossRef]
  51. Debes, C.; Merentitis, A.; Heremans, R.; Hahn, J.; Frangiadakis, N.; van Kasteren, T.; Liao, W.; Bellens, R.; Pižurica, A.; Gautama, S. Hyperspectral and LiDAR data fusion: Outcome of the 2013 GRSS data fusion contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2405–2418. [Google Scholar] [CrossRef]
  52. Yokoya, N.; Iwasaki, A. Airborne Hyperspectral Data over Chikusei; SAL-2016-05-27; Space Application Laboratory, University of Tokyo: Tokyo, Japan, 2016; p. 5. [Google Scholar]
  53. Ma, W.; Yang, Q.; Wu, Y.; Zhao, W.; Zhang, X. Double-branch multi-attention mechanism network for hyperspectral image classification. Remote Sens. 2019, 11, 1307. [Google Scholar] [CrossRef] [Green Version]
  54. Wang, W.; Dou, S.; Jiang, Z.; Sun, L. A fast dense spectral–spatial convolution network framework for hyperspectral images classification. Remote Sens. 2018, 10, 1068. [Google Scholar] [CrossRef] [Green Version]
  55. Wang, Y.; Ma, X.; Chen, Z.; Luo, Y.; Yi, J.; Bailey, J. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 322–330. [Google Scholar]
  56. Guo, A.J.; Zhu, F. Improving deep hyperspectral image classification performance with spectral unmixing. Signal Process. 2021, 183, 107949. [Google Scholar] [CrossRef]
Figure 1. The proposed HSI classification method overview—training flow. For illustrative purposes, a single image flow instead of a batch is shown here. Stage 1: pre-training based on supervised contrastive learning. Stage 2: fine-tuning for the final classification task.
Figure 1. The proposed HSI classification method overview—training flow. For illustrative purposes, a single image flow instead of a batch is shown here. Stage 1: pre-training based on supervised contrastive learning. Stage 2: fine-tuning for the final classification task.
Remotesensing 14 05530 g001
Figure 2. Multiscale data augmentation. Red box represents the region of an HSI sample.
Figure 2. Multiscale data augmentation. Red box represents the region of an HSI sample.
Remotesensing 14 05530 g002
Figure 3. Examples of 3D random occlusion augmentation: (a) original (not occluded) inputs, (bd) occluded inputs with occluded zones shown in grey.
Figure 3. Examples of 3D random occlusion augmentation: (a) original (not occluded) inputs, (bd) occluded inputs with occluded zones shown in grey.
Remotesensing 14 05530 g003
Figure 4. The generation process of pairs in SCL.
Figure 4. The generation process of pairs in SCL.
Remotesensing 14 05530 g004
Figure 5. Multi-level similarity regularization for SCL model.
Figure 5. Multi-level similarity regularization for SCL model.
Remotesensing 14 05530 g005
Figure 6. Indian Pines dataset: (a) false color map, (b) ground truth.
Figure 6. Indian Pines dataset: (a) false color map, (b) ground truth.
Remotesensing 14 05530 g006
Figure 7. Pavia University dataset: (a) false color map, (b) ground truth.
Figure 7. Pavia University dataset: (a) false color map, (b) ground truth.
Remotesensing 14 05530 g007
Figure 8. Houston dataset: (a) false color map, (b) ground truth.
Figure 8. Houston dataset: (a) false color map, (b) ground truth.
Remotesensing 14 05530 g008
Figure 9. Chikusei dataset: (a) false color map, (b) ground truth.
Figure 9. Chikusei dataset: (a) false color map, (b) ground truth.
Remotesensing 14 05530 g009
Figure 10. The average spectrum reflectance curves of different classes on the Pavia University dataset.
Figure 10. The average spectrum reflectance curves of different classes on the Pavia University dataset.
Remotesensing 14 05530 g010
Figure 11. The ground truth of the Grass-synthetic class and the Running-track class in the Houston dataset. The red box represents the region where samples are likely to be ambiguous.
Figure 11. The ground truth of the Grass-synthetic class and the Running-track class in the Houston dataset. The red box represents the region where samples are likely to be ambiguous.
Remotesensing 14 05530 g011
Figure 12. Distribution of similarity over the Indian Pines dataset, when epoch = 300: (a) T = 0.25,   h = 10; (b) T = 1.0,   h = 10; (c)   T = 0.25,   h = 20; (d) T = 1.0,   h = 20.
Figure 12. Distribution of similarity over the Indian Pines dataset, when epoch = 300: (a) T = 0.25,   h = 10; (b) T = 1.0,   h = 10; (c)   T = 0.25,   h = 20; (d) T = 1.0,   h = 20.
Remotesensing 14 05530 g012
Figure 13. Distribution of similarity over the Pavia University dataset, when epoch = 300: (a) T = 0.25,   h = 10; (b) T = 1.0,   h = 10; (c)   T = 0.25,   h = 20; (d) T = 1.0,   h = 20.
Figure 13. Distribution of similarity over the Pavia University dataset, when epoch = 300: (a) T = 0.25,   h = 10; (b) T = 1.0,   h = 10; (c)   T = 0.25,   h = 20; (d) T = 1.0,   h = 20.
Remotesensing 14 05530 g013
Figure 14. Distribution of similarity over the Houston dataset, when epoch = 300: (a) T = 0.25,   h = 10; (b) T = 1.0,   h = 10; (c)   T = 0.25,   k = 20; (d) T = 1.0,   h = 20.
Figure 14. Distribution of similarity over the Houston dataset, when epoch = 300: (a) T = 0.25,   h = 10; (b) T = 1.0,   h = 10; (c)   T = 0.25,   k = 20; (d) T = 1.0,   h = 20.
Remotesensing 14 05530 g014
Figure 15. Distribution of similarity over the Chikusei dataset, when epoch = 300: (a) T = 0.25,   h = 10; (b) T = 1.0,   h = 10; (c)   T = 0.25,   k = 20; (d) T = 1.0,   h = 20.
Figure 15. Distribution of similarity over the Chikusei dataset, when epoch = 300: (a) T = 0.25,   h = 10; (b) T = 1.0,   h = 10; (c)   T = 0.25,   k = 20; (d) T = 1.0,   h = 20.
Remotesensing 14 05530 g015
Figure 16. Classification accuracies over different values of T and h: (a) the Indian Pines dataset; (b) the Pavia University dataset; (c) the Houston dataset; (d) the Chikusei dataset.
Figure 16. Classification accuracies over different values of T and h: (a) the Indian Pines dataset; (b) the Pavia University dataset; (c) the Houston dataset; (d) the Chikusei dataset.
Remotesensing 14 05530 g016
Figure 17. Similarity statistics of various m in SCL training on the Indian Pines dataset: (a) mean of negative similarity; (b) variance of negative similarity; (c) mean of positive similarity; (d) variance of positive similarity. Different colors corresponds to different values of m .
Figure 17. Similarity statistics of various m in SCL training on the Indian Pines dataset: (a) mean of negative similarity; (b) variance of negative similarity; (c) mean of positive similarity; (d) variance of positive similarity. Different colors corresponds to different values of m .
Remotesensing 14 05530 g017
Figure 18. Similarity statistics of various m in SCL training on the Pavia University dataset: (a) mean of negative similarity; (b) variance of negative similarity; (c) mean of positive similarity; (d) variance of positive similarity. Different colors corresponds to different values of m .
Figure 18. Similarity statistics of various m in SCL training on the Pavia University dataset: (a) mean of negative similarity; (b) variance of negative similarity; (c) mean of positive similarity; (d) variance of positive similarity. Different colors corresponds to different values of m .
Remotesensing 14 05530 g018
Figure 19. Similarity statistics of various m in SCL training on the Houston dataset: (a) mean of negative similarity; (b) variance of negative similarity; (c) mean of positive similarity; (d) variance of positive similarity. Different colors corresponds to different values of m .
Figure 19. Similarity statistics of various m in SCL training on the Houston dataset: (a) mean of negative similarity; (b) variance of negative similarity; (c) mean of positive similarity; (d) variance of positive similarity. Different colors corresponds to different values of m .
Remotesensing 14 05530 g019
Figure 20. Similarity statistics of various m in SCL training on the Chikusei dataset: (a) mean of negative similarity; (b) variance of negative similarity; (c) mean of positive similarity; (d) variance of positive similarity. Different colors corresponds to different values of m .
Figure 20. Similarity statistics of various m in SCL training on the Chikusei dataset: (a) mean of negative similarity; (b) variance of negative similarity; (c) mean of positive similarity; (d) variance of positive similarity. Different colors corresponds to different values of m .
Remotesensing 14 05530 g020
Figure 21. The overall accuracy of SCL with different values of momentum coefficient m .
Figure 21. The overall accuracy of SCL with different values of momentum coefficient m .
Remotesensing 14 05530 g021
Figure 22. The overall accuracy of SCL using different data augmentation techniques.
Figure 22. The overall accuracy of SCL using different data augmentation techniques.
Remotesensing 14 05530 g022
Figure 23. Classification accuracies over different values of p and S : (a) Indian Pines dataset; (b) Pavia University dataset; (c) Houston dataset; (d) Chikusei dataset.
Figure 23. Classification accuracies over different values of p and S : (a) Indian Pines dataset; (b) Pavia University dataset; (c) Houston dataset; (d) Chikusei dataset.
Remotesensing 14 05530 g023
Figure 24. Distribution of similarity when T = 1.0,   h = 25. (ad) SCL over the Indian Pines dataset, the Pavia University dataset, the Houston dataset, and the Chikusei dataset; (eh) SCL–MSR over the Indian Pines dataset, the Pavia University dataset, the Houston dataset, and the Chikusei dataset.
Figure 24. Distribution of similarity when T = 1.0,   h = 25. (ad) SCL over the Indian Pines dataset, the Pavia University dataset, the Houston dataset, and the Chikusei dataset; (eh) SCL–MSR over the Indian Pines dataset, the Pavia University dataset, the Houston dataset, and the Chikusei dataset.
Remotesensing 14 05530 g024
Figure 25. False color maps of the Indian Pines and Pavia University datasets. (a,c): original datasets; (b,d): downsampled datasets.
Figure 25. False color maps of the Indian Pines and Pavia University datasets. (a,c): original datasets; (b,d): downsampled datasets.
Remotesensing 14 05530 g025
Figure 26. Classification maps using different methods on the Indian Pines dataset: (a) SCL–MSR; (b) SCL; (c) FDSCC; (d) DBDA; (e) DBMA; (f) SSRN; (g) SiamSCL; (h) EMP–SVM.
Figure 26. Classification maps using different methods on the Indian Pines dataset: (a) SCL–MSR; (b) SCL; (c) FDSCC; (d) DBDA; (e) DBMA; (f) SSRN; (g) SiamSCL; (h) EMP–SVM.
Remotesensing 14 05530 g026
Figure 27. Classification maps using different methods on the Pavia University dataset: (a) SCL–MSR; (b) SCL; (c) FDSCC; (d) DBDA; (e) DBMA; (f) SSRN; (g) SiamSCL; (h) EMP–SVM.
Figure 27. Classification maps using different methods on the Pavia University dataset: (a) SCL–MSR; (b) SCL; (c) FDSCC; (d) DBDA; (e) DBMA; (f) SSRN; (g) SiamSCL; (h) EMP–SVM.
Remotesensing 14 05530 g027
Figure 28. Classification maps using different methods on the Houston dataset: (a) SCL–MSR; (b) SCL; (c) FDSCC; (d) DBDA; (e) DBMA; (f) SSRN; (g) SiamSCL; (h) EMP–SVM.
Figure 28. Classification maps using different methods on the Houston dataset: (a) SCL–MSR; (b) SCL; (c) FDSCC; (d) DBDA; (e) DBMA; (f) SSRN; (g) SiamSCL; (h) EMP–SVM.
Remotesensing 14 05530 g028aRemotesensing 14 05530 g028b
Figure 29. Classification maps using different methods on the Chikusei dataset: (a) SCL–MSR; (b) SCL; (c) FDSCC; (d) DBDA; (e) DBMA; (f) SSRN; (g) SiamSCL; (h) EMP–SVM.
Figure 29. Classification maps using different methods on the Chikusei dataset: (a) SCL–MSR; (b) SCL; (c) FDSCC; (d) DBDA; (e) DBMA; (f) SSRN; (g) SiamSCL; (h) EMP–SVM.
Remotesensing 14 05530 g029
Table 1. Land cover classes and numbers of samples in the Indian Pines dataset.
Table 1. Land cover classes and numbers of samples in the Indian Pines dataset.
No.Class NameTraining SamplesTest SamplesTotal Samples
1Alfalfa202646
2Corn-notill2014081428
3Corn-mintill20810830
4Corn20217237
5Grass-pasture20463483
6Grass-trees20710730
7Grass-pasture-mowed20828
8Hay-windrowed20458478
9Oats15520
10Soybean-notill20952972
11Soybean-mintill2024352455
12Soybean-clean20573593
13Wheat20185205
14Woods2012451265
15Buildings-Grass-Trees20366386
16Stone-Steel-Towers207393
Total315993410,249
Table 2. Land cover classes and numbers of samples in the Pavia University dataset.
Table 2. Land cover classes and numbers of samples in the Pavia University dataset.
No.Class NameTraining SamplesTest SamplesTotal Samples
1Asphalt2066116631
2Meadows2018,62918,649
3Gravel2020792099
4Trees2030443064
5Mental sheets2013251345
6Bare soil2050095029
7Bitumen2013101330
8Bricks2036623682
9Shadow20927947
Total18042,59642,776
Table 3. Land cover classes and numbers of samples in the Houston dataset.
Table 3. Land cover classes and numbers of samples in the Houston dataset.
No.Class NameTraining SamplesTest SamplesTotal Samples
1Grass-healthy2012311251
2Grass-stressed2012341254
3Grass-synthetic20677697
4Tree2012241244
5Soil2012221242
6Water20305325
7Residential2012481268
8Commercial2012241244
9Road2012321252
10Highway2012071227
11Railway2012151235
12Parking-lot-12012131233
13Parking-lot-220449469
14Tennis-court204008428
15Running-track20640660
Total30014,72915,029
Table 4. Land cover classes and numbers of samples in the Chikusei dataset.
Table 4. Land cover classes and numbers of samples in the Chikusei dataset.
No.Class NameTraining SamplesTest SamplesTotal Samples
1Water528402845
2Bare soil (school)528542859
3Bare soil (park)5281286
4Bare soil (farmland)548474852
5Natural plants542924297
6Weeds511031108
7Forest520,51120,516
8Grass565106515
9Rice field (grown)513,36413,369
10Rice field (first stage)512631268
11Row crops559565961
12Plastic house521882193
13Manmade-1512151220
14Manmade-2576597664
15Manmade-35426431
16Manmade-45217222
17Manmade grass510351040
18Asphalt5796801
19Paved ground5140145
Total9577,49777,592
Table 5. Architecture of CNN.
Table 5. Architecture of CNN.
No.ConvolutionReLUPoolingPaddingStrideBNLinear
11 × 1 × 32YESNoNO1YES-
24 × 4 × 32YES2 × 2NO1YES-
33 × 3 × 64YES2 × 2NO1YES-
44 × 4 × 128YES2 2NO1YES-
5-NONONO-NO128 × 256
Table 6. Testing data classification results (mean ± standard deviation) on the Indian Pines dataset.
Table 6. Testing data classification results (mean ± standard deviation) on the Indian Pines dataset.
ClassEMP–SVMCNNSiamSCLSSRNDBMADBDAFDSSCSCLSCL–MSR
Remotesensing 14 05530 i00196.54 ± 2.07100.0 ± 0.00100.0 ± 0.0086.04 ± 9.7366.41 ± 13.1779.35 ± 12.8089.04 ± 10.60100.0 ± 0.00100.0 ± 0.00
Remotesensing 14 05530 i00263.74 ± 7.4578.96 ± 6.2279.51 ± 4.0086.85 ± 6.0084.14 ± 7.0086.89 ± 10.7492.35 ± 6.4681.46 ± 2.9683.32 ± 3.12
Remotesensing 14 05530 i00376.56 ± 4.4087.75 ± 5.9688.96 ± 6.7386.54 ± 5.6584.42 ± 8.0786.88 ± 8.7987.91 ± 11.6489.00 ± 4.7690.06 ± 3.03
Remotesensing 14 05530 i00481.94 ± 5.1498.53 ± 2.5698.29 ± 2.0573.29 ± 13.1082.27 ± 10.2081.64 ± 11.4780.76 ± 14.6498.39 ± 2.1998.34 ± 2.13
Remotesensing 14 05530 i00586.57 ± 3.8592.18 ± 3.2191.71 ± 2.5798.00 ± 1.9693.89 ± 5.2597.02 ± 3.7498.69 ± 1.3693.00 ± 2.4993.13 ± 1.82
Remotesensing 14 05530 i00692.85 ± 4.6395.41 ± 2.5394.93 ± 3.7397.79 ± 1.4098.14 ± 1.3296.54 ± 1.2597.95 ± 1.5694.62 ± 3.9791.87 ± 5.09
Remotesensing 14 05530 i00792.50 ± 6.12100.0 ± 0.00100.0 ± 0.0073.78 ± 18.7039.00 ± 26.5358.08 ± 28.2266.78 ± 30.24100.0 ± 0.00100.0 ± 0.00
Remotesensing 14 05530 i00894.10 ± 3.2599.80 ± 0.5999.85 ± 0.3999.46 ± 0.8399.36 ± 1.2199.62 ± 0.5899.93 ± 0.1099.83 ± 0.5299.87 ± 0.46
Remotesensing 14 05530 i00998.00 ± 6.00100.0 ± 0.00100.0 ± 0.0044.13 ± 20.5615.44 ± 8.8432.65 ± 11.2232.03 ± 21.44100.0 ± 0.00100.0 ± 0.00
Remotesensing 14 05530 i01068.28 ± 7.2088.93 ± 3.1089.09 ± 4.4879.53 ± 6.4578.91 ± 5.1381.08 ± 8.0879.27 ± 12.0289.29 ± 4.4190.42 ± 4.30
Remotesensing 14 05530 i01159.22 ± 4.6987.41 ± 5.4788.21 ± 3.6692.94 ± 3.5092.90 ± 3.5795.43 ± 4.1992.20 ± 6.6489.66 ± 2.9389.25 ± 3.98
Remotesensing 14 05530 i01266.61 ± 7.6883.84 ± 6.2384.56 ± 6.2182.78 ± 10.7381.50 ± 11.1290.57 ± 13.4587.94 ± 14.3285.46 ± 6.0388.38 ± 3.25
Remotesensing 14 05530 i01397.08 ± 1.8199.57 ± 0.8999.41 ± 0.8995.49 ± 3.8297.35 ± 2.9691.42 ± 5.3592.61 ± 4.9899.57 ± 0.7699.46 ± 0.76
Remotesensing 14 05530 i01486.39 ± 5.6897.39 ± 1.7197.45 ± 1.8097.87 ± 1.1497.70 ± 1.4298.43 ± 1.0598.57 ± 8.4197.82 ± 1.5297.27 ± 1.44
Remotesensing 14 05530 i01571.48 ± 8.5597.19 ± 2.6097.62 ± 2.2287.30 ± 7.7580.60 ± 5.5388.99 ± 5.4683.25 ± 12.1897.57 ± 2.6498.17 ± 1.65
Remotesensing 14 05530 i01695.48 ± 4.7199.18 ± 0.9199.32 ± 0.9279.51 ± 4.4878.45 ± 8.9168.09 ± 12.7580.42 ± 6.4199.18 ± 0.9199.17 ± 0.67
OA (%)73.32 ± 2.2589.62 ± 1.7290.15 ± 1.2789.44 ± 1.3887.96 ± 1.2490.26 ± 3.0689.71 ± 2.7290.95 ± 1.3591.23 ± 1.40
AA (%)82.96 ± 1.3194.14 ± 0.7294.31 ± 0.7285.08 ± 2.0879.41 ± 1.5983.29 ± 2.4784.98 ± 3.9794.68 ± 0.8094.92 ± 0.67
K × 10069.93 ± 2.5088.20 ± 1.9388.80 ± 1.4288.00 ± 1.5586.32 ± 1.3488.95 ± 3.4388.30 ± 3.0889.69 ± 1.5290.02 ± 1.57
Table 7. Testing data classification results (mean ± standard deviation) on the Pavia University dataset.
Table 7. Testing data classification results (mean ± standard deviation) on the Pavia University dataset.
ClassEMP–SVMCNNSiamSCLSSRNDBMADBDAFDSSCSCLSCL–MSR
Remotesensing 14 05530 i01781.27 ± 6.6090.85 ± 4.5392.02 ± 5.0997.84 ± 1.7198.47 ± 0.7098.74 ± 1.2098.59 ± 1.6194.58 ± 3.7893.42 ± 3.45
Remotesensing 14 05530 i01883.13 ± 3.2693.83 ± 3.4995.32 ± 4.2097.72 ± 0.8198.08 ± 1.3599.51 ± 0.3699.19 ± 0.3896.62 ± 3.9697.62 ± 2.78
Remotesensing 14 05530 i01981.60 ± 4.5198.12 ± 1.2798.88 ± 0.8683.71 ± 8.1778.83 ± 10.5890.81 ± 12.1092.84 ± 5.7298.86 ± 0.9698.56 ± 1.90
Remotesensing 14 05530 i02095.29 ± 2.4496.29 ± 1.3996.18 ± 2.0597.70 ± 2.0188.43 ± 4.0192.74 ± 7.9294.75 ± 5.7596.89 ± 0.9996.73 ± 1.17
Remotesensing 14 05530 i02199.26 ± 0.2699.33 ± 0.5299.49 ± 0.4599.86 ± 0.2796.67 ± 4.5799.53 ± 0.6399.88 ± 0.1299.58 ± 0.3499.40 ± 0.54
Remotesensing 14 05530 i02280.27 ± 6.3199.47 ± 0.6399.86 ± 0.2791.98 ± 3.6986.60 ± 8.5490.96 ± 5.4595.65 ± 2.2199.27 ± 1.3699.86 ± 0.32
Remotesensing 14 05530 i02393.11 ± 1.5699.39 ± 0.6899.52 ± 0.4388.49 ± 12.0395.13 ± 8.1293.80 ± 8.8396.36 ± 2.6799.71 ± 0.3699.63 ± 0.44
Remotesensing 14 05530 i02483.86 ± 3.9698.94 ± 0.8099.02 ± 0.7884.79 ± 7.3685.56 ± 8.6289.83 ± 6.4783.63 ± 11.8498.99 ± 0.8599.08 ± 0.89
Remotesensing 14 05530 i02599.85 ± 0.1296.66 ± 1.3396.76 ± 1.5599.41 ± 0.9492.34 ± 3.5096.82 ± 1.6997.35 ± 3.4496.54 ± 1.4096.43 ± 2.02
OA (%)84.53 ± 2.2295.26 ± 1.7496.18 ± 2.1594.72 ± 1.1792.93 ± 1.7595.87 ± 1.8596.07 ± 1.8997.13 ± 1.8297.43 ± 1.49
AA (%)88.63 ± 1.5796.99 ± 0.7497.45 ± 0.8293.50 ± 2.0791.13 ± 1.5494.75 ± 2.3695.36 ± 1.6497.89 ± 0.7097.86 ± 0.69
K × 10080.00 ± 2.7693.80 ± 2.2395.00 ± 2.7893.02 ± 1.5390.75 ± 2.2394.58 ± 2.4194.84 ± 2.4296.22 ± 2.3596.62 ± 1.94
Table 8. Testing data classification results (mean ± standard deviation) on the Houston dataset.
Table 8. Testing data classification results (mean ± standard deviation) on the Houston dataset.
ClassEMP–SVMCNNSiamSCLSSRNDBMADBDAFDSSCSCLSCL–MSR
Remotesensing 14 05530 i02692.99 ± 4.3092.59 ± 4.5693.23 ± 4.9396.25 ± 2.9494.79 ± 3.3493.48 ± 5.5996.41 ± 2.3094.25 ± 5.1793.11 ± 4.73
Remotesensing 14 05530 i02793.06 ± 5.7297.00 ± 2.2196.54 ± 2.3597.65 ± 2.4892.77 ± 4.6295.10 ± 3.7897.64 ± 2.1197.46 ± 1.9497.29 ± 1.79
Remotesensing 14 05530 i02898.97 ± 1.1098.66 ± 1.4199.03 ± 1.3399.93 ± 0.2299.76 ± 0.51100.0 ± 0.00100.0 ± 0.0098.98 ± 1.2098.41 ± 1.85
Remotesensing 14 05530 i02994.75 ± 2.9497.66 ± 1.8198.04 ± 1.5195.98 ± 4.1394.93 ± 3.2397.13 ± 2.1795.63 ± 3.7998.46 ± 1.4597.31 ± 2.02
Remotesensing 14 05530 i03096.51 ± 4.5297.47 ± 5.1198.05 ± 5.1295.41 ± 2.3696.65 ± 2.6797.66 ± 2.4297.60 ± 2.4898.75 ± 3.0398.58 ± 3.34
Remotesensing 14 05530 i03194.72 ± 3.4295.38 ± 3.5295.01 ± 3.9197.31 ± 7.8396.88 ± 3.6197.37 ± 2.2399.80 ± 0.3494.89 ± 3.7095.51 ± 3.79
Remotesensing 14 05530 i03285.54 ± 4.6790.54 ± 2.4191.44 ± 1.9992.10 ± 2.4786.21 ± 4.3391.93 ± 3.6292.56 ± 4.6493.04 ± 2.7491.79 ± 2.42
Remotesensing 14 05530 i03369.36 ± 4.9078.48 ± 6.6479.56 ± 3.1293.23 ± 3.4990.28 ± 4.8194.88 ± 3.1992.68 ± 3.6680.74 ± 4.6889.27 ± 3.89
Remotesensing 14 05530 i03475.81 ± 6.8190.60 ± 3.9492.00 ± 2.8089.87 ± 3.8386.17 ± 4.0188.57 ± 2.2790.82 ± 3.1991.83 ± 5.1591.49 ± 3.93
Remotesensing 14 05530 i03587.63 ± 4.0196.06 ± 4.1297.12 ± 3.8886.49 ± 6.7891.35 ± 2.7489.76 ± 3.8389.12 ± 4.7097.75 ± 2.8799.34 ± 1.31
Remotesensing 14 05530 i03685.58 ± 7.7291.52 ± 4.7194.24 ± 3.8790.45 ± 1.9491.57 ± 4.5195.44 ± 2.1092.70 ± 3.5694.79 ± 2.6295.81 ± 1.68
Remotesensing 14 05530 i03776.18 ± 6.1491.81 ± 5.4892.94 ± 5.3489.91 ± 4.7390.33 ± 5.8493.15 ± 3.2793.40 ± 3.8692.48 ± 6.7595.03 ± 3.85
Remotesensing 14 05530 i03856.44 ± 5.9096.08 ± 2.6295.10 ± 3.7893.52 ± 5.6577.27 ± 7.7182.75 ± 6.0683.22 ± 9.1495.03 ± 3.6395.06 ± 2.66
Remotesensing 14 05530 i03997.94 ± 2.5399.93 ± 0.22100.0 ± 0.0097.67 ± 3.2995.37 ± 6.7298.13 ± 2.5998.07 ± 2.72100.0 ± 0.00100.0 ± 0.00
Remotesensing 14 05530 i04099.08 ± 0.4699.59 ± 1.0699.72 ± 0.7096.93 ± 1.9696.17 ± 2.2095.05 ± 2.4796.22 ± 3.1899.58 ± 0.9799.92 ± 0.19
OA (%)86.56 ± 1.3693.36 ± 0.9294.03 ± 0.9193.32 ± 1.0591.58 ± 0.6993.67 ± 0.9293.97 ± 0.9994.65 ± 0.7394.84 ± 0.72
AA (%)86.97 ± 1.2694.23 ± 0.6794.80 ± 0.7394.18 ± 1.1292.03 ± 0.7194.03 ± 0.8794.39 ± 1.0095.20 ± 0.6095.39 ± 0.60
K × 10085.47 ± 1.4792.82 ± 0.9993.64 ± 0.9892.78 ± 1.1390.90 ± 0.7593.16 ± 1.0093.48 ± 1.0794.21 ± 0.7994.42 ± 0.77
Table 9. Testing data classification results (mean ± standard deviation) on the Chikusei dataset.
Table 9. Testing data classification results (mean ± standard deviation) on the Chikusei dataset.
ClassEMP–SVMCNNSiamCLSPSSRNDBMADBDAFDSSCCLSPCLSP–MSR
Remotesensing 14 05530 i04183.55 ± 10.6092.99 ± 4.4091.74 ± 4.1883.51 ± 12.9484.50 ± 11.8283.44 ± 13.886.47 ± 12.4291.17 ± 4.6293.42 ± 3.92
Remotesensing 14 05530 i04293.83 ± 3.8499.54 ± 0.5399.59 ± 0.4998.07 ± 2.0299.82 ± 0.2399.65 ± 0.5198.55 ± 3.3099.60 ± 0.5399.45 ± 0.52
Remotesensing 14 05530 i04398.01 ± 2.6299.57 ± 0.9899.78 ± 0.4328.93 ± 10.7523.02 ± 5.5431.63 ± 15.7729.06 ± 14.9497.30 ± 5.4697.08 ± 6.14
Remotesensing 14 05530 i04450.19 ± 20.782.66 ± 16.1082.82 ± 15.2290.14 ± 11.3889.34 ± 9.3287.22 ± 10.7384.33 ± 10.5586.55 ± 1.7886.07 ± 16.79
Remotesensing 14 05530 i04596.70 ± 2.7699.95 ± 0.0299.99 ± 0.0295.10 ± 3.3297.64 ± 2.6796.53 ± 3.2494.59 ± 3.6999.97 ± 5.3699.98 ± 0.03
Remotesensing 14 05530 i04687.28 ± 12.1395.62 ± 3.6495.26 ± 3.6173.53 ± 22.8971.42 ± 22.9585.41 ± 24.1481.26 ± 18.4895.27 ± 3.8695.27 ± 3.86
Remotesensing 14 05530 i04782.13 ± 7.4999.97 ± 0.0599.97 ± 0.0795.66 ± 3.7094.69 ± 4.9299.37 ± 0.8798.10 ± 1.6699.99 ± 0.0299.98 ± 0.07
Remotesensing 14 05530 i04891.93 ± 2.7293.05 ± 2.9994.42 ± 3.4296.71 ± 4.9699.06 ± 0.9799.90 ± 0.2798.81 ± 2.0193.91 ± 3.0295.23 ± 1.95
Remotesensing 14 05530 i04979.34 ± 20.9794.59 ± 10.5898.22 ± 2.4296.57 ± 3.7795.30 ± 5.1199.43 ± 0.4696.95 ± 4.7897.74 ± 3.0398.69 ± 1.75
Remotesensing 14 05530 i05099.26 ± 0.5599.94 ± 0.1799.92 ± 0.1781.93 ± 9.8680.55 ± 15.5089.73 ± 5.2382.11 ± 12.8399.64 ± 0.9699.98 ± 0.07
Remotesensing 14 05530 i05166.40 ± 14.4782.22 ± 10.9079.41 ± 12.7494.58 ± 11.3793.09 ± 3.3297.42 ± 3.2994.36 ± 10.2385.51 ± 8.785.19 ± 8.01
Remotesensing 14 05530 i05269.20 ± 11.5084.48 ± 9.1385.50 ± 8.7891.50 ± 6.2092.15 ± 4.9296.78 ± 4.4889.21 ± 11.7485.74 ± 8.3485.53 ± 9.46
Remotesensing 14 05530 i05395.09 ± 1.9795.97 ± 1.4896.15 ± 1.7796.16 ± 7.6292.84 ± 7.3998.75 ± 2.2792.87 ± 9.7096.12 ± 1.7296.18 ± 1.56
Remotesensing 14 05530 i05486.85 ± 11.2489.49 ± 10.8090.76 ± 10.5899.80 ± 0.3398.09 ± 2.5399.60 ± 7.8299.75 ± 0.5091.70 ± 8.592.60 ± 10.94
Remotesensing 14 05530 i05591.01 ± 17.2391.78 ± 8.4391.19 ± 16.2393.87 ± 9.6992.99 ± 9.3398.12 ± 5.2096.65 ± 7.2791.78 ± 10.491.78 ± 6.43
Remotesensing 14 05530 i05693.73 ± 7.8595.67 ± 6.0495.66 ± 6.0493.60 ± 7.3294.38 ± 5.1598.24 ± 3.4892.51 ± 2.4594.29 ± 6.8996.04 ± 7.87
Remotesensing 14 05530 i05793.39 ± 6.3896.06 ± 8.3994.97 ± 8.7098.35 ± 1.6596.53 ± 2.8896.62 ± 2.3296.83 ± 4.2498.51 ± 1.6198.98 ± 1.78
Remotesensing 14 05530 i05888.52 ± 12.1783.98 ± 11.285.30 ± 11.8369.53 ± 13.8564.40 ± 14.6072.33 ± 13.8259.54 ± 14.8885.10 ± 12.383.79 ± 11.08
Remotesensing 14 05530 i05988.07 ± 7.6998.86 ± 3.4398.85 ± 3.4324.50 ± 16.2714.22 ± 9.2835.81 ± 35.8161.44 ± 25.4099.79 ± 0.64100.0 ± 0.00
OA (%)81.58 ± 4.6493.87 ± 2.2894.51 ± 1.6691.46 ± 3.6290.12 ± 4.3594.39 ± 2.3992.79 ± 3.2095.20 ± 1.9195.58 ± 2.02
AA (%)86.02 ± 3.0693.50 ± 1.4093.66 ± 1.2684.32 ± 2.9882.84 ± 2.2788.39 ± 2.2485.97 ± 3.0994.19 ± 1.2994.49 ± 1.55
K × 10078.97 ± 5.3192.95 ± 2.6093.68 ± 1.9090.18 ± 4.1388.65 ± 4.9693.55 ± 2.7391.72 ± 3.6594.47 ± 2.1894.91 ± 2.31
Table 10. Classification results after spectral unmixing.
Table 10. Classification results after spectral unmixing.
CNNSiamSCLSSRNDBDAFDSSCSCLSCL–MSR
Indian PinesOA (%)90.52 ± 1.2590.92 ± 1.3990.24 ± 1.5690.92 ± 2.1690.68 ± 1.8991.44 ± 1.6691.93 ± 1.24
AA (%)94.47 ± 0.7294.78 ± 1.0285.64 ± 1.2884.08 ± 2.1285.69 ± 2.5495.25 ± 0.7995.51 ± 0.65
K × 10089.22 ± 1.4189.66 ± 1.5388.74 ± 1.7289.62 ± 2.6889.44 ± 2.4690.27 ± 1.8490.82 ± 1.41
Pavia
University
OA (%)96.04 ± 1.9796.72 ± 2.2195.41 ± 1.2696.61 ± 1.9696.82 ± 1.9497.51 ± 2.0097.88 ± 1.61
AA (%)96.98 ± 1.2797.96 ± 1.2593.96 ± 2.2695.39 ± 2.1696.24 ± 1.5697.67 ± 1.2197.86 ± 0.91
K × 10094.80 ± 2.5795.61 ± 2.4193.42 ± 1.9495.21 ± 2.5595.63 ± 2.4396.72 ± 2.6297.20 ± 2.11
HoustonOA (%)94.00 ± 1.3094.56 ± 1.1293.89 ± 1.0594.31 ± 1.2394.56 ± 1.1594.90 ± 0.8595.19 ± 0.83
AA (%)94.67 ± 1.1195.15 ± 0.9894.62 ± 1.2594.83 ± 0.9595.03 ± 0.9895.42 ± 0.7895.63 ± 0.64
K × 10093.52 ± 1.4194.01 ± 1.2693.28 ± 1.4593.75 ± 1.1294.01 ± 1.0194.52 ± 0.9294.75 ± 0.82
ChikuseiOA (%)94.37 ± 2.0294.86 ± 1.7892.30 ± 3.2694.89 ± 2.1293.82 ± 2.8695.54 ± 1.8595.95 ± 1.78
AA (%)94.02 ± 1.6094.02 ± 1.3285.07 ± 2.7388.92 ± 1.8386.68 ± 2.5294.58 ± 1.4694.98 ± 1.68
K × 10093.50 ± 2.8194.12 ± 1.9890.86 ± 3.6594.05 ± 2.4992.43 ± 3.1594.86 ± 2.1495.16 ± 1.93
Table 11. Classification results when the spatial resolution is poor.
Table 11. Classification results when the spatial resolution is poor.
CNNSiamSCLSSRNDBDAFDSSCSCLSCL–MSR
Indian PinesOA (%)90.00 ± 1.6691.28 ± 1.6290.12 ± 1.2691.35 ± 1.5291.24 ± 1.4591.86 ± 1.5392.19 ± 1.49
AA (%)94.23 ± 0.8195.43 ± 0.8386.26 ± 1.7281.89 ± 3.3581.96 ± 3.2195.69 ± 0.8995.89 ± 0.95
K × 10088.62 ± 1.8689.65 ± 1.7888.85 ± 1.4390.51 ± 1.7890.42 ± 1.6591.25 ± 1.6891.46 ± 1.54
Pavia
University
OA (%)94.13 ± 2.4894.54 ± 2.1590.00 ± 1.7995.06 ± 1.5893.18 ± 1.5495.69 ± 1.8696.01 ± 1.56
AA (%)95.83 ± 1.2696.01 ± 1.1286.97 ± 1.7493.49 ± 1.5890.37 ± 1.7497.26 ± 0.9597.54 ± 0.89
K × 10092.35 ± 3.1692.65 ± 2.5686.91 ± 0.0293.52 ± 2.0391.07 ± 1.9695.19 ± 2.1495.44 ± 1.94
HoustonOA (%)86.58 ± 1.0687.13 ± 1.2780.62 ± 1.3885.58 ± 0.9685.33 ± 1.2287.69 ± 1.1287.95 ± 1.08
AA (%)88.42 ± 0.9088.96 ± 1.1581.97 ± 1.3986.45 ± 0.9886.26 ± 1.3689.56 ± 1.0289.84 ± 1.13
K × 10085.49 ± 1.1485.89 ± 1.2979.05 ± 1.5084.42 ± 1.0384.16 ± 1.3186.72 ± 1.2587.02 ± 1.18
ChikuseiOA (%)93.06 ± 3.3493.52 ± 2.5889.43 ± 3.6593.59 ± 2.9191.26 ± 3.4294.22 ± 3.2594.66 ± 3.45
AA (%)93.61 ± 1.6094.03 ± 1.2382.15 ± 2.3186.30 ± 3.5184.29 ± 3.2594.86 ± 1.7895.02 ± 1.91
K × 10092.03 ± 3.8392.60 ± 3.0588.56 ± 4.0292.64 ± 3.3190.83 ± 3.8993.34 ± 3.7793.68 ± 3.89
Table 12. The number of FLOPs and parameters.
Table 12. The number of FLOPs and parameters.
CNNSiamSCLSSRNDBMADBDAFDSSCSCLSCL–MSR
FLOPs32.32M96.96M158.38M245.59M161.30M265.0M96.96M96.96M
Param.0.44M0.44M0.36M0.61M0.38M1.227M0.88M0.88M
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, L.; Chen, Y.; He, X.; Ghamisi, P. Supervised Contrastive Learning-Based Classification for Hyperspectral Image. Remote Sens. 2022, 14, 5530. https://doi.org/10.3390/rs14215530

AMA Style

Huang L, Chen Y, He X, Ghamisi P. Supervised Contrastive Learning-Based Classification for Hyperspectral Image. Remote Sensing. 2022; 14(21):5530. https://doi.org/10.3390/rs14215530

Chicago/Turabian Style

Huang, Lingbo, Yushi Chen, Xin He, and Pedram Ghamisi. 2022. "Supervised Contrastive Learning-Based Classification for Hyperspectral Image" Remote Sensing 14, no. 21: 5530. https://doi.org/10.3390/rs14215530

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop