Next Article in Journal
Enhancing Coherence Images for Coherent Change Detection: An Example on Vehicle Tracks in Airborne SAR Images
Next Article in Special Issue
A Spatial–Spectral Combination Method for Hyperspectral Band Selection
Previous Article in Journal
An Improved Imaging Algorithm for Multi-Receiver SAS System with Wide-Bandwidth Signal
Previous Article in Special Issue
Multiscale Information Fusion for Hyperspectral Image Classification Based on Hybrid 2D-3D CNN
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Weakly Supervised Classification of Hyperspectral Image Based on Complementary Learning

School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(24), 5009; https://doi.org/10.3390/rs13245009
Submission received: 3 November 2021 / Revised: 4 December 2021 / Accepted: 7 December 2021 / Published: 9 December 2021

Abstract

:
In recent years, supervised learning-based methods have achieved excellent performance for hyperspectral image (HSI) classification. However, the collection of training samples with labels is not only costly but also time-consuming. This fact usually causes the existence of weak supervision, including incorrect supervision where mislabeled samples exist and incomplete supervision where unlabeled samples exist. Focusing on the inaccurate supervision and incomplete supervision, the weakly supervised classification of HSI is investigated in this paper. For inaccurate supervision, complementary learning (CL) is firstly introduced for HSI classification. Then, a new method, which is based on selective CL and convolutional neural network (SeCL-CNN), is proposed for classification with noisy labels. For incomplete supervision, a data augmentation-based method, which combines mixup and Pseudo-Label (Mix-PL) is proposed. And then, a classification method, which combines Mix-PL and CL (Mix-PL-CL), is designed aiming at better semi-supervised classification capacity of HSI. The proposed weakly supervised methods are evaluated on three widely-used hyperspectral datasets (i.e., Indian Pines, Houston, and Salinas datasets). The obtained results reveal that the proposed methods provide competitive results compared to the state-of-the-art methods. For inaccurate supervision, the proposed SeCL-CNN has outperformed the state-of-the-art method (i.e., SSDP-CNN) by 0.92%, 1.84%, and 1.75% in terms of OA on the three datasets, when the noise ratio is 30%. And for incomplete supervision, the proposed Mix-PL-CL has outperformed the state-of-the-art method (i.e., AROC-DP) by 1.03%, 0.70%, and 0.82% in terms of OA on the three datasets, with 25 training samples per class.

1. Introduction

Hyperspectral remote sensing obtains the spatial and spectral information from objects with hundreds of narrow spectral bands. The obtained hyperspectral image (HSI) contains abundant spectral and spatial information, therefore, HSI has a wide variety of applications such as agriculture [1], mineralogy [2], surveillance [3], physics [4], astronomy [5], chemical imaging [6], and environmental sciences [7].
In order to fully explore the usage of HSI, many data processing techniques have been proposed and classification is one of them [8]. The classification of HSI tries to assign a label to each pixel in the scene and it is the basic of many applications [9]. Most of existing HSI classification methods belong to supervised classification, where each training sample has a corresponding label indicating it’s ground truth. It is a very active and hot topic, and a great many methods have been proposed [10,11,12].
In the early stage of HSI supervised classification, most of classifiers do not classify HSI in a deep manner. The typical HSI feature extraction and classification techniques include support vector machine (SVM), morphological operation, and sparse representation [13]. For example, support vector machine (SVM) exhibits its low sensitivity to high dimensionality, therefore, SVM-based methods have obtained good performance for HSI classification [14]. In order to extract the spatial features of HSI, many morphological operations including morphological profiles (MPs) [15] and extended multi-attribute profile (EMAP) [16] have been proposed for HSI classification. Another important technique is sparse representation, which generates a dictionary from inputs. And many sparse representation-based methods have been successfully explored for HSI classification [17,18]. In recent years, deep learning-based methods, especially convolutional neural networks (CNNs), have shown their powerfulness in many research fields including HSI classification [19,20,21]. Deep CNNs (DCNNs) hierarchically extract discriminate features of HSI and then obtain better classification performance compared with shallow models [22].
Although DCNN-based methods have achieved great progress in HSI classification, accurate classification is still challenging in real practice. For example, to proper train a large number of parameters in DCNNs, sufficient labeled samples are usually needed. However, the collecting of labeled training samples is expensive, daunting, and time-consuming. Therefore, the problem of learning with limited labeled samples should be solved in CNN-based methods [23]. Furthermore, there are incorrect labeled samples when labeling HSI training samples, which does great harm to classification performance [24]. However, traditional methods did not pay much attention to noisy labels in classification.
It is desirable to develop a new kind of classification mechanism which depends on less support and weakly supervised classification is a proper method. Weakly supervised learning covers a wide range of studies including incomplete supervision (i.e., only a subset of training samples is labeled), inexact supervision (i.e., only coarse-grained labeled), and inaccurate supervision (i.e., the given labels are not always right which are usually noisy labels) [25]. For the classification of HSI, there are usually two types of weakly supervised classification: semi-supervised HSI classification and HSI classification with noisy label.
Most of existing weakly supervised methods in HSI classification require discriminative features [26,27,28]. However, the handcrafted features limited the classification performance with weakly supervision. Therefore, we consider using deep CNN in the presence of weak supervision. Meanwhile, complementary learning (CL) strategy is proper to prevent CNN from being overfitting to weak supervision [29,30]. In CL, each training example is supplied with a complementary label. It is an indirect learning method for training CNN that “input image does not belong to this complementary label.” In this manner, the noisy-labeled samples can contribute to training CNN by providing the “right” information.
Due to the advantages of deep CNN-based methods, the property of CL and the necessary of weak supervision in real practice, weakly supervised deep learning based on CL is investigated in this study. Two kinds of weakly supervised classification, i.e., inaccurate supervision and incomplete supervision are addressed. The main contributions of this study are summarized as follows.
(1)
Complementary learning is introduced for HSI classification for the first time. Compared to traditional supervised learning, complementary learning has the advantages of using less supervised information, which makes it proper for weakly supervised classification.
(2)
An improved complementary learning strategy, which is based on selective CL (SeCL), is proposed for HSI classification with noisy labels. The SeCL uses CL to filter noisy-labeled samples out and uses selective CL to accelerate the training process.
(3)
A method, i.e., Pseudo-Label, combined with mixup (Mix-PL), is proposed for semi-supervised HSI classification. The usage of Mix-PL makes the training process more stable and achieves better classification performance.
(4)
SeCL is combined with Mix-PL (Mix-PL-CL) for further improving the performance of HSI semi-supervised classification, owing to the SeCL’s capacity for filtering noisy-labeled samples.
The rest of this paper is organized as follows. Section 2 presents the related works of this study. Section 3 and Section 4 introduce the proposed inaccurate and incomplete supervision-based HSI classification methods, respectively. Section 5 presents comprehensive experiments including data description, results, and analysis. Finally, Section 6 summarizes the main conclusion of this study.

2. Related Works

2.1. DCNN-Based HSI Classification

In recent years, DCNN-based methods have achieved significant breakthroughs in HSI classification [31]. Compared to the traditional methods, DCNNs have been used to automatically learn high-level features from HSI in a hierarchical manner, which have achieved state-of-the-art performance. CNN-based methods for HSI classification can be roughly divided into two branches: modified CNN [32,33] and CNN combined with other machine learning techniques [34,35].
For the modified CNN methods for HSI classification, most works aim to modify the architecture of CNN for HSI classification. For example, the authors in [36] proposed a deep contextual CNN with residual learning and multi-scale convolution to explore the spatial-spectral features of HSI. In [37], CNN was used to extract the pixel-pair features for following HSI classification. In addition, due to the fact that the input of HSI should be a 3D cube, 3D convolution is used for HSI classification [38].
Many works have combined CNN with other machine learning techniques for HSI classification, such as transfer learning [39], ensemble learning [40], and few shot learning [41]. In addition, to fully extract the spatial features of HSI, morphological profiles were conducted on principal components and then followed by CNN to finish HSI classification task [42,43]. Very recently, Transformer has been investigated for HSI classification with CNN to extract spectral-spatial features [44]. However, the above approaches obtained superior performance heavily depend on enough and correctly labeled samples.

2.2. Weakly Supervised Learning-Based Classification

In weakly supervised learning, two types of weak supervision are often discussed, including inaccurate and incomplete supervision.
For inaccurate supervision, there are noisy-labeled training samples whose given labels don’t indicate their ground-truth. Three major strategies dealing with label noise are widely explored: robust model architecture, robust loss, and sample selection. Noise adaptation layer is often used in robust model design to estimate the noise transition matrix [45]. In addition, designing robust losses is also a hot topic for learning with noisy labels. Ref. [46] combined mean absolute error and cross-entropy loss to design a noise-robust loss, which achieved good classification performance. Besides, sample selection is a promising way to cope with label noise. For example, Co-teaching [47] utilized two DNNs, each DNN selects a certain number of small-loss examples as clean samples and feeds them to another DNN for further training. And many works based on co-teaching were proposed for learning with noisy labels [48,49].
For incomplete supervision, there are not enough labeled training samples to train a good classifier. Semi-supervised learning is a major technique for solving this problem, which attempts to exploit unlabeled training samples to improve performance without human intervention [50]. Specifically, graph-based methods mainly focus on the construction of graph with different properties [51]. Ref. [52] introduced a new sparse graph construction method that integrates manifold constraints on the unknown sparse codes as a graph regularizer. Apart from graph-based methods, self-training is also a popular strategy. Ref. [53] proposed Pseudo-Label for semi-supervised learning, which used the labels predicted for unlabeled samples in the last epoch to train model. The authors in [54] utilized the features extracted by CNN to conduct label propagation algorithm, and got the pseudo labels for unlabeled samples.

2.3. Weakly Supervised Learning-Based HSI Classification

There are usually two types of weakly supervised HSI classification: HSI classification with noisy label and semi-supervised HSI classification.
For the HSI classification with noisy label approaches, many researchers are mainly focused on cleaning of mislabeled samples [55,56]. For example, the authors in [57] used spatial-spectral information extraction method to improve the separability of features, and then a target detection method was utilized to find noisy-labeled samples and correct their labels. Ref. [58] designed a noisy labels detection algorithm based on density peak algorithm. Training samples whose computed local densities below the threshold were removed from the training set. After cleaning, SVM would be trained on the less noisy training set. The above works used handcrafted features which would limit the classification performance, and it is an open question to construct a deep model robust to noisy labels.
A great many of methods have been proposed for HSI semi-supervised classification, including graph-based methods [59,60], Self-Organizing Maps [61] and self-training methods [62,63]. Several studies based on self-training are related to our work. For example, the authors in [64] utilized simple linear iterative cluster segmentation method to extract spatial information, and multiple classifiers were assembled to find the most confident pseudo-labeled samples. Of particular interest, [65] used the cluster results based on deep features and classification results based on the output of deep model, to determine whether to select the confident samples or not. The semi-supervised methods form a promising research direction in HSI classification with application-realistic assumption of limited availability of labeled samples.

3. CL-Based HSI Classification with Noisy Labels

CNN-based methods are quite powerful for classifying HSI if the labels are all correct. Unfortunately, the process of labeling training samples with no error is not only time-consuming but also sometimes impossible. If inaccurate labels are used in training stage, the classification performance will be severely degraded. In this section, complementary learning-based method is investigated for HSI classification with noisy labels.

3.1. CL-Based Deep CNN for HSI Classification

In supervised learning, each training sample contains an example (i.e., image) and its corresponding label. For example, if a classification model receives a 3D hyperspectral cube of a tree and a label “tree”, the supervised classifier will be trained to acknowledge that the input cube is a tree.
For complementary learning, every training sample contains an image and a complementary label that the image does not belong to. For example, the model may receive a 3D hyperspectral cube of a tree and a label “not soil”. Complementary label is relatively easy to obtain and it can be used for weakly supervised learning.
In a c -class classification task f : X Y , x X and y Y = { 1 , , c } are the input image and the corresponding label of a training sample, respectively. The complementary label y ¯ of the sample can be obtained by:
y ¯ = R a d o m   s e l e c t i o n   f r o m   { 1 , , c } \ { y } .
In practice, y , y ¯ { 0 , 1 } c are the one-hot vector of the training sample. For traditional supervised deep learning-based classification, cross entropy is a widely-used loss function:
( f , x ,   y ) = k = 1 c y k log   p k ,
where p is a c-dimension vector outputted by CNN and p k is the k -th element of p , representing the probability that x belongs to class k . Cross entropy loss forces the output of model to meet the true distribution. It works well if the labels are all correct.
For CL-based learning, cross entropy loss is calculated as follows:
( f , x ,   y ¯ ) = k = 1 c y ¯ k log ( 1 p k ) .
Equation (3) enables the probability value of the given complementary label (i.e., y ¯ ) to be optimized towards zero, resulting in an increase in the probability values of other classes.
The framework of CL-based deep CNN for HSI classification is shown in Figure 1. In the figure, a neighboring region of the pixel to be classified is obtained as input. Then, a well-designed CNN is used for feature extraction and softmax is used to finish the HSI classification task. In the training procedure, the complementary labels for training samples are firstly obtained by Equation (1), and then CL-based loss, i.e., the loss in Equation (3), is used to train the parameters in CNN based on back-propagation.

3.2. CL-Based HSI Classification with Noisy Labels

Complementary learning can reduce the probability of wrong labeled training samples and therefore it can prevent the deep learning methods from overfitting to noisy data, which is useful for a supervised classification task with noisy labels.
Figure 2 demonstrates the proposed CL-based HSI classification method with noisy labels (SeCL-CNN). Due to the powerfulness and good performance, CNN is used as the basis of the classification system.
In order to reduce the computational complexity of HSI classification, extended morphological profile (EMP) [15] is used as a pre-processing step of CL-CNN-based classification.
In general, there are two stages in the whole method: detection stage using SeCL, and classification stage using CNN. In the detection stage, the proposed SeCL firstly uses the CL strategy to train a CNN by minimizing Equation (3). Then, the CNN is trained using selective CL strategy, which only select the samples whose p y are larger than 1 / c , for faster and better convergence. In the classification stage, the training samples whose p y are larger than 0.5 is selected, and then they are treated as clean samples to train a classifier (i.e., CNN) using Equation (2).
In a nutshell, the overall flowchart of the proposed SeCL-CNN is shown in Algorithm 1. Steps 4 and 5 mean the procedure of complementary learning and selective complementary learning, respectively. Then step 6 selects clean-label samples from training set. And steps 7–9 use selected samples to train the CNN model for final classification.
Algorithm 1 SeCL-CNN for HSI classification with noisy labels
1. Begin
2.  Input: Noisy training samples ( x , y ) ( X , Y ) , where x is a 3D cube from EMPs
      of HSI and y is the corresponding label
3.  Initialize network f
4.  For t = 1 to T 1  do:
    Batch ( X B , Y B ) = sample (x, y) from ( X , Y )
    For each x   X B  do:
      Get complementary label y ¯ using Equation (1)
      Calculate ( f , x ,   y ¯ ) by Equation (3)
    Update f by minimizing x X B ( f , x ,   y ¯ )
5.  For t = 1 to T 2  do:
    Batch ( X B , Y B ) = sample (x, y) from ( X , Y ) , if p y > 1 / c
    For each x   X B  do:
      Get complementary label y ¯ using Equation (1)
      Calculate ( f , x ,   y ¯ ) by Equation (3)
    Update f by minimizing x X B ( f , x ,   y ¯ )
6.  ( X c l e a n , Y c l e a n ) = sample (x, y) from ( X , Y ) , if p y > 0.5
7.  Initialize network f
8.  For t = 1 to T 3  do:
    Batch ( X B , Y B ) = sample (x, y) from ( X c l e a n , Y c l e a n )
    For each x   X B  do:
      Calculate ( f , x ,   y ) by Equation (2)
    Update f by minimizing x X B ( f , x ,   y )
9.  Output: network f and filtered dataset ( X c l e a n , Y c l e a n )
10. End

4. CL-Based Semi-Supervised HSI Classification

The collection of labeled training samples is not only costly but also time-consuming. In addition, there are tremendous unlabeled samples. How to effectively utilize both the labeled and unlabeled samples is an urgent task in HSI classification. In this section, a semi-supervised HSI classification method, which combines complementary learning, Pseudo-Label, and mixup, is proposed for the task.
Incomplete supervised HSI classification concerns the situation with a small amount of labeled data, which is insufficient to train a classifier well, and a large number of unlabeled data. For incomplete supervision, the task is to learn f = X Y from labeled and unlabeled training set. The labeled training dataset D l and unlabeled D u can be denoted as:
D l = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x l , y l ) , , ( x m , y m ) } ,
D u = { x m + 1 , x m + 2 , , x u , , x m + n   } .
There are m samples with cubes x l and their corresponding labels y l   ( l = 1 : m ) in labeled training dataset. Furthermore, D u has n unlabeled training samples x u   ( u = m + 1 : m + n ) .

4.1. Pseudo-Label for HSI Semi-Supervised Classification

Pseudo-Label (PL) is a simple but efficient method which can exploit both labeled and unlabeled samples. It just picks up the class which has the maximum predicted probability, and uses them as if they were true labels.
In PL, a CNN g is trained in a supervised fashion with labeled and unlabeled data simultaneously. For an unlabeled samples x u in the current training epoch, its pseudo label y ^ u has been obtained by
y ^ u = g ( x u ) ,
in the last epoch. And then y ^ u is used to calculate the cross entropy loss for unlabeled samples.
The overall loss function is
t o t a l = s + ρ ( t ) u ,
s = 1 B 1 l B 1 ( g , x l , y l ) ,
u = 1 B 2 u B 2 ( g , x u ,   y ^ u ) ,
where s and u are supervised loss generated by labeled samples and unsupervised loss generated by unlabeled samples, respectively. And ρ ( t ) is a balancing coefficient, varying with epoch represented by t, to weight the importance of unsupervised loss. B 1 and B 2 are batch-size for each kind of loss. ( · ) is the cross-entropy loss defined by Equation (2).

4.2. Combining Mixup and Pseudo-Label for HSI Semi-Supervised Classification

As is introduced in Section 4.1, PL trains a CNN by using pseudo labels as if they were true labels. In order to alleviate the negative impact caused by incorrect pseudo labels and regularize the model for better convergence, PL combined with mixup [66], abbreviated as Mix-PL, is proposed for HSI semi-supervised classification.
Given a mixup operation:
{ x ˜ = λ x + ( 1 λ ) x y ˜ = λ y + ( 1 λ ) y   ,
where ( x ,   y ) and ( x ,   y ) are randomly selected from training set while using corresponding one-hot label vector. The decision boundary is pushed by enforcing the prediction model to behave linearly in-between training examples. The parameter λ ~ Beta ( α ,   α ) , with α ( 0 , ) . Beta ( α ,   α ) represents the Beta distribution in probability theory. And the hyper parameter α controls the strength of interpolation for mixup.
From Equation (10), it can be seen that labels are needed for mixup. Here we extend mixup to the semi-supervised learning setting by using the pseudo label for unlabeled samples:
{ x ˜ u = λ x u 1 + ( 1 λ ) x u 2 y ˜ u = λ y ^ u 1 + ( 1 λ ) y ^ u 2 ,
where x u 1 , x u 2 are sampled from unlabeled dataset, and y ^ u 1 , y ^ u 2 are corresponding one-hot pseudo labels which are generated by Equation (6).
The unsupervised loss can be calculated by:
m u = 1 B 2 u B 2 ( g , x ˜ u ,   y ^ u ) ,
where ( · ) is the cross entropy loss and x ˜ u ,   y ^ u are generated by Equation (11). t o t a l is revised as:
t o t a l = s + ρ ( t ) m u .

4.3. Combining CL and Mix-PL for HSI Semi-Supervised Classification

Considering the excellent performance in the presence of label noise, we further combine Mix-PL with SeCL to filter out some incorrect labels and propose Mix-PL-CL method based on self-training.
Figure 3 illustrates the proposed Mix-PL-CL for semi-supervised HSI classification. Specifically, we firstly train a CNN denoted by g using Mix-PL. And then the classifier is used to make predictions on abundant unlabeled samples. This process can be described by:
y ^ u = g ( x u ) ,   u = m + 1 : m + n ,
D n o i s y = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x m + n , y ^ m + n ) } .
The predicted pseudo-labels are not absolutely correct, so D n o i s y is used here to denote the labeled and pseudo-labeled samples. We would like to select pseudo-labeled samples that are most likely to be correct, treat them as truly labeled, and add them to the labeled training set. This can be accomplished by SeCL-CNN, which has been introduced in previous section:
D c l e a n = s e l e c t ( D n o i s y ) ,
where s e l e c t ( · ) means using SeCL to choose less noisy samples.
Iterating this procedure is an alternative way of improving the quality of pseudo-labels and finally obtain better classification performance.
Algorithm 2 shows the overall process of the proposed semi-supervised classification method.
Algorithm 2 Mix-PL-CL for HSI semi-supervised classification
1. Begin
2.  Input: labeled training set D l , unlabeled training set D u
3.  Initialize network g  
4.  For i = 1 to T 4 . do:
5.   For t = 1 to T 5  do:
     For each x u D u  do:
        y ^ u = g ( x u )
      D ^ u = { ( x u , y ^ u )   } u = m + 1 m + n
     Sample { ( x l , y l ) } l = 1 B 1 from D l
     Calculate supervised loss s by Equation (8)
     Sample { ( x u 1 , y ^ u 1 ) } u 1 = 1 B 2 from D ^ u
      { ( x u 2 , y ^ u 2 ) } u 2 = 1 B 2 = permutation ( { ( x u 1 , y ^ u 1 ) } u 1 = 1 B 2 )
     Get { ( x ˜ u , y ˜ u ) } u = 1 B 2 by Equation (11)
     Calculate unsupervised loss m u by Equation (12)
     Update g by minimizing Equation (13)
6.   For each x u D u  do:
      y ^ u = g ( x u )
7.    D n o i s y = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x m + n , y ^ m + n ) }
8.    D l = D c l e a n = s e l e c t ( D n o i s y ) , D u = D n o i s y D l
9.  Output: network g
10. End

5. Results

5.1. Datasets Description

To evaluate the performance of the proposed methods, three widely-used hyperspectral datasets, including Indian Pines, Houston, Salinas Valley, were employed in the experiments, which are described as follows.
(1) Indian Pines: This dataset was acquired by the Airborne Visible/Infrared Imaging Spectrometer sensor in June 1992, covering the agricultural fields in Indian, USA. The scene consists of 145 × 145 pixels with a spatial resolution of 20 m × 20 m and has 220 bands covering the range from 400 nm to 2500 nm. In this paper, 20 low signal to noise ratio (SNR) bands were removed and a total of 200 bands were reserved for classification. Figure 4 illustrated the false color composite images and corresponding ground-truth map of the Indian Pines dataset. The numbers of samples for each class were listed in Table 1.
(2) Houston: The Houston dataset was acquired over the Houston University campus and its neighboring area, by an ITRES-CASI 1500 sensor. It had been used in the 2013 GRSS Data Fusion Contest. The dataset contains 144 spectral bands ranging from 380 nm to 1050 nm region, and 349 × 1905 pixels with a spatial resolution of 2.5 m. This dataset is an urban dataset whose most of the land covers are man-made objects. It contains fifteen classes. Figure 5 illustrated the false color composite images and corresponding ground-truth maps. The numbers of samples for each class were listed in Table 2.
(3) Salinas: This dataset was acquired by the 224-band AVIRIS sensor, capturing an area over Salinas Valley, CA, USA. The dataset consists of 204 spectral channels after the removal of 20 water absorption bands (108–112, 154–167, and 224), ranging from 400 to 2500 nm. 512 × 217 pixels are included with a spatial resolution of 3.7 m. In this dataset, there are approximately 54,129 labeled pixels with 16 classes sampled from the ground-truth map. Figure 6 demonstrated the false color composite images and corresponding ground-truth maps. The numbers of samples for each class were listed in Table 3.

5.2. Experimental Setup

For the three datasets, the samples were divided into two subsets which contained the training and testing samples, respectively.
(1) Experimental Setup for Classification with Noisy Labels: In the training process with noisy labels, 30 samples were chosen randomly for each class and only 15 labeled samples were chosen if the corresponding class had less than 30 samples.
For each training sample x i , the potential noisy label y ˜ i could be generated as follows:
p ( y ˜ i = k | y i = j , x i ) = p ( y ˜ i = k | y i = j ) = η j k   ,
η j k = { 1 η ,     j = k η C 1 ,         j k   ,
where y i represented the correct label, whose value was j , and it had the probability η j k to become the noisy label k . From Equation (17), one could see that the noise added in labels was independent of individual samples. And Equation (18) showed that the probability of label transition from one class to the other was constant. This type of label noise was called symmetric noise. Following most related works, we used symmetric label noise in the experiments.
In the experiments, the general noise ratio   η was set to be 0.1, 0.2, and 0.3 for exploring the performance of different leraning algorithms.
As a noisy label detection method, the proposed SeCL-CNN was compared with density peak based mehtods including DP, KSDP [58], and SSDP [56]. And a method with noise-robust loss function, denoted as Lq-CNN, was used for comparison [46]. Besides, traditional classification methods were also conducted, such as SVM, EMP-SVM, CNN, and MCNN-CP [67]. Among these methods, SVM, EMP-SVM, CNN, MCNN-CP, and CNN-Lq were end-to-end classification methods, while DP, KSDP, SSDP, and SeCL-CNN would filter out noisy samples firstly, and then use the rest samples to train CNNs for classification.
In SVM-based methods, we adopted grid search together with five-fold cross validation to find the proper C and   γ   ( C = 10 4 , 10 3 , , 10 3 , γ = 10 4 , 10 3 , , 10 3 ) . When using EMP, the first four principle components (PCs) were used. For each PC, three openings and closings by reconstruction were conducted with a circular structuring element whose initial size was four and step size increment was two.
The architecture of the CNN used in the experiments was shown in Table 4. It contained three convolutional layers with rectified linear unit (ReLU), three batch normalization layers, and two pooling layers. In order to use spatial information, the 27 × 27 image regions corresponding to a center pixel were fed to the 2D CNN.
For CNN and Lq-CNN, the initial learning rates were set to 0.01, and it was divided by 10 every 50 epochs. The number of epochs for training was set to 150.
The initial learning rate for SeCL-CNN was set to 0.01, and it was divided by 10 at the 400th and 800th epochs. The complementary learning was conducted in the first 800 epochs, followed by selective complementary learning in the next 1000 epochs. The last 200 epochs were used for conducting traditional learning. The batch-size of the deep learning-based methods was set to 128.
In the experiments, the classification performance was mainly evaluated using overall accuracy (OA), average accuracy (AA), and Kappa coefficient (K). Besides, the area under ROC curve (AUC) was also adopted to evaluate the detection ability of different methods. Experiments were repeated ten times.
(2) Experimental Setup for Semi-Supervised Classification: In semi-supervised classification, 8000 samples were chosen randomly as the unlabeled samples and they were also the testing samples. 20, 25, and 30 training samples (denoted by N) for each class were selected as the labeled training set for exploring the classification performance of different methods, but only 15 labeled examples were chosen if the corresponding class has less than 30 samples.
The proposed Mix-PL-CL method was compared with popular semi-supervised classification methods, such as label propagation (LP), Laplacian support vector machine (LapSVM), EMP-LapSVM, pseudo-label (PL) AROC-DP [65], and proposed Mix-PL. Besides, supervised methods like EMP-CNN and MCNN-CP were also considered.
In LapSVM based methods, we adopted grid search method with five-fold cross validation to find the proper γ A and   γ I   ( γ A = 10 5 , 10 4 , , 10 1 , γ = 10 5 , 10 4 , 10 1 ) . Besides, a one-against-one multiclass strategy which involved a parallel architecture consisting of c ( c 1 ) / 2 different SVMs was adopted, where c is the number of classes. In the graph-based method like Label Propagation, we used a RBF kernel to construct a graph, and the clamping factor α was set to be 0.2, which represented that the 80 percent of original label distribution was always reserved and it changed the confidence of the distribution within 20 percent. The parameter of the kernel was chosen from { 10 3 , , 10 3 }. LP iterated on a modified version of the original graph and normalizes the edge weights by computing the normalized graph Laplacian matrix, besides, it minimized a loss function that has regularization properties to make classification performance robust against noise.
When training g , the initial learning rate was set to 0.001, and it was divided by ten after 60 epochs. The number of epochs, denoted by T 5 , was set to 450. The hyper parameters α used in mixup was fixed to one and balancing coefficient ρ ( t ) was obtained by Equation (19). In the experiments, t 1 and t 2 were set to be 120 and 300, respectively. And ρ e n d was set to be two. Figure 7 illustrated the ρ ( t ) set in the experiments. The influence of ρ e n d and α would analyzed later. The number of iterations, denoted by T 4 , was set to be two. That meant Mix-PL was used twice and CL-CNN was used once in the iteration. T 4 would have a great impact on classification performance, and it would be analyzed in the experiments.
ρ ( t ) = { 0                           t < t 1 t t 1 t 2 t 1 ρ e n d               t 1   t t 2 ρ e n d                     t > t 2 ,  

5.3. Results of HSI Classification with Noisy Labels

(1) Training Process of CNN Using CE Loss in the Presence of Label Noise: Generally, CNNs were capable of memorizing completely random labels and exhibited poor generalization capability in the presence of noisy labels.
Figure 8, Figure 9 and Figure 10 showed the distribution of training data in different learning stages with 30% label noise, according to probability   p y . From Figure 8a, Figure 9a and Figure 10a, one could see that a large number of clean samples together with few noisy samples lay in the right of the graphs. This meant that they were firstly learned by the deep model in the early training stage. With training going on, noisy samples would move toward to the right, indicating that the model was becoming overfitting to noisy samples, just like Figure 8b, Figure 9b and Figure 10b showed. When the training was completed, most of training samples had large values of p y , meaning that the model had memorized most of the noisy training set, just like Figure 8c, Figure 9c and Figure 10c showed.
(2) Training Process of CNN Using CL in the Presence of Label Noise: Figure 11, Figure 12 and Figure 13 showed the distribution of training data using different learning methods with 30% label noise, according to probability   p y Figure 11a, Figure 12a, Figure 13a and Figure 11b, Figure 12b, Figure 13b respectively showed the histogram of the training data after traditional learning (using CE) and CL. Different from the fact that the probability   p y of both clean and noisy samples seemed large in PL, the probability of noisy samples was much lower than those of clean samples in CL, indicating the CL’s capability to prevent the CNN from overfitting to noisy samples. After CL, noisy samples and clean samples could be separated, which could be seen in Figure 11b, Figure 12b and Figure 13b clearly.
However, there was still an overlap between the distributions of clean and noisy samples, which could be seen in Figure 11b, Figure 12b and Figure 13b. And most of noisy samples had the output   p y less than 1/c, which was align with cognition. Figure 11c, Figure 12c and Figure 13c showed that there was smaller overlap after training the CNN only with the data having p y over 1/c. Using thresholds, the samples involved in training tended to be less noisy than before, and thus improved the convergence of the CNN.
Figure 11c, Figure 12c and Figure 13c showed that noisy samples could be detected by judging if the values of   p y were smaller than 0.5 for simplicity and universality, which meant samples having   p y less than 0.5 would likely to be noisy samples. After training CNN only with samples having probability p y larger than 0.5, almost all clean samples exhibited high p y , which could be seen from Figure 11d, Figure 12d and Figure 13d.
(3) Detection Performance Compared with Other Methods: Table 5 showed the AUC of different noisy labels detection methods. From Table 5, one could see the proposed CL performed best on three datasets, compared with DP, KSDP, and SSDP. And it could work well in noisy datasets with different noise ratios. The results showed that the proposed method had better detection ability.
(4) Classification Performance Compared with Other Methods: Table 6, Table 7 and Table 8 showed the classification results of different methods on three datasets. And the detailed classification results with 30% label noise could be seen in Table A1, Table A2 and Table A3, Appendix A.
From these results, one could see that though CNN-based models performed well in traditional HSI classification tasks. However, they exhibited poor generalization capability when noisy labels existed. For example, CNN and MCNN-CP achieved excellent classification results with non-noisy labels or less noisy labels, compared with EMP-SVM. When the noise ratio was 10%, MCNN-CP maintained highest classification results. But CNN-based models’ OA, AA, and K decreased drastically as the noise ratio increased. And they couldn’t perform as well as EMP-SVM in the case of higher noise ratio. It behaved the same on the other two datasets. One could also see that the accuracies of EMP-SVM didn’t decrease as drastically as CNN when the noise ratio increased, but EMP-SVM’s classification performance was limited by handcrafted features.
The proposed SeCL-CNN outperformed other methods in terms of OA, AA, and K. For example, in Table 6, one could see that the OA of SeCL-CNN was 73.90% when the noise ratio was 30%. This accuracy was higher than the ones obtained by the other methods. SeCL-CNN outperformed SSDP-CNN by 1.02%, 0.82%, and 0.0065 in terms of OA, AA, and K, respectively. When noise ratio was 10% and 20%, SeCL-CNN gained better classification results than compared methods in terms of OA, AA and K, except that the accuracies were slightly lower than the ones obtained by MCNN-CP. However, our proposed method mainly focused on the cleaning of noisy-labeled samples and could be combined with any classifier, including MCNN-CP, to complete the final classification.
(5) Ablation Studies: Table 9 showed the results obtained by ablation studies, when the noise ratio was 30%. From Table 9, one could see that without EMP, the OA decreased 1.67%, 1.38% and 1.29 on the three datasets, respectively. And the AUC decreased 0.0146, 0.0058, 0.0028. It showed that the use of EMP enhanced the capacity of noisy labels detection and finally improved the classification performance. And it was similar to selective CL. Without selective CL, the AUC and OA both decreased, which meant the importance of selective CL.

5.4. Results of HSI Semi-Supervised Classification

(1) Classification Performance Compared with Other Methods: Table 10, Table 11 and Table 12 showed the classification results of different supervised and semi-supervised classification methods on the three datasets. And the detailed semi-supervised classification results with 25 labeled training samples per class are reported in Table A4, Table A5 and Table A6, Appendix A.
From Table 10, one could see that the proposed Mix-PL-CL achieved the best performance compared with other methods. Mix-PL-CL outperformed AROC-DP by 1.03%, 0.19%, and 0.00115 in terms of OA, AA, and K when the number of samples per class was 25, and the accuracies of each class obtained by Mix-PL-CL demonstrated good performance compared with other compared methods, including supervised methods like MCNN-CP. Besides, the accuracies gained by different classification methods increased as the number of labeled training samples per class grew, and the proposed methods, i.e., Mix-PL and Mix-PL-CL, still achieved higher classification accuracies, which showed the superior classification ability.
Table 11 showed classification results of different methods on Houston dataset. The usage of mixup helped PL improved classification accuracies. And, Mix-PL-CL achieved better classification results when compared with Mix-PL, which showed the importance of CL. On Houston dataset, Mix-PL-CL outperformed AROC-DP by 0.70%, 0.55%, and 0.0086 in terms of OA, AA and K when the number of samples per class was 25. And the highest classification accuracies were obtained by the proposed method in the case of different numbers of labeled training samples.
From Table 12, one could see that the proposed Mix-PL-CL still achieved superior performance on Salinas dataset with different numbers of labeled training samples.
(2) Ablation Studies: Table 13 showed the results obtained by ablation studies, when the number of training samples per class was 25. From Table 13, one could see that every module contributed to the final classification results. (1) EMP was used in CL-Mix-PL to reduce the computational complexity of HSI classification, which made the model be less overfitting. And without EMP, the OA on the three dataset decreased. (2) Without PL, the model (CL-CNN) only used ordinary CNN to generate pseudo labels for unlabeled samples, which were less accurate than the ones generated by Mix-PL-CL. And one could see that the OA on the three dataset decreased 0.97%, 1.37%, 1.86%, compared to Mix-PL-CL. (3) CL was used in Mix-PL-CL to filter the noisy pseudo-labels generated by Mix-PL. And one could see that the OA of Mix-PL was lower than that of Mix-PL-CL on three datasets, which showed the importance of CL. (4) Without mixup, the model (CL-PL) only used Pseudo-Label method to generate pseudo labels for unlabeled samples. the results showed that the use of mixup led to gains of OA over three datasets.
Figure 14 showed the influence of ρ e n d , T 4 , and α , respectively. From Figure 14a, one could see that it was better to set the value of ρ e n d to 2. A higher ρ e n d would make the model quickly be overfitting to the noisy pseudo-labeled samples and degrade the accuracy. On the contrary, the lower ρ e n d would make model learn fewer from the unlabeled samples and get lower classification results. Figure 14b showed that α was better set to one. Figure 14c showed that T 4 was better set to two. When T 4 was one, the model was actually Mix-PL. With T 4 increased, the model would gradually be overfitting to the pseudo-labeled samples and degrade the accuracy.

5.5. Classification Maps of Different Classification Methods

Figure 15, Figure 16 and Figure 17 showed the classification maps of different methods, including classification methods in the presence of noisy labels and semi-supervised methods, on the three datasets.
From these maps, one could clearly see the differences. For example, it showed that pixels near some noisy samples were misclassified, while getting correct labels in anti-label noise methods such as SeCL-CNN and KSDP-CNN. And SeCL-CNN performed better than methods for comparison. And for semi-supervised classification, the proposed Mix-PL-CL achieved better classification results.

6. Conclusions

In this study, the strategy of complementary learning was explored for hyperspectral weakly supervised classification. For inaccurate supervision, complementary learning method was proposed for HSI classification. Then SeCL, using selective CL, was proposed for classification in the presence of noisy labels. For incomplete supervision, Mix-PL was proposed, which combines mixup and Pseudo-Label method. And then Mix-PL-CL was designed aiming at better semi-supervised HSI classification capacity.
Experimental conclusions can be drawn for the three widely used datasets (i.e., Indian Pines, Houston, and Salinas datasets): (1) The CL strategy can prevent DCNNs from being overfitting to noisy labels and can be used to detect noisy-labeled training samples. The proposed SeCL can further improve the ability of dealing with label noise. (2) According to the experimental results, the proposed Mix-PL can achieve good semi-supervised classification results. And the use of CL (Mix-PL-CL) further improves the classification performance. (3) The classification results on the three datasets demonstrate that the proposed methods for inaccurate and incomplete supervised classification outperformed other studied state-of-the-art methods as well as the conventional techniques. This research provides guidance for further studies to explore complementary learning and weakly supervised learning in the field of HSI classification.

Author Contributions

Conceptualization: Y.C.; methodology: L.H. and Y.C.; writing—original draft preparation: L.H., Y.C. and X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of China under the Grant 61971164 and the Grant U20B2041.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Houston dataset is available at: https://hyperspectral.ee.uh.edu/. The Indian Pines and Salinas datasets are available at: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes (accessed on 1 September 2020).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Detailed Classification Results

Table A1. Detailed classification results (mean ± standard deviation) with 30% label noise on the Indian Pines dataset.
Table A1. Detailed classification results (mean ± standard deviation) with 30% label noise on the Indian Pines dataset.
Noise RatioClass RBF-SVMEMP-SVMCNNMCNN-CPCNN-LqDP-CNNKSDP-CNNSSDP-CNNSeCL-CNN
30%OA (%) 55.38   ±   4.53 67.11   ±   3.54 57.34   ±   2.8768.16 ± 3.2766.36 ± 5.14 70.43   ±   2.82 72.22   ±   2.6472.88 ± 2.47 73.90   ±   2.94
AA (%) 66.90   ±   2.77 76.60   ±   2.19 63.48   ±   1.9772.21 ± 2.06 75.32   ±   3.01 78.27   ±   1.68 81.00   ±   1.7382.62 ± 1.24 83.44   ±   2.07
K × 10049.94 ± 4.53 62.92   ±   3.70 52.56   ±   3.0064.30 ± 3.48 62.30   ±   5.51 66.72   ±   3.01 68.66   ±   2.8469.86 ± 2.67 70.51   ±   3.21
Remotesensing 13 05009 i001 83.75   ±   11.92 93.13   ±   9.04 74.38   ±   10.2573.21 ± 11.23 93.75   ±   9.27 88.13   ±   9.46 97.50   ±   5.0095.62 ± 5.63 99.38   ±   1.88
Remotesensing 13 05009 i002 32.80   ±   8.13 49.13   ±   7.34 49.24   ±   5.3661.56 ± 4.78 54.43   ±   7.74 58.35   ±   5.87 56.36   ±   7.6158.36 ± 5.73 58.50   ±   6.82
Remotesensing 13 05009 i003 43.24   ±   10.50 64.74   ±   9.97 51.08   ±   5.4857.28 ± 10.34 55.53   ±   9.00 58.86   ±   8.87 61.74   ±   10.5863.62 ± 8.71 64.35   ±   15.91
Remotesensing 13 05009 i004 62.61   ±   9.38 69.61   ±   10.14 69.37   ±   6.2177.94 ± 6.91 79.28   ±   6.68 83.48   ±   9.03 83.24   ±   8.5085.51 ± 4.41 89.66   ±   8.20
Remotesensing 13 05009 i005 76.05   ±   4.99 82.32   ±   2.86 59.45   ±   7.0175.77 ± 6.69 71.59   ±   8.30 78.94   ±   11.77 75.96   ±   10.9580.04 ± 10.93 77.68   ±   9.93
Remotesensing 13 05009 i006 80.21   ±   8.03 87.07   ±   7.45 52.74   ±   5.8369.90 ± 6.54 65.66   ±   6.43 75.26   ±   9.35 79.70   ±   12.6879.06 ± 10.69 74.74   ±   19.23
Remotesensing 13 05009 i007 89.23   ±   7.05 94.62   ±   3.53 63.85   ±   22.3167.56 ± 18.22 86.15   ±   18.46 81.54   ±   20.12 86.15   ±   16.0689.24 ± 15.07 99.23   ±   2.31
Remotesensing 13 05009 i008 82.25   ±   4.80 94.40   ±   4.29 75.27   ±   8.6881.64 ± 5.52 89.04   ±   5.94 88.26   ±   8.17 95.20   ±   3.3295.31 ± 5.07 93.82   ±   4.75
Remotesensing 13 05009 i009 80.00   ±   20.00 84.00   ±   14.97 76.00   ±   24.9888.17 ± 19.84 84.00   ±   23.32 88.00   ±   13.27 94.00   ±   9.1796.00 ± 8.00 98.00   ±   6.00
Remotesensing 13 05009 i010 59.47   ±   13.28 63.95   ±   10.70 59.00   ±   10.1269.47 ± 8.12 67.19   ±   10.39 68.75   ±   8.95 68.96   ±   6.7572.65 ± 7.27 72.40   ±   7.08
Remotesensing 13 05009 i011 44.27   ±   18.99 56.66   ±   15.36 51.51   ±   9.1162.66 ± 11.56 60.40   ±   15.31 64.11   ±   8.87 66.89   ±   9.6069.47 ± 9.33 69.62   ±   8.31
Remotesensing 13 05009 i012 40.25   ±   9.41 63.18   ±   11.00 58.37   ±   5.5067.21 ± 4.62 67.55   ±   7.59 67.94   ±   6.94 71.97   ±   8.4670.94 ± 6.03 67.67   ±   6.76
Remotesensing 13 05009 i013 95.83   ±   2.14 97.43   ±   1.72 68.97   ±   10.1175.16 ± 8.37 88.80   ±   10.19 88.69   ±   4.93 92.00   ±   5.0191.60 ± 6.86 95.49   ±   5.61
Remotesensing 13 05009 i014 79.03   ±   8.75 84.08   ±   5.56 67.22   ±   9.7579.68 ± 8.67 77.06   ±   11.75 84.55   ±   7.13 86.04   ±   6.0290.56 ± 5.63 88.93   ±   5.16
Remotesensing 13 05009 i015 36.94   ±   10.52 54.75   ±   10.89 70.51   ±   8.8872.77 ± 9.95 78.99   ±   9.70 80.76   ±   7.93 84.92   ±   7.8885.45 ± 7.05 88.76   ±   9.43
Remotesensing 13 05009 i016 84.44   ±   9.14 86.51   ±   8.93 68.73   ±   9.9475.34 ± 13.95 85.71   ±   5.72 96.67   ±   4.23 95.40   ±   4.5193.65 ± 4.32 96.83   ±   4.65
Table A2. Detailed classification results (mean ± standard deviation) with 30% label noise on the Houston dataset.
Table A2. Detailed classification results (mean ± standard deviation) with 30% label noise on the Houston dataset.
Noise RatioClass RBF-SVMEMP-SVMCNNMCNN-CPCNN-LqDP-CNNKSDP-CNNSSDP-CNNSeCL-CNN
30%OA (%) 77.05   ±   1.91 78.88   ±   1.62 62.05   ±   1.9675.58 ± 2.63 74.44   ±   2.12 75.25   ±   2.36 76.65   ±   2.2778.16 ± 2.00 80.00   ±   2.51
AA (%) 77.87   ±   1.33 79.96   ±   1.54 62.33   ±   1.9276.02 ± 2.33 75.21   ±   2.28 76.93   ±   2.22 78.36   ±   2.2679.49 ± 1.49 81.41   ±   2.45
K × 100 75.18   ±   2.06 77.16   ±   1.75 59.09   ±   2.1173.63 ± 2.84 72.41   ±   2.29 73.29   ±   2.53 74.77   ±   2.9376.40 ± 2.13 78.39   ±   2.72
Remotesensing 13 05009 i001 89.70   ±   7.10 89.74   ±   7.70 70.77   ±   8.3972.72 ± 14.54 85.18   ±   7.02 83.16   ±   7.17 82.70   ±   7.4978.76 ± 7.19 86.47   ±   7.49
Remotesensing 13 05009 i002 87.92   ±   6.67 82.71   ±   8.59 60.17   ±   8.5776.07 ± 11.70 68.24   ±   13.05 74.78   ±   9.09 74.71   ±   8.9371.43 ± 5.64 76.09   ±   8.26
Remotesensing 13 05009 i003 99.10   ±   0.67 99.24   ±   1.05 72.59   ±   10.6985.27 ± 5.64 87.45   ±   9.46 84.38   ±   7.56 88.32   ±   7.0388.85 ± 7.78 94.86   ±   5.75
Remotesensing 13 05009 i004 91.92   ±   3.12 91.28   ±   3.06 66.96   ±   8.3283.55 ± 6.19 82.00   ±   5.55 88.66   ±   5.86 88.57   ±   5.2484.76 ± 2.86 87.20   ±   2.70
Remotesensing 13 05009 i005 92.95   ±   4.59 94.56   ±   3.98 70.14   ±   5.0085.28 ± 5.68 85.72   ±   8.44 88.68   ±   6.37 89.84   ±   5.5293.01 ± 5.57 93.20   ±   4.57
Remotesensing 13 05009 i006 85.32   ±   10.14 85.39   ±   8.33 56.58   ±   5.9670.15 ± 7.72 67.73   ±   8.18 74.82   ±   6.80 74.98   ±   5.1772.95 ± 6.82 74.14   ±   13.12
Remotesensing 13 05009 i007 76.45   ±   6.74 83.65   ±   9.29 54.29   ±   5.8168.15 ± 8.11 67.81   ±   5.72 71.75   ±   5.64 73.55   ±   7.7175.54 ± 4.45 75.98   ±   7.18
Remotesensing 13 05009 i008 50.44   ±   7.20 51.46   ±   6.74 49.89   ±   7.3863.07 ± 5.14 57.35   ±   10.6257.30 ± 5.83 60.72   ±   7.8967.91 ± 8.33 55.03   ±   8.90
Remotesensing 13 05009 i009 74.17   ±   5.44 76.35   ±   8.98 54.18   ±   8.3667.24 ± 8.96 64.61   ±   7.05 65.00   ±   8.57 65.96   ±   7.8168.35 ± 11.48 75.63   ±   8.14
Remotesensing 13 05009 i010 65.33   ±   17.97 64.70   ±   11.38 65.36   ±   12.9873.32 ± 9.21 76.33   ±   9.34 65.90   ±   13.12 69.86   ±   13.9072.90 ± 12.55 75.74   ±   16.92
Remotesensing 13 05009 i011 72.37   ±   6.40 84.25   ±   5.59 62.76   ±   5.3584.00 ± 5.82 73.70   ±   7.49 67.47   ±   6.30 67.16   ±   7.3174.06 ± 7.25 77.82   ±   11.42
Remotesensing 13 05009 i012 52.81   ±   10.50 51.79   ±   13.84 59.80   ±   8.0673.71 ± 10.65 71.78   ±   8.75 70.12   ±   9.86 72.19   ±   7.8077.22 ± 6.83 78.26   ±   8.47
Remotesensing 13 05009 i013 37.15   ±   5.51 49.25   ±   5.46 64.31   ±   11.1477.16 ± 9.11 82.07   ±   7.64 76.63   ±   10.97 77.27   ±   6.3179.68 ± 8.32 91.50   ±   3.00
Remotesensing 13 05009 i014 96.41   ±   1.03 97.66   ±   2.09 64.48   ±   9.5678.30 ± 10.50 80.53   ±   11.25 95.95   ±   7.96 97.29   ±   4.4595.58 ± 5.17 91.83   ±   7.64
Remotesensing 13 05009 i015 96.08   ±   2.09 97.33   ±   2.90 62.78   ±   5.8182.31 ± 4.6977.628.58 89.33   ±   8.18 92.35   ±   4.3891.33 ± 5.98 87.38   ±   8.52
Table A3. Detailed classification results (mean ± standard deviation) with 30% label noise on the Salinas dataset.
Table A3. Detailed classification results (mean ± standard deviation) with 30% label noise on the Salinas dataset.
Noise RatioClass RBF-SVMEMP-SVMCNNMCNN-CPCNN-LqDP-CNNKSDP-CNNSSDP-CNNSeCL-CNN
30%OA (%) 85.59   ±   2.05 86.85   ±   2.02 72.36   ±   2.3084.53 ± 2.79 89.99   ±   1.92 87.10   ±   2.38 88.35   ±   3.3289.76 ± 1.67 91.51   ±   2.31
AA (%) 91.24   ±   1.0992.23 ± 1.3575.44 ± 1.2485.27 ± 2.9593.77 ± 1.57 90.79   ±   2.01 92.25   ±   1.1792.86 ± 1.40 95.07   ±   1.48
K × 100 83.98   ±   2.22 85.38   ±   2.21 69.54   ±   2.4882.84 ± 3.07 88.89   ±   2.13 85.70   ±   2.63 87.09   ±   2.5388.62 ± 1.85 90.57   ±   2.55
Remotesensing 13 05009 i001 98.18   ±   0.93 99.02   ±   0.46 75.16   ±   9.7788.88 ± 7.36 97.64   ±   2.95 96.38   ±   3.05 97.18   ±   2.7497.29 ± 3.31 99.95   ±   0.08
Remotesensing 13 05009 i002 97.18   ±   3.13 98.28   ±   1.68 73.40   ±   6.3089.40 ± 4.16 94.50   ±   5.84 92.77   ±   5.83 93.14   ±   6.4493.43 ± 4.84 96.68   ±   4.05
Remotesensing 13 05009 i003 85.99   ±   12.47 89.90   ±   13.79 69.30   ±   8.6384.47 ± 9.66 91.51   ±   8.41 88.24   ±   9.30 89.90   ±   9.2389.95 ± 8.86 92.97   ±   8.47
Remotesensing 13 05009 i004 98.91   ±   0.52 98.56   ±   1.95 82.03   ±   7.2785.03 ± 8.80 97.69   ±   2.00 96.06   ±   3.27 95.84   ±   3.2894.40 ± 5.75 99.24   ±   1.10
Remotesensing 13 05009 i005 95.32   ±   3.70 95.07   ±   4.70 81.02   ±   8.9388.11 ± 3.09 97.26   ±   4.77 93.15   ±   6.80 94.59   ±   6.2291.81 ± 5.91 96.15   ±   6.76
Remotesensing 13 05009 i006 96.75   ±   3.02 96.13   ±   3.73 75.74   ±   9.0787.78 ± 9.04 96.70   ±   3.33 93.63   ±   8.22 94.35   ±   7.8297.01 ± 6.42 97.89   ±   3.50
Remotesensing 13 05009 i007 98.94   ±   0.49 98.96   ±   0.62 71.68   ±   9.2384.18 ± 10.03 93.18   ±   7.71 89.96   ±   8.66 92.33   ±   6.3692.30 ± 5.07 94.69   ±   3.30
Remotesensing 13 05009 i008 72.64   ±   12.11 72.40   ±   11.75 61.23   ±   7.2681.05 ± 4.70 76.94   ±   6.45 73.06   ±   6.14 74.22   ±   10.3377.68 ± 4.84 78.37   ±   7.55
Remotesensing 13 05009 i009 98.41   ±   1.32 99.24   ±   0.96 84.76   ±   5.5491.37 ± 8.36 98.49   ±   1.91 93.96   ±   3.53 95.78   ±   2.9996.88 ± 3.44 97.89   ±   2.00
Remotesensing 13 05009 i010 87.29   ±   4.01 89.37   ±   3.57 68.77   ±   11.8580.59 ± 10.50 91.24   ±   10.35 88.63   ±   5.63 90.77   ±   6.3892.30 ± 5.07 93.99   ±   8.42
Remotesensing 13 05009 i011 90.52   ±   3.91 93.70   ±   1.09 74.33   ±   4.9081.09 ± 8.77 94.19   ±   7.21 90.28   ±   6.15 92.36   ±   5.7293.63 ± 6.53 96.85   ±   5.56
Remotesensing 13 05009 i012 99.42   ±   0.52 99.92   ±   0.12 82.38   ±   11.2987.65 ± 4.66 98.38   ±   2.19 95.59   ±   7.88 97.54   ±   4.0896.85 ± 6.34 98.96   ±   1.85
Remotesensing 13 05009 i013 98.51   ±   0.67 98.12   ±   0.75 84.60   ±   8.0387.10 ± 7.72 97.88   ±   2.17 94.45   ±   7.71 97.90   ±   3.1697.42 ± 3.86 98.83   ±   1.55
Remotesensing 13 05009 i014 92.26   ±   2.51 90.62   ±   7.17 81.74   ±   7.5884.48 ± 8.40 96.88   ±   7.94 92.97   ±   6.87 94.69   ±   6.0294.28 ± 5.18 96.67   ±   7.26
Remotesensing 13 05009 i01557.38 ± 10.0063.62 ± 9.5367.82 ± 5.4278.56 ± 6.1382.27 ± 10.2382.31 ± 4.92 82.46   ±   4.9985.07 ± 8.8986.53 ± 6.53
Remotesensing 13 05009 i016 92.19   ±   5.0992.69 ± 4.8373.06 ± 10.2284.53 ± 9.32 95.53   ±   4.3691.22 ± 4.9192.98 ± 3.4995.48 ± 2.5195.43 ± 3.50
Table A4. Detailed semi-supervised classification results (mean ± standard deviation) on the Indian Pines dataset (N = 25).
Table A4. Detailed semi-supervised classification results (mean ± standard deviation) on the Indian Pines dataset (N = 25).
NClassEMP-CNNMCNN-CPLPLapSVMEMP-LapSVMPLAROC-DPMix-PLCL-MixPL
25OA (%)91.78 ± 2.2292.74 ± 1.4958.12 ± 1.3361.27 ± 1.2785.09 ± 2.3492.87 ± 2.3092.30 ± 1.7293.12 ± 3.2893.33 ± 2.29
AA (%)94.95 ± 1.2096.19 ± 0.7467.86 ± 1.2771.60 ± 1.6490.57 ± 1.4395.35 ± 1.2695.55 ± 0.7795.37 ± 1.2795.74 ± 1.11
K × 10090.60 ± 2.5291.71 ± 1.6952.73 ± 1.4056.26 ± 1.4683.07 ± 2.6191.83 ± 2.6291.20 ± 1.9592.12 ± 2.7192.35 ± 2.61
Remotesensing 13 05009 i001100.0 ± 0.00100.0 ± 0.0086.30 ± 10.3488.31 ± 11.6398.18 ± 2.8099.00 ± 3.00100.0 ± 0.0098.50 ± 3.20100.0 ± 0.00
Remotesensing 13 05009 i00280.86 ± 7.8588.62 ± 3.7831.90 ± 5.1740.10 ± 4.6579.24 ± 3.7683.79 ± 6.3785.36 ± 3.3084.28 ± 6.5786.62 ± 5.90
Remotesensing 13 05009 i00391.79 ± 4.8990.33 ± 4.9242.32 ± 6.2750.78 ± 7.2984.04 ± 4.6290.59 ± 7.3789.28 ± 5.6790.96 ± 7.2090.23 ± 7.45
Remotesensing 13 05009 i00498.42 ± 2.0399.25 ± 0.9263.26 ± 6.2568.26 ± 10.4792.61 ± 7.0798.45 ± 1.6299.25 ± 1.2698.25 ± 1.5499.47 ± 0.88
Remotesensing 13 05009 i00590.89 ± 6.2895.67 ± 2.8979.28 ± 4.9578.79 ± 3.8986.59 ± 2.9889.86 ± 5.4791.98 ± 3.0789.93 ± 5.4791.51 ± 4.71
Remotesensing 13 05009 i00698.13 ± 1.9497.85 ± 1.6185.64 ± 4.0784.50 ± 5.1990.28 ± 5.5698.60 ± 2.1295.59 ± 14.2698.63 ± 2.1396.04 ± 3.62
Remotesensing 13 05009 i007100.0 ± 0.00100.0 ± 0.0092.76 ± 6.5492.92 ± 4.7194.87 ± 5.16100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00
Remotesensing 13 05009 i008100.0 ± 0.0099.95 ± 0.1480.95 ± 3.2485.99 ± 3.1599.78 ± 0.38100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00
Remotesensing 13 05009 i009100.0 ± 0.00100.0 ± 0.0069.50 ± 20.9187.00 ± 14.0100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00
Remotesensing 13 05009 i01087.75 ± 4.3392.98 ± 5.0861.58 ± 7.5860.81 ± 5.4386.82 ± 3.8185.67 ± 6.0388.69 ± 5.5285.16 ± 5.7186.96 ± 5.66
Remotesensing 13 05009 i01192.51 ± 5.4587.79 ± 3.8255.30 ± 5.8455.53 ± 4.0178.82 ± 6.7795.12 ± 5.2191.03 ± 4.2195.81 ± 5.0895.73 ± 5.17
Remotesensing 13 05009 i01286.31 ± 6.5390.56 ± 2.3339.49 ± 4.2643.06 ± 7.2583.10 ± 5.4492.87 ± 4.8290.57 ± 2.5493.39 ± 4.8690.71 ± 4.19
Remotesensing 13 05009 i01399.93 ± 0.2199.41 ± 1.4293.22 ± 2.0793.42 ± 3.1197.46 ± 0.9499.83 ± 0.3799.88 ± 0.3699.89 ± 0.34100.0 ± 0.00
Remotesensing 13 05009 i01497.95 ± 3.7898.82 ± 1.8376.00 ± 8.6479.62 ± 8.0989.93 ± 8.1797.59 ± 2.1198.81 ± 0.7897.60 ± 2.1296.94 ± 2.53
Remotesensing 13 05009 i01595.59 ± 4.2598.10 ± 2.9233.66 ± 3.7344.90 ± 7.4589.82 ± 3.1996.54 ± 3.6699.10 ± 1.2696.46 ± 3.5998.74 ± 2.00
Remotesensing 13 05009 i01699.13 ± 1.1699.67 ± 0.6696.30 ± 3.4191.60 ± 5.4497.58 ± 3.5197.73 ± 2.2799.18 ± 1.1097.12 ± 2.3998.94 ± 0.97
Table A5. Detailed semi-supervised classification results (mean ± standard deviation) on the Houston dataset (N = 25).
Table A5. Detailed semi-supervised classification results (mean ± standard deviation) on the Houston dataset (N = 25).
NClass EMP-CNNMCNN-CPLPLapSVMEMP-LapSVMPLAROC-DPMix-PLMix-PL-CL
25OA (%)92.05 ± 0.8293.44 ± 0.9979.86 ± 0.8882.30 ± 1.0486.52 ± 1.2493.39 ± 1.3393.48 ± 1.1593.77 ± 0.9594.18 ± 0.82
AA (%)92.86 ± 0.7694.53 ± 0.9880.37 ± 0.8582.55 ± 1.1887.54 ± 1.2494.23 ± 1.2894.43 ± 1.1694.75 ± 0.8994.98 ± 0.86
K × 10091.42 ± 0.8992.91 ± 1.0778.22 ± 0.9480.86 ± 1.1385.43 ± 1.3492.86 ± 1.4492.95 ± 1.2493.27 ± 1.0293.71 ± 0.89
Remotesensing 13 05009 i00190.86 ± 5.3991.90 ± 5.6294.15 ± 4.5694.42 ± 4.2692.58 ± 4.7293.05 ± 3.9991.96 ± 5.1191.43 ± 5.3091.85 ± 4.73
Remotesensing 13 05009 i00287.25 ± 8.2397.15 ± 2.4195.70 ± 1.6494.38 ± 2.9095.13 ± 2.3487.23 ± 8.9194.66 ± 5.6485.33 ± 8.4388.51 ± 7.70
Remotesensing 13 05009 i00398.74 ± 0.9699.33 ± 0.4598.14 ± 1.3097.27 ± 2.0297.86 ± 2.1999.53 ± 0.6698.95 ± 0.6699.71 ± 0.3499.82 ± 0.19
Remotesensing 13 05009 i00494.14 ± 2.2898.65 ± 1.7197.10 ± 2.6995.96 ± 3.2692.03 ± 3.2895.36 ± 5.0796.59 ± 2.9197.46 ± 3.0297.72 ± 2.33
Remotesensing 13 05009 i00598.75 ± 1.1399.94 ± 0.1896.65 ± 1.2496.62 ± 1.0997.59 ± 1.7599.61 ± 0.6999.72 ± 0.7599.82 ± 0.5599.80 ± 0.55
Remotesensing 13 05009 i00693.13 ± 5.0298.07 ± 3.8895.33 ± 2.7693.12 ± 3.0296.95 ± 3.0795.54 ± 4.2197.28 ± 3.8796.94 ± 3.9396.87 ± 4.00
Remotesensing 13 05009 i00785.42 ± 3.0285.81 ± 4.2171.25 ± 5.5877.62 ± 6.8284.94 ± 3.2691.86 ± 4.4189.32 ± 1.6791.64 ± 3.0891.95 ± 3.21
Remotesensing 13 05009 i00882.42 ± 3.0679.82 ± 5.7965.64 ± 4.1965.37 ± 8.1475.27 ± 4.6879.65 ± 6.8280.04 ± 6.8183.18 ± 5.6081.97 ± 5.40
Remotesensing 13 05009 i00990.48 ± 3.7687.50 ± 5.3366.94 ± 3.8974.87 ± 6.8480.51 ± 3.8092.66 ± 5.8891.96 ± 4.0094.98 ± 2.7895.56 ± 3.08
Remotesensing 13 05009 i01097.69 ± 2.5996.90 ± 2.2174.52 ± 3.9380.49 ± 5.6586.56 ± 6.1099.07 ± 1.0596.49 ± 7.7598.09 ± 2.3097.50 ± 3.95
Remotesensing 13 05009 i01193.43 ± 3.2096.39 ± 2.6067.87 ± 3.5972.00 ± 3.7579.13 ± 3.2096.05 ± 2.6393.93 ± 3.9896.15 ± 2.2997.03 ± 1.99
Remotesensing 13 05009 i01290.25 ± 4.9889.27 ± 4.1557.47 ± 5.5062.59 ± 6.4267.76 ± 7.2589.80 ± 5.8989.99 ± 4.8989.10 ± 6.4391.05 ± 5.10
Remotesensing 13 05009 i01392.30 ± 3.8697.23 ± 2.8228.41 ± 5.0239.92 ± 8.4170.69 ± 4.7494.21 ± 5.2395.61 ± 3.4997.43 ± 1.6195.10 ± 4.54
Remotesensing 13 05009 i01499.76 ± 0.50100.0 ± 0.0097.07 ± 2.2995.32 ± 4.3096.55 ± 2.4199.92 ± 0.16100.0 ± 0.00100.0 ± 0.0099.95 ± 0.15
Remotesensing 13 05009 i01598.19 ± 2.70100.0 ± 0.0099.33 ± 0.7298.33 ± 1.0099.58 ± 0.59100.0 ± 0.0099.92 ± 0.1999.98 ± 0.0599.98 ± 0.05
Table A6. Detailed semi-supervised classification results (mean ± standard deviation) on the Salinas dataset (N = 25).
Table A6. Detailed semi-supervised classification results (mean ± standard deviation) on the Salinas dataset (N = 25).
NClassEMP-CNNMCNN-CPLPLapSVMEMP-LapSVMPLAROC-DPMix-PLCL-MixPL
25OA (%)94.95 ± 2.4696.17 ± 0.9884.13 ± 1.1986.12 ± 1.9691.93 ± 1.7195.97 ± 2.2596.18 ± 1.7296.69 ± 0.7197.00 ± 0.85
AA (%)98.24 ± 0.8098.37 ± 0.3791.91 ± 0.4492.01 ± 1.0195.18 ± 1.1098.49 ± 0.8298.63 ± 0.4498.82 ± 0.2298.91 ± 0.30
K × 10094.40 ± 2.7095.64 ± 0.8582.40 ± 1.3084.59 ± 2.1591.02 ± 1.9195.53 ± 2.4895.76 ± 1.3496.33 ± 0.7896.67 ± 0.94
Remotesensing 13 05009 i00199.99 ± 0.03100.0 ± 0.0098.07 ± 1.0497.04 ± 1.8799.24 ± 0.6099.89 ± 0.22100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00
Remotesensing 13 05009 i00298.79 ± 2.6099.92 ± 0.1999.56 ± 0.3796.99 ± 1.5997.95 ± 2.5197.00 ± 5.4699.52 ± 1.7899.02 ± 1.3998.67 ± 2.67
Remotesensing 13 05009 i00399.97 ± 0.0899.99 ± 0.0295.57 ± 3.0094.29 ± 3.2199.59 ± 0.4599.83 ± 0.2799.37 ± 1.0499.94 ± 0.1799.92 ± 0.17
Remotesensing 13 05009 i00499.89 ± 0.3199.43 ± 0.4298.96 ± 1.4198.86 ± 0.8699.02 ± 1.0099.96 ± 0.1399.94 ± 0.1499.974 ± 0.1099.96 ± 0.07
Remotesensing 13 05009 i00599.13 ± 0.7397.38 ± 2.0495.41 ± 2.3795.03 ± 2.0196.25 ± 2.7199.12 ± 1.0499.10 ± 1.0099.07 ± 1.0299.50 ± 0.56
Remotesensing 13 05009 i006100.0 ± 0.0099.98 ± 0.1299.36 ± 0.3298.27 ± 0.9798.60 ± 1.10100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00
Remotesensing 13 05009 i00799.80 ± 0.3999.94 ± 0.1299.38 ± 0.3896.88 ± 3.7197.74 ± 2.45100.0 ± 0.0099.55 ± 0.9699.95 ± 0.05100.0 ± 0.01
Remotesensing 13 05009 i00879.71 ± 11.188.55 ± 3.0758.83 ± 7.2470.19 ± 9.3182.59 ± 5.5384.16 ± 9.5284.89 ± 4.9087.19 ± 2.8889.03 ± 3.28
Remotesensing 13 05009 i00999.93 ± 0.20100.0 ± 0.0095.61 ± 1.3895.85 ± 2.1497.98 ± 1.2899.93 ± 0.20100.0 ± 0.00100.0 ± 0.00100.0 ± 0.00
Remotesensing 13 05009 i01099.98 ± 0.0699.09 ± 0.9886.46 ± 2.3685.37 ± 4.3695.64 ± 2.3499.95 ± 0.0999.85 ± 0.1799.88 ± 0.1799.99 ± 0.02
Remotesensing 13 05009 i01199.85 ± 0.1699.97 ± 0.0693.51 ± 2.2593.10 ± 2.5996.05 ± 2.6999.86 ± 0.1499.88 ± 0.1099.91 ± 0.1399.88 ± 0.14
Remotesensing 13 05009 i01299.95 ± 0.1299.78 ± 0.5299.56 ± 0.5399.47 ± 1.02100.0 ± 0.00100.0 ± 0.00100.0 ± 0.0099.99 ± 0.02100.0 ± 0.00
Remotesensing 13 05009 i01399.73 ± 0.60100.0 ± 0.0096.89 ± 1.9597.71 ± 1.6597.77 ± 1.5299.99 ± 0.0399.89 ± 0.2399.93 ± 0.1299.99 ± 0.03
Remotesensing 13 05009 i01499.87 ± 0.2199.70 ± 0.3992.95 ± 2.8092.96 ± 3.6692.91 ± 3.2699.45 ± 1.0699.88 ± 0.2299.86 ± 0.2799.89 ± 0.26
Remotesensing 13 05009 i01595.31 ± 2.4391.35 ± 5.0862.89 ± 5.2064.97 ± 7.4279.42 ± 4.6396.80 ± 3.5596.26 ± 2.5899.37 ± 2.0595.76 ± 3.13
Remotesensing 13 05009 i01699.95 ± 0.1398.86 ± 1.0197.52 ± 1.4195.14 ± 3.0592.04 ± 5.6699.92 ± 0.2499.94 ± 0.1499.99 ± 0.0299.95 ± 0.09

References

  1. Gevaert, C.M.; Suomalainen, J.; Tang, J.; Kooistra, L. Generation of spectral–temporal response surfaces by combining multispectral satellite and hyperspectral UAV imagery for precision agriculture applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 3140–3146. [Google Scholar] [CrossRef]
  2. Murphy, R.J.; Schneider, S.; Monteiro, S.T. Consistency of Measurements of Wavelength Position from Hyperspectral Imagery: Use of the Ferric Iron Crystal Field Absorption at ~900 nm as an Indicator of Mineralogy. IEEE Trans. Geosci. Remote Sens. 2013, 52, 2843–2857. [Google Scholar] [CrossRef]
  3. Koz, A. Ground-Based Hyperspectral Image Surveillance Systems for Explosive Detection: Part I—State of the Art and Challenges. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4746–4753. [Google Scholar] [CrossRef]
  4. Qu, D.-X.; Berry, J.; Calta, N.P.; Crumb, M.F.; Guss, G.; Matthews, M.J. Temperature Measurement of Laser-Irradiated Metals Using Hyperspectral Imaging. Phys. Rev. Appl. 2020, 14, 014031. [Google Scholar] [CrossRef]
  5. Berné, O.; Helens, A.; Pilleri, P.; Joblin, C. Non-negative matrix factorization pansharpening of hyperspectral data: An application to mid-infrared astronomy. In Proceedings of the 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing, Reykjavìk, Iceland, 14–16 June 2010; pp. 1–4. [Google Scholar]
  6. Cheng, J.-H.; Sun, D.-W.; Pu, H.; Zhu, Z. Development of hyperspectral imaging coupled with chemometric analysis to monitor K value for evaluation of chemical spoilage in fish fillets. Food Chem. 2015, 185, 245–253. [Google Scholar] [CrossRef]
  7. Chang, C.-I. Hyperspectral Data Exploitation: Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
  8. Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef] [Green Version]
  9. Ghamisi, P.; Yokoya, N.; Li, J.; Liao, W.; Liu, S.; Plaza, J.; Rasti, B.; Plaza, A. Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art. IEEE Geosci. Remote Sens. Mag. 2017, 5, 37–78. [Google Scholar] [CrossRef] [Green Version]
  10. Fauvel, M.; Tarabalka, Y.; Benediktsson, J.A.; Chanussot, J.; Tilton, J.C. Advances in spectral-spatial classification of hyperspectral images. Proc. IEEE 2012, 101, 652–675. [Google Scholar] [CrossRef] [Green Version]
  11. Ghamisi, P.; Plaza, J.; Chen, Y.; Li, J.; Plaza, A.J. Advanced spectral classifiers for hyperspectral images: A review. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–32. [Google Scholar] [CrossRef] [Green Version]
  12. He, L.; Li, J.; Liu, C.; Li, S. Recent advances on spectral–spatial hyperspectral image classification: An overview and new guidelines. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1579–1597. [Google Scholar] [CrossRef]
  13. Sowmya, V.; Soman, K.; Hassaballah, M. Hyperspectral image: Fundamentals and advances. In Recent Advances in Computer Vision; Springer: Berlin/Heidelberg, Germany, 2019; pp. 401–424. [Google Scholar]
  14. Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
  15. Benediktsson, J.A.; Palmason, J.A.; Sveinsson, J.R. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 2005, 43, 480–491. [Google Scholar] [CrossRef]
  16. Dalla Mura, M.; Villa, A.; Benediktsson, J.A.; Chanussot, J.; Bruzzone, L. Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis. IEEE Geosci. Remote Sens. Lett. 2010, 8, 542–546. [Google Scholar] [CrossRef] [Green Version]
  17. Chen, Y.; Nasrabadi, N.M.; Tran, T.D. Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3973–3985. [Google Scholar] [CrossRef]
  18. Huang, S.; Zhang, H.; Pižurica, A. A robust sparse representation model for hyperspectral image classification. Sensors 2017, 17, 2087. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef] [Green Version]
  20. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef] [Green Version]
  21. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  22. Zhao, W.; Du, S. Spectral–spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4544–4554. [Google Scholar] [CrossRef]
  23. Haut, J.M.; Paoletti, M.E.; Plaza, J.; Li, J.; Plaza, A. Active learning with convolutional neural networks for hyperspectral image classification using a new bayesian approach. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6440–6461. [Google Scholar] [CrossRef]
  24. Jiang, J.; Ma, J.; Wang, Z.; Chen, C.; Liu, X. Hyperspectral image classification in the presence of noisy labels. IEEE Trans. Geosci. Remote Sens. 2018, 57, 851–865. [Google Scholar] [CrossRef] [Green Version]
  25. Zhou, Z.-H. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 2018, 5, 44–53. [Google Scholar] [CrossRef] [Green Version]
  26. Leng, Q.; Yang, H.; Jiang, J. Label noise cleansing with sparse graph for hyperspectral image classification. Remote Sens. 2019, 11, 1116. [Google Scholar] [CrossRef] [Green Version]
  27. Zhang, X.; Song, Q.; Liu, R.; Wang, W.; Jiao, L. Modified co-training with spectral and spatial views for semisupervised hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2044–2055. [Google Scholar] [CrossRef]
  28. Yang, L.; Yang, S.; Jin, P.; Zhang, R. Semi-supervised hyperspectral image classification using spatio-spectral Laplacian support vector machine. IEEE Geosci. Remote Sens. Lett. 2013, 11, 651–655. [Google Scholar] [CrossRef]
  29. Yu, X.; Liu, T.; Gong, M.; Tao, D. Learning with biased complementary labels. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 68–83. [Google Scholar]
  30. Feng, L.; Kaneko, T.; Han, B.; Niu, G.; An, B.; Sugiyama, M. Learning with multiple complementary labels. In Proceedings of the International Conference on Machine Learning, Online, 13–18 July 2020; pp. 3072–3081. [Google Scholar]
  31. Yuan, Y.; Wang, C.; Jiang, Z. Proxy-Based Deep Learning Framework for Spectral-Spatial Hyperspectral Image Classification: Efficient and Robust. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5501115. [Google Scholar] [CrossRef]
  32. Paoletti, M.E.; Haut, J.M.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.; Li, J.; Pla, F. Capsule networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2145–2160. [Google Scholar] [CrossRef]
  33. Yu, C.; Han, R.; Song, M.; Liu, C.; Chang, C.-I. A simplified 2D-3D CNN architecture for hyperspectral image classification based on spatial–spectral fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2485–2501. [Google Scholar] [CrossRef]
  34. Alam, F.I.; Zhou, J.; Liew, A.W.-C.; Jia, X.; Chanussot, J.; Gao, Y. Conditional random field and deep feature learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1612–1628. [Google Scholar] [CrossRef] [Green Version]
  35. Yu, C.; Zhao, M.; Song, M.; Wang, Y.; Li, F.; Han, R.; Chang, C.-I. Hyperspectral image classification method based on CNN architecture embedding with hashing semantic feature. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1866–1881. [Google Scholar] [CrossRef]
  36. Lee, H.; Kwon, H. Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans. Image Process. 2017, 26, 4843–4855. [Google Scholar] [CrossRef] [Green Version]
  37. Li, W.; Wu, G.; Zhang, F.; Du, Q. Hyperspectral image classification using deep pixel-pair features. IEEE Trans. Geosci. Remote Sens. 2016, 55, 844–853. [Google Scholar] [CrossRef]
  38. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
  39. He, X.; Chen, Y.; Ghamisi, P. Heterogeneous transfer learning for hyperspectral image classification based on convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3246–3263. [Google Scholar] [CrossRef]
  40. Fang, L.; Zhao, W.; He, N.; Zhu, J. Multiscale CNNs Ensemble Based Self-Learning for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1593–1597. [Google Scholar] [CrossRef]
  41. Gao, K.; Liu, B.; Yu, X.; Qin, J.; Zhang, P.; Tan, X. Deep relation network for hyperspectral image few-shot classification. Remote Sens. 2020, 12, 923. [Google Scholar] [CrossRef] [Green Version]
  42. Roy, S.K.; Mondal, R.; Paoletti, M.E.; Haut, J.M.; Plaza, A. Morphological Convolutional Neural Networks for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8689–8702. [Google Scholar] [CrossRef]
  43. Aptoula, E.; Ozdemir, M.C.; Yanikoglu, B. Deep learning with attribute profiles for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1970–1974. [Google Scholar] [CrossRef]
  44. He, X.; Chen, Y.; Lin, Z. Spatial-Spectral Transformer for Hyperspectral Image Classification. Remote Sens. 2021, 13, 498. [Google Scholar] [CrossRef]
  45. Cheng, L.; Zhou, X.; Zhao, L.; Li, D.; Shang, H.; Zheng, Y.; Pan, P.; Xu, Y. Weakly supervised learning with side information for noisy labeled images. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 306–321. [Google Scholar]
  46. Zhang, Z.; Sabuncu, M.R. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
  47. Han, B.; Yao, Q.; Yu, X.; Niu, G.; Xu, M.; Hu, W.; Tsang, I.; Sugiyama, M. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), Montréal, QC, Canada, 3–8 December 2018; pp. 8527–8537. [Google Scholar]
  48. Yu, X.; Han, B.; Yao, J.; Niu, G.; Tsang, I.; Sugiyama, M. How does disagreement help generalization against label corruption? In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7164–7173. [Google Scholar]
  49. Wei, H.; Feng, L.; Chen, X.; An, B. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13726–13735. [Google Scholar]
  50. Zhu, X.; Goldberg, A.B. Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 2009, 3, 1–130. [Google Scholar] [CrossRef] [Green Version]
  51. Wang, C.-P.; Zhang, J.-S.; Du, F.; Shi, G. Symmetric low-rank representation with adaptive distance penalty for semi-supervised learning. Neurocomputing 2018, 316, 376–385. [Google Scholar] [CrossRef]
  52. Dornaika, F.; Weng, L. Sparse graphs with smoothness constraints: Application to dimensionality reduction and semi-supervised classification. Pattern Recognit. 2019, 95, 285–295. [Google Scholar] [CrossRef]
  53. Lee, D.-H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of the Workshop on Challenges in Representation Learning, ICML, Atlanta, GA, USA, 16–21 June 2013; p. 896. [Google Scholar]
  54. Iscen, A.; Tolias, G.; Avrithis, Y.; Chum, O. Label propagation for deep semi-supervised learning. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5070–5079. [Google Scholar]
  55. Bahraini, T.; Azimpour, P.; Yazdi, H.S. Modified-mean-shift-based noisy label detection for hyperspectral image classification. Comput. Geosci. 2021, 155, 104843. [Google Scholar] [CrossRef]
  56. Tu, B.; Zhou, C.; He, D.; Huang, S.; Plaza, A. Hyperspectral classification with noisy label detection via superpixel-to-pixel weighting distance. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4116–4131. [Google Scholar] [CrossRef]
  57. Kang, X.; Duan, P.; Xiang, X.; Li, S.; Benediktsson, J.A. Detection and correction of mislabeled training samples for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5673–5686. [Google Scholar] [CrossRef]
  58. Tu, B.; Zhang, X.; Kang, X.; Wang, J.; Benediktsson, J.A. Spatial density peak clustering for hyperspectral image classification with noisy labels. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5085–5097. [Google Scholar] [CrossRef]
  59. Camps-Valls, G.; Marsheva, T.V.B.; Zhou, D. Semi-supervised graph-based hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3044–3054. [Google Scholar] [CrossRef]
  60. Wang, L.; Hao, S.; Wang, Q.; Wang, Y. Semi-supervised classification for hyperspectral imagery based on spatial-spectral label propagation. ISPRS J. Photogramm. Remote Sens. 2014, 97, 123–137. [Google Scholar] [CrossRef]
  61. Riese, F.M.; Keller, S.; Hinz, S. Supervised and semi-supervised self-organizing maps for regression and classification focusing on hyperspectral data. Remote Sens. 2020, 12, 7. [Google Scholar] [CrossRef] [Green Version]
  62. Tan, K.; Zhu, J.; Du, Q.; Wu, L.; Du, P. A novel tri-training technique for semi-supervised classification of hyperspectral images based on diversity measurement. Remote Sens. 2016, 8, 749. [Google Scholar] [CrossRef] [Green Version]
  63. Zhang, Y.; Liu, K.; Dong, Y.; Wu, K.; Hu, X. Semisupervised classification based on SLIC segmentation for hyperspectral image. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1440–1444. [Google Scholar] [CrossRef]
  64. Ji, X.; Cui, Y.; Wang, H.; Teng, L.; Wang, L.; Wang, L. Semisupervised hyperspectral image classification using spatial-spectral information and landscape features. IEEE Access 2019, 7, 146675–146692. [Google Scholar] [CrossRef]
  65. Fang, B.; Li, Y.; Zhang, H.; Chan, J.C.-W. Collaborative learning of lightweight convolutional neural network and deep clustering for hyperspectral image semi-supervised classification with limited training samples. ISPRS J. Photogramm. Remote Sens. 2020, 161, 164–178. [Google Scholar] [CrossRef]
  66. Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
  67. Zheng, J.; Feng, Y.; Bai, C.; Zhang, J. Hyperspectral image classification using mixed convolutions and covariance pooling. IEEE Trans. Geosci. Remote Sens. 2020, 59, 522–534. [Google Scholar] [CrossRef]
Figure 1. Complementary learning-based CNN for HSI classification.
Figure 1. Complementary learning-based CNN for HSI classification.
Remotesensing 13 05009 g001
Figure 2. The framework of the complementary learning-based HSI classification with noisy labels.
Figure 2. The framework of the complementary learning-based HSI classification with noisy labels.
Remotesensing 13 05009 g002
Figure 3. The framework of the complementary learning-based HSI semi-supervised classification.
Figure 3. The framework of the complementary learning-based HSI semi-supervised classification.
Remotesensing 13 05009 g003
Figure 4. Indian Pines dataset: (a) false color map; (b) ground-truth map.
Figure 4. Indian Pines dataset: (a) false color map; (b) ground-truth map.
Remotesensing 13 05009 g004
Figure 5. Houston dataset: (a) false color map; (b) ground-truth map.
Figure 5. Houston dataset: (a) false color map; (b) ground-truth map.
Remotesensing 13 05009 g005
Figure 6. Salinas dataset: (a) false color map; (b) ground-truth map.
Figure 6. Salinas dataset: (a) false color map; (b) ground-truth map.
Remotesensing 13 05009 g006
Figure 7. Balancing coefficient ρ ( t ) .
Figure 7. Balancing coefficient ρ ( t ) .
Remotesensing 13 05009 g007
Figure 8. The distribution of Indian Pines training data in different learning stages with 30% label noise, according to probability   p y . (a) early stage of learning; (b) middle stage of learning; (c) late stage of learning.
Figure 8. The distribution of Indian Pines training data in different learning stages with 30% label noise, according to probability   p y . (a) early stage of learning; (b) middle stage of learning; (c) late stage of learning.
Remotesensing 13 05009 g008
Figure 9. The distribution of Houston training data in different learning stages with 30% label noise, according to probability   p y . (a) early stage of learning; (b) middle stage of learning; (c) late stage of learning.
Figure 9. The distribution of Houston training data in different learning stages with 30% label noise, according to probability   p y . (a) early stage of learning; (b) middle stage of learning; (c) late stage of learning.
Remotesensing 13 05009 g009
Figure 10. The distribution of Salinas training data in different learning stages with 30% label noise, according to probability   p y . (a) early stage of learning; (b) middle stage of learning; (c) late stage of learning.
Figure 10. The distribution of Salinas training data in different learning stages with 30% label noise, according to probability   p y . (a) early stage of learning; (b) middle stage of learning; (c) late stage of learning.
Remotesensing 13 05009 g010
Figure 11. The distribution of Indian Pines training data in different learning strategies with 30% label noise, according to probability   p y . (a) traditional learning; (b) complementary learning; (c) selective complementary learning following CL; (d) traditional learning using samples whose p y > 0.5 .
Figure 11. The distribution of Indian Pines training data in different learning strategies with 30% label noise, according to probability   p y . (a) traditional learning; (b) complementary learning; (c) selective complementary learning following CL; (d) traditional learning using samples whose p y > 0.5 .
Remotesensing 13 05009 g011
Figure 12. The distribution of Houston training data in different learning strategies with 30% label noise, according to probability   p y . (a) traditional learning; (b) complementary learning; (c) selective complementary learning following CL; (d) traditional learning using samples whose p y > 0.5 .
Figure 12. The distribution of Houston training data in different learning strategies with 30% label noise, according to probability   p y . (a) traditional learning; (b) complementary learning; (c) selective complementary learning following CL; (d) traditional learning using samples whose p y > 0.5 .
Remotesensing 13 05009 g012
Figure 13. The distribution of Salinas training data in different learning strategies with 30% label noise, according to probability   p y . (a) traditional learning; (b) complementary learning; (c) selective complementary learning following CL; (d) traditional learning using samples whose p y > 0.5 .
Figure 13. The distribution of Salinas training data in different learning strategies with 30% label noise, according to probability   p y . (a) traditional learning; (b) complementary learning; (c) selective complementary learning following CL; (d) traditional learning using samples whose p y > 0.5 .
Remotesensing 13 05009 g013
Figure 14. The influence of ρ e n d and α on OA with N = 25. (a) OA with different values of ρ e n d , while α = 1.0 , T 4 = 2 ; (b) OA with different values of α , while ρ e n d = 2 , T 4 = 2 ; (c) OA with different values of T 4 , while α = 1.0 , ρ e n d = 2 .
Figure 14. The influence of ρ e n d and α on OA with N = 25. (a) OA with different values of ρ e n d , while α = 1.0 , T 4 = 2 ; (b) OA with different values of α , while ρ e n d = 2 , T 4 = 2 ; (c) OA with different values of T 4 , while α = 1.0 , ρ e n d = 2 .
Remotesensing 13 05009 g014
Figure 15. Indian Pines. (a) The ground-truth map with noisy training samples, the classification map using (b) SeCL-CNN; (c) KSDP-CNN; (d) CNN-Lq; (e) CNN; (f) EMP-SVM; (g) Mix-PL-CL; (h) LP.
Figure 15. Indian Pines. (a) The ground-truth map with noisy training samples, the classification map using (b) SeCL-CNN; (c) KSDP-CNN; (d) CNN-Lq; (e) CNN; (f) EMP-SVM; (g) Mix-PL-CL; (h) LP.
Remotesensing 13 05009 g015
Figure 16. Houston. (a) The ground-truth map with noisy training samples, the classification map using (b) SeCL-CNN; (c) KSDP-CNN; (d) CNN-Lq; (e) CNN; (f) EMP-SVM; (g) Mix-PL-CL; (h) LP.
Figure 16. Houston. (a) The ground-truth map with noisy training samples, the classification map using (b) SeCL-CNN; (c) KSDP-CNN; (d) CNN-Lq; (e) CNN; (f) EMP-SVM; (g) Mix-PL-CL; (h) LP.
Remotesensing 13 05009 g016
Figure 17. Salinas. (a) The ground-truth map with noisy training samples, the classification map using (b) SeCL-CNN; (c) KSDP-CNN; (d) CNN-Lq; (e) CNN; (f) EMP-SVM; (g) Mix-PL-CL; (h) LP.
Figure 17. Salinas. (a) The ground-truth map with noisy training samples, the classification map using (b) SeCL-CNN; (c) KSDP-CNN; (d) CNN-Lq; (e) CNN; (f) EMP-SVM; (g) Mix-PL-CL; (h) LP.
Remotesensing 13 05009 g017
Table 1. Land cover classes and numbers of samples in the Indian Pines dataset.
Table 1. Land cover classes and numbers of samples in the Indian Pines dataset.
No.ColorClass NameNumber
1 Remotesensing 13 05009 i001Alfalfa46
2 Remotesensing 13 05009 i002Corn-notill1428
3 Remotesensing 13 05009 i003Corn-mintill830
4 Remotesensing 13 05009 i004Corn237
5 Remotesensing 13 05009 i005Grass-pasture483
6 Remotesensing 13 05009 i006Grass-trees730
7 Remotesensing 13 05009 i007Grass-pasture-mowed28
8 Remotesensing 13 05009 i008Hay-windrowed478
9 Remotesensing 13 05009 i009Oats20
10 Remotesensing 13 05009 i010Soybean-notill972
11 Remotesensing 13 05009 i011Soybean-mintill2455
12 Remotesensing 13 05009 i012Soybean-clean593
13 Remotesensing 13 05009 i013Wheat205
14 Remotesensing 13 05009 i014Woods1265
15 Remotesensing 13 05009 i015Buildings-Grass-Trees386
16 Remotesensing 13 05009 i016Stone-Steel-Towers93
Total10,249
Table 2. Land cover classes and numbers of samples in the Houston dataset.
Table 2. Land cover classes and numbers of samples in the Houston dataset.
No.ColorClass NameNumber
1 Remotesensing 13 05009 i001Grass-healthy1251
2 Remotesensing 13 05009 i002Grass-stressed1254
3 Remotesensing 13 05009 i003Grass-synthetic697
4 Remotesensing 13 05009 i004Tree1244
5 Remotesensing 13 05009 i005Soil1242
6 Remotesensing 13 05009 i006Water325
7 Remotesensing 13 05009 i007Residential1268
8 Remotesensing 13 05009 i008Commercial1244
9 Remotesensing 13 05009 i009Road1252
10 Remotesensing 13 05009 i010Highway1227
11 Remotesensing 13 05009 i011Railway1235
12 Remotesensing 13 05009 i012Parking-lot-11233
13 Remotesensing 13 05009 i013Parking-lot-2469
14 Remotesensing 13 05009 i014Tennis-court428
15 Remotesensing 13 05009 i015Running-track660
Total 15,029
Table 3. Land cover classes and numbers of samples in the Salinas dataset.
Table 3. Land cover classes and numbers of samples in the Salinas dataset.
No.ColorClass NameNumber
1 Remotesensing 13 05009 i001Brocoli-green-weeds-12009
2 Remotesensing 13 05009 i002Brocoli-green-weeds-23726
3 Remotesensing 13 05009 i003Fallow1976
4 Remotesensing 13 05009 i004Fallow-rough-plow1394
5 Remotesensing 13 05009 i005Fallow-smooth2678
6 Remotesensing 13 05009 i006Stubble3959
7 Remotesensing 13 05009 i007Celery3579
8 Remotesensing 13 05009 i008Grapes-untrained11,271
9 Remotesensing 13 05009 i009Soil-vineyard-develop6203
10 Remotesensing 13 05009 i010Corn-senesced-green-weeds3278
11 Remotesensing 13 05009 i011Lettuce-romaine-4wk1068
12 Remotesensing 13 05009 i012Lettuce-romaine-5wk1927
13 Remotesensing 13 05009 i013Lettuce-romaine-6wk916
14 Remotesensing 13 05009 i014Lettuce-romaine-7wk1070
15 Remotesensing 13 05009 i015Vineyard-untrained7268
16 Remotesensing 13 05009 i016Vineyard-vertical-trellis1807
Total 54,129
Table 4. Architecture of CNN.
Table 4. Architecture of CNN.
No.ConvolutionReLUPoolingPaddingStrideBN
14 × 4 × 32YES2 × 2NO1YES
25 × 5 × 32YES2 × 2NO1YES
34 × 4 × 64YESNONO1YES
Table 5. AUC of detection results on the three datasets.
Table 5. AUC of detection results on the three datasets.
Dataset10%20%30%
DPKSDPSSDPSeCLDPKSDPSSDPSeCLDPKSDPSSDPSeCL
Indian Pines0.90270.92810.94110.97560.89940.92770.93910.97780.89880.92480.93860.9672
Houston0.91300.92620.93530.95030.90070.91230.92850.94490.88750.89320.91240.9404
Salinas0.96790.97860.98610.99510.96810.97760.98440.99560.96780.97510.98060.9955
Table 6. Testing data classification results (mean ± standard deviation) on the Indian Pines dataset.
Table 6. Testing data classification results (mean ± standard deviation) on the Indian Pines dataset.
Noise Ratio RBF-SVMEMP-SVMCNNMCNN-CPCNN-LqDP-CNNKSDP-CNNSSDP-CNNSeCL-CNN
10%OA (%) 62.25   ±   2.6574.52   ±   2.5276.84   ±   2.0483.94 ± 1.7682.51   ±   1.8579.55   ±   1.7280.01   ±   1.7481.86 ± 1.6882.70   ±   1.96
AA (%) 73.50   ±   1.5183.17   ±   1.1383.32   ±   0.9687.77 ± 1.6589.59   ±   1.6386.20   ±   1.5786.60   ±   1.0388.25 ± 1.6789.36   ±   1.73
K × 100 57.57   ±   2.8571.23   ±   2.7373.88   ±   2.2381.81 ± 1.9580.18   ±   2.0676.88   ±   1.8877.40   ±   1.8779.45 ± 1.9280.35   ±   2.20
20%OA (%) 59.55   ±   1.9971.16   ±   2.7067.45   ±   2.5276.91 ± 2.1678.19   ±   2.6172.81   ±   4.5676.79   ±   2.5978.81 ± 1.9479.98   ±   2.40
AA (%) 70.98   ±   1.7180.29   ±   1.9173.19   ±   2.0179.98 ± 1.0785.26   ±   1.4482.66   ±   1.8684.79   ±   0.7886.11 ± 1.4488.04   ±   1.54
K × 100 84.58   ±   1.9567.48   ±   2.8263.48   ±   2.7073.97 ± 2.3075.33   ±   2.8369.43   ±   4.9973.79   ±   2.7976.35 ± 2.1277.28   ±   2.63
30%OA (%) 55.38   ±   4.5367.11   ±   3.5457.34   ±   2.8768.16 ± 3.2766.36   ±   5.1470.43   ±   2.8272.22   ±   2.6472.88 ± 2.4773.90   ±   2.94
AA (%) 66.90   ±   2.7776.60   ±   2.1963.48   ±   1.9772.21 ± 2.0675.32   ±   3.0178.27   ±   1.6881.00   ±   1.7382.62 ± 1.2483.44   ±   2.07
K × 100 49.94   ±   4.5362.92   ±   3.7052.56   ±   3.0064.30 ± 3.4862.30   ±   5.5166.72   ±   3.0168.66   ±   2.8469.86 ± 2.6770.51   ±   3.21
Table 7. Testing data classification results (mean ± standard deviation) on the Houston dataset.
Table 7. Testing data classification results (mean ± standard deviation) on the Houston dataset.
Noise Ratio RBF-SVMEMP-SVMCNNMCNN-CPCNN-LqDP-CNNKSDP-CNNSSDP-CNNSeCL-CNN
10%OA (%) 82.81   ±   2.1485.65   ±   1.9182.03   ±   1.4288.01 ± 1.5986.47   ±   1.6284.96   ±   1.4285.76 ± 1.0486.29 ± 1.3786.95   ±   2.18
AA (%) 82.96   ±   1.8186.11   ±   1.7182.94   ±   1.4389.09 ± 1.3587.82   ±   1.4586.26   ±   1.3887.02   ±   0.9688.25 ± 1.4588.42   ±   2.04
K × 100 81.40   ±   2.3184.48   ±   2.0680.60   ±   1.5487.05 ± 1.7285.38   ±   1.7583.76   ±   1.5384.62   ±   1.1279.45 ± 1.8985.89   ±   2.36
20%OA (%) 79.92   ±   0.8582.26   ±   0.8571.29   ±   0.9282.13 ± 2.4681.97 ± 1.5080.00   ±   2.6481.25   ±   1.6082.43 ± 1.9683.68   ±   2.57
AA (%) 80.65   ±   0.7083.09   ±   0.8072.17   ±   0.9583.18 ± 2.4983.02   ±   1.6081.55   ±   2.3182.66   ±   1.1183.91 ± 1.8885.01   ±   2.58
K × 100 78.29   ±   0.9280.81   ±   0.9269.03   ±   0.9980.70 ± 2.6680.52   ±   1.6378.41   ±   2.8579.74   ±   1.7181.15 ± 2.0382.37   ±   2.77
30%OA (%) 77.05   ±   1.9178.88   ±   1.6262.05   ±   1.9675.58 ± 2.6374.44   ±   2.1275.25   ±   2.3676.65   ±   2.2778.16 ± 2.0080.00   ±   2.51
AA (%) 77.87   ±   1.3379.96   ±   1.5462.33   ±   1.9276.02 ± 2.3375.21   ±   2.2876.93   ±   2.2278.36   ±   2.2679.49 ± 1.4981.41   ±   2.45
K × 100 75.18   ±   2.0677.16   ±   1.7559.09 ± 2.1173.63 ± 2.8472.41 ± 2.2973.29   ±   2.5374.77 ± 2.9376.40 ± 2.1378.39   ±   2.72
Table 8. Testing data classification results (mean ± standard deviation) on the Salinas dataset.
Table 8. Testing data classification results (mean ± standard deviation) on the Salinas dataset.
Noise Ratio RBF-SVMEMP-SVMCNNMCNN-CPCNN-LqDP-CNNKSDP-CNNSSDP-CNNSeCL-CNN
10%OA (%)87.01 ± 1.9290.09 ± 0.8988.06 ± 2.0392.68 ± 1.2892.14 ± 2.2990.90 ± 1.8691.80 ± 2.6492.24 ± 2.56 92.57 ± 2.45
AA (%) 92.61   ±   0.93 94.45   ±   0.46 92.06   ±   1.1494.34 ± 0.96 96.10   ±   0.89 95.17   ±   0.54 95.44   ±   1.2595.86 ± 1.67 96.39   ±   1.26
K × 100 85.57   ±   2.09 88.97   ±   0.98 86.78   ±   2.2491.62 ± 1.42 91.28   ±   2.53 89.91   ±   2.04 90.91   ±   2.9191.32 ± 2.82 91.74   ±   2.70
20%OA (%) 85.80   ±   2.17 88.22   ±   2.01 81.36   ±   3.0488.43 ± 2.12 91.73   ±   2.10 89.92   ±   2.04 90.62   ±   1.7891.31 ± 1.80 92.13   ±   1.62
AA (%) 91.79   ±   0.86 93.33   ±   0.98 85.01   ±   1.8089.85 ± 2.03 95.69   ±   1.26 94.42   ±   0.98 94.56   ±   1.3195.01 ± 1.21 95.88   ±   0.75
K × 100 84.23   ±   2.3486.912.21 79.39   ±   3.3287.15 ± 2.35 90.81   ±   2.34 88.81   ±   2.24 89.57   ±   1.9990.35 ± 2.03 91.27   ±   1.79
30%OA (%) 85.59   ±   2.05 86.85   ±   2.02 72.36   ±   2.3084.53 ± 2.79 89.99   ±   1.92 87.10   ±   2.38 88.35   ±   3.3289.76 ± 1.67 91.51   ±   2.31
AA (%) 91.24   ±   1.09 92.23   ±   1.35 75.44   ±   1.2485.27 ± 2.95 93.77   ±   1.57 90.79   ±   2.01 92.25   ±   1.1792.86 ± 1.4095.07 ± 1.48
K × 100 83.98   ±   2.22 85.38   ±   2.21 69.54   ±   2.4882.84 ± 3.07 88.89   ±   2.13 85.70   ±   2.63 87.09   ±   2.5388.62 ± 1.85 90.57   ±   2.55
Table 9. Ablation studies for inaccurate supervision on the three datasets (30%label noise).
Table 9. Ablation studies for inaccurate supervision on the three datasets (30%label noise).
DatasetMetricSeCL-CNNWithout EMPWithout Selective CL
IndianOA (%)73.9072.2372.98
AUC0.96720.95260.9559
HoustonOA (%)80.0078.6279.07
AUC0.94040.93460.9373
SalinasOA (%)91.5190.2090.62
AUC0.99550.99270.9915
Table 10. Testing data classification results (mean ± standard deviation) on the Indian Pines dataset.
Table 10. Testing data classification results (mean ± standard deviation) on the Indian Pines dataset.
N EMP-CNNMCNN-CPLPLapSVMEMP-LapSVMPLAROC-DPMix-PLMix-PL-CL
20OA (%)88.67 ± 1.9989.97 ± 1.4355.96 ± 2.1559.02 ± 1.8984.10 ± 2.5889.78 ± 2.0090.73 ± 1.6891.65 ± 2.1192.54 ± 1.93
AA (%)93.00 ± 1.0194.76 ± 0.7666.93 ± 1.5670.27 ± 1.4489.86 ± 1.2993.99 ± 1.0894.47 ± 0.8794.48 ± 1.2194.67 ± 1.21
K × 10087.08 ± 2.2488.11 ± 1.6050.45 ± 2.2853.89 ± 2.0581.96 ± 2.9288.36 ± 2.2689.45 ± 1.8890.48 ± 2.3991.45 ± 2.20
30OA (%)92.83 ± 1.4593.88 ± 1.4659.49 ± 1.2563.52 ± 1.0786.55 ± 2.3993.56 ± 1.5793.69 ± 1.8094.41 ± 1.3994.82 ± 1.32
AA (%)95.84 ± 0.7596.35 ± 0.5668.47 ± 0.8173.38 ± 2.1191.37 ± 1.3796.06 ± 0.7496.11 ± 0.8596.33 ± 0.5896.72 ± 0.64
K × 10091.82 ± 1.6492.99 ± 1.6654.28 ± 1.3058.89 ± 1.1984.70 ± 2.6892.63 ± 1.7792.46 ± 2.0493.57 ± 1.5694.06 ± 1.50
25OA (%)91.78 ± 2.2292.74 ± 1.4958.12 ± 1.3361.27 ± 1.2785.09 ± 2.3492.87 ± 2.3092.30 ± 1.7293.12 ± 3.2893.33 ± 2.29
AA (%)94.95 ± 1.2096.19 ± 0.7467.86 ± 1.2771.60 ± 1.6490.57 ± 1.4395.35 ± 1.2695.55 ± 0.7795.37 ± 1.2795.74 ± 1.11
K × 10090.60 ± 2.5291.71 ± 1.6952.73 ± 1.4056.26 ± 1.4683.07 ± 2.6191.83 ± 2.6291.20 ± 1.9592.12 ± 2.7192.35 ± 2.61
Table 11. Testing data classification results (mean ± standard deviation) on the Houston dataset.
Table 11. Testing data classification results (mean ± standard deviation) on the Houston dataset.
N EMP-CNNMCNN-CPLPLapSVMEMP-LapSVMPLAROC-DPMix-PLMix-PL-CL
20OA (%)90.48 ± 0.9792.53 ± 1.2778.21 ± 0.9980.63 ± 1.0685.63 ± 1.5391.52 ± 0.9892.90 ± 0.8692.89 ± 1.1293.39 ± 1.06
AA (%)91.38 ± 0.8193.66 ± 1.1078.91 ± 0.8381.12 ± 1.2186.75 ± 1.3792.15 ± 0.8493.82 ± 0.7393.44 ± 0.9294.29 ± 0.87
K × 10089.71 ± 1.0591.93 ± 1.3876.45 ± 1.0879.06 ± 1.1684.47 ± 1.6690.86 ± 0.9292.33 ± 0.9392.01 ± 1.2592.86 ± 1.14
30OA (%)93.34 ± 0.8694.34 ± 0.8081.04 ± 0.8983.49 ± 1.0888.13 ± 1.2694.12 ± 1.0594.59 ± 0.6894.82 ± 1.2895.62 ± 0.98
AA (%)94.13 ± 0.6795.32 ± 0.7181.47 ± 0.7983.69 ± 1.0288.89 ± 1.1394.86 ± 0.8995.56 ± 0.6995.49 ± 1.1296.36 ± 0.81
K × 10092.80 ± 0.9393.88 ± 0.8779.50 ± 0.9682.14 ± 1.1687.17 ± 1.3693.52 ± 1.1294.12 ± 1.1294.41 ± 1.3995.26 ± 1.06
25OA (%)92.05 ± 0.8293.44 ± 0.9979.86 ± 0.8882.30 ± 1.0486.52 ± 1.2493.39 ± 1.3393.48 ± 1.1593.77 ± 0.9594.18 ± 0.82
AA (%)92.86 ± 0.7694.53 ± 0.9880.37 ± 0.8582.55 ± 1.1887.54 ± 1.2494.23 ± 1.2894.43 ± 1.1694.75 ± 0.8994.98 ± 0.86
K × 10091.42 ± 0.8992.91 ± 1.0778.22 ± 0.9480.86 ± 1.1385.43 ± 1.3492.86 ± 1.4492.95 ± 1.2493.27 ± 1.0293.71 ± 0.89
Table 12. Testing data classification results (mean ± standard deviation) on the Salinas dataset.
Table 12. Testing data classification results (mean ± standard deviation) on the Salinas dataset.
N EMP-CNNMCNN-CPLPLapSVMEMP-LapSVMPLAROC-DPMix-PLMix-PL-CL
20OA (%)94.60 ± 3.3295.77 ± 1.5083.77 ± 0.8885.37 ± 1.9291.38 ± 1.6295.41 ± 1.3795.47 ± 1.8695.94 ± 1.6896.20 ± 1.13
AA (%)97.88 ± 1.8098.12 ± 0.3891.25 ± 0.4691.43 ± 1.2594.81 ± 1.0397.99 ± 0.4898.26 ± 0.5898.02 ± 0.7598.29 ± 0.51
K × 10094.00 ± 3.8195.30 ± 1.5582.00 ± 0.9883.76 ± 2.1290.41 ± 1.8194.90 ± 1.5194.97 ± 2.0595.45 ± 2.0295.77 ± 1.25
30OA (%)95.72 ± 1.3796.44 ± 0.6784.30 ± 0.7686.14 ± 1.4192.90 ± 0.9496.36 ± 2.5996.95 ± 1.1396.85 ± 1.6097.18 ± 0.84
AA (%)98.41 ± 0.4898.43 ± 0.4591.83 ± 0.3192.43 ± 0.9195.85 ± 0.6398.73 ± 0.7898.91 ± 0.4098.80 ± 0.7898.83 ± 0.44
K × 10095.25 ± 1.5196.04 ± 0.9582.60 ± 0.8184.61 ± 1.5492.10 ± 1.0495.97 ± 2.8596.67 ± 1.2696.50 ± 1.8096.87 ± 0.93
25OA (%)94.95 ± 2.4696.17 ± 0.9884.13 ± 1.1986.12 ± 1.9691.93 ± 1.7195.97 ± 2.2596.18 ± 1.7296.69 ± 0.7197.00 ± 0.85
AA (%)98.24 ± 0.8098.37 ± 0.3791.91 ± 0.4492.01 ± 1.0195.18 ± 1.1098.49 ± 0.8298.63 ± 0.4498.82 ± 0.2298.91 ± 0.30
K × 10094.40 ± 2.7095.64 ± 0.8582.40 ± 1.3084.59 ± 2.1591.02 ± 1.9195.53 ± 2.4895.76 ± 1.3496.33 ± 0.7896.67 ± 0.94
Table 13. The OA (%) of ablation studies for semi-supervised classification on the three datasets (N = 25).
Table 13. The OA (%) of ablation studies for semi-supervised classification on the three datasets (N = 25).
DatasetMix-PL-CLWithout EMPWithout PLWithout CLWithout Mixup
Indian93.3392.1592.3693.1292.98
Houston94.1892.7692.8193.7793.75
Salinas97.0096.0595.1496.6996.63
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, L.; Chen, Y.; He, X. Weakly Supervised Classification of Hyperspectral Image Based on Complementary Learning. Remote Sens. 2021, 13, 5009. https://doi.org/10.3390/rs13245009

AMA Style

Huang L, Chen Y, He X. Weakly Supervised Classification of Hyperspectral Image Based on Complementary Learning. Remote Sensing. 2021; 13(24):5009. https://doi.org/10.3390/rs13245009

Chicago/Turabian Style

Huang, Lingbo, Yushi Chen, and Xin He. 2021. "Weakly Supervised Classification of Hyperspectral Image Based on Complementary Learning" Remote Sensing 13, no. 24: 5009. https://doi.org/10.3390/rs13245009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop