1. Introduction
In the past few decades, machine leaning has demonstrated remarkable success across various visual tasks [
1,
2,
3,
4,
5,
6,
7,
8]. This success can be attributed to advancements in learning algorithms and the availability of extensive labeled datasets. However, in real-world scenarios, the construction of large labeled datasets can be costly and often impractical. Therefore, finding ways to effectively learn from a limited number of labeled data points has become a major concern. This is where semi-supervised learning (SSL) [
9,
10,
11] comes into play. SSL is an important branch of the machine learning theory and its algorithms, which has emerged as a promising solution to address this challenge by leveraging the abundance of unlabeled data. It has proven to be a remarkable achievement in the field.
The goal of SSL is to enhance the generalization performance by leveraging the potential of unlabeled data. One widely accepted assumption, known as the Low-density Separation Assumption [
12], posits that the decision boundary should typically reside in low-density regions in order to improve generalization. Building upon this assumption, two prominent paradigms have emerged: pseudo-labeling [
11] and consistency regularization [
13]. These approaches have gained significant popularity in the field as effective methods for leveraging unlabeled data in the pursuit of a better generalization performance. Consistency-regularization based methods have become widely adopted methods in SSL. These methods aim to maintain the stability of network outputs when presented with noisy inputs [
13,
14]. However, one limitation of consistency-regularization based methods is their heavy reliance on extensive data augmentations, which may restrict their effectiveness in certain domains such as videos and medical images. On the other hand, pseudo-labeling based methods are alternative approaches that have gained popularity in SSL. These methods select unlabeled samples with high confidence as training targets (pseudo-labels) [
11]. One notable advantage of pseudo-labeling based methods is their simplicity, as they do not require multiple data augmentations and can be easily applied to various domains.
In recent trends, a combination of pseudo-labeling and consistency regularization has shown promising results [
15,
16,
17,
18]. The underlying idea of these methods is to train a classifier using labeled samples and use the predicted distribution as pseudo-labels for unlabeled samples. These pseudo-labels are typically generated by weakly augmented views [
16,
19], or by averaging predictions from multiple strongly augmented views [
9]. The objective is then constructed by applying the cross-entropy loss between the pseudo-labels and the predictions obtained from different strongly augmented views. It is worth noting that the pseudo-labels are often sharpened or processed using argmax, and each instance is assigned to a specific category to further refine the learning process.
Building upon this concept, several methods have been proposed. MixMatch [
9] adopts a sharpened averaged prediction from multiple strongly augmented views as the pseudo-label and incorporates the mix-up trick [
15] to enhance the quality of pseudo-labels. ReMixMatch [
16] further improves upon this idea by generating pseudo-labels with weakly augmented views. Additionally, it introduces a distribution alignment strategy, which promotes alignment between the pseudo-label distribution and the distribution of ground-truth class labels. FixMatch [
19] simplifies these ideas by employing a confidence-based threshold to select high-confidence pseudo-labels. This method has achieved state-of-the-art performance among augmentation anchoring-based methods. SimPLE [
20] enhances previous approaches by incorporating a novel unsupervised objective called Pair Loss. The primary objective is to minimize the statistical distance between high-confidence pseudo-labels that surpass a specific similarity threshold. SimMatch [
21], on the other hand, simultaneously matches similarity relationships in both semantic and instance spaces for different augmentations. It further interacts between the semantic and instance pseudo-labels by employing a memory buffer for instantiation. These methods represent significant advancements in semi-supervised learning by effectively utilizing pseudo-labels and incorporating various strategies to enhance the performance of the models.
The previously mentioned methods have one thing in common: they estimate the pseudo-label depending upon the high confident predicted distributions from a single unlabeled sample. One major issue with these methods is their heavy reliance on the quality of pseudo-labeling, which limits their effectiveness when pseudo-labeling is unreliable. This problem stems from the following facts: (1) pseudo-labels evaluated from a single sample may be inaccurate, especially when the model is poorly calibrated. Erroneous high confidence predictions from a single sample can result in many incorrect pseudo-labels, leading to noisy training. (2) Propagating pseudo-labels from unlabeled samples is unstable, since there is a margin between the pseudo-label and the ground-truth label. This fact is likely to cause the error accumulation in the pseudo-labeling process.
In response to the aforementioned challenges, we aim to leverage the characteristics of prototypes (i.e., centers of representations) from labeled samples to generate more reliable pseudo-labels. To achieve this, we introduce a novel loss called the Prototype Consistency (PC) Loss that ensures the stability of label propagation and enhances the accuracy of pseudo-labels. By incorporating the PC Loss with the techniques developed by the MixMatch family [
9,
16,
19], we propose the ProMatch algorithm, which offers an effective training framework for semi-supervised learning (SSL). The framework of ProMatch is depicted in
Figure 1. Initially, we construct two sets of prototypes from the semantic space and the prediction space using weakly augmented labeled data. Next, for a given unlabeled sample, we generate the pseudo-label by selecting the most similar
prediction-prototype based on the PC criterion. The PC criterion requires that the categories of the most similar
semantic-prototype and
prediction-prototype align with the predicted category of the most similar
prediction-prototype. Finally, we formulate the PC Loss by combining the unlabeled data with strongly augmented views, and then integrate supervised and unsupervised losses from MixMatch family methods to establish the ProMatch training framework. This approach allows us to improve the reliability of the pseudo-labeling process and the accuracy of the pseudo-label by leveraging the stability and consistency of prototypes. It is worthy noting that compared to the state-of-the-art SSL methods, the principal characteristics of our proposal lie in the fact that we use prototypes of labeled samples to exploit and propagate the pseudo-labels.
Our contributions can be summarized as follows:
We introduce a novel loss component called PC Loss, which addresses the limitations of traditional pseudo-labeling methods. By deriving the pseudo-label from the prediction-prototype of labeled data, the PC Loss ensures more precise and stable label propagation in semi-supervised learning.
By integrating the PC Loss with the techniques employed in the MixMatch family methods, we establish the ProMatch training framework. This framework combines the benefits of PC Loss and the existing approaches, resulting in an improved performance in semi-supervised learning tasks.
Extensive experimental results demonstrate that ProMatch achieves significant performance gains over previous algorithms on popular benchmark datasets such as CIFAR-10, CIFAR-100, SVHN, and Mini-ImageNet. These results validate the effectiveness and superiority of our proposed approach in the field of SSL.
3. Methods
In this section, we will give a detailed explanation of our newly proposed ProMatch framework. In
Section 3.1, we will give a brief definition of the semi-supervised learning (SSL) task. Then, we will introduce the generation of prototypes in
Section 3.2. Next, we present the Prototype Consistency Loss (PC Loss) in
Section 3.3. Finally, we combine the PC Loss into the MixMatch family methods and introduce the total objective in
Section 3.4.
3.1. Preliminaries
We consider the
K-class semi-supervised classification setting that define both labeled data
and unlabeled data
to train the model
f. The model
f is define as
which is composed of a semantic encoder
followed by a prediction classifier
. Note that
and
are the set of parameters of
and
. The weak augmentation (i.e., random crop and flip) and strong augmentation (i.e., RandAugment [
32]) are represented by
and
.
3.2. Prototype Generation
We create a set of
semantic-prototypes and a set of
prediction-prototypes from
. The process can be seen in
Figure 2. In more specific terms, we construct two buffers of memory queues,
and
, where each key corresponds to the class. Both
and
denote a memory queue for class
k with the fixed size
(which is equivalent to
) and store the feature points from
and
. Each
and
is calculated by averaging the feature points which are in
and
. Meanwhile, we update
and
for all
k at every step by pushing new features from labeled data in the batch and discarding the eldest ones when the queue is full. The prototype represents the center of clustering for each category from labeled data, which shows more stable propagation than a single sample.
3.3. Prototype Consistency Loss
In this section, we propose the PC Loss to help select the most dependable pseudo-label. First, for a given unlabeled sample, we obtain the semantic feature and prediction feature by the equation and over its weakly augmented version.
Second, we construct two similarity vectors
and
. Each element
/
represents the similarity between the semantic/prediction feature and the semantic/prediction prototype of each category, which can be represented as
In Equation (
2),
is the semantic feature of the weakly augmented unlabeled sample, and
is the
semantic-prototype of the
i-th category.
is the
i-th element of similarity vector
, which represents the similarity between
and
. In Equation (
3),
is the prediction feature of the weakly augmented unlabeled sample,
is the
prediction-prototype of the
i-th category.
is the
i-th element of similarity vector
, which represents the similarity between
and
. Note that we employ the cosine measure and Bhattacharyya coefficient [
33] to calculate the semantic similarity
and prediction similarity
, respectively.
Third, based on the similarity vectors, we select the most similar semantic-prototype and most similar prediction-prototype which are regarded as the closest semantic-prototype and prediction-prototype to the semantic and prediction feature, respectively. Where and , i and j represent the categories corresponding to the most similar semantic-prototype and most similar prediction-prototype; and represent the similarity vectors over the semantic and prediction space.
Finally, we formulate the PC loss as
where
is the cross-entropy,
is a filter function,
is the most similar
prediction-prototype,
returns the category having a maximum prediction probability, and
is the prediction of the unlabeled sample with strong augmentation, i.e.,
. Equation (
4) is the core of our approach, which highlights the prototype consistency criterion and the method to construct the loss function based on this criterion. Equation (
4) demonstrates that the most similar
prediction-prototype can be considered as a pseudo-label only if the following condition is true: the category of the most similar
semantic-prototype (i.e.,
i), the category of the most similar
prediction-prototype (i.e.,
j), and the predicted category of the most similar
prediction-prototype (i.e.,
) are equivalent. This condition includes two aspects of constraints: (1)
indicates that the features regarding with the unlabeled samples and the prototypes regarding the labeled samples are supported to be consistent both on the semantic and prediction space, ensuring the reliability for the pseudo-labeling process. (2)
indicates that the ground-truth category and the predicted category of the
prediction-prototype should be consistent, guaranteeing the accuracy of the pseudo-label. The details are shown in
Figure 3.
3.4. Total Objective
The proposed PC loss is a generic loss which can easily be coupled with other SSL algorithms. In this paper, we combine the PC loss with the MixMatch [
9,
16,
19] family methods and present a ProMatch framework, in which the final objective can be represented as:
In Equation (
5),
is the supervised loss computed by
.
is the unsupervised loss formulated as
.
is the filter function for confidence-based thresholding,
is the threshold, and
,
.
is the PC Loss.
and
are the hyperparameters that control the weights of the two losses.
The concrete algorithm of our method is shown in Algorithm 1.
4. Experiments
This section validates the effectiveness of ProMatch for semi-supervised learning tasks on three popular benchmarks of CIFAR-10 [
34], CIFAR-100 [
34], SVHN [
35], and Mini-ImageNet. First, we introduce the standard datasets in
Section 4.1. Second, the baselines related to our task are introduced in
Section 4.2. Third, we list the implementation details of the total experiments in
Section 4.3. Fourth, we conduct experiments on the standard datasets with different amounts of labeled data in
Section 4.4. Finally, we perform ablation experiments on each component. All experiments are based on the PyTorch deep learning framework, implemented on a Linux server configured with Intel(R) Core(TM) i7-10700F CPU @ 2.90 GHz and GeForce RTX 3090. The version of CUDA is 11.2 and the version of CuDNN is 8.
4.1. Datasets
CIFAR-10 [34]: CIFAR-10 is a widely used benchmark dataset in the field of computer vision and machine learning. The dataset consists of 60,000 color images and each image size is 32 × 32 pixels. It covers a total of 10 different categories and each category contains 6000 images. CIFAR-10 is split into two subsets: a training set and a test set. The training set contains 50,000 images and the test set contains 10,000 images. We randomly select 1000 and 4000 samples from the training set as labeled data, while the rest of the training set is used as unlabeled data.
CIFAR-100 [34]: CIFAR-100 is an extension of the CIFAR-10 dataset. It consists of 60,000 color images and each image size is 32 × 32 pixels. These images cover 100 diverse categories with 600 images per class. Similar to CIFAR-10, CIFAR-100 is split into a training set and a test set. The training set contains 50,000 images and each category contains 500 images. The rest of the 10,000 images form the test set and each category contains 100 images. We randomly select 10,000 samples from the training set as labeled data, while the rest of the training set is used as unlabeled data.
SVHN [35]: SVHN consists of the images of house numbers captured from Google Street View. It contains over 600,000 color images. Each image represents a cropped image patch containing a single digit from 0 to 9. SVHN consists of 10 classes. The training set contains 73,257 images and the test set contains 26,032 images. The image size in SVHN is 32 × 32. We randomly selected 1000 and 4000 samples from the training set as labeled data.
Mini-ImageNet: Mini-ImageNet is a sub-dataset of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). It is firstly used for few-shot learning. It consists of 100 randomly selected categories from ImageNet-1K, where each category of the training set contains 600 labeled images of size 84 × 84. For the SSL evaluation, we select 500 images from each class to form the training set and the rest of the 100 images are utilized for the test set.
4.2. Baseline
We compare ProMatch with six typical baseline methods.
VAT [
24]: maintains the consistency of the model’s output under adversarial perturbations.
MeanTeacher [
10]: keeps the consistency between model parameters and a moving average teacher model.
MixMatch [
9]: guesses low-entropy labels for data-augmented unlabeled examples and mixes labeled and unlabeled data using Mix-up.
PLCB [
36]: proposes to learn from unlabeled data by generating soft pseudo-labels using the network predictions.
ReMixMatch [
16]: introduces the distribution alignment and augmentation anchoring to upgrade MixMatch.
FixMatch [
19]: simplifies its previous works by introducing a confidence threshold into its unsupervised objective function. For the same unlabeled sample, FixMatch encourages consistency between weak and strong augmented images.
SimPLE [
20]: introduces a similarity threshold and focuses on the similarity among unlabeled samples.
FlexMatch [
30]: proposes Curriculum Pseudo Labeling, a curriculum learning approach to utilize unlabeled samples according to the status of model learning.
DoubleMatch [
37]: combines the pseudo-labeling technique with a self-supervised loss, enabling the model to utilize all unlabeled data in the training process.
NP-Match [
38]: adjusts neural processes (NPs) to semi-supervised learning and proposes an uncertainty-guided skew-geometric JS divergence to replace the original KL divergence in NPs.
Bad GAN [
39]: a generative-model-based SSL method, which is built up upon the assumption that good semi-supervised learning requires a bad generator.
Triple-GAN [
40]: also a generative-model-based SSL method, which is formulated as a three-player minimax game consisting of a generator, a classifier, and a discriminator.
4.3. Implementation Details
For most experiments in ProMatch, Wide ResNet [
41] is adopted as the backbone (WRN-28-2 for CIFAR10 and SVHN, WRN-28-8 for CIFAR-100) following [
19]. For Mini-ImageNet, we use ResNet-18 as the backbone. We train the model by using SGD [
42] with a momentum of 0.9 or AdamW [
43]. The initial learning rate is 0.03 with a cosine learning rate decay schedule of
, where
is the initial learning rate,
s is the current training step, and
S is the total training step. For CIFAR-100 on WRN-28-2 and Mini-ImageNet on ResNet-18, we use AdamW without learning rate scheduling. In addition, we use the exponential moving average (EMA) of the network parameter for evaluation and label guessing. Note that we use an identical set of hyperparameters for all datasets:
. More specific implementation details are shown in
Table 1.
4.4. Results
For all datasets, we perform labeled and unlabeled set segmentation by randomly sampling an equal number of images from all classes without replacement.
CIFAR-100: We evaluate the performance of ProMatch on CIFAR-100 with 10,000 labels under two different settings. First, we utilize Wide ResNet 28-8 [
41] as the backbone and set the number of weak augmentation
K to 4. We use the SGD [
42] optimizer and set the weight decay to 0.001. As shown in
Table 2, compared to FixMatch [
19], ProMatch achieves an accuracy rate improvement from 77.40% to 78.85%. Next, we evaluate the performance of ProMatch on CIFAR-100 by using the Wide ResNet 28-2 [
41] network. We change the optimizer from SGD [
42] to AdamW [
43]. The different hyperparameters of this experiment are set at
, with EMA decay = 0.04. The experimental result shows that ProMatch significantly outperforms the baseline methods by a large margin. Compared to SimPLE [
20], our method achieves an accuracy rate improvement from 70.82% to 72.71%. Moreover, ProMatch achieves nearly the best accuracy rate at about 500 epochs. It indicates that our approach is more stable and has a faster convergence rate.
CIFAR-10: We utilize the Wide ResNet 28-2 [
41] network to evaluate the accuracy of ProMatch on CIFAR-10 with 1000 and 4000 labels. We use SGD [
42] with a momentum of 0.9 as the optimizer. As shown in
Table 3, the accuracy rate of ProMatch reaches 95.01% and 95.83% on 1000 and 4000 labeled samples, respectively. In contrast, the accuracy rates of Bad GAN are 79.37% and 85.59% on 1000 and 4000 labeled samples, while the accuracy rates of Triple-GAN-V2 are 85.00% and 89.99% on 1000 and 4000 labeled samples. It indicates that our method significantly outperforms the classical generative-model-based SSL methods. Moreover, the performance of ProMatch is superior to the recent state-of-the-art methods, such as SimPLE, FlexMatch, and DoubleMatch. Note that compared to the increment of accuracy for CIFAR-100, ProMatch increases at a relatively small accuracy rate for CIFAR-10. This is because CIFAR-10 is a comparatively simple dataset compared to CIFAR-100. Many mainstream semi-supervised algorithms such as FixMatch and SimPLE have already achieved desirable results on CIFAR-10. The prediction accuracy rates of these methods using partial labeled samples (i.e., semi-supervised learning) are pretty close to the accuracy rates of the methods using the whole labeled samples (i.e., supervised learning). This means that the advancement space of the semi-supervised learning methods for the CIFAR-10 dataset is limited, since the accuracy rate derived by the fully supervised benchmark can be considered as the theoretical upper bound for the performance of the semi-supervised learning. All the same, the accuracy rate obtained by our proposal has reached 95.83% under 4000 labeled samples of CIFAR-10, which is not only the highest among the compared methods and but is also very near the upper bound.
SVHN: We use the Wide ResNet 28-2 [
41] network and set the same hyperparameters as CIFAR-10 to evaluate the performance of ProMatch on SVHN with 1000 and 4000 labels. As shown in
Table 3, ProMatch achieves 97.79% and 97.88% Top-1 accuracy rates for 1000 and 4000 labeled samples, respectively. Compared to Bad GAN and Triple-GAN-V2, our method achieves an accuracy rate improvement distinctly. Despite the outstanding performance of the baseline methods on the SVHN dataset, ProMatch achieves a higher accuracy rate compared to them.
Mini-ImageNet: We use the ResNet-18 network to assess the performance of ProMatch on a more complex dataset Mini-ImagNet. The number of labeled samples is 4000. We set the learning rate and weight decay to 0.02. Moreover, to elucidate the effect of erroneous high confident pseudo-labels, we recorded the error rates of pseudo-labels whose confidence threshold is
. The results are shown in
Table 4. From the table, we can observe that the error rates of pseudo-labels display inverse correlations with the test accuracy rates of the semi-supervised learning model. Among the comparisons, our method has the lowest error rate
of pseudo-labels and the highest accuracy rate
of the test dataset, indicating that the proposed ProMatch can effectively increase the accuracy of pseudo-labels and boost the performance of semi-supervised learning. Note that compared to SimPLE, the accuracy rate of our method increases by
, which demonstrates the effectiveness of our proposal for the complex dataset.
4.5. Ablation Study
We present extensive ablation studies to verify the effect of different components. We conduct these ablation studies on CIFAR-10 with 10,000 labels and use Wide ResNet 28-2 as the backbone. The reasons are as follows: (1) CIFAR-100 covers a certain number of classes and the image size is small; (2) the Wide ResNet 28-2 network is fast enough to train compared to Wide ResNet-28-8.
The Effect of the Weak Augmentation Number K. We exhibit the result of the predicted accuracy rate under different numbers of weak augmentation. The results are shown in
Table 5. As shown from the first row and the second row, we compare the accuracy rate by varying the number of
K in ProMatch. Compared to ProMatch on
, ProMatch on
achieves an accuracy rate improvement from 72.71% to 73.38%. As shown from the third row and the fourth row, we change the number of
K in ProMatch without the PC Loss. ProMatch on
and without the PC Loss reaches an accuracy rate of 69.94%, while the accuracy rate of ProMatch on
and without the PC Loss is only 69.07%. These experimental results demonstrate that increasing
K from 2 to 7 can improve the accuracy rate.
The Effect of PC Loss. To demonstrate the effect of the PC Loss, we conducted three experiments. The results are shown in
Table 5. First, as shown from the first row and the third row, we validate the performance of ProMatch and ProMatch without the PC Loss. ProMatch without the PC Loss only reaches an accuracy of 69.07%. Compared to ProMatch, the accuracy rate of ProMatch without the PC Loss decreases by 3.61%. Second, as shown from the second row and the fourth row, we increase the number of
K from 2 to 7. In this case, we compare the accuracy rate between ProMatch and ProMatch without the PC Loss. The accuracy rate of ProMatch without the PC Loss is 69.94%, which is lower than ProMatch by 3.44%. Finally, as shown from the fifth row and the sixth row, we change the augmentation type from RandAugment [
32] to the fixed augmentation. Based on this, we compare the accuracy rate between ProMatch and ProMatch without the PC Loss. ProMatch reaches an accuracy rate of 67.66%. The accuracy rate of ProMatch without the PC Loss is 67.41%. We find that the accuracy rate of ProMatch is improved as relatively small.
To further demonstrate the effect of the PC Loss, we evaluate the recall rate, Top-1 accuracy rate, and the F1 score on CIFAR-100. Due to the large number of categories in CIFAR-100, we randomly select seven categories to present their results. As shown in
Figure 4a,b, ProMatch improves the precision of the classes while achieving an enhanced recall rate. As shown in
Figure 4c, ProMatch reaches a higher F1 Score, which further demonstrates that the introduction of the PC Loss has improved the algorithm’s overall performance. Besides, we visualize the t-SNE of the features on ProMatch without the PC Loss and ProMatch. As shown in
Figure 5, ProMatch exhibits a better clustering performance and enhances the accuracy of classification. Finally, we visualize the Top-1 and Top-5 accuracy rates on the test dataset and validation dataset. As shown in
Figure 6, ProMatch exhibits a rapid convergence speed on both the test dataset and validation dataset. Compared to ProMatch without the PC Loss, ProMatch obtains a high accuracy rate in a short period of time.
Algorithm 1 ProMatch algorithm. |
- 1:
Input: A batch of labeled data and unlabeled data , network for semantic encoder , and prediction classifier , buffer of memory queue and . Weak-augmentation , strong-augmentation , Cosine similarity function , Bhattacharyya coefficient , the number of weak-augmentation K, cross-entropy loss function .
|
- 2:
Prototype Generation:
|
- 3:
| ▹ Generate semantic-prototype and store by label k. |
- 4:
| ▹ Generate prediction-prototype and store by label k. |
- 5:
Prototype Consistency:
|
- 6:
for to do
|
- 7:
| ▹ Apply weak data augmentation to |
- 8:
| ▹ Apply strong data augmentation to |
- 9:
| ▹ Compute semantic feature across using EMA |
- 10:
| ▹ Compute average predictions across using EMA |
- 11:
| ▹ Apply temperature sharpening to the average prediction |
- 12:
| ▹ Compute predictions across using EMA |
- 13:
| ▹ Similarity between and |
- 14:
| ▹ Similarity between and |
- 15:
end for
|
- 16:
| ▹ Obtain the most similar semantic-prototype |
- 17:
| ▹ Obtain the most similar prediction-prototype |
- 18:
Loss:
|
- 19:
|
- 20:
|
- 21:
|
- 22:
return
| ▹ Total loss |
The Effect of Different Augmentation Strategy. To verify the effect of the different augmentation strategies, we simply replace RandAugment [
32] with the fixed augmentation. The results are shown in
Table 5. First, as shown from the first row and the fifth row, we compare the accuracy rate between ProMatch with fixed augmentation and ProMatch with RandAugment. ProMatch with fixed augmentation only reaches an accuracy rate of 67.66%. The accuracy rate is much lower than ProMatch with RandAugment. Second, as shown from the third row and the sixth row, we evaluate the effect of different augmentation strategies on ProMatch without the PC Loss. The accuracy rate of ProMatch with RandAugment and without the PC Loss is 69.07%, while ProMatch with fixed augmentation and without the PC Loss only achieves a 67.41% accuracy rate. These results demonstrate that RandAugment [
32] improves the model’s robustness to different samples and alleviates the overfitting issues.
The Effect of Hyperparameters. In this section, we evaluate the effect of several hyperparameters such as the confidence threshold
, PC loss coefficient
, and memory buffer size
under different values. The results are shown in
Table 5 and
Figure 7.
The first row and 7th–9th row of
Table 5 show the accuracy rates of our proposal ProMatch under different values of the confidence threshold
. We also plot these results in
Figure 7a. From these results, we can observe that the accuracy rates of our method have a positive correlation with the confidence threshold. Specifically, ProMatch achieves the optimal accuracy rate of 72.71% at the confidence threshold
.
The first row and 11th–13th row of
Table 5 show the accuracy rates of ProMatch under different PC loss coefficients
. We also plot these results in
Figure 7b. From these results, we can observe that ProMatch achieves the optimal accuracy rate at
.
The first row and 14th–17th row of
Table 5 show the accuracy rates of ProMatch under different buffer sizes
. We also plot these results in
Figure 7c. These results demonstrates that ProMatch achieves the optimal accuracy rate at
.
5. Conclusions
This paper proposed a new framework ProMatch to improve the performance of SSL. ProMatch enhances the reliability and accuracy of pseudo-labeling by considering the PC, which maintains the consistency among the ground-truth category of the semantic-prototype, the ground-truth category of the prediction-prototype, and the predicted category of the prediction-prototype. According to the PC criterion, we formulated a new PC loss by selecting the most-similar prediction-prototype that is closest to the predicted distribution for the unlabeled sample as the pseudo-label. Extensive experiments are conducted on CIFAR-10, CIFAR-100, SVHN, and Mini-ImageNet datasets. The results demonstrate the effectiveness of our proposal. In addition, it should be noted that the performance of ProMatch is mainly dependent upon the reliability of the prototypes regarding the labeled samples. Thus, our proposal may not be adept at dealing with some specific tasks such as the imbalanced category task and fine-grained recognition task, where the prototypes of different categories are extremely indistinct. In our future work, we will devote to making our method more robust against various specific circumstances.