Semi-Supervised Medical Image Segmentation Guided by Bi-Directional Constrained Dual-Task Consistency

Pan, Ming-Zhang; Liao, Xiao-Lan; Li, Zhen; Deng, Ya-Wen; Chen, Yuan; Bian, Gui-Bin

doi:10.3390/bioengineering10020225

Open AccessArticle

Semi-Supervised Medical Image Segmentation Guided by Bi-Directional Constrained Dual-Task Consistency

by

Ming-Zhang Pan

¹,

Xiao-Lan Liao

¹,

Zhen Li

^2,3,

Ya-Wen Deng

¹,

Yuan Chen

¹ and

Gui-Bin Bian

^3,*

¹

School of Mechanical Engineering, Guangxi University, Nanning 530004, China

²

School of Electronic and Information Engineering, Tongji University, Shanghai 200092, China

³

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

Bioengineering 2023, 10(2), 225; https://doi.org/10.3390/bioengineering10020225

Submission received: 13 January 2023 / Revised: 28 January 2023 / Accepted: 31 January 2023 / Published: 7 February 2023

(This article belongs to the Section Biosignal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Background: Medical image processing tasks represented by multi-object segmentation are of great significance for surgical planning, robot-assisted surgery, and surgical safety. However, the exceptionally low contrast among tissues and limited available annotated data makes developing an automatic segmentation algorithm for pelvic CT challenging. Methods: A bi-direction constrained dual-task consistency model named PICT is proposed to improve segmentation quality by leveraging free unlabeled data. First, to learn more unmarked data features, it encourages the model prediction of the interpolated image to be consistent with the interpolation of the model prediction at the pixel, model, and data levels. Moreover, to constrain the error prediction of interpolation interference, PICT designs an auxiliary pseudo-supervision task that focuses on the underlying information of non-interpolation data. Finally, an effective loss algorithm for both consistency tasks is designed to ensure the complementary manner and produce more reliable predictions. Results: Quantitative experiments show that the proposed PICT achieves 87.18%, 96.42%, and 79.41% mean DSC score on ACDC, CTPelvic1k, and the individual Multi-tissue Pelvis dataset with gains of around 0.8%, 0.5%, and 1% compared to the state-of-the-art semi-supervised method. Compared to the baseline supervised method, the PICT brings over 3–9% improvements. Conclusions: The developed PICT model can effectively leverage unlabeled data to improve segmentation quality of low contrast medical images. The segmentation result could improve the precision of surgical path planning and provide input for robot-assisted surgery.

Keywords:

medical image processing; pelvic CT segmentation; semi-supervised learning; interpolation consistency regularization; dual-task consistency

1. Introduction

Preoperative pelvic Computed Tomography (CT) segmentation is a key technology in computer-assisted surgery and minimally invasive surgical robot navigation [1,2,3,4,5]. The semantic segmentation results of soft and hard tissues can provide supplementary information on pathology and anatomy, help accurate diagnosis, and provide surgical image guidance [3]. Furthermore, the segmentation results can support the surgical path planning and postoperative evaluation [4,5].

However, developing an automatic segmentation algorithm for soft and hard tissue of pelvic CT faces many challenges. The major challenge is the similarity of gray-scale features and local texture between tissues. As shown in Figure 1a, the contrast between soft and hard tissues is relatively low. The CT intensity of the fracture edge is similar to that of soft tissue, and the fracture edge is fuzzy. In addition, the shape of fracture blocks cannot be predicted, which makes it hard to segment according to prior knowledge. As shown in Figure 1b, the contrast between different soft tissues is relatively low. The local features such as color (threshold intensity) and texture are highly similar for muscles and other pelvic cavity organs. In computer-assisted pelvic surgery, inaccurate recognition and segmentation of soft tissues may damage organs, resulting in a severe iatrogenic injury.

In the past years, the convolutional neural network (CNN), with its strong non-linear modeling capability [6] and capturing non-explicit feature capability [7], has demonstrated remarkable results in medical image segmentation [8]. However, training a well-trained pelvic CT segmentation model is usually at the expense of requiring a large-scale high quality-per-pixel annotated dataset [9,10]. Unlike natural images, obtaining a large, labeled dataset in the medical field is extremely difficult. The number of samples in the dataset limits the complexity of the networks [11]. Stacking complexity for the network, such as network depth and additional encoder structure, may become prone to a redundant use of information and give rise to over-fitting performance on the source dataset [12,13]. Therefore, increasing the complexity of CNN may not be the most effective choice for pelvic anatomical segmentation with a small dataset or even without a dataset.

An alternative solution to reduce the labeled data burden is the semi-supervised learning (SSL) method [14]. In recent studies, interpolation consistency training (ICT) outperforms other state-of-the-art methods in both natural images [15] and medical images [16], making it an appealing approach to SSL. The ICT method augments the input samples in a pixel-level interpolation perturbation manner. It increases the ability to capture detailed features by resisting the subtle differences [17,18] between the interpolation augmentation data. However, this manner ignores the helpful underlying information inside the data itself. Furthermore, the consistency regularization method usually has loss caused by some unpredictable perturbations [19]. The perturbation may strengthen the noise interference to the network learning, resulting in the wrong classification of pixels. Some studies [20,21] showed that integrating network prediction in different training processes can improve the quality of semi-supervised prediction. Therefore, this work designed a pseudo-label supervision module as an auxiliary supervision task, focusing on the undisturbed feature structure of the original unlabeled data and integrating the prediction consistency of the pseudo-label supervision module and the interpolation module to produce more reliable predictions.

Specifically, the proposed interpolation-based pseudo-supervision (PICT) consists of two consistency tasks: the interpolation consistency task and the pseudo-supervision task. The former task encourages the teacher model predictions of pairwise unlabeled interpolation data to be consistent with the student model interpolation prediction of unlabeled data, which is a pixel-level and model-level consistency. The latter task utilizes the unlabeled predictions of the teacher model as pseudo-labels to conduct supervision on student unlabeled outputs and further encourages consistency between the two network predictions, which is both the model-level and task-level consistency. The two joint tasks act in a complementary manner and successfully tackle both low-contrast tissues and high-similar feature problems. The contributions of this paper are summarized as follows:

A bi-directional constrained dual-task consistency method is proposed; the PICT enhances the ability to learn data features by resisting subtle differences at pixel, model, and task level, and can effectively capture and infer the tissue semantic feature in the low contrast area of pelvic CT.
A pseudo-supervision module was designed as an auxiliary supervision task to learn the underlying information of original unlabeled data without perturbations, so as to constrain some false predictions of interpolation.
A multi-object pelvic dataset annotated by experts was developed, which contains 100 CT and subdivides the muscles, tissues, and bones with extremely low contrast into seven categories. PICT achieves state-of-the-art performance in three challenging medical datasets: ACDC, CTPelvic1k, and the individual dataset Multi-tissue Pelvic.

2. Related Work

The SSL method can be roughly grouped into four categories: adversarial learning method, self-training method, co-training method, and consistency regularization method [22].

2.1. Adversarial Learning Method

The adversarial method is a process in which two networks compete against each other [23]. One is the generation network, the other one is discrimination network. The generation network confuses the discrimination network by generating false data. The task of the discriminating network is to distinguish whether the data comes from the generator or the truth. However, most studies focus on small-resolution data and small-scale data. Medical data like pelvic CT images are usually accompanied by low contrast, complex, and large-scale characteristics, making it more challenging to generate fine details for the generator [24].

2.2. Self-Training Method

The self-training method usually uses a pre-trained model to generate pseudo-labels of unlabeled data to expand the limited labeled dataset and then train the model until the performance improvement can be ignored [25]. However, in the field of medical imaging, large public datasets are scarce, so it is difficult to match a suitable pre-trained model. Due to the difference in distribution and the mismatch of samples, it is not simple to extend semi-supervised to cross-domain data [26]. The second disadvantage is that the quality of pseudo-labels is generally less reliable in early training. The incorrect prediction may be strengthened, resulting in the worse performance of the model [27].

2.3. Co-Training Method

The co-training method usually trains two models with different initializations simultaneously and encourages them to take each other’s predictions as pseudo-supervision signals [28]; ref. [29] put forward the view that training models with strong tags and pseudo-tags may lead to disordered back propagation. Moreover, this method also has the disadvantage of self-training, that is, the low confidence of early pseudo-labels. Not only that, the co-training method usually requires high computational costs and time.

2.4. Consistency Regularization Method

The consistency regularization method follows the assumption that the predictions of the same input should be consistent, which is a method of expanding dataset and avoiding overfitting through technical means, such as noise perturbations [30], data augmentation [31], and mixing up [32]. Tarvainen [33] designs a mean teacher model, which considers the exponential moving average (EMA) of student model parameters as the teacher model parameters. However, the random perturbations of the consistent models are inefficient in high-dimensional space, because only a tiny proportion of input perturbations can push the distribution decision boundary of unlabeled data into the low-density region, which may result in the loss of universality [34]. Vikas Verma [15] further introduced a pixel-level consistency and confirmed the effectiveness of this method, namely interpolation consistency training. This method enforces the low-density separation to achieve sample aggregation of the same category and the separation of samples of different categories, giving rise to the accuracy of model recognition.

However, the existing consistency methods basically enforce consistency on the unlabeled perturbations data, and do not make use of the original unlabeled data structure characteristics. The main disadvantage is that some perturbations may not be in the adversarial direction, in which the network is liable to misclassify the pixels, leading to the loss of generalization ability [16,35]. Thus, this work designs a pseudo-supervision module as an auxiliary task, with the original unlabeled data as input of the student model and teacher model, to achieve the consistency of the data underlying information between the two models, which can limit the error prediction of the interpolation data. Next, the method proposed in this paper will be described in detail.

3. Materials and Methods

3.1. Overview of Network Architecture

The overall framework is illustrated in Figure 2, which consists of two feature extractors, the supervised learning branch and the semi-supervised learning branch. The two feature extractors share the same backbone of U-Net architecture and follow the spirit of mean teacher. The semi-supervised branch contains two mutually constrained tasks: the interpolation consistency regularization task and pseudo-supervision task. Assuming that the training set D consists of N labeled data and M unlabeled data, they are denoted as

D_{L} = {\{(x_{i}, y_{i})\}}_{i = 1}^{N}

and

D_{U} = {\{(x_{i}, y_{i})\}}_{i = 1}^{N}

, respectively. For the 2D dataset,

x_{i} \in R^{H \times W}

represents the input pixel, and

y_{i} \in {\{0, 1\}}^{H \times W}

represents the corresponding ground-truth annotation. The goal of the approved PICT is to minimize the following combined function:

\min_{θ} \underset{s u p e r v i s e d l o s s}{\underset{︸}{\sum_{i = 1}^{N} L_{s u p}}} + λ_{d} (t) \underset{s e m i - s u p e r v i s e d l o s s}{\underset{︸}{\sum_{i = N + 1}^{N + M} (L_{c o n} + L_{p s})}},

(1)

where

L_{s u p}

is the supervised loss only used for labeled data

D_{L}

,

L_{c o n}

is the consistency loss designed for interpolation data, and

L_{p s}

is the pseudo-supervision. Consequently, the PICT optimizes the network in a semi-supervised manner by jointly using labeled data, unlabeled data, and interpolation data. Here, we introduce the time-dependent Gaussian warming-up function [33] as a balance factor to control the trade-off between the supervision loss and semi-supervised loss,

λ_{d} (t) = e x p (- 5 {(1 - t / t_{m a x})}^{2}),

(2)

where

t

denotes the current training step, and

t_{m a x}

is the maximum training step.

3.2. Supervision Task Design

The supervision part conducted the combination of pixel-wise Dice Loss

L_{D i c e}

and Cross-Entropy (CE) Loss

L_{C E}

to evaluate the quality between the student network output and ground truth label and minimize the following loss function to update the weight:

L_{s u p} = \sum_{i = 1}^{N} L_{D i c e} (f (x_{i}; θ), y_{i}) + \sum_{i = 1}^{N} L_{C E} (f (x_{i}; θ), y_{i})

(3)

where the

f (x_{i}; θ)

and

θ

represent the segmentation confidence maps and weights of the student model.

3.3. Interpolation Consistency Regularization Task Design

The interpolation task considers two unlabeled image data-point and interpolates the two unlabeled image data-point

M_{α} (x_{i}, x_{j}) = \sum_{i, j = N + 1}^{N + M} [μ x_{i} + (1 - μ) x_{j}]

(4)

where

M_{α} (x_{i}, x_{j})

represents the unlabeled interpolation data, and

μ

is the interpolation factor and follows the beta distribution

μ ~ Beta (α, β)

, for α,β∈(0, ∞),

μ

∈[0,1]. α and β are the hyper-parameters of the interpolation factor, which controls the strength of interpolation between data pairs;

α

is set to be consistent with

β

[36], following the hyper-parameters setting [16] so that each update randomly generates

μ

from

Beta (0.2, 0.2)

. When

μ

tends to zero, the interpolated image is more similar to data point 1; when

μ

tends to 1, the interpolated image is more similar to data point 2. This pixel-level interpolation generates the augmented data, which can effectively avoid data overfitting, as shown in Figure 3a. Next, consistency regularization is applied between the unlabeled interpolation predictions of the student model and the interpolation of the teacher model unlabeled predictions, as can be seen in Figure 3b. The network enhances the ability to learn the detailed feature of data by resisting the subtle pixel-level difference of augmented data. In a nutshell, we first interpolate the input of the student model, then interpolate the output of the teacher model, and finally force the output of the two parallel models to be consistent by

L_{2}

Loss:

L_{c o n} = \sum_{i, j = N + 1}^{N + M} {‖f (M_{α} (x_{i}, x_{j}); θ) - M_{α} (f (x_{i}; θ^{'}), f (x_{j}; θ^{'}))‖}^{2},

(5)

where

f (\cdot; θ^{'})

and

θ^{'}

represent the segmentation confidence maps and weight of teacher model, respectively. Here, the teacher weights of

θ^{'}

are updated by an exponential moving average (EMA) of the student weights.

M_{α} (f (\cdot; θ^{'}), f (\cdot; θ^{'}))

on the right side of the equation represents the interpolation of the teacher model outputs:

M_{α} (f (x_{i}; θ^{'}), f (x_{j}; θ^{'})) = \sum_{i, j = N + 1}^{N + M} [α f (x_{i}; θ^{'}) + (1 - α) f (x_{j}; θ^{'})],

(6)

3.4. Pseudo-Label Supervision Task Design

To resist the feature interference caused by some interpolation perturbations, an auxiliary task of pseudo-label supervision was introduced to focus on the feature structure of the original data without perturbations:

L_{p s} = \sum_{i = N + 1}^{N + M} L_{D i c e} [f (x_{i}; θ), P (f (x_{i}; θ^{'}))],

(7)

where

P (f (x_{i}; θ^{'}))

is the max label map of teacher model segmentation confidence map, that is, the pseudo-segmentation map. The pseudo-segmentation map of the teacher model will act as the pseudo-label for guiding the student model. The approach uses the idea of transforming the prediction diversity as auxiliary supervision signals to strengthen the training of the two parallel networks.

3.5. Data and Label Preprocessing

This section introduces information about the test datasets: Multi-tissue Pelvic dataset, CTPelvic1k dataset [37], and Automated Cardiac Diagnosis Challenge (ACDC) dataset [38]. More detailed statistics of these three datasets are listed in Table 1.

3.5.1. Multi-Tissue Pelvic Dataset

To evaluate the anatomical segmentation performance of proposed PITC methods for pelvic CT, 100 slices with the size of 512 × 512 pixels from the total of 6251 slices were randomly selected to form the Multi-tissue Pelvic dataset. The window width of all slices was set to 60 HU, the window level was set to 500 HU, and then a median filtering algorithm was used to reduce noise. Under the medical annotation software Pair, seven categories in each slice were manually delineated by three experienced radiologists. The seven categories were background, miscellaneous intra-pelvic content (MIPC), bone, muscle, subcutaneous adipose tissue (SAT), inter-muscular adipose tissue (IMAT), and intra-pelvic gas (IPG). An authoritative radiologist finally checked the labeled data to minimize the individual errors.

In the preprocessing phase, the image edges were randomly cropped according to pelvic region of ground-truth, randomly splitting the Multi-tissue Pelvic dataset into 80 images for training, 15 images for testing, and 5 images for validation.

3.5.2. CTPelvic1k Dataset

The CTPelvic1k dataset comes from a public pelvic 3D CT dataset. Selecting the first 70 pelvic fracture CT from subset-6 with mean size of 512 × 512 × 345 voxels to form the dataset. The pelvic regions were annotated as five categories in the ground-truth: background, sacrum, left hip (LH), right hip (RH), and lumbar spine (LS). A random 50 CT were assigned to the training set, 10 to the validation set, and 10 to the test set. Finally, a centering crop was used to reduce the size of the 3D CT volume.

3.5.3. ACDC Dataset

The ACDC dataset is hosted in MICCAI 2017 open challenge and contains of 200 3D short-axis MRI scans from 100 patients, along with the expert annotations for three cardiac regions: myocardium (Myo), left ventricle (LV), and right ventricle (RV). Following existing work [39], this experiment used a split with a ratio of 140:20:40 for training, validation, and testing. All the 3D voxels were resized into 256 × 256 pixels and readjusted for intensity per slice to [0,1].

3.6. Implementation Details

3.6.1. Network Architectures

For all experiments on the same dataset, the U-Net [40] model was fixed as the baseline for comparison. For the training experiments on Multi-task Pelvic, ACDC dataset, and CTPelvic1k dataset, this work considered 2D, 2D, and 3D U-Net models, respectively. The U-Net structure consisted of a convolution layer, followed by a normalization layer InstanceNorm, and finally, an active layer ReLu.

3.6.2. Training Procedure

All the experiments were implemented with PyTorch under an NVIDIA GeForce RTX 3090 and trained by the Stochastic Gradient Descent (SGD) optimizer with a momentum of 0.9 and weight decay of 10⁻⁴ and a poly-learning rate strategy with an initial learning rate of 0.01. The hyper-parameter

α

of interpolation factor was set to 0.2 [16]. For semi-supervised learning, this work set up a two-stream batch input strategy. The batch sizes of ACDC, CTPelvic1k, and Multi-task Pelvic dataset were set to 8, 4, and 4, respectively.

To alleviate the over-fitting of networks, we employed the random flip, rotation data, and random clipping augmentation techniques [41,42]. The ACDC, CTPelvic1k, and Multi-tissue Pelvic dataset were fed into a network with patch sizes of 256 × 256, 112 × 112 × 112, and 256 × 256, respectively. In the inference phase, a sliding window was used to obtain the final result of CTPelvic1k and Multi-tissue Pelvic with a corresponding stride of 64 × 64 × 64 and 64 × 64. For ACDC, predictions were generated slice-by-slice in the form of 2D and were stacked into a 3D-volume form.

3.7. Evaluation Criteria

To quantitatively evaluate the segmentation performance, the experiments followed the two commonly used complementary evaluation metrics for the ACDC dataset:

D S C

and Hausdorff Distance (

H D

).

D S C

is defined as the degree of overlap between the segmentation region and the ground-truth region, which is an index to measure the region mismatch and is expressed as

D S C (G, S) = \frac{2 |S \cap G|}{|S| + |G|},

(8)

where

D S C \in [0, 1]

; the higher the value of

D S C

, the better the segmentation performance.

S

is the predicted labels,

G

is the corresponding ground-truth labels,

|S \cap G|

is the intersection of

S

and

G

.

H D

is a boundary-based metrics, which is used to measure the boundary errors, and can be defined as follow:

H D (G, S) = m a x \{d_{S G}, d_{G S}\} = m a x \{\max_{s \in S} \min_{g \in G} d (S, G), \max_{g \in G} \min_{s \in S} d (G, S)\},

(9)

where

d_{S G}

is the largest distance of nearest-neighbor from points in

S

to

G

. Here, taking 95% of

H D

to eliminate the influence of a small subset of outliers and denoted as

H D_{95}

[16,21,38].

For the CTPelvic1k dataset, experiment performances are evaluated by using the region based metrics

D S C

as mentioned above.

For the Multi-tissue Pelvic dataset experiments, the three metrics used include

D S C

, mean Intersection over Union (

m I o U

), and mean Accuracy (

m A c c

). The

m I o U

is defined as

m I o U (G, S) = \frac{|S \cap G|}{|S \cup G|},

(10)

where

|S \cup G|

is the union of

S

and

G

.

m A c c

is a pixel-level metric, which indicates the percentage of pixels with correct prediction in all pixels:

m A c c = \frac{T P + T N}{F N + T P + F P + T N},

(11)

where

T P

is true positive,

T N

is true negative,

F N

is false negative, and

F P

is false positive.

4. Results

4.1. Ablation Analysis of ACDC

4.1.1. Ablation Experiments of Related Works

Under the experimental setup with a batch size of eight and three labeled samples, two evaluation metrics are shown in Table 2. The first section lists the comparison of related works, and the second section gives the choice of pseudo-label supervision loss.

From the experiment results, the PICT with the best DSC score outperforms all the related works. The diagram illustration of the related work is shown in Figure 4. MT can be seen as the basic framework for other semi-supervised methods. The CPICT is based on CPS and adds an auxiliary interpolation task. Experimental results showed that the

DSC

was significantly improved by around 4%. However, CPICT needed to train two networks with different initialization, which is time-consuming. Considering the high time requirement for clinical applications, the PICT is based on simple and efficient ICT and adds a pseudo-supervision module. The time cost caused by the auxiliary task of pseudo-supervision can even be conditionally ignored in small batch size. PICT further improves segmentation performance at a low time cost. The experimental results of CPICT and PICT show that the joint interpolation consistency module and the pseudo-supervision module are practical. These two modules focus on the original feature structure of unlabeled data and its interpolation augmentation data, which can complement each other to improve the ability of capturing detailed features.

The second section discusses the influence of pseudo-loss function. The experiment results show that when using the same CE loss as CPS, the optimization effect is not apparent, as shown in PICT (CE). When the PICT (CE + Dice) uses such a supervised loss manner in the combination of CE loss and Dice loss as the pseudo-supervision loss, the performance of

DSC

further improves by around 2%. Thus, it is reasonable to speculate that Dice may be more effective as a medical-image pseudo-supervision loss. Then, we tried only using Dice loss as a pseudo-loss, namely PICT (Dice); as the result was greatly improved, we finalized it as the final plan model.

Figure 5 shows the loss function during the three labeled case-training processes. It shows that the loss function of the proposed model converges well.

4.1.2. Ablation Experiments of Backbone Models

Taking the examples of seven labeled cases, three common models (E-Net [43], P-Net [44], and U-Net [40]) were compared to discuss their generalization ability. As listed in Table 3, the fully supervised training can be regarded as the lower limit and denoted as LS. Compared with the fully supervised backbone model, the proposed semi-supervised framework improves by 3–9% by the utilization of additional free unlabeled data, showing its potential for the utilization of unlabeled data. Compared with ICT and CPS, PICT achieves the best DSC score on the three models, showing its generalization ability in adapting different models. For a fair comparison, the subsequent experiments will use the same U-Net backbone model as [16,21,39,45] to verify the effectiveness of the proposed semi-supervised module with unlabeled data.

4.1.3. Comparison with Baseline and Existing Methods

We compared the proposed method PICT against baselines, the latest six methods, and the fully supervised method. The baselines and fully supervised refer to the low-bound performance and upper-bound performance of the used 3D U-Net model on labeled data and are denoted as LS and FS, respectively. The latest methods include Interpolation Consistency Training (ICT) [16], Mean Teacher (MT) [33], Cross-Pseudo Supervised (CPS) [35], Uncertainty Aware Mean Teacher (UAMT) [21], Cross Teaching between CNN and Transformer (CNN-Trans) [39], and Uncertainty Rectified Pyramid Consistency (URPC) [45]. For the semi-supervised method, MT method can be regarded as a prototype and semi-supervised baseline.

Table 4 lists the results comparison of using three labeled cases (six volumes) and seven labeled cases (14 volumes). Report values show that all the semi-supervised methods yield improvements over the supervised baseline LS and semi-supervised baseline MT. For three labeled cases, PICT largely improves the

DSC

score from 61% of LS and 67% of MT to 81.9% compared with the two baselines, and improves it by 10–15% in comparison to the latest methods. For seven labeled cases, the ICT and CPS improved by 0.6% and 0.3% in comparison with MT. The improvement confirms that these two kinds of models are effective. The proposed improved approach PICT based on ICT and CPS further outperforms MT by 1.2% and achieved the most advanced performance of both mean

DSC

score and

{HD}_{95}

with 87.18% and 5.46 mm, respectively. The minimum

{HD}_{95}

indicates that our method has the ability to regularize dividing boundaries and avoid large gaps in the real area of the ground-truth.

As shown in Figure 6, the visualization results can more intuitively reflect the model performance. It can be observed that the segmentation prediction of our method is closer to the ground-truth mask compared with other methods. These experiment results on ACDC show the potential of the proposed semi-supervised method in alleviating the label cost.

4.2. Performance Comparison on CTPelvic1k

The blurry edges caused by fracture are the major challenge for the accurate segmentation of pelvic CT. In order to investigate the performance of the proposed technique, we set three comparisons with different labeled levels on the fracture dataset CTPelvic1k, as seen in Figure 7. From the comparison results, our method still has certain advantages in most cases. The mean

DSC

scores of CPS, ICT, and PICT are 95.73%, 95.91%, and 96.12%, respectively. As the labeled cases were up to 15, the corresponding mean

DSC

score further improved to 95.90%, 96.12%, and 96.42%, respectively. Taking 10 labeled data points as an example, the

DSC

score comparison of four categories are listed in Table 5. Results show that PICT achieved the highest performance in all categories except sacrum. This work did not use any preprocessing as in other pelvic segmentation studies, such as oversampling operations, to deal with the severe imbalance between the sacroiliac joint and lumbosacral joint [36]. All the contributions come from the model itself.

The 3D visualization of the segmentation results is convenient for doctors to know the fracture types at a glance. Figure 8 visually shows several typical comparisons of each method. It can be observed from the figure that all the models performed well on fracture pelvic CT. When paying attention to the 3D pelvic model itself, there was little difference from the ground truth. However, in small, labeled cases like 5, these three methods have different degrees of noise segmentation phenomenon, and the noise segmentation in ICT is the most obvious, followed by PICT. The reason may be that both ICT and PICT take disturbance data as input, which may bring some unpredictable noise. As the number of labeled images increases, PICT shows a higher

DSC

score and less noise segmentation. This may be due to the advantages of yhe pseudo-supervision part, and the chosen loss function is more suitable for medical images than CPS. The experiment shows that the proposed PICT holds for 3D pelvic fracture CT analysis.

4.3. Performance Comparison on Multi-Tissue Pelvic1k

This paper further studied the segmentation performance of the proposed method on Multi-tissue Pelvic. To demonstrate superiority, this section quantitatively studied the performance under different labeled ratios; the mean

DSC

score comparison of the six categories is shown in Figure 9. From the histogram, the proposed method improves consistently over almost every labeled–unlabeled ratio. Compared with ICT and CPS, the combined model PICT has achieved about 1% gain on average in these eight labeled cases. The comparative analysis of the other two indicators

mIoU

and

mAcc

are listed in Table 6. Similarly, PICT has an advantage of around 1% over the baseline model in most cases. In the case of 40 labeled images of PICT, these three metrics (

DSC

,

mIoU

, and

mAcc

) reached 79.4%, 71.1%, and 81.8%, respectively. We compare this work with [45], which also studied pelvic tissue and muscle segmentation. They discuss a fully supervised model with 540 CT slices as input and reported that the corresponding three metrics were 74.9%, 63.6%, and 76.6%, respectively. The model proposed in this paper shows excellent advantages in each indicator and the amount of training data. This work is expected to further introduce a three-dimensional visualization model for simultaneous segmentation of multiple tissues of the pelvis.

To further investigate the feasibility of the proposed method, the comparison of each category’s performance with 40 labeled as an example are listed in Table 7. The proposed PICT achieved the advantages of three metrics in most cases. Figure 10 visuallys shows the comparison. We randomly selected several test images as examples; the visualization results show that the predictions of the proposed PICT are closer to the ground-truth in comparison with ICT and CPS. Similarly, this experiment did not use any post-processing, including dilation and fixed thresholding. Therefore, there will be some mispredictions inside the bone.

The proposed algorithm improves consistently over almost every category under every labeled–unlabeled ratio. The segmentation performance results show the ability of PICT in 2D pelvic CT analysis with multi tissues.

4.4. Training Time Costs

Finally, we list the time spent on three datasets during training processes in Table 8. Table shows that the lowest time cost was for ICT, followed by PICT, and finally CPS. The proposed PICT achieved state-of-art performance with a more acceptable time cost than CPS.

5. Discussion

Based on the above quantitative and qualitative results, it can be found that PICT achieves state-of-the-art performance and efficiently reduces the costs of medical image research. The ablation experiments on the open ACDC dataset showed that the combination of the interpolation module and pseudo-supervision module is effective. The pseudo-supervision module can constrain some ineffective interpolation perturbations where pixels are prone to misclassify. The data augmentation of pixel interpolation can make up for the unstable quality of the pseudo-label. This idea is also confirmed on two pelvic datasets: CTPelvic1k and Multi-tissue dataset.

Facing medical image segmentation with different types and tasks, the PICT proposed in this paper still has certain advantages. These results show that the proposed method is progressive and generalizable. This has clinical significance for exploring pelvic data analysis and other medical data analysis.

However, the proposed PICT also has a limitation. Manually annotating a multi-tissue 3D pelvic CT dataset is associated with difficulties, even when training in a semi-supervised manner. In future work, we plan to develop a 3D pelvic dataset containing multiple tissue and organs. For the semi-supervised model, we plan to automatically balance the weights of the interpolation consistency part and pseudo-supervision part.

6. Conclusions

This paper proposed a bi-direction constrained dual-task consistency semi-supervision method named PICT for few-label medical images, consisting of an interpolation consistency regularization task and a pseudo-supervision task. It can leverage free unlabeled data to capture more tissue semantic feature in the low contrast area and is sensitive to fracture margins with low time cost. The experiments on ACDC 2018, CTPelvic1k, and Multi-tissue datasets proved that the proposed PICT achieves state-of-the-art performance in comparison to the latest semi-supervised methods. This model can be used to visualize the anatomical morphology of soft hard tissue in computer-assisted surgery and can promote some automatic operations such as automatic path planning and postoperative evaluation, which is of great significance to promoting the application of machine learning in the clinical treatment of pelvic fractures.

Author Contributions

Conceptualization, Y.-W.D. and Y.C.; methodology, X.-L.L.; validation, Z.L.; formal analysis, X.-L.L.; resources, M.-Z.P.; data curation, Y.-W.D. and Y.C.; writing—original draft preparation, X.-L.L.; writing—review and editing, Z.L., M.-Z.P. and G.-B.B.; project administration, Z.L.; funding acquisition, G.-B.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, grant number 2020YFB1313800, the National Natural Science Foundation of China, grant number 62027813, U20A20196, 62176266, the CAS Interdisciplinary Innovation Team, grant number JCTD-2019-07, and the Beijing Science Fund for Distinguished Young Scholars, grant number JQ21016.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The CTPelvic1k dataset [36] and ACDC dataset [37] are publicly available. The individual Multi-tissues Pelvic dataset in this study is available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Han, R.; Uneri, A.; Vijayan, R.; Wu, P.; Vagdargi, P.; Sheth, N.; Vogt, S.; Kleinszig, G.; Osgood, G.; Siewerdsen, J. Fracture reduction planning and guidance in orthopaedic trauma surgery via multi-body image registration. Med. Image Anal. 2020, 68, 101917. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Wu, Y.; Huang, M.; Zhang, Y.; Bai, Z. Automatic prostate and peri-prostatic fat segmentation based on pyramid mechanism fusion network for T2-weighted MRI. Methods Programs Biomed. 2022, 223, 106918. [Google Scholar] [CrossRef] [PubMed]
Zheng, S.; Yang, X.; Wang, Y.; Ding, M.; Hou, W. Unsupervised Cross-Modality Domain Adaptation Network for X-Ray to CT Registration. IEEE J. Biomed. Health Inform. 2021, 26, 2637–2647. [Google Scholar] [CrossRef] [PubMed]
Ni, Z.; Bian, G.B.; Li, Z.; Zhou, X.; Li, R.; Hou, Z. Space Squeeze Reasoning and Low-Rank Bilinear Feature Fusion for Surgical Image Segmentation. IEEE J. Biomed. Health Inform. 2022, 26, 3209–3217. [Google Scholar] [CrossRef]
Ni, Z.; Bian, G.B.; Zhou, X.; Wang, G.; Yue, W.; Li, Z.; Hou, Z. SurgiNet: Pyramid Attention Aggregation and Class-wise Self-Distillation for Surgical Instrument Segmentation. Med. Image Anal. 2021, 76, 102310. [Google Scholar] [CrossRef]
He, K.; Cao, X.; Shi, Y.; Nie, D.; Gao, Y.; Shen, D. Pelvic Organ Segmentation Using Distinctive Curve Guided Fully Convolutional Networks. IEEE Trans. Med. Imaging 2019, 38, 585–595. [Google Scholar] [CrossRef]
Kalantar, R.; Lin, G.; Winfield, J.; Messiou, C.; Lalondrelle, S.; Blackledge, M.; Koh, D. Automatic segmentation of pelvic cancers using deep learning: State-of-the-art approaches and challenges. Diagnostics 2021, 11, 1964. [Google Scholar] [CrossRef]
Altini, N.; Brunetti, A.; Puro, E.; Taccogna, M.; Saponaro, C.; Zito, F.; De Summa, S.; Bevilacqua, V. NDG-CAM: Nuclei Detection in Histopathology Images with Semantic Segmentation Networks and Grad-CAM. Bioengineering 2022, 9, 475. [Google Scholar] [CrossRef]
Hemke, R.; Buckless, C.; Tsao, A.; Wang, B.; Torriani, M. Deep learning for automated segmentation of pelvic muscles, fat, and bone from CT studies for body composition assessment. Skelet. Radiol. 2020, 49, 387–395. [Google Scholar] [CrossRef]
Zhang, Z.; Zhao, T.; Gay, H.; Zhang, W.; Sun, B. Semi-supervised semantic segmentation of prostate and organs-at-risk on 3D pelvic CT images. Biomed. Phys. Eng. Express 2021, 7, 065023. [Google Scholar] [CrossRef]
Srivastava, A.; Jha, D.; Chanda, S.; Pal, U.; Johansen, H.; Johansen, D.; Riegler, M.; Ali, S.; Halvorsen, P. MSRF-Net: A Multi-Scale Residual Fusion Network for Biomedical Image Segmentation. IEEE J. Biomed. Health Inform. 2021, 26, 2252–2263. [Google Scholar] [CrossRef]
Han, L.; Huang, Y.; Dou, H.; Wang, S.; Ahamad, S.; Luo, H.; Liu, Q.; Fan, J.; Zhang, J. Semi-supervised segmentation of lesion from breast ultrasound images with attentional generative adversarial network. Comput. Methods Programs Biomed. 2020, 189, 105275. [Google Scholar] [CrossRef]
Zhou, H.-Y.; Wang, C.; Li, H.; Wang, G.; Zhang, S.; Li, W.; Yu, Y. SSMD: Semi-Supervised medical image detection with adaptive consistency and heterogeneous perturbation. Med. Image Anal. 2021, 72, 102117. [Google Scholar] [CrossRef] [PubMed]
Huang, W.; Gao, W.; Hou, C.; Zhang, X.; Wang, X.; Zhang, J. Simultaneous vessel segmentation and unenhanced prediction using self-supervised dual-task learning in 3D CTA (SVSUP). Comput. Methods Programs Biomed. 2022, 224, 107001. [Google Scholar] [CrossRef] [PubMed]
Verma, V.; Lamb, A.; Kannala, J.; Bengio, Y.; Lopez-Paz, D. Interpolation consistency training for semi-supervised learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019. [Google Scholar] [CrossRef]
Basak, H.; Bhattacharya, R.; Hussain, R.; Chatterjee, A. An Embarrassingly Simple Consistency Regularization Method for Semi-Supervised Medical Image Segmentation. arXiv 2022, arXiv:2202.00677. [Google Scholar]
Yang, X.; Hu, X.; Zhou, S.; Liu, X.; Zhu, E. Interpolation-Based Contrastive Learning for Few-Label Semi-Supervised Learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–12. [Google Scholar] [CrossRef] [PubMed]
Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; Raffel, C.A. Mixmatch: A holistic approach to semi-supervised learning. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://arxiv.org/abs/1905.02249 (accessed on 30 January 2023).
Wang, Y.; Huang, G.; Song, S.; Pan, X.; Xia, Y.; Wu, C. Regularizing Deep Networks with Semantic Data Augmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1. [Google Scholar] [CrossRef]
Li, S.; Zhang, C.; He, X. Shape-aware semi-supervised 3d semantic segmentation for medical images. In Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
Yu, L.; Wang, S.; Li, X.; Fu, C.; Heng, P. Uncertainty-Aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation. In Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
Xiao, Z.; Su, Y.; Deng, Z.; Zhang, W. Efficient Combination of CNN and Transformer for Dual-Teacher Uncertainty-guided Semi-supervised Medical Image Segmentation. Comput. Methods Programs Biomed. 2022, 226, 107099. [Google Scholar] [CrossRef]
Ruan, Y.; Li, D.; Marshall, H.; Miao, T.; Cossetto, T.; Chan, I.; Daher, O.; Accorsi, F.; Goela, A.; Li, S. MB-FSGAN: Joint segmentation and quantification of kidney tumor on CT by the multi-branch feature sharing generative adversarial network. Med. Image Anal. 2020, 64, 101721. [Google Scholar] [CrossRef]
Li, W.; Li, J.; Polson, J.; Wang, Z.; Speier, W.; Arnold, C. High resolution histopathology image generation and segmentation through adversarial training. Med. Image Anal. 2022, 75, 102251. [Google Scholar] [CrossRef]
Li, C.; Dong, L.; Dou, Q.; Lin, F.; Zhang, K.; Feng, Z.; Si, W.; Deng, X.; Deng, Z.; Heng, P. Self-Ensembling Co-Training Framework for Semi-Supervised COVID-19 CT Segmentation. IEEE J. Biomed. Health Inform. 2021, 25, 4140–4151. [Google Scholar] [CrossRef]
Chen, J.; Zhang, H.; Mohiaddin, R.; Wong, T.; Firmin, D.; Keegan, J.; Yang, G. Adaptive Hierarchical Dual Consistency for Semi-Supervised Left Atrium Segmentation on Cross-Domain Data. IEEE Trans. Med. Imaging 2021, 41, 420–433. [Google Scholar] [CrossRef]
Xu, Z.; Wang, Y.; Lu, D.; Yu, L.; Yan, J.; Luo, J.; Ma, K.; Zheng, Y.; Tong, R. All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-supervised Medical Image Segmentation. IEEE J. Biomed. Health Inform. 2022, 26, 3174–3184. [Google Scholar] [CrossRef] [PubMed]
Wang, P.; Peng, J.; Pedersoli, M.; Zhou, Y.; Zhang, C.; Desrosiers, C. Self-paced and self-consistent co-training for semi-supervised image segmentation. Med. Image Anal. 2021, 73, 102146. [Google Scholar] [CrossRef]
Wang, K.; Zhan, B.; Zu, C.; Wu, X.; Zhou, J.; Zhou, L.; Wang, Y. Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learning. Med. Image Anal. 2022, 79, 102447. [Google Scholar] [CrossRef] [PubMed]
Laine, S.; Aila, T. Temporal ensembling for semi-supervised learning. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
Chaitanya, K.; Karani, N.; Baumgartner, C.; Erdil, E.; Becker, A.; Donati, O.; Konukoglu, E. Semi-supervised task-driven data augmentation for medical image segmentation. Med. Image Anal. 2021, 68, 101934. [Google Scholar] [CrossRef]
Qu, Y.; Li, X.; Yan, Z.; Zhao, L.; Zhang, L.; Liu, C.; Xie, S.; Li, K.; Metaxas, D.; Wu, W.; et al. Surgical planning of pelvic tumor using multi-view CNN with relation-context representation learning. Med. Image Anal. 2021, 69, 101954. [Google Scholar] [CrossRef] [PubMed]
Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Miyato, T.; Maeda, S.; Koyama, M.; Ishii, S. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1979–1993. [Google Scholar] [CrossRef] [Green Version]
Chen, X.; Yuan, Y.; Zeng, G.; Wang, J. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021. [Google Scholar]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. Mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018; Available online: https://openreview.net/forum?id=r1Ddp1-Rb (accessed on 25 October 2017).
Liu, P.; Han, H.; Du, Y.; Zhu, H.; Li, Y.; Gu, F.; Xiao, H.; Li, J.; Zhao, C.; Xiao, L.; et al. Deep learning to segment pelvic bones: Large-scale CT datasets and baseline models. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 749–756. [Google Scholar] [CrossRef]
Bernard, O.; Lalande, A.; Zotti, C.; Cervenansky, F.; Yang, X. Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? IEEE Trans. Med. Imaging. 2018, 37, 2514–2525. [Google Scholar] [CrossRef] [PubMed]
Luo, X.; Hu, M.; Song, T.; Wang, G.; Zhang, S. Semi-Supervised Medical Image Segmentation via Cross Teaching between CNN and Transformer. arXiv 2021, arXiv:2112.04894. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015. [Google Scholar] [CrossRef]
Perez, F.; Vasconcelos, C.; Avila, S.; Valle, E. Data augmentation for skin lesion analysis. In Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
Simard, P.; Steinkraus, D.; Platt, J. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 7th International Conference on Document Analysis and Recognition, Edinburgh, UK, 3–6 August 2003. [Google Scholar] [CrossRef]
Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
Xiang, J.; Zhu, G. Joint face detection and facial expression recognition with MTCNN. In Proceedings of the 2017 4th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 21–23 July 2017. [Google Scholar] [CrossRef]
Yan, C.; Lu, J.J.; Chen, K.; Wang, L.; Lu, H.; Yu, L.; Sun, M.; Xu, J. Scale- and Slice-aware Net (S2aNet) for 3D segmentation of organs and musculoskeletal structures in pelvic MRI. Magn. Reson. Med. 2022, 87, 431–445. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Some difficult samples of pelvic CT slices. (a) Low contrast between fracture edges and tissue. (b) Low contrast and local similarity between tissue and tissue. The boxes are the “interested region”, and the colors used are their corresponding GT marking color. The difficulties of segmentation are listed with asterisks.

Figure 2. Schematic view of the proposed bi-directional constrained dual-task consistency method on CTPelvic1k dataset. The framework consists of two branches: supervised learning and semi-supervised learning. The below semi-supervised part is the combination of interpolation consistency regularization task and pseudo-label supervision task. The backbone models share the same architecture, and the weight of teacher model is the exponential moving average (EMA) of the student model. The “PDF” in the interpolation part represents the probability density function of beta distribution, “α” and “β” are the parameter of beta distribution, “μ” is the interpolation factor.

Figure 3. Illustration of interpolation consistency regularization, taking the Multi-tissue Pelvic dataset as an example. (a) describes the data augmentation process by pixel level interpolation, “μ” is the interpolation factor and follows the Beta distribution. The “PDF” represents the probability-density function of Beta distribution, and “α” and “β” are the parameters of beta distribution. (b) is the interpolation consistency regularization process.

Figure 4. Illustration of the diagrammatic architecture for related works; only the semi-supervised part is given here. (a) mean teacher, (b) cross-pseudo-supervision, (c) interpolation consistency training, (d) cross-pseudo-interpolation consistency learning, (e) pseudo-interpolation consistency training. (a–c) are the related advanced works, (d,e) are the proposed schemes. U denotes the unlabeled data, P is the prediction map, Y represents the pseudo-label,

f (θ_{1})

and

f (θ_{2})

represent network with different initialization. MT can be considered as the baseline for all work. Here, MT’ is used to represent two networks with different initialization. Therein, (b) can represented by MT’ + CP, (c) can be denoted as I + MT, (d) can be denoted as MT’ + CP + I, and (e) can be denoted as I + MT + P.

Figure 4. Illustration of the diagrammatic architecture for related works; only the semi-supervised part is given here. (a) mean teacher, (b) cross-pseudo-supervision, (c) interpolation consistency training, (d) cross-pseudo-interpolation consistency learning, (e) pseudo-interpolation consistency training. (a–c) are the related advanced works, (d,e) are the proposed schemes. U denotes the unlabeled data, P is the prediction map, Y represents the pseudo-label,

f (θ_{1})

and

f (θ_{2})

represent network with different initialization. MT can be considered as the baseline for all work. Here, MT’ is used to represent two networks with different initialization. Therein, (b) can represented by MT’ + CP, (c) can be denoted as I + MT, (d) can be denoted as MT’ + CP + I, and (e) can be denoted as I + MT + P.

Figure 5. The loss function of the proposed PICT on ACDC dataset with three labeled cases.

Figure 6. Visual comparison on ACDC test images of the state-of-the-art methods. The top two rows are the results of using 3 labeled cases; the bottom two rows are the results of using 7 labeled cases. The yellow arrows indicate the misclassify situation of target; the yellow boxes indicate the misprediction in other area.

Figure 7. The mean

DSC

score point-fold line of different methods on CTPelvic1k dataset with different ratio labeled cases.

Figure 7. The mean

DSC

score point-fold line of different methods on CTPelvic1k dataset with different ratio labeled cases.

Figure 8. Representative cases of three methods on CTPelvic1k dataset. The green, blue, yellow, and red parts of the visual model represent the left hip bone, right hip bone, lumbar spine, and sacrum, respectively.

Figure 9. The mean

DSC

score point-fold line of different methods on Multi-tissue Pelvic dataset with different ratio labeled cases.

Figure 9. The mean

DSC

score point-fold line of different methods on Multi-tissue Pelvic dataset with different ratio labeled cases.

Figure 10. Visual comparison of the state-of-the-art methods on Multi-tissues Pelvic test images. The top two rows are the results of using 35 labeled cases; the bottom two rows are the results of using 40 labeled cases. The white arrows and boxes indicate the misclassified pixels.

Table 1. Dataset statistics of CTPelvic1k, Multi-tissue Pelvic, and ACDC.

Parameters	Multi-Tissue Pelvic	CTPelvic1k	ACDC
Mean size	512 × 512	512 × 512 × 345	256 × 256 × 10
Data number	100	70	200
Tra/Val/Ts	80/5/15	50/10/10	140/20/40
Category	7	5	4
Type	Individual	Public	Public
Pre-process	Center crop	Center crop	None
Patch size	256 × 256	112 × 112 × 112	256 × 256

Table 2. Ablation analysis results on ACDC dataset using 3 labeled cases. The bold fonts indicate the best results of the comparing variants.

Method	RV		Myo		LV		Mean
Method	$D S C$ $(%)$	$H D_{95}$ $(%)$	$D S C$ $(%)$	$H D_{95}$ $(%)$	$D S C$ $(%)$	$H D_{95}$ $(%)$	$D S C$ $(%)$	$H D_{95}$ $(%)$
MT [33]	56.77	37.83	67.76	16.61	76.46	24.37	67.00	26.27
CPS (MT’ + CP) [35]	61.46	18.59	70.90	12.54	78.67	18.73	70.34	16.62
ICT (I + MT) [16]	57.49	20.28	71.17	14.93	79.69	25.09	69.45	20.10
CPICT (CP + I + MT)	58.22	16.91	74.44	11.00	81.34	24.67	71.33	17.53
PICT (P + I + MT)	63.27	21.40	77.90	10.63	77.09	34.97	72.76	22.33
PICT (CE)	63.27	21.40	77.90	10.63	77.09	34.97	72.76	22.33
PICT (CE + Dice)	69.08	21.99	76.77	21.54	76.44	40.59	74.10	28.04
PICT (Dice, Ours)	80.06	15.25	82.42	7.07	83.39	25.96	81.95	16.09

Table 3. The DSC score (%) comparison of CPS, ICT, and PICT on ACDC dataset under different basic models with using 7 labeled cases. The bold fonts indicate the best results of the comparing variants.

Type of Basic Models	LS	CPS	ICT	PICT (Ours)
E-Net [43]	67.71	75.54	74.83	76.66
P-Net [44]	80.71	83.47	82.83	83.76
U-Net [40]	81.92	86.35	86.64	87.18

Table 4. Quantitative comparison results of mean DSC score and

{HD}_{95}

of the state-of-art method on ACDC under 3 labeled cases and 7 labeled cases. The bold fonts indicate the best results of the comparing variants.

Table 4. Quantitative comparison results of mean DSC score and

{HD}_{95}

of the state-of-art method on ACDC under 3 labeled cases and 7 labeled cases. The bold fonts indicate the best results of the comparing variants.

Labled Number	Method	RV		Myo		LV		Mean
Labled Number	Method	$D S C$ $(%)$	$H D_{95}$ $(%)$	$D S C$ $(%)$	$H D_{95}$ $(%)$	$D S C$ $(%)$	$H D_{95}$ $(%)$	$D S C$ $(%)$	$H D_{95}$ $(%)$
3 cases	LS (baseline)	48.11	45.76	62.76	23.27	72.06	24.87	60.98	31.30
	FS	91.15	1.23	88.62	5.95	93.58	5.62	91.12	4.26
	MT [33]	56.77	37.83	67.78	16.61	76.46	24.37	67.00	26.27
	UAMT [21]	57.86	32.10	67.32	14.57	76.00	20.81	67.06	22.49
	URPC [45]	63.73	33.13	69.59	15.95	79.19	18.61	70.89	22.56
	CPS [35]	61.46	18.59	70.90	12.54	78.67	18.73	70.34	16.62
	ICT [16]	57.48	20.28	71.17	14.83	79.69	25.09	69.45	20.06
	CNN-Trans [39]	57.70	21.70	62.80	11.50	76.30	15.70	65.60	16.20
	PICT (Ours)	80.06	15.25	82.41	7.06	83.39	25.96	81.95	16.09
7 cases	LS (baseline)	79.42	8.39	79.61	3.06	86.75	18.97	81.93	13.39
	FS	91.15	1.23	88.62	5.95	93.58	5.62	91.12	4.26
	MT [33]	86.31	4.73	83.39	8.81	88.32	17.11	86.01	10.22
	UAMT [21]	84.96	4.98	83.46	9.16	89.20	14.89	85.87	9.68
	URPC [45]	85.77	4.65	83.79	7.44	89.08	7.44	86.21	6.85
	CPS [35]	86.09	3.64	84.31	9.66	88.63	13.16	86.35	8.82
	ICT [16]	86.49	4.48	84.12	9.27	89.33	11.38	86.64	8.37
	CNN-Trans [39]	84.80	7.80	84.40	6.90	90.10	11.20	86.40	8.6
	PICT (Ours)	86.74	3.22	85.16	3.24	89.66	9.93	87.18	5.46

Table 5. Comparison of DSC score (%) of CPS, ICT, and PICT on CTPelvic1k with 10 labeled images as examples. The bold fonts indicate the best results of the comparing variants.

Categories	Sacrum	LH	RH	LS
CPS [35]	95.59	95.75	96.19	95.39
ICT [16]	95.98	96.05	95.82	95.81
PICT (Ours)	95.92	96.09	96.44	96.03

Table 6. The mIoU (%) and mAcc for CPS, ICT, and PICT on Multi-tissue Pelvic with different ratio labeled cases. The bold fonts indicate the best results of the comparing variants.

Labeled Case		5	10	15	20	25	30	35	40
mIoU (%)	CPS [35]	61.25	62.38	63.43	64.49	66.86	67.67	68.36	70.18
	ICT [16]	59.36	60.80	63.51	64.62	67.63	68.32	69.29	69.05
	PICT (Ours)	61.83	61.79	64.22	65.56	67.46	68.49	70.60	71.07
mAcc (%)	CPS [35]	73.85	74.73	75.15	77.15	78.56	79.62	79.11	80.97
	ICT [16]	72.77	74.13	75.65	76.65	79.33	79.71	80.31	80.53
	PICT (Ours)	73.84	74.30	76.60	77.92	78.91	79.97	80.60	81.75

Table 7. Comparison of

DSC

score (%),

mIoU

(%),

mAcc

(%) indexes of the 6 categories with 40 labeled images as an example on Multi-tissue Pelvic. The bold fonts indicate the best results of the comparing variants.

Table 7. Comparison of

DSC

score (%),

mIoU

(%),

mAcc

(%) indexes of the 6 categories with 40 labeled images as an example on Multi-tissue Pelvic. The bold fonts indicate the best results of the comparing variants.

Categories		MIPC	Bone	Muscle	SAT	IMAT	IPG
DSC (%)	CPS [35]	80.76	93.44	86.03	92.16	52.36	67.44
	ICT [16]	78.99	93.58	86.25	90.97	52.55	63.67
	PICT (Ours)	83.22	93.84	86.92	92.71	53.88	65.89
mIoU (%)	CPS [35]	69.21	87.93	75.78	85.83	36.09	66.24
	ICT [16]	67.22	88.18	76.12	84.18	36.53	62.11
	PICT (Ours)	72.54	88.62	77.17	86.78	37.51	64.46
mAcc (%)	CPS [35]	73.78	93.33	91.39	92.60	51.87	64.15
	ICT [16]	70.35	93.40	92.80	91.93	55.20	60.36
	PICT (Ours)	77.29	94.25	91.63	93.12	50.89	64.06

Table 8. The time cost of ACDC. CTPelvic1k, Multi-tissue dataset; the unit is minutes.

Dataset	CPS	ICT	PICT (Ours)
ACDC	132	100	105
CTPelvic1k	753	500	586
Multi-tissue Pelvic	31	24	26

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, M.-Z.; Liao, X.-L.; Li, Z.; Deng, Y.-W.; Chen, Y.; Bian, G.-B. Semi-Supervised Medical Image Segmentation Guided by Bi-Directional Constrained Dual-Task Consistency. Bioengineering 2023, 10, 225. https://doi.org/10.3390/bioengineering10020225

AMA Style

Pan M-Z, Liao X-L, Li Z, Deng Y-W, Chen Y, Bian G-B. Semi-Supervised Medical Image Segmentation Guided by Bi-Directional Constrained Dual-Task Consistency. Bioengineering. 2023; 10(2):225. https://doi.org/10.3390/bioengineering10020225

Chicago/Turabian Style

Pan, Ming-Zhang, Xiao-Lan Liao, Zhen Li, Ya-Wen Deng, Yuan Chen, and Gui-Bin Bian. 2023. "Semi-Supervised Medical Image Segmentation Guided by Bi-Directional Constrained Dual-Task Consistency" Bioengineering 10, no. 2: 225. https://doi.org/10.3390/bioengineering10020225

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised Medical Image Segmentation Guided by Bi-Directional Constrained Dual-Task Consistency

Abstract

1. Introduction

2. Related Work

2.1. Adversarial Learning Method

2.2. Self-Training Method

2.3. Co-Training Method

2.4. Consistency Regularization Method

3. Materials and Methods

3.1. Overview of Network Architecture

3.2. Supervision Task Design

3.3. Interpolation Consistency Regularization Task Design

3.4. Pseudo-Label Supervision Task Design

3.5. Data and Label Preprocessing

3.5.1. Multi-Tissue Pelvic Dataset

3.5.2. CTPelvic1k Dataset

3.5.3. ACDC Dataset

3.6. Implementation Details

3.6.1. Network Architectures

3.6.2. Training Procedure

3.7. Evaluation Criteria

4. Results

4.1. Ablation Analysis of ACDC

4.1.1. Ablation Experiments of Related Works

4.1.2. Ablation Experiments of Backbone Models

4.1.3. Comparison with Baseline and Existing Methods

4.2. Performance Comparison on CTPelvic1k

4.3. Performance Comparison on Multi-Tissue Pelvic1k

4.4. Training Time Costs

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI