Next Article in Journal
Association between Short Hours of Sleep and Overweight/Obesity in Mexican Adolescent Population: A School-Based Cross-Sectional Study
Next Article in Special Issue
Signal and Texture Features from T2 Maps for the Prediction of Mild Cognitive Impairment to Alzheimer’s Disease Progression
Previous Article in Journal
Sexual Orientation and the Incidence of COVID-19: Evidence from Understanding Society in the UK Longitudinal Household Study
Previous Article in Special Issue
A Survey on Recent Advances in Machine Learning Based Sleep Apnea Detection Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Loss Weightings for Improving Imbalanced Brain Structure Segmentation Using Fully Convolutional Networks

1
Department of Biomedical Information, Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University, Tokyo 101-0062, Japan
2
Department of Neurosurgery, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan
*
Authors to whom correspondence should be addressed.
Healthcare 2021, 9(8), 938; https://doi.org/10.3390/healthcare9080938
Submission received: 29 May 2021 / Revised: 12 July 2021 / Accepted: 22 July 2021 / Published: 26 July 2021

Abstract

:
Brain structure segmentation on magnetic resonance (MR) images is important for various clinical applications. It has been automatically performed by using fully convolutional networks. However, it suffers from the class imbalance problem. To address this problem, we investigated how loss weighting strategies work for brain structure segmentation tasks with different class imbalance situations on MR images. In this study, we adopted segmentation tasks of the cerebrum, cerebellum, brainstem, and blood vessels from MR cisternography and angiography images as the target segmentation tasks. We used a U-net architecture with cross-entropy and Dice loss functions as a baseline and evaluated the effect of the following loss weighting strategies: inverse frequency weighting, median inverse frequency weighting, focal weighting, distance map-based weighting, and distance penalty term-based weighting. In the experiments, the Dice loss function with focal weighting showed the best performance and had a high average Dice score of 92.8% in the binary-class segmentation tasks, while the cross-entropy loss functions with distance map-based weighting achieved the Dice score of up to 93.1% in the multi-class segmentation tasks. The results suggested that the distance map-based and the focal weightings could boost the performance of cross-entropy and Dice loss functions in class imbalanced segmentation tasks, respectively.

1. Introduction

Brain structure segmentation on magnetic resonance (MR) images is an essential technique for measuring, visualizing, and evaluating brain morphology. It is used for diagnosis support of psychiatric and neurodegenerative diseases, brain development analysis, and surgical planning and navigation [1,2]. It is manually performed in practice, but manual segmentation is a very laborious task and is subject to intra- and inter-operator variability [1]. Thus, it is desirable to provide an automatic accurate segmentation of brain structures. The most successful state-of-the-art approach for automated segmentation is a fully convolutional network (FCN) [3]. It enables pixel-wise segmentation in an end-to-end manner. Since it was proposed by Long et al. [3] in 2015, it has been improved for medical image segmentation [4,5] and applied to brain structure segmentation tasks [6]. However, it is often biased towards the majority (large-size) classes and suffers from low segmentation performance on the minority (small-size) classes due to a high imbalance between background and foreground classes in medical images. To address this problem, which is commonly known as the class imbalance, there are two types of approaches: data-level approaches and algorithm-level approaches [7,8].
Data-level approaches mainly alleviate the class imbalance by undersampling the majority classes [9] and oversampling the minority classes [10]. However, the majority undersampling limits the information of available data for training and the minority oversampling can lead to overfitting. On the other hand, algorithm-level approaches address the class imbalance by improving algorithms for training. The most common approach is improving loss functions. The improvement of loss functions can be carried out by using new evaluation metrics for loss function or weighting loss functions to enhance the importance of minority classes in the training process. Thus far, various types of loss functions [11,12,13,14,15,16,17] and loss weighting strategies [4,18,19,20,21,22,23,24,25] have been proposed to alleviate the class imbalance problem. They can be applied for any medical image segmentation tasks in a plug-and-play fashion [26]. However, it is unclear which loss function and weighting strategy should be used in different situations. Thus, it is important to reveal weighted loss functions which can enhance the capability of FCNs in brain structure segmentation tasks.
In related works, Ma et al. [26] performed a systematic study of the utility of 20 loss functions on typical segmentation tasks using public datasets and evaluated the performance of these loss functions in the imbalanced segmentation tasks. Moreover, Ma et al. [27] compared and evaluated the boundary-based loss functions, which minimize the distance between boundaries of ground-truth and predicted segmentation labels, in an empirical study. Yeung et al. [28] focused on compound loss functions, combining Dice and cross-entropy-based losses with a modulating factor of focal loss function [19] and evaluated what compound loss functions were effective to handle class imbalance problems. As shown in these related works, the effect of loss functions varies according to the situation of segmentation tasks (e.g., medical images used for segmentation, the number and size of segmentation target objects, and the degree of class imbalance). However, how the loss functions work for different segmentation targets remains undiscussed, although their accuracies were evaluated in the related works.
We test the effect of weighted loss functions in different situations of imbalanced brain structure segmentation tasks, including binary- and multi-class segmentation tasks. Especially, in this study, we focus on weighting strategies of loss functions, defined based on class frequency, predictive probability, and distance map, and aim to investigate and discuss how the loss weightings affect the performance of FCNs in brain structure segmentation tasks with different class imbalances.

2. Materials and Methods

2.1. Segmentation Target

In this study, we adopted a segmentation task of brain structures, including the cerebrum, cerebellum, brainstem, and blood vessels, on MR images. As for MR images, we used MR cisternography (MRC) and MR angiography (MRA) images (Figure 1). MRC images, i.e., heavily T2-weighted images, can clearly represent brain surface and cerebral sulci due to the high intensity of cerebrospinal fluid, whereas MRA images can highlight blood vessels. In our group, we used MRC and MRA as clinical routine MR sequences because of the ease of segmentation processing, and segmented brain parenchyma on MRC images and blood vessels on MRA images for the planning and navigation of neurosurgeries. The brain structures have different features in the MR images. The cerebrum is the largest part of the brain and has a low-level foreground–background imbalance in the MRC images. Its surface, i.e., cerebral sulci, has a bit more of a complex shape. The cerebellum is the second largest part of the brain and is located under the cerebrum. It can be considered a middle-level imbalanced target. The brainstem is a small part of the brain and is located between the cerebrum and the spinal cord. It has a high foreground–background imbalance. The brain parenchyma, i.e., the cerebrum, cerebellum, and brainstem, appears in much the same location in every MRC image volume, although its size and shape have individual differences. Its surface can be clearly visualized in MRC images due to high signal intensity of the cerebrospinal fluid around it. On the other hand, blood vessels have varying locations and shapes and appear as small white spots in MRA images. Thus, they are considered a hard-to-segment target with the high foreground–background imbalance, although they are clearly visualized in MRA images. We used the segmentation targets to fundamentally evaluate the effect of loss weightings on the FCN-based segmentation of different brain structures.

2.2. Network Architecture

As an FCN architecture, we adopted a 2D U-net [4], which is one of the most popular FCN architectures for medical image segmentation. Figure 2 shows the network architecture used in this study. The U-net architecture, which consists of a symmetrical encoder–decoder architecture with skip connections, has been often adopted as a baseline FCN architecture for various medical image segmentation tasks. Many different variants of the U-net architecture have been proposed according to different medical image segmentation tasks, and moreover, a 3D U-net architecture [5] has been introduced for volumetric medical image segmentation. However, training the 3D U-net on full input MR image volumes is usually impractical due to memory limitations of the graphical processing unit (GPU). In the case of the MR image volumes used in this study, it would require at least more than 150 GB of GPU memory, which far exceeds the memory of prevalent GPUs. To overcome the memory limitation, approaches to train 3D FCNs on resized or cropped MR image volumes have been proposed. However, resizing MR image volumes to a smaller size may cause the loss of information on segmentation targets, whereas a patch-based approach [5,29] that crops MR image volumes requires the tuning of more hyperparameters (i.e., patch size), which may affect segmentation performance. Thus, in this study, we decided to use the simple 2D U-net architecture to reduce other factors affecting the results as much as possible.

2.3. Loss Functions

As shown in the related works [26,27,28], loss functions are an important factor for handling the class imbalance. Existing loss functions for FCN-based segmentation can be divided into four categories: distribution-based loss, region-based loss, boundary-based loss, and compound loss [26]. Distribution-based loss functions measure the dissimilarity between two distributions based on cross-entropy. Region-based loss functions quantify the mismatch or the overlap between two regions. Dice loss function [11,12] is the most common loss function in this category. Boundary-based loss functions measure the distance between two boundaries. Euclidean distance [16] or Hausdorff distance [17] metrics can be used for loss functions in this category. Compound loss functions are defined as the combinations among the distribution-, region-, and boundary-based loss functions [15,28,30,31,32].
As described in [26], most of the distribution-based and region-based loss functions can be considered as the variants of cross-entropy and Dice loss functions, respectively. Moreover, boundary-based loss functions, which are formally defined in a region-based way, have similarities to the Dice loss function. Therefore, as most of the loss functions are based on the cross-entropy and Dice loss functions, we decided to use these two loss functions in this study. The cross-entropy loss L CE and the Dice loss L Dice are defined as
L CE = 1 N c = 1 C i = 1 N g i , c log p i , c ,
L Dice = 1 2 c = 1 C i = 1 N g i , c p i , c 2 c = 1 C i = 1 N g i , c p i , c + c = 1 C i = 1 N ( 1 g i , c ) p i , c + c = 1 C i = 1 N g i , c ( 1 p i , c ) = 1 2 c = 1 C i = 1 N g i , c p i , c c = 1 C i = 1 N g i , c + c = 1 C i = 1 N p i , c ,
where g i , c and p i , c are the ground-truth label and the predicted segmentation probability of class c at pixel i , respectively. N and C are the numbers of pixels and classes in images for a training dataset, respectively.

2.4. Loss Weighting Strategies

In highly imbalanced segmentation tasks, FCNs are likely to ignore small-size foreground classes in the training process, which results in the low segmentation accuracy of the foreground classes. This is what is called the class imbalance problem and can be alleviated by weighting the loss of small-size foreground classes. In this study, we adopted five loss weighting strategies defined based on different factors of class frequency, predictive probability, and distance map. Table 1 indicates the overview of weighted loss functions used in this study. The details of loss weightings are described below.

2.4.1. Inverse Frequency Weighting

Inverse frequency weighting [24], which is one of the most common weighting strategies, is a method for weighting each class based on the class frequency. The weight is inversely proportional to the number of pixels. The smaller the size of target objects is, the higher the weight of them becomes. The inverse frequency weight W c Inverse in class c is defined by
W c Inverse = 1 ( i = 1 N g i , c ) α ,
where α is a power parameter. In this study, we used α = 1 for the cross-entropy loss function and α = 2 for the Dice loss function. The Dice loss function weighted by the inverse of square frequency is known as generalized Dice loss function [24].

2.4.2. Inverse Median Frequency Weighting

Inverse median frequency weighting [18] is a frequency-based weighting as with the inverse frequency weighting. The inverse median frequency weight W c Median is computed as
F c = i = 1 N g i , c N ,
W c Median = m e d i a n ( F c ) F c ,
where F c is the normalized frequency of class c and m e d i a n ( · ) denotes a function returning the median value of input data.

2.4.3. Focal Weighting

Focal weighting [19] is a method for putting more focus on hard-to-classify class pixels based on predictive probability. It gives a higher weight to class pixels with lower prediction confidence and reduces the loss assigned to well-classified pixels during the training process. The focal weighting W i , c Focal is defined by
W i , c Focal = ( 1 p i , c ) γ ,
where γ is called a focusing parameter. In this study, we used γ = 2 for cross-entropy loss function as in [19] and γ = 1 for Dice loss function as in [25]. Note that for simplification, here, we did not consider the balancing factor α used in [19].

2.4.4. Distance Transform Map-Based Weighting

Distance transform map (DTM), which is computed as the Euclidean distance from the boundary of target objects, is used in the distance-based loss functions [16,17]. Figure 3b shows an example of DTM. DTM-based weighting can be performed by multiplying prediction errors by the DTM. This weighting assigns higher weights to the pixels which are more distant from the boundary of ground-truth labels. Here, we defined the DTM-based weight W c DTM as
D T M c = {     0 ,                                                   x G c inf y G c | | x y | | 2 ,               o t h e r s
W c DTM = 1 + D T M c ,
where D T M c is the distance transform map in class c , and G c denotes the boundary of ground-truth label in class c . | | x y | | 2 denotes the Euclidean distance between pixels x and y in images.

2.4.5. Distance Penalty Term-Based Weighting

Distance penalty term (DPT) is a distance map for weighting hard-to-segment boundary regions [20], in contrast to the DTM. Let D P T c be the distance penalty term in class c . Then, D P T c is defined as the inverse of the D T M c , and thus, it puts higher weights on the pixels closer to the boundary of ground-truth labels in contrast with the DTM-based weighting. Figure 3c shows an example of DPT. As with the DTM-based weighting, DPT-based weighting penalizes prediction errors with the DPT. The DPT-based weight W c DPT is defined by
W c DPT = 1 + D P T c .
We used the cross-entropy and Dice loss functions weighted by the above five weighting strategies. Table 1 summarizes the weighted loss functions used in this study. As for the weighted Dice loss functions, L Dice Inverse , L Dice Median , and L Dice Focal put their weights on both the numerator and denominator terms as in [24], while L Dice DTM and L Dice DPT assign their weights to the false positive (i.e., c = 1 C i = 1 N ( 1 g i , c ) p i , c ) and false negative (i.e., c = 1 C i = 1 N g i , c ( 1 p i , c ) ) terms in the denominator.

2.5. Evaluation of Loss Weighting Strategies

2.5.1. Dataset

We used the MR images of 84 patients with unruptured cerebral aneurysms, which were imaged with MRC and time-of-flight MRA sequences on a 3.0 T scanner (Signa HDxt 3.0 T, GE Healthcare, WI, USA) at the University of Tokyo Hospital, Tokyo, Japan. The MR image volumes had 144–190 slices of 512 × 512 pixels with an in-plane resolution of 0.47 × 0.47 mm2 and a slice thickness of 1.00 mm. As a preprocessing step, the MR images were normalized to have a mean of 0 and a standard deviation of 1. The dataset consisting of 84 cases was divided into the following three subsets: training (60 cases), validation (4 cases), and test subsets (20 cases).
The ground-truth-labeled images for training and testing were manually created by using an open-source software for medical image processing (3D Slicer, Brigham and Women’s Hospital, MA, USA); the cerebrum, cerebellum, and brainstem were annotated on MRC images, while blood vessels were annotated on MRA images. The manual annotation was performed by a biomedical engineer and a neurosurgeon. Table 2 indicates the frequency ( F c = i = 1 N g i , c / N ) of the foreground classes (the cerebrum, cerebellum, brainstem, and blood vessels) in the training subsets. The cerebrum was the most frequent in the foreground classes, followed by the cerebellum, brainstem, and blood vessels.

2.5.2. Segmentation Tasks

The goal of this work was to study the effect of loss weightings in different class imbalance situations. Thus, we evaluated the effect of loss weightings on both binary- and multi-class segmentation tasks. Table 3 indicates the overview of the training datasets in the binary- and multi-class segmentation tasks.
Binary-class segmentation tasks: To test how the effect of loss weightings varies according to the size of a foreground class in binary-class segmentation tasks, we evaluated the segmentation performance on the binary-class segmentation task for each of the foreground classes. Note that the binary-class segmentation tasks for the cerebrum, cerebellum, and brainstem were performed using MRC images, whereas the binary-class segmentation for blood vessels was performed using MRA images.
Multi-class segmentation tasks: To test how the effect of loss weightings varies according to the imbalance of foreground classes in multi-class segmentation tasks, we evaluated the segmentation performance on the three-, four-, and five-class segmentation tasks; the three, four, and five classes include the foreground classes of (cerebrum, blood vessels), (cerebrum, cerebellum, blood vessels), and (cerebrum, cerebellum, brainstem, blood vessels), respectively. Note that the multi-class segmentation tasks were performed using multi-modal MR images which included MRC and MRA images.

2.5.3. Network Training Procedure

In the binary- and multi-class segmentation tasks, we trained the FCN model on each training dataset using the cross-entropy and Dice loss functions with or without the loss weightings. The FCN model was trained from scratch for 30 epochs with the Adam optimization algorithm [33] ( α   ( learning   rate ) = { 1 e 3 ,   1 e 4 ,   and   1 e 5 } , β 1 = 0.9 , β 2 = 0.999 , and epsilon = 1 e 7 ) and a batch size of 5 in each training process. For testing, we used the best trained model in the set { learning   rate ,   epoch } = { 1 e 3 ,   10 } , { 1 e 3 ,   20 } , { 1 e 3 ,   30 } , { 1 e 4 ,   10 } , { 1 e 4 ,   20 } , { 1 e 4 ,   30 } , { 1 e 5 ,   10 } , { 1 e 5 ,   20 } , and { 1 e 5 ,   30 } because the condition for good training convergence, especially learning rate and number of epochs, was different according to the loss weightings.
The FCN model with the weighted loss functions were implemented by using Keras with Tensorflow backend, and the training and prediction were performed on an Ubuntu 16.04 PC (CPU: Intel Xeon Gold 5222 3.80 GHz, RAM: 384 GB) with NVIDIA Quadro RTX8000 GPU cards for deep learning.

2.5.4. Evaluation Metrics

To quantitatively evaluate the segmentation performance, we adopted the Dice similarity coefficient (DSC), surface DSC (SDSC) [34], average symmetric surface distance (ASD), and Hausdorff distance (HD). The DSC and SDSC, overlap-based metrics, can be used for evaluating the region overlaps; the DSC measures the overlap of whole regions between ground-truth and predicted labels, whereas the SDSC measures the overlap of the two surface regions. The DSC was calculated by
DSC = 2 | G P | | G | + | P | ,
where G and P denote the regions of ground-truth and predicted labels, respectively. The SDSC was calculated by
SDSC = | G B P ( τ ) | + | P B G ( τ ) | | G | + | P | ,
where G and P denote the boundaries of ground-truth and predicted labels, respectively. B G ( τ ) , B P ( τ ) 3 are the border regions of ground-truth and predicted label surfaces at tolerance τ , which are defined as B G ( τ ) = { x 3 | y G ,   | | x y | | τ } and B P ( τ ) = { x 3 | y P ,   | | x y | | τ } , respectively [26,34]. We here used τ = 1   mm as in [26].
The ASD and HD, boundary distance-based metrics, can be used for evaluating the surface errors; ASD measures the average surface distance between ground-truth and predicted labels, whereas HD measures the max surface distance between them. The ASD was calculated by
ASD = x G D ( x , P ) + y G D ( y , G ) | G | + | P | ,
where D ( a , A ) denote the minimum Euclidean distance from a voxel a to a set of voxels A . The HD was calculated by
HD = max { max x G D ( x , P ) ,   max y P D ( y , G ) } .
As for HD, in this study, 95th-percentile HD (95HD) was used, as in [27].
When the segmentation accuracy increases, the overlap-based and the boundary distance-based metrics approach 1 and 0, respectively. The evaluation metrics was implemented using the open-source code, which is available at [35].
Furthermore, we used a rank score, which was defined based on [36], to comprehensively evaluate which loss weightings worked well based on the above metrics, as in [26]. The rank score was computed according to the following steps:
Step 1.
Performance assessment per case: compute metrics m i ( l o s s j , c l a s s k ,   c a s e l )   ( i = 1 , ,   N m ) of all loss functions l o s s j   ( j = 1 , ,   12 ) for all classes c l a s s k   ( k = 1 , ,   N c ) in all test cases c a s e l   ( l = 1 , ,   20 ) , where N m and N c are the number of metrics and classes, respectively. Note that in this case, we used four metrics m i { DSC ,   SDSC ,   ASD ,   95 HD } and a total of twelve loss functions, including cross-entropy and Dice loss functions with no weighting, Inverse, Median, Focal, DTM, and DPT weightings.
Step 2.
Statistical tests: perform Wilcoxon signed-rank pairwise statistical tests between all loss functions with the values m i ( l o s s j , c l a s s k ,   c a s e l ) m i ( l o s s j , c l a s s k ,   c a s e l ) .
Step 3.
Significance scoring: compute a significance score s i k ( l o s s j ) for loss functions l o s s j , classes c l a s s k , and metrics m i . s i k ( l o s s j ) equals the number of loss functions performing significantly worse than l o s s j according to the statistical tests ( p < 0.05 , not adjusted for multiplicity).
Step 4.
Rank score computing: compute the final rank score R ( l o s s j ) of each loss function from the mean significance score of all classes and metrics in each of the binary- and multi-class segmentation tasks by the following equation:
R ( l o s s j ) = 1 N m × N c i = 1 N m k = 1 N c s i k ( l o s s j ) .

3. Results

We compared the results of loss weightings (inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT)) with those of no weighting (N/A). The statistical difference between N/A and each loss weighting was evaluated by the Wilcoxon signed-rank test. A p-value less than 0.05 was considered significant. Subsequently, we comprehensively evaluated the effect of loss weightings by using the rank scores.

3.1. Binary-Class Segmentation Tasks

Table 4 summarizes all the results in the binary-class segmentation tasks. Figure 4 shows the violin plots of the Dice scores. As for cross-entropy loss function, Inverse and Median provided worse results than N/A in any segmentation tasks. Focal, DTM, and DPT tended to improve the surface accuracy in the highly imbalanced segmentation tasks (i.e., segmentation of brainstem and blood vessels) although the improvement was not statistically significant. As for Dice loss function, Inverse and Median significantly improved the segmentation accuracy in the highly imbalanced segmentation tasks, compared with N/A. Focal tended to provide better results than N/A in all the binary-class segmentation tasks. The distance map-based weightings (i.e., DTM and DPT) worked well in the segmentation of brain parenchyma, but they were ineffective in the segmentation of blood vessels.
Figure 5 visualizes an example of the segmentation results of blood vessels, which are the highly imbalanced class, in the binary-class segmentation task. As for the cross-entropy loss function, N/A had difficulty in segmenting the upper blood vessels. Both Inverse and Median allowed the FCN to extract most of the upper blood vessels which N/A failed to segment, but obviously increased the overextraction. Focal provided almost the same result as N/A. Both DTM and DPT extracted the wider region of blood vessels than N/A. As for the Dice loss function, N/A had false negatives in the upper blood vessels as with the cross-entropy loss function. It also provided a few more false positives. The class frequency-based weightings, especially Inverse, improved the false positives as well as the false negatives. Focal provided better results than N/A, although it was not so much as Inverse. The results of the distance map-based weightings, especially DPT, were worse than that of N/A.

3.2. Multi-Class Segmentation Tasks

Table 5 summarizes all the results in the multi-class segmentation tasks. Figure 6 shows the violin plots of the Dice scores. As for the cross-entropy loss function, Inverse and Median, as in the binary-class segmentation tasks, worsened the results in any multi-class segmentation tasks. The results of Focal, especially surface accuracies, were equivalent to or better than those of N/A in almost all the tasks. In the distance map-based weighting, DPT worked well for improvement of segmentation accuracy. As for the Dice loss function, Inverse and Median significantly improved the segmentation accuracy of blood vessels, which were a very high-level imbalanced class, in any multi-class segmentation tasks. However, Inverse also significantly worsened the segmentation accuracy of the cerebrum and cerebellum, which were relatively large-size targets. Focal provided better results than N/A for almost all the segmentation targets. The distance map-based weightings showed inconsistent results between the multi-class segmentation tasks.
Figure 7 visualizes an example of the segmentation results in the five-class segmentation task. It shows the false positive and false negative labels as well as the predicted labels. False positives were likely to appear around the surface of the cerebrum, cerebellum, and brainstem, while false negatives tended to appear in the upper part of blood vessels. As for the cross-entropy loss function, Inverse and Median reduced the false negatives, but more than that, they greatly increased the false positives. Focal worked well for a reduction in the false positives, although it did not reduce the false negatives. The results of the distance map-based weightings showed that DPT was a little effective in reducing the false positives and false negatives. As for Dice loss function, Inverse reduced the false negatives in blood vessels, although it failed to segment the whole cerebrum. Median worked to reduce the false negatives in blood vessels, as with Inverse. Focal slightly reduced the false positives. DTM and DPT seemed to provide almost the same results as N/A.

3.3. Rank Scoring

Table 6 indicates the ranking results of loss weightings in the binary- and multi-class segmentation tasks. The distance map-based weightings for cross-entropy loss function and the predictive-probability weighting for Dice loss function tended to have high rank scores in both the binary- and multi-class segmentation tasks. In the binary-class segmentation tasks, the Dice loss function with Focal showed the best ranking result. It actually obtained a high average DSC and SDSC of 92.8% and 93.3%, respectively. Compared with no weighting, it improved the DSC and SDSC values of all tasks by 0.2–8.1% and 0.5–12.5%, respectively. In the multi-class segmentation tasks, the cross-entropy loss function with DPT had the highest rank score, followed by the Dice loss function with Focal. In the five-class segmentation task, DPT achieved the highest average DSC and SDSC values of 93.1% and 94.6%, respectively.

4. Discussion

We evaluated the effect of loss weightings on the segmentation of the cerebrum, cerebellum, brainstem, and blood vessels from the MR images. From the segmentation results with the non-weighted loss functions, we found that the segmentation errors of the cerebrum, cerebellum, and brainstem, including false positives and false negatives, were concentrated at the edges of them, whereas the segmentation errors of blood vessels, especially false negatives, appeared in the upper part of them. This is probably because the edges of brain parenchyma or the upper blood vessels were variable according to the cases and the FCN was biased toward training image features on easier-to-segment majority regions. Thus, in order to improve the brain structure segmentation, it would be important to make the FCN focus on training image features around the edge of brain parenchyma and in the upper part of blood vessels by loss weightings. We discuss the effect of loss weightings based on the results in the binary- and multi-class segmentation tasks below. Subsequently, we also discuss the limitations of this study.

4.1. Binary-Class Segmentation Tasks

As for the cross-entropy loss function, the class frequency-based weightings (Inverse and Median) greatly increased false positives. They assign a lower uniform weight to the loss of larger-size classes, i.e., background class in the case of binary-class segmentation tasks. They gave a low uniform weight to low-confidence background pixels near the edge of the foreground, which would result in a large increase in false positives on the low-confidence background pixels, although they could also help reduce false negatives. On the other hand, the predictive probability- and the distance map-based weightings tended to improve the surface accuracy of highly imbalanced classes, i.e., the brainstem and blood vessels. Different from the class frequency-based weighting, they assign a different weight to each pixel. Using such pixel-wise weights instead of uniform weights may be appropriate for imbalanced segmentation because FCNs do not focus equally on all the pixels of the same class during training. The predictive-probability-based weighting (Focal) gives higher weights to pixels with lower prediction confidences based on the predictive probability and helps correct pixels misclassified with low prediction confidence, whereas the distance map-based weightings (DTM and DPT) define pixel-wise weights based on the distance from the edge of ground-truth labels and help correct surface segmentation errors. Thus, it is considered that these loss weightings could correct the surface error because pixels around the edge of foreground class were subject to be misclassified with low prediction confidence in the highly imbalanced segmentation tasks.
As for the Dice loss function, the class frequency-based weightings significantly improved the accuracy in the highly imbalanced segmentation tasks, although they did not work well for the cross-entropy loss function. They assigned the weight to both the denominator and numerator for the Dice loss function, which would allow the FCN to reduce false negatives without increasing false positives. The predictive probability-based weighting, which showed the best performance in Table 6, worked well for the low- and middle-level imbalanced segmentation tasks as well as the highly imbalanced segmentation tasks. This can be explained by the fact that the FCN with the Dice loss function had more pixels misclassified with low prediction confidence in the low- and middle-level imbalanced segmentation tasks, compared with that of the cross-entropy loss function. Additionally, the distance map-based weightings tended to improve the surface accuracy in the brain parenchyma segmentation. However, they were ineffective in the segmentation task of blood vessels. As shown in [16], in the case of the segmentation of objects which have variable locations and shapes, they might be able to work stably by using a scheduling strategy, i.e., gradually increasing the weight to the mismatched region with the training epochs.

4.2. Multi-Class Segmentation Tasks

The binary-class segmentation tasks included the class imbalance problem between background and foreground classes, whereas the multi-class segmentation tasks, which deal with two or more foreground classes, included the class imbalance problems not only between background and foreground classes but also among foreground classes. However, the results in the multi-class segmentation tasks showed similar tendencies to those in the binary-class segmentation tasks, although some of them were affected by the foreground–foreground class imbalance.
The class frequency-based weightings failed to improve the segmentation performance of the FCN with the cross-entropy loss function in any multi-class segmentation tasks because they greatly increased false positives by assigning an extremely low weight to the background pixels. For the Dice loss function, they also worked negatively for the low- and middle-level imbalanced classes. Especially in the five-class segmentation task, Inverse could not segment the cerebrum at all due to the foreground–foreground class imbalance. However, it also provided the best DSC value for blood vessels. Thus, the class frequency-based weightings could work well for only objects with very high imbalance because of their extreme weighting in any segmentation tasks. The predictive probability-based weighting totally worked well for both the cross-entropy and Dice loss functions. These results suggested that despite the foreground–foreground class imbalance, it could enable FCNs to focus on the pixels misclassified with low prediction confidence, i.e., hard-to-segment pixels, by considering the predictive probability. As well, the distance map-based weightings tended to provide good segmentation results for the cross-entropy loss function. In particular, the cross-entropy loss function with DPT achieved the best performance as indicated in Table 6b. However, the distance map-based weightings provided unstable segmentation results for the Dice loss function. In this study, although we designed the Dice loss function with the distance map-based weightings by multiplying the false positive and false negative terms in the denominator by the weights, using a scheduling strategy might make the effect of the distance map-based weightings more stable, as mentioned above.
Therefore, the cross-entropy loss function with DPT and the Dice loss function with Focal achieved relatively high accuracy in any segmentation targets and tasks, but some other weightings outperformed their weightings according to segmentation targets. For example, the Dice loss function with Inverse provided better DSC and SDSC results for blood vessels than that with Focal. Therefore, in this study, we focused on the unary weighted loss functions instead of compound loss functions, but considering the difference of features in loss weightings, the combination of different weighted loss functions might lead to the further improvement of segmentation performance.

4.3. Limitations

For limitations of this work, we adopted the segmentation of brain parenchyma and blood vessels on MRC and MRA images, which is performed as a routine work in our group. However, the effect of loss weightings might depend on segmentation targets and tasks, although the results in this study reflected the features of loss weightings. Considering a wider range of applications, we should test the loss weightings in other brain structure segmentation tasks (e.g., the segmentation of white matter, gray matter, and cerebrospinal fluid on T1-weighted MR images). Second, we used the 2D U-Net architecture to investigate the effect of loss weightings with less hyperparameters. However, we would need to test 3D FCNs with the weighted loss functions, because they have been applied for volumetric brain structure segmentation. Moreover, we set default parameters for loss weightings (e.g., the focusing parameter for focal weighting) based on the previous studies, but tuning such parameters would enable the performance improvement of FCNs. Furthermore, in this study, we focused on segmenting brain structures, including blood vessels, from the MR images of patients with cerebral aneurysms, but considering the clinical practice, it would be desired to automatically detect the location of aneurysms, as in [37], in addition to the segmentation.

5. Conclusions

This paper investigated how the loss weightings work for FCN-based brain structure segmentation on MR images in different class imbalance situations. Using the 2D U-Net with cross-entropy or Dice loss functions as a baseline network, we tested the five loss weightings, which were defined based on class frequency, predictive probability, and distance map, in the binary- and multi-class brain structure segmentation on MRC and MRA images. From the experimental results, we found that the cross-entropy loss function with the distance map-based weightings, especially distance penalty term-based weighting, and the Dice loss function with the predictive probability-based weighting could stably provide good segmentation results. In the binary-class segmentation tasks, the Dice loss function with focal weighting showed the best performance and achieved a high average DSC of 92.8%, whereas in the multi-class segmentation tasks, the cross-entropy loss function with distance penalty term-based weighting provided the best performance. It achieved the highest average DSC of 93.1% in the five-class segmentation task. We also found that their weighted loss functions were relatively robust to the foreground–foreground class imbalance as well as the background–foreground class imbalance. In other words, the experimental results suggested that they could work well in the situations of both binary- and multi-class segmentation. Therefore, it may be effective to use the distance penalty term-based weighting in the cross-entropy loss function and the focal weighting in the Dice loss function. We believe that these findings would help to select weighting strategies for loss functions or design advanced loss weighting strategies.
In future work, for clinical application, we will address the detection and segmentation of a diseased area that is more highly imbalanced, such as a cerebral aneurysm, as well as its surrounding structures, by using the loss weighting strategies. Moreover, we will design compound loss functions (i.e., combination among the loss weightings) and further investigate the effect of them for different brain structure segmentation tasks.

Author Contributions

Conceptualization, T.S. and Y.N.; methodology, T.S. and Y.N.; software, T.S.; validation, all; formal analysis, T.S. and Y.N.; investigation, T.S.; resources, T.S., T.K. (Taichi Kin) and N.S.; data curation, T.S., T.K. (Taichi Kin) and N.S.; writing—original draft preparation, T.S.; writing—review and editing, T.K. (Toshihiro Kawase), S.O. and Y.N.; visualization, T.S.; supervision, N.S. and Y.N.; project administration, N.S. and Y.N.; funding acquisition, T.S., T.K. (Taichi Kin), N.S. and Y.N. All authors have read and agreed to the published version of the manuscript.

Funding

Parts of this research were supported by the Japan Agency for Medical Research and Development (AMED) (Grant Number JP21he1602001h0105) and JSPS KAKENHI (Grant Number 20K20216).

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of Tokyo Medical and Dental University (protocol code: M2018-190 and date of approval: 29 January 2019).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. González-Villà, S.; Oliver, A.; Valverde, S.; Wang, L.; Zwiggelaar, R.; Lladó, X. A review on brain structures segmentation in magnetic resonance imaging. Artif. Intell. Med. 2016, 73, 45–69. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Despotovic, I.; Goossens, B.; Philips, W. MRI segmentation of the human brain: Challenges, methods, and applications. Comput. Math. Methods Med. 2015, 2015, 450341. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Comput Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: NW Washington, DC, USA, 2015; pp. 3431–3440. [Google Scholar]
  4. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; LNCS 9351. pp. 234–241. [Google Scholar]
  5. Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Athens, Greece, 17-21 October 2016; LNCS 9351. pp. 424–432. [Google Scholar]
  6. Bernal, J.; Kushibar, K.; Asfaw, D.S.; Valverde, S.; Oliver, A.; Marti, R.; Lladó, X. Deep convolutional neural networks for brain image analysis networks for brain image analysis on magnetic resonance imaging: A review. Artif. Intell. Med. 2019, 95, 64–81. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Buda, M.; Maki, A.; Mazuroqski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Zhou, T.; Ruan, S.; Canu, S. A review: Deep learning for medical image segmentation using multi-modality fusion. Array 2019, 3, 100004. [Google Scholar] [CrossRef]
  9. Jang, J.; Eo, T.J.; Kim, M.; Choi, N.; Han, D.; Kim, D.; Hwang, D. Medical image matching using variable randomized undersampling probability pattern in data acquisition. In Proceedings of the 2014 International Conference on Electronics, Information and Communications, Kota Kinabalu, Malaysia, 15–18 January 2014; pp. 1–2. [Google Scholar]
  10. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  11. Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the Fourth International Conference on 3D Vision, Stanford, CA, USA, 25–28 October 2016; IEEE: NW Washington, DC, USA, 2016; pp. 566–571. [Google Scholar]
  12. Drozdzal, M.; Vorontsov, E.; Chartrand, G.; Kadoury, S.; Pal, C. The importance of skip connections in biomedical image segmentation. In Deep Learning and Data Labeling for Medical Applications; Springer: Cham, Switzerland, 2016; LNCS 10008; pp. 179–187. [Google Scholar]
  13. Rahman, M.A.; Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA, 12–14 December 2016; LNCS 10072. pp. 234–244. [Google Scholar]
  14. Berman, M.; Triki, A.R.; Blaschko, M.B. The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4413–4421. [Google Scholar]
  15. Wong, K.C.L.; Moradi, M.; Tang, H.; Syeda-Mahmood, T. 3D segmentation with exponential logarithmic loss for highly unbalanced object sizes. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; LNCS 11072. pp. 612–619. [Google Scholar]
  16. Kervadec, H.; Bouchtiba, J.; Desrosiers, C.; Granger, E.; Dolz, J.; Ayed, I.B. Boundary loss for highly unbalanced segmentation. Med. Image Anal. 2019, 67, 101851. [Google Scholar] [CrossRef] [PubMed]
  17. Karimi, D.; Salcudean, S.E. Reducing the Hausdorff distance in medical image segmentation with convolutional neural networks. IEEE Trans. Med. Imaging 2020, 39, 499–513. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Eigen, D.; Fergus, R. Predicting depth, surface normal and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; IEEE: NW Washington, DC, USA, 2015; pp. 2650–2658. [Google Scholar]
  19. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: NW Washington, DC, USA, 2017; pp. 2980–2988. [Google Scholar]
  20. Caliva, F.; Iriondo, C.; Martinez, A.M.; Majumdar, S.; Pedoia, V. Distance map loss penalty term for semantic segmentation. In Proceedings of the 2nd International Conference on Medical Imaging with Deep Learning, London, UK, 8–10 July 2019; pp. 1–5. [Google Scholar]
  21. Salehi, S.S.M.; Erdogmus, D.; Gholipour, A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. In Proceedings of the International Workshop on Machine Learning in Medical Imaging, Quebec City, QC, Canada, 10 September 2017; LNCS 10541. pp. 379–387. [Google Scholar]
  22. Hashemi, S.R.; Salehi, S.S.M.; Erdogmus, D.; Prabhu, S.P.; Warfield, S.K.; Gholipour, A. Asymmetric loss functions and deep densely-connected networks for highly-imbalanced medical image segmentation: Application to multiple sclerosis lesion detection. IEEE Access 2018, 7, 1721–1735. [Google Scholar] [CrossRef] [PubMed]
  23. Guerrero-Pena, F.A.; Fernandez, P.D.M.; Ren, T.I.; Yui, M.; Rothenberg, E.; Cunha, A. Multiclass weighted loss for instance segmentation of cluttered cells. In Proceedings of the 25th IEEE International Conference on Image Processing, Athens, Greece, 7–10 October 2018; pp. 2451–2455. [Google Scholar]
  24. Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Cardoso, M.J. Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Québec City, QC, Canada, 14 September 2017; Springer: Cham, Switzerland, 2017. LNCS 10553. pp. 240–248. [Google Scholar]
  25. Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice loss for data-imbalanced NLP tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 465–476. [Google Scholar]
  26. Ma, J.; Chen, J.; Ng, M.; Huang, R.; Li, Y.; Li, C.; Yang, X.; Martel, A.L. Loss odyssey in medical image segmentation. Med. Image Anal. 2021, 71, 102035. [Google Scholar] [CrossRef] [PubMed]
  27. Ma, J.; Wei, Z.; Zhang, Y.; Wang, Y.; Lv, R.; Zhu, C.; Chen, G.; Liu, J.; Peng, C.; Wang, L.; et al. How distance transform maps boost segmentation CNNs: An empirical study. Med. Imaging Deep Learn. 2020, 121, 479–492. [Google Scholar]
  28. Yeung, M.; Sala, E.; Schönlieb, C.B.; Rundo, L. Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation. arXiv 2021, arXiv:2102.04525, Preprint. [Google Scholar]
  29. Huo, Y.; Xu, Z.; Xiong, Y.; Aboud, K.; Parvathaneni, P.; Bao, S.; Bermudez, C.; Resnick, S.M.; Cutting, L.E.; Landman, B.A. 3D whole brain segmentation using spatially localized atlas network tiles. NeuroImage 2019, 194, 105–119. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  30. Taghanaki, S.A.; Zheng, Y.; Zhou, S.K.; Georgescu, B.; Sharma, P.; Xu, D.; Comaniciu, D.; Hamarneh, G. Combo loss: Handling input and output imbalance in multi-organ segmentation. Comput. Med. Imaging Graph. 2019, 75, 24–33. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Zhu, W.; Huang, Y.; Zeng, L.; Chen, X.; Liu, Y.; Qian, Z.; Du, N.; Fan, W.; Xie, X. AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Med. Phys. 2018, 46, 576–589. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Xue, Y.; Tang, H.; Qiao, Z.; Gong, G.; Yin, Y.; Qian, Z.; Huang, X. Shape-aware organ segmentation by predicting signed distance maps. AAAI Conf. Artif. Intell. 2020, 34, 12565–12572. [Google Scholar] [CrossRef]
  33. Kingma, D.P.; Ba, J.L. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980, Preprint. [Google Scholar]
  34. Nikolov, S.; Blackwell, S.; Zverovitch, A.; Mendes, R.; Livne, M.; De Fauw, J.; Patel, Y.; Meyer, C.; Askham, H.; Romera-Paredes, B.; et al. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv 2018, arXiv:1809.04430, Preprint. [Google Scholar]
  35. DeepMind. Github: Library to Compute Surface Distance Based Performance Metrics for Segmentation Tasks. Available online: https://github.com/deepmind/surface-distance (accessed on 28 April 2021).
  36. Antonelli, M.; Reinke, A.; Bakas, S.; Farahani, K.; Kopp-Schneider, A.; Landman, B.A.; Litjens, G.; Menze, B.; Ronneberger, O.; Summers, R.M.; et al. The Medical Segmentation Decathlon. arXiv 2021, arXiv:2106.05735, Preprint. [Google Scholar]
  37. Conti, V.; Militello, C.; Rundo, L.; Vitabile, S. A novel bio-inspired approach for high-performance management in service-oriented networks. IEEE Trans. Emerg. Top. Comput. 2020. [Google Scholar] [CrossRef]
Figure 1. MR images used in this study.
Figure 1. MR images used in this study.
Healthcare 09 00938 g001
Figure 2. FCN architecture. Each box represents a set of feature maps. The number of feature maps is denoted on the top or bottom of each box.
Figure 2. FCN architecture. Each box represents a set of feature maps. The number of feature maps is denoted on the top or bottom of each box.
Healthcare 09 00938 g002
Figure 3. Distance maps for loss weighting. (a) Label image, (b) distance transform map, and (c) distance penalty term.
Figure 3. Distance maps for loss weighting. (a) Label image, (b) distance transform map, and (c) distance penalty term.
Healthcare 09 00938 g003
Figure 4. Violin plots of the segmentation results (Dice similarity coefficients) of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in binary-class segmentation tasks. (a) Dataset 1: cerebrum, (b) Dataset 2: cerebellum, (c) Dataset 3: brainstem, and (d) Dataset 4: blood vessels. Compared with the results of N/A, the significantly worse and better results are shown in black and red, respectively (Wilcoxon signed-rank test, *   p < 0.05 , **   p < 0.01 , and ***   p < 0.001 , not adjusted for multiplicity).
Figure 4. Violin plots of the segmentation results (Dice similarity coefficients) of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in binary-class segmentation tasks. (a) Dataset 1: cerebrum, (b) Dataset 2: cerebellum, (c) Dataset 3: brainstem, and (d) Dataset 4: blood vessels. Compared with the results of N/A, the significantly worse and better results are shown in black and red, respectively (Wilcoxon signed-rank test, *   p < 0.05 , **   p < 0.01 , and ***   p < 0.001 , not adjusted for multiplicity).
Healthcare 09 00938 g004
Figure 5. Visualization of the segmentation results of blood vessels in the binary-class segmentation task. (a) No weighting, (b) Inverse frequency weighting, (c) Inverse median frequency weighting, (d) Focal weighting, (e) Distance transform map-based weighting, and (f) Distance penalty term-based weighting.
Figure 5. Visualization of the segmentation results of blood vessels in the binary-class segmentation task. (a) No weighting, (b) Inverse frequency weighting, (c) Inverse median frequency weighting, (d) Focal weighting, (e) Distance transform map-based weighting, and (f) Distance penalty term-based weighting.
Healthcare 09 00938 g005
Figure 6. Violin plots of the segmentation results (Dice similarity coefficients) of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in multi-class segmentation tasks. (a) Dataset 1: three classes, (b) Dataset 2: four classes, and (c) Dataset 3: five classes. Compared with the results of N/A, the significantly worse and better results are shown in black and red, respectively (Wilcoxon signed-rank test, *   p < 0.05 , **   p < 0.01 , and ***   p < 0.001 , not adjusted for multiplicity).
Figure 6. Violin plots of the segmentation results (Dice similarity coefficients) of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in multi-class segmentation tasks. (a) Dataset 1: three classes, (b) Dataset 2: four classes, and (c) Dataset 3: five classes. Compared with the results of N/A, the significantly worse and better results are shown in black and red, respectively (Wilcoxon signed-rank test, *   p < 0.05 , **   p < 0.01 , and ***   p < 0.001 , not adjusted for multiplicity).
Healthcare 09 00938 g006
Figure 7. Visualization of the segmentation results in the five-class segmentation task. (a) No weighting, (b) inverse frequency weighting, (c) inverse median frequency weighting, (d) focal weighting, (e) distance transform map-based weighting, and (f) distance penalty term-based weighting. The segmentation results include the predicted results (left), the false positives (middle), and the false negatives (right). Note that in the result of Dice loss function with inverse frequency weighting, there are no true positive voxels in the cerebrum class and most of the background region were overestimated as the cerebrum class, but the false positives and false negatives in the cerebrum class were excluded from the figure for better visualization.
Figure 7. Visualization of the segmentation results in the five-class segmentation task. (a) No weighting, (b) inverse frequency weighting, (c) inverse median frequency weighting, (d) focal weighting, (e) distance transform map-based weighting, and (f) distance penalty term-based weighting. The segmentation results include the predicted results (left), the false positives (middle), and the false negatives (right). Note that in the result of Dice loss function with inverse frequency weighting, there are no true positive voxels in the cerebrum class and most of the background region were overestimated as the cerebrum class, but the false positives and false negatives in the cerebrum class were excluded from the figure for better visualization.
Healthcare 09 00938 g007
Table 1. Overview of the weighted loss functions.
Table 1. Overview of the weighted loss functions.
Baseline Loss FunctionsWeighting StrategiesWeighted Loss Functions
Cross-entropy loss function L CE Class frequency-based weightingInverse frequency weighting L CE Inverse = 1 N c = 1 C W c Inverse i = 1 N g i , c log p i , c
Inverse median weighting L CE Median = 1 N c = 1 C W c Median i = 1 N g i , c log p i , c
Predictive probability-based weightingFocal weighting L CE Focal = 1 N c = 1 C i = 1 N W i , c Focal g i , c log p i , c
Distance map-based weightingDistance transform map-based weighting L CE DTM = 1 N c = 1 C i = 1 N W c DTM g i , c log p i , c
Distance penalty term-based weighting L CE DPT = 1 N c = 1 C i = 1 N W c DPT g i , c log p i , c
Dice loss function L Dice Class frequency-based weightingInverse frequency weighting L Dice Inverse = 1 2 c = 1 C W c Inverse i = 1 N g i , c p i , c c = 1 C W c Inverse i = 1 N ( g i , c + p i , c )
Inverse median weighting L Dice Median = 1 2 c = 1 C W c Median i = 1 N g i , c p i , c c = 1 C W c Median i = 1 N ( g i , c + p i , c )
Predictive probability-based weightingFocal weighting L Dice Focal = 1 2 c = 1 C i = 1 N W i , c Focal g i , c p i , c c = 1 C i = 1 N W i , c Focal ( g i , c + p i , c )
Distance map-based weightingDistance transform map-based weighting L Dice DTM = 1 ( 2 c = 1 C i = 1 N g i , c p i , c ) / ( 2 c = 1 C i = 1 N g i , c p i , c + c = 1 C i = 1 N W c DTM ( 1 g i , c ) p i , c + c = 1 C i = 1 N W c DTM g i , c ( 1 p i , c ) )
Distance penalty term-based weighting L Dice DPT = 1 ( 2 c = 1 C i = 1 N g i , c p i , c ) / ( 2 c = 1 C i = 1 N g i , c p i , c + c = 1 C i = 1 N W c DPT ( 1 g i , c ) p i , c + c = 1 C i = 1 N W c DPT g i , c ( 1 p i , c ) )
Table 2. Frequency of the foreground classes in the training subset ( n = 60 ).
Table 2. Frequency of the foreground classes in the training subset ( n = 60 ).
CerebrumCerebellumBrainstemBlood Vessels
Frequency0.0960.0120.0030.001
Table 3. Training datasets in binary- and multi-class segmentation tasks. BG, CR, CL, BS, and BV stand for background, cerebrum, cerebellum, brainstem, and blood vessels, respectively.
Table 3. Training datasets in binary- and multi-class segmentation tasks. BG, CR, CL, BS, and BV stand for background, cerebrum, cerebellum, brainstem, and blood vessels, respectively.
DatasetRatio 1
Binary-class segmentation tasks
 Dataset 1: Cerebrum BG : CR = 9 : 1
 Dataset 2: Cerebellum BG : CL = 86 : 1
 Dataset 3: Brainstem BG : BS = 352 : 1
 Dataset 4: Blood vessels BG : BV = 749 : 1
Multi-class segmentation tasks
 Dataset 1: Three classes BG : CR : BV = 677 : 72 : 1
 Dataset 2: Four classes BG : CR : CL : BV = 668 : 72 : 9 : 1
 Dataset 3: Five classes BG : CR : CL : BS : BV = 666 : 72 : 9 : 2 : 1
1 Ratio of the number of labeled voxels between foreground classes in each training dataset.
Table 4. Segmentation results of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in binary-class segmentation tasks: Dice similarity coefficient (DSC), surface DSC (SDSC), average symmetric surface distance (ASD) (mm), and 95th-percentile Hausdorff distance (95HD) (mm). (a) Dataset 1: cerebrum, (b) Dataset 2: cerebellum, (c) Dataset 3: brainstem, and (d) Dataset 4: blood vessels. The results of background class are excluded in this table. Compared with the results of N/A, the significantly better and worse results are shown in bold and italic, respectively (Wilcoxon signed-rank test, p < 0.05 , not adjusted for multiplicity).
Table 4. Segmentation results of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in binary-class segmentation tasks: Dice similarity coefficient (DSC), surface DSC (SDSC), average symmetric surface distance (ASD) (mm), and 95th-percentile Hausdorff distance (95HD) (mm). (a) Dataset 1: cerebrum, (b) Dataset 2: cerebellum, (c) Dataset 3: brainstem, and (d) Dataset 4: blood vessels. The results of background class are excluded in this table. Compared with the results of N/A, the significantly better and worse results are shown in bold and italic, respectively (Wilcoxon signed-rank test, p < 0.05 , not adjusted for multiplicity).
Loss FunctionWeightingDSCSDSCASD95HD
(a) Dataset 1: Cerebrum
Cross
entropy
N/A0.9870.9910.0640.287
Inverse0.9700.9410.4243.504
Median0.9810.9830.1350.565
Focal0.9860.9890.0730.397
DTM0.9860.9900.0690.378
DPT0.9870.9920.0590.328
DiceN/A0.9860.9880.1020.381
Inverse0.9840.9860.2750.495
Median0.9850.9900.2340.425
Focal0.9880.9930.0540.308
DTM0.9870.9910.0610.364
DPT0.9870.9920.0660.341
(b) Dataset 2: Cerebellum
Cross
entropy
N/A0.9780.9810.0880.669
Inverse0.9540.9220.4111.755
Median0.9500.9040.5252.539
Focal0.9760.9760.1662.430
DTM0.9780.9780.1040.729
DPT0.9780.9800.0890.713
DiceN/A0.9760.9730.2211.048
Inverse0.9650.9401.9341.975
Median0.9680.9502.0374.568
Focal0.9770.9800.1010.686
DTM0.9740.9720.1530.878
DPT0.9760.9750.1842.331
(c) Dataset 3: Brainstem
Cross
entropy
N/A0.9630.9400.5014.676
Inverse0.9330.8741.0248.518
Median0.9220.8490.8496.510
Focal0.9620.9470.2391.362
DTM0.9650.9510.2801.204
DPT0.9650.9460.4253.478
DiceN/A0.9230.8248.880156.912
Inverse0.9530.9210.4764.770
Median0.9540.9260.4213.365
Focal0.9630.9490.2411.905
DTM0.9610.9390.3324.268
DPT0.9570.9360.3181.646
(d) Dataset 4: Blood vessels
Cross
entropy
N/A0.7850.8091.41512.947
Inverse0.6420.7002.00816.978
Median0.6470.6902.22218.620
Focal0.7830.8121.35112.353
DTM0.7860.8211.41912.243
DPT0.7840.8241.36112.340
DiceN/A0.7040.7671.99616.026
Inverse0.7860.8261.38513.364
Median0.7680.7941.62714.597
Focal0.7850.8121.51813.104
DTM0.7250.7542.40019.281
DPT0.6480.6275.99940.077
Table 5. Segmentation results of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in the multi-class segmentation tasks: Dice similarity coefficient (DSC), surface DSC (SDSC), average symmetric surface distance (ASD), and 95th-percentile Hausdorff distance (95HD). (a) Dataset 1: three classes, (b) Dataset 2: four classes, and (c) Dataset 3: five classes. The results of background class are excluded in this table. Compared with the results of N/A, the significantly better and worse results are shown in bold and italic, respectively (Wilcoxon signed-rank test, p < 0.05 , not adjusted for multiplicity).
Table 5. Segmentation results of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in the multi-class segmentation tasks: Dice similarity coefficient (DSC), surface DSC (SDSC), average symmetric surface distance (ASD), and 95th-percentile Hausdorff distance (95HD). (a) Dataset 1: three classes, (b) Dataset 2: four classes, and (c) Dataset 3: five classes. The results of background class are excluded in this table. Compared with the results of N/A, the significantly better and worse results are shown in bold and italic, respectively (Wilcoxon signed-rank test, p < 0.05 , not adjusted for multiplicity).
(a) Dataset 1: Three Classes
Loss
Function
WeightingCerebrumBlood Vessels
DSCSDSCASD95HDDSCSDSCASD95HD
Cross
entropy
N/A0.9790.9650.5075.6350.7780.8101.92617.142
Inverse0.9670.9560.2651.2560.6180.6622.44820.272
Median0.9700.9690.2391.2730.6750.7401.90117.298
Focal0.9790.9890.0930.5850.7960.8431.19512.933
DTM0.9790.9890.0920.5850.7880.8481.09710.539
DPT0.9840.9920.0690.4920.7950.8361.19811.321
DiceN/A0.9850.9900.2660.4450.7710.8331.22511.276
Inverse0.8960.6342.29017.4360.8000.8421.17711.325
Median0.9850.9860.1090.4790.8090.8481.17211.654
Focal0.9850.9840.1470.4150.7800.8211.52514.393
DTM0.9840.9910.0680.4920.7600.8171.35411.769
DPT0.9860.9920.2450.4080.7590.8161.34612.316
(b) Dataset 2: Four classes
Loss
Function
WeightingCerebrumCerebellumBlood Vessels
DSCSDSCASD95HDDSCSDSCASD95HDDSCSDSCASD95HD
Cross
entropy
N/A0.9850.9940.0570.4690.9780.9810.0820.6700.7920.8341.20911.215
Inverse0.9660.9630.2211.0150.9390.8900.4721.9110.6230.6682.37519.928
Median0.9700.9680.2211.0090.9540.9380.2791.3970.6740.7381.86017.051
Focal0.9800.9900.0870.5750.9790.9820.0820.6350.7830.8361.16811.228
DTM0.9860.9940.0590.4080.9770.9790.1422.0190.7810.8271.24711.639
DPT0.9820.9920.0690.5050.9800.9860.0650.5790.7910.8421.13811.197
DiceN/A0.9860.9930.0600.3380.9750.9710.3292.3700.7660.8211.24611.110
Inverse0.1630.06618.57581.6440.9600.9490.3143.9390.7990.8401.19212.014
Median0.9800.9840.1550.5240.9730.9720.2342.5780.7800.8181.30612.029
Focal0.9870.9940.0520.3520.9800.9860.0670.5430.7910.8341.23311.518
DTM0.9710.9630.1981.0610.9560.9330.4493.6540.6100.6305.30934.425
DPT0.9850.9920.0640.5050.9780.9810.0850.5930.7860.8271.28912.360
(c) Dataset 3: Five classes
Loss
Function
WeightingCerebrumCerebellum
DSCSDSCASD95HDDSCSDSCASD95HD
Cross
entropy
N/A0.9810.9910.0830.5520.9770.9800.1270.855
Inverse0.9710.9730.1790.8460.9500.9260.3461.492
Median0.9790.9870.1040.6090.9580.9490.2531.252
Focal0.9850.9930.0600.4690.9790.9840.1070.634
DTM0.9800.9900.0850.5520.9790.9820.0930.898
DPT0.9820.9930.0690.5020.9800.9850.0700.624
DiceN/A0.9860.9930.0740.3380.9770.9820.0840.618
Inverse0.0000.000--0.9550.9460.2211.405
Median0.9840.9880.1070.5020.9740.9750.1711.164
Focal0.9870.9950.0520.2910.9800.9860.0650.567
DTM0.9860.9930.0680.3610.9780.9830.0820.608
DPT0.9850.9920.0980.4450.9740.9770.0950.747
Loss
Function
WeightingBrainstemBlood Vessels
DSCSDSCASD95HDDSCSDSCASD95HD
Cross
entropy
N/A0.9610.9420.2662.0830.7900.8461.08410.471
Inverse0.9440.9370.3711.3020.7120.7781.52414.184
Median0.9490.9280.4151.5280.6860.7211.92017.233
Focal0.9620.9470.2671.4950.7820.8301.26312.068
DTM0.9660.9460.2912.3620.7830.8401.16311.097
DPT0.9640.9520.2031.3430.7970.8551.05910.703
DiceN/A0.9600.9340.3892.1740.7740.8281.23411.574
Inverse0.9610.9410.3912.3740.8010.8361.19612.002
Median0.9620.9410.3442.3290.7880.8291.20010.648
Focal0.9630.9520.2351.2620.7830.8281.30012.835
DTM0.9640.9440.2171.2880.7730.8311.22111.280
DPT0.9600.9290.3943.7590.7570.8011.86918.269
Table 6. Ranking results of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in (a) binary-class segmentation tasks and (b) multi-class segmentation tasks. The best results are shown in bold. The rank is determined based on the rank scores of segmentation results on all datasets.
Table 6. Ranking results of no weighting (N/A), inverse frequency weighting (Inverse), inverse median frequency weighting (Median), focal weighting (Focal), distance transform map-based weighting (DTM), and distance penalty term-based weighting (DPT) in (a) binary-class segmentation tasks and (b) multi-class segmentation tasks. The best results are shown in bold. The rank is determined based on the rank scores of segmentation results on all datasets.
(a) Binary-Class Segmentation Tasks
Loss
Function
WeightingRank ScoreRank
Dataset 1: CerebrumDataset 2: CerebellumDataset 3: BrainstemDataset 4: Blood VesselsAll
Cross
entropy
N/A5.257.253.256.005.444
Inverse0.002.251.251.251.1911
Median1.500.750.500.750.8812
Focal3.504.006.006.004.885
DTM4.256.256.506.005.752
DPT5.56.254.506.005.563
DiceN/A2.754.000.002.502.3110
Inverse1.751.503.005.502.948
Median1.751.003.503.752.509
Focal8.54.506.504.756.061
DTM4.54.254.251.753.696
DPT5.254.004.000.003.317
(b) Multi-class segmentation tasks
Loss
Function
WeightingRank ScoreRank
Dataset 1: Three ClassesDataset 2: Four ClassesDataset 3: Five ClassesAll
Cross
entropy
N/A1.505.754.134.086
Inverse0.630.830.810.7812
Median1.251.920.811.2811
Focal4.884.674.194.504
DTM5.635.253.694.643
DPT6.756.176.636.501
DiceN/A4.634.583.694.195
Inverse2.882.171.381.9710
Median6.003.672.563.698
Focal3.637.506.756.312
DTM4.630.674.753.369
DPT4.884.672.443.727
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sugino, T.; Kawase, T.; Onogi, S.; Kin, T.; Saito, N.; Nakajima, Y. Loss Weightings for Improving Imbalanced Brain Structure Segmentation Using Fully Convolutional Networks. Healthcare 2021, 9, 938. https://doi.org/10.3390/healthcare9080938

AMA Style

Sugino T, Kawase T, Onogi S, Kin T, Saito N, Nakajima Y. Loss Weightings for Improving Imbalanced Brain Structure Segmentation Using Fully Convolutional Networks. Healthcare. 2021; 9(8):938. https://doi.org/10.3390/healthcare9080938

Chicago/Turabian Style

Sugino, Takaaki, Toshihiro Kawase, Shinya Onogi, Taichi Kin, Nobuhito Saito, and Yoshikazu Nakajima. 2021. "Loss Weightings for Improving Imbalanced Brain Structure Segmentation Using Fully Convolutional Networks" Healthcare 9, no. 8: 938. https://doi.org/10.3390/healthcare9080938

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop