Next Article in Journal
UY-NET: A Two-Stage Network to Improve the Result of Detection in Colonoscopy Images
Previous Article in Journal
Research on the Water Entry of the Fuselage Cylindrical Structure Based on the Improved SPH Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

U-Net-Based Semi-Automatic Semantic Segmentation Using Adaptive Differential Evolution

1
Faculty of Science and Engineering, Doshisha University, Kyoto 610-0321, Japan
2
Faculty of Advanced Science and Technology, Ryukoku University, Kyoto 612-8577, Japan
3
Graduate School of Science and Engineering, Doshisha University, Kyoto 610-0321, Japan
4
Graduate School of Science and Technology, Ryukoku University, Kyoto 612-8577, Japan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(19), 10798; https://doi.org/10.3390/app131910798
Submission received: 2 August 2023 / Revised: 25 September 2023 / Accepted: 27 September 2023 / Published: 28 September 2023
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Bone semantic segmentation is essential for generating a bone simulation model for automatic diagnoses, and a convolution neural network model is often applied to semantic segmentation. However, ground-truth (GT) images, which are generated based on handwriting borderlines, are required to learn this model. It takes a great deal of time to generate accurate GTs from handwriting borderlines, which is the main reason why bone simulation has not been put to practical use for diagnosis. With the above in mind, we propose the U-net-based semi-automatic semantic segmentation method detailed in this paper to tackle the problem. Moreover, bone computed tomography (CT) images are often presented in digital imaging and communications in medicine format, which consists of various parameters and affects the image quality for segmentation. We also propose a novel adaptive input image generator using an adaptive differential evolution. We evaluate the proposed method compared to conventional U-net and DeepLabv3 models using open bone datasets, the spine and the femur, and our artificial bone data. Performance evaluations show that the proposed method outperforms U-net and DeepLabv3 in terms of Dice, IoU, and pairwise accuracy, and DeepLabv3 show the lowest performance, due to a lack of training data. We verify that the U-net-based model is effective for bone segmentation, where a large quantity of training data are available. Moreover, we verify that the proposed method can effectively create proper GTs and input images, resulting in increased performance and reduced computational costs. We believe that the proposed method enhances the wide use of bone simulation based on CT images for practical use.

1. Introduction

Medical image segmentation has recently been paid much attention, as its performance has improved to the level used in medical diagnostics, including retinal disorder [1,2,3,4,5], cancer [6,7], and finger vein recognition [8]. In particular, tumor detection using computer tomography (CT) and magnetic resonance imaging (MRI) images can extract tumor regions roughly, so these techniques have reached practical uses [9,10,11]. By contrast, bone semantic segmentation needs precise pixelwise region detection to generate bone models accurately, so it can be used for medical diagnoses such as osteoporosis of the femur, which has a simple shape; this technique is not yet at a practical stage for use with the spine [12,13]. As this demonstrates, the semantic segmentation of bones is still a challenging research topic. Semantic segmentation is a regional extraction problem that requires a semantic understanding of each pixel and precise accuracy. Before the advent of deep learning, various image processing methods based on clustering and boosting had been proposed. These methods required appropriate feature descriptions for semantic segmentation from various image features, which included data such as color tones and intensity, color palettes, shapes, and textures such as HOG, SURF, and SHIFT [14,15,16]. They also required these features to be optimally selected considering each subject image dataset. By contrast, many deep learning-based methods have been published based on general image datasets, such as those represented by ImageNet. A fully convolutional network (FCN) [17] was the first convolutional neural network (CNN)-based semantic segmentation method and replaced fully connected layers with a 1 × 1 convolution layer. An FCN is based on a pretrained VGG-16 model for image classification but has shown high segmentation performance. SegNet [18] is another CNN-based method, but it has an encoder network and a corresponding decoder network. The novelty of SegNet is the decoder, which uses pooling indices computed in the max-pooling step of the corresponding encoder to perform nonlinear upsampling. SegNet often extracts pedestrians, cars, buildings, and roads based on the Camid dataset. CRF-RNN [19] incorporates a conditional random field (CRF) to clarify detailed parts. Moreover, various other methods, such as Deep Mask [20], Fusion Net [21], and DeepLab versions [22,23] have been proposed. These methods focus on segmentation performance for general images and learn based on a large number of image datasets.
Unlike the methods outlined above, medical image segmentation aims for semantic pixelwise segmentation based on CT and MRI images. Compared to segmentation for general images, medical image segmentation requires highly accurate performance to diagnose a patient’s condition. Early medical image segmentation methods were based on the above-mentioned image processing methods. As several public datasets have been published, CNN-based methods have been proposed [24]; however, it is difficult to obtain pixel-level medical image-labeled data, which are known as ground truths (GTs). For this reason, state-of-art models for image segmentation have not been adopted because most of them consist of deep layers and require huge datasets to optimize numerous weight parameters. A widely successful CNN-based model for medical imaging is the U-net [25]. The structure of the U-net is U-shaped and has a relatively compact architecture with skip connections. It can combine high-level semantic feature maps from the decoder and corresponding low-level detailed feature maps from the encoder. Most U-net-based medical image segmentation methods use benchmark datasets [26,27]; however, each patient model must be constructed for an accurate diagnosis. Compared to other models, the U-net requires less training data, but the cost of creating GTs remains high.
In general, GTs are created from precise handwritten boundary lines, generated one by one, for training in the U-net. The boundary of one part of the bones should be closed, and any missing line parts should not be admitted. Moreover, it is difficult to write the lines in a single stroke in many cases because most bone parts, such as the spine, are complex. Therefore, it takes time to generate each patient’s bone model, even when applying U-net data. This is the main reason why automatic diagnosis has not been widely used [28,29,30]. With the above in mind, in this paper, we propose a novel semi-automatic semantic segmentation method for U-net data. Specifically, our method proposes a semi-automatic GT generator and an optimal CT image generator. The proposed method overcomes this problem using rough handwritten lines as input by formulating and solving the bone boundary generation problem with CT images, reframing it as a clustering and combinatorial optimization problem. The proposed method only needs rough handwritten lines, so time savings for input image preparation are expected. However, most CT images are in the digital imaging and communications in medicine (DICOM) data format and contain many parameters, such as window width and center. When applying semantic segmentation, input images are generated by converting from the DICOM data format to the bitmap format. The created input images are generated differently according to their parameters and affect the difficulties of semantic segmentation. To the best of our knowledge, the optical parameter settings are not applied in U-net; instead, the parameters are related to each other. Therefore, we propose a DICOM parameters optimizer using a differential evolution algorithm (DE), a robust optimization algorithm of high dimensions, to enhance our proposed method. As far as we know, our proposed semi-automatic semantic segmentation using clustering and combinatorial optimization is the first challenge of its kind to enable automatic modeling for each patient. The main contributions of this study are summarized as follows:
(1)
Our method enables a novel semi-automatic GT generation by combining clustering and combinatorial optimization methods.
(2)
Our method optimizes CT images using DICOM parameters based on the adaptive DE.
(3)
We evaluate our GTs’ capability as input data for the U-net using femur, spine, and artificial bones.
To facilitate further semi-automatic semantic segmentation using the U-net approach, we have made our implementation publicly available. Please visit: https://github.com/ISDL-academic/U-Net_base_bone_segmentation (accessed on 25 September 2023).

2. Proposed Method

Figure 1 shows the components of the proposed U-net-based semi-automatic semantic segmentation method. The method gives appropriate input images of the U-net model by optimizing DICOM parameters using adaptive DE and GT images automatically from roughly handwritten lines by solving the defined combinatorial optimization problem. Each algorithm is described in detail in the following subsections.

2.1. Semi-Automatic GTs

Medical semantic segmentation aims at pixel classification. When generating a bone model, missing parts of borderlines are not admitted. This is the main reason why generating such a model by hand takes time. By contrast, various CNN-based methods, such as the U-net, have been proposed to estimate semantic segmentation levels automatically, but these methods learn based on benchmark datasets [26,27], so each patient’s GT data are essential for obtaining their unique bone models. This is why U-net-based methods need more handcrafted GT data than using conventional image processing. The proposed method overcomes this problem using rough handwritten lines as input. The lines show the borderlines of a bone in a CT image but are often not closed, which is essential for GT data. The proposed method refers to this as a combinatorial problem and generates closed regions like a single pen stroke. Moreover, the bone of interest is sometimes divided into multiple parts, so the proposed method uses DBSCAN to extract multiple parts in an image by clustering in advance. DBSCAN can gather points of high density and is effective for points that contain clusters of similar density. Therefore, multiple parts are clustered accurately because points consisting of handwritten lines are distributed continuously and in an unbiased way.
Let each image I t , t = 1 , , T consist of colored pixels p i , i = 1 , from handwritten lines. The following algorithm generates the GT of I t . Based on the generated GT images, the proposed method adopts a U-net, as shown in Figure 1, for segmentation.
O-0: 
C = { p i ; i = 1 } ,
O-1: 
Divide C into M parts using DBSCAN, C = C 1 C M ,
O-2: 
m 1 ,
O-3: 
while m < = M
  min. i C m j C m d i j x i j , d i j = | p i p j | ,
  s.t. j = 1 | c m | x i j ( i C m ) ,
      i = 1 | c m | x i j ( j C m ) ,
      i S j C m S x i j 1 ( S C m , S ) ,
      x i j { 0 , 1 } ( i , j C m ) ,
  Draw a line between p i and p j when x i j = 1 ,
  Fill in the boundary,
   m m + 1 .

2.2. Windowing by jDE

DICOM is an international standard format for medical images such as CT and MRI. DICOM has various parameters consisting of personal information such as name, gender, and hospital, as well as image composition information such as CT number, window size, and tone level. The CT number is the value of the X-ray absorption coefficient of the tissue captured by CT, with 0 for water and −1000 for air. The values range from −1000 to 4000 and must be converted to a range from 0 to 255 when outputting images. If all CT numbers are projected in the image output, the tissue irrelevant to the diagnosis is also included in the image. Thus, it is necessary to use only the CT number ranges that match the target tissue. This conversion is called windowing. Windowing uses two parameters, window center (WC) and window width (WW), to specify the center and width of the target area, respectively. Figure 2 shows an example of the distribution of CT numbers in body tissues. More detailed observations become possible by setting the window conditions according to the target tissue.
In our proposed method, the U-net is applied for semantic segmentation, but image preprocessing affects the final performance of the segmentation. As mentioned above, windowing parameters are sensitive and require a real-value optimization. To estimate appropriate windowing parameters, in this paper, we optimize them using adaptive differential evolution (jDE). jDE [31] is a kind of DE and is a real-value optimization method that applies an evolutionary computation algorithm to perform a multipoint search. DE is usually applied to nonlinear and nonconvex optimization problems that cannot be differentiated. Compared to other real-value optimization approaches, it is faster and more robust because it does not require complex computations and differentiation and has few operating parameters. The parameters that need to be set in DE are the number of generations, scaling factor, and crossover probability. The “number of generations” indicates the number of searches, while the “scaling factor” is a parameter that manipulates the range of searches. The crossover probability is a parameter used for inheriting elements from the original individual in a crossover. The jDE algorithm is as follows:
D-0 
Generate an initial population set X = { x 1 , , x N } with a random uniform distribution within feasible regions.
D-1 
g 1
D-2 
Select three individuals x r 1 , x r 2 , x r 3 for individual x i , where ( r 1 r 2 r 3 ).
D-3 
Generate the mutant vector v i = x r 1 + F i g ( x r 2 x r 3 ) , where F [ 0 , 1 ] is a scaling parameter.
D-4 
Select j randomly from [1, n ] , where n is the dimension of each x i .
D-5 
Cross over between the mutant vector v i and the parent vector x i using the following equation,
u i , j = v i , j if ( rand [ 0 , 1 ) C R i g ) ( j = rand [ 1 , n ] ) , x i , j otherwise ,
where C R represents the crossover rate.
D-6 
Apply D-1 to D-4 to generate a population of trial vectors U = { u i ; i = 1 , , N } .
D-7 
Evaluate U and update the solution set.
 D-8 
Update F i g and C R i g by the following equations
F i g + 1 = F l + rand [ 0 , 1 ) · F u if ( rand [ 0 , 1 ) τ 1 , F i g otherwise ,
C R i g + 1 = F l + rand [ 0 , 1 ) if ( rand [ 0 , 1 ) τ 2 , C R i g otherwise ,
D-9 
g g + 1
D-10 
Repeat D-1 through D-9 for a certain period.
Using the jDE, we optimized the windowing parameters w i = { w c i , w w i } , where w c is a window center and w w is a window width. It takes a lot of time to compare the GT and an output image of the U-net, so we introduce the following simple alternative method based on DICOM data elements.
P-0 
Generate initial parameter set W = { w 1 , w 2 , , w N } with a uniform random number U(0, 1500), and convert DICOM images to PNG images using W .
P-1 
If each pixel value of each PNG image is above the threshold (in our experiment, we set it to (200, 200, 200)), this pixel is converted to a bone label. We refer to these converted PNG images as windowed images.
P-2 
Execute the jDE. The objective function is set to the average of the IoU between each windowed image and its GT image.
P-3 
Repeat P-1 to P-2 for a certain period.

2.3. U-Net

The U-net [25] is a kind of convolution network using an autoencoder for semantic segmentation. Figure 3 shows the U-net architecture we applied. The first half of the U-shaped network is an encoder, and the second half is a decoder. The encoder consists of convolution layers and pooling layers, which repeatedly perform two 3 × 3 convolutions, each followed by a rectified linear unit (ReLU) and a 2 × 2 max pooling with stride 2 for downsampling. The number of feature maps increases as the layers get deeper, with the deepest layer having 1024 feature maps. The decoder consists of deconvolution layers and upconvolution layers with a 2 × 2 upsampling and two 3 × 3 convolutions, each followed by a ReLU. The feature maps of the encoder part at the same depth are concatenated with the feature maps obtained by upsampling, and a convolution is performed. The last layer performs a 1 × 1 convolution and assigns a class to each pixel. It uses skip connections to combine the high-level semantic feature maps from the decoder and corresponding low-level detailed feature maps from the encoder. Due to the skip connection and the relatively small network described above, the U-net approach is often used in medical image segmentation.

3. Evaluations

3.1. Experimental Settings

For performance evaluation, two datasets were used. One was from an open dataset [32], and the other was a CT scan image of a pig’s vertebrae we scanned as artificial bones. From [32], one kind of spine and femur image set was applied. The numbers of scan-by-scan images of the artificial bone, spine, and femur were 86, 276, and 162, respectively. Each dataset was divided into two sets randomly for training and testing in a 1:1 ratio. The image size was 256 × 256.
FCN, U-net, SegNet, and DeepLab are well-known segmentation methods, but since U-net is an extension of FCN and SegNet is specialized for road scenes, U-net and DeepLab3, the latest DeepLab method, were used as the comparison methods in our experiments. When training the proposed model and DeepLabv3, Adam was employed as the optimizer. The batch size was 32, the epoch number was 100, and the training rate was set to 0.001.
As the jDE parameter, the initial values F i = 0.5 and C R i = 0.9 were used for each individual, and the F i ’s lower value F l and its upper value F u were set to 0.1 and 0.9, respectively, and τ 1 = 0.1 and τ 2 = 0.2 , at each generation (jDE can adaptively control F and CR, so these parameters are not sensitive, but we set commonly used initial values). The number of individuals was 100, and the generation number was 10.

3.2. Results

3.2.1. GTs

Figure 4 shows the results of the proposed GT generator using our semi-automatic method. Due to paper-length limitations, only the results for the artificial bone are presented here. Figure 4a shows the input images, and Figure 4b,c show the results of the proposed method without steps 0–1 and the proposed method, respectively. We found that while the borderlines of Figure 4b were connected to each other, the borderlines of Figure 4c were correctively divided into parts, thanks to DBSCAN. Handwritten GT lines are often not closed and cannot be filled with GT color, resulting in rewrites, and it takes 10 min per image by an experienced annotator (>2 years). However, the proposed method automatically generated GT images filled with the GT color and only took 43 s to generate 86 images using an ordinary Apple M1 3.20 GHz processor computer with 16 GB of memory. The generated GTs were verified by experienced persons (>2 years) who confirmed that the GTs were the same as the GTs made by experienced persons. From these results, we could verify that the proposed semi-automatic GT generation worked effectively from handwriting lines and could reduce time-consuming GT-generation tasks. The proposed method is highly effective when the number of data is large, such as when creating a bone model of the entire body.

3.2.2. Segmentation Results

Table 1, Table 2 and Table 3 show the semantic segmentation results in terms of IoU, dice, and pairwise accuracy. As shown in Table 3, we found that the proposed method had the best performance; however, the performance improvement regarding the semantic segmentation was not so apparent because the pairwise accuracy includes the accuracy regarding the background. In Table 1 and Table 2, we observed that there were no clear differences between the results of the U-net, DeepLabv3, and the proposed methods, in terms of the femur. Compared to artificial bone and spine, the femur is a relatively easier-to-extract bone area because its shape is not complex. Therefore, DeepLabv3 worked well even in this experimental condition with a small number of training images. Regarding artificial and spine bones, we verified the proposed method outperformed U-net, thanks to optimizing windowing parameters with jDE. jDE is a stochastic method, but these results were stable. Moreover, we verified that the U-net-based model was better suited than DeepLabv3 under the condition of limited training data, based on these results. The architecture of DeepLabv3 has many more layers than the U-net, so DeepLabv3 needs a lot of training data. Figure 5, Figure 6 and Figure 7 show the segmentation results, and (a), (b), (c), (d), and (e) are the input images, GT images using the proposed method, with the results of Deeplabv3, of the U-net, and of the proposed method, respectively. From Figure 7a, we can see that the white color is not unified in bone areas because the bone contours are denser than those of the inside. Considering the semantic segmentation from the input images, it makes it difficult to extract whole bone parts.
As shown in the left figures in Figure 5 and the right figures in Figure 6, the proposed method, windowing jDE + U-net, could extract more precise areas compared to U-net alone. The proposed method could automatically generate the GT images from handwritten lines and optimize DICOM image parameters for performance improvement. From these results, we could verify that an appropriate windowing parameter generated clearer input images for the U-net model, and the proposed method only needed to solve the easier semantic segmentation problem, thanks to the input images using jDE.

4. Conclusions and Future Work

Bone semantic segmentation is essential for generating a bone model to automatically diagnose osteoporosis; however, it takes much time to generate bone borderlines, which are input data from the bone model, because it is common to write borderlines by hand. To obtain borderlines automatically from input data, a neural network model is often used, but GT images are required for learning when using a neural network model for semantic segmentation. Moreover, each handwritten borderline is often rough, so line completion and parts segmentation are necessary to obtain GT images. Therefore, we proposed a semi-automatic GT generator using a clustering method, DBSCAN, and solved the combinatorial optimization problem.
Further, we introduced a novel input image generator for CNN models for performance improvement. The proposed method incorporates the U-net model for segmentation, and input images of U-net are generated from DICOM data, a standard CT image format. DICOM data have various parameters related to input images, so we could obtain clear images for performance improvement of U-net if the appropriate parameters are given. These parameters have deep relationships with one another, so this paper proposed automatic windowing using jDE, a continuous real-value optimizer, because these parameters are real-value. We also defined an alternative optimization problem to reduce computational costs, instead of using U-net outputs to optimize input images. We aimed to generate a bone model for each patient, so both optimizers needed to work with a few images. U-net and jDE can work with a relatively small number of images to optimize network and DICOM parameters. We used fewer than 100 images for training.
We evaluated our proposed method using two open datasets and our scanned artificial bone. Our experimental results showed that the proposed method could create GT images from roughly handwritten borderlines precisely, and we found that the time required to create training images, which used to take about 10 min per image by hand, could be significantly reduced. Moreover, the proposed method outperformed the conventional U-net and DeepLabv3 models regarding the IoU, Dice coefficient, and pairwise accuracy by incorporating the proposed input image generator using jDE. From the experimental results, we determined that DeepLabv3 showed good performance for the femur, which has a simple shape, as well as U-net and the proposed method, while U-net and the proposed method showed high performance for the artificial bone and spine. Moreover, the proposed method outperformed the U-net. Although DeepLab3 requires a large number of images to train the network, it could be considered capable of coarsely extracting simple shapes, such as femurs. By contrast, the proposed method, which also adjusts the parameters of the DICOM data, can perform better in extracting complex shapes. We verified that performance improvement was observed from segmentation images generated by the proposed method. Our main contributions are not only performance improvements but also a proposal for semi-automatic training for data generation. We believe that the proposed method will allow us to generate each patient’s bone simulation for practical bone diagnosis within a reasonable time. However, there are bones with more complex shapes, such as those in the hand. For practical bone diagnoses, we should evaluate such datasets and work to improve the proposed method. The proposed method does not incorporate adjacent bone information. We hypothesize that more precise segmentation will be available if we can model the adjacent relationships, because the bone shape is continuous. Our future work includes evaluating the proposed method using more complex bone shapes and incorporating continuous bone information, in order to acquire fine bone shapes accurately.

Author Contributions

Conceptualization, K.O. and D.T.; methodology, K.O.; software, Y.T.; validation, K.O., Y.T.; formal analysis, Y.T.; investigation, K.O.; data curation, S.Y. (Sohei Yamakawa), S.Y. (Shoma Yakushijin); writing—original draft preparation, K.O., Y.T.; writing—review and editing, K.O.; visualization, K.O.; supervision, K.O.; project administration, K.O. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JSPS KAKENHI grant number 23H01308.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Bilal, A.; Zhu, L.; Deng, A.; Lu, H.; Wu, N. AI-Based Automatic Detection and Classification of Diabetic Retinopathy Using U-Net and Deep Learning. Symmetry 2022, 14, 1427. [Google Scholar] [CrossRef]
  2. Bilal, A.; Sun, G.; Mazhar, S.; Imran, A.; Latif, J. A Transfer Learning and U-Net-based automatic detection of diabetic retinopathy from fundus images. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2022, 10, 663–674. [Google Scholar] [CrossRef]
  3. Bilal, A.; Sun, G.; Mazhar, S.; Imran, A. Improved Grey Wolf optimization-based feature selection and classification using CNN for diabetic retinopathy detection. In Evolutionary Computing and Mobile Sustainable Networks: Proceedings of ICECMSN 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–14. [Google Scholar]
  4. Bilal, A.; Sun, G.; Mazhar, S. Diabetic retinopathy detection using weighted filters and classification using CNN. In Proceedings of the 2021 International Conference on Intelligent Technologies (CONIT), Hubli, India, 25–27 June 2021; pp. 1–6. [Google Scholar]
  5. Bilal, A.; Sun, G.; Li, Y.; Mazhar, S.; Khan, A.Q. Diabetic retinopathy detection and classification using mixed models for a disease grading database. IEEE Access 2021, 9, 23544–23553. [Google Scholar] [CrossRef]
  6. Bilal, A.; Sun, G.; Li, Y.; Mazhar, S.; Latif, J. Lung nodules detection using grey wolf optimization by weighted filters and classification using CNN. J. Chin. Inst. Eng. 2022, 45, 175–186. [Google Scholar] [CrossRef]
  7. Bilal, A.; Shafiq, M.; Fang, F.; Waqar, M.; Ullah, I.; Ghadi, Y.Y.; Long, H.; Zeng, R. IGWO-IVNet3: DL-Based Automatic Diagnosis of Lung Nodules Using an Improved Gray Wolf Optimization and InceptionNet-V3. Sensors 2022, 22, 9603. [Google Scholar] [CrossRef] [PubMed]
  8. Bilal, A.; Sun, G.; Mazhar, S. Finger-vein recognition using a novel enhancement method with convolutional neural network. J. Chin. Inst. Eng. 2021, 44, 407–417. [Google Scholar] [CrossRef]
  9. Havaei, M.; Davy, A.; Warde-Farley, D.; Biard, A.; Courville, A.; Bengio, Y.; Pal, C.; Jodoin, P.M.; Larochelle, H. Brain tumor segmentation with deep neural networks. Med. Image Anal. 2017, 35, 18–31. [Google Scholar] [CrossRef]
  10. Bilic, P.; Christ, P.; Li, H.B.; Vorontsov, E.; Ben-Cohen, A.; Kaissis, G.; Szeskin, A.; Jacobs, C.; Mamani, G.E.H.; Chartrand, G.; et al. The liver tumor segmentation benchmark (lits). Med. Image Anal. 2023, 84, 102680. [Google Scholar] [CrossRef]
  11. Pereira, S.; Pinto, A.; Alves, V.; Silva, C.A. Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans. Med. Imaging 2016, 35, 1240–1251. [Google Scholar] [CrossRef]
  12. Deniz, C.M.; Xiang, S.; Hallyburton, R.S.; Welbeck, A.; Babb, J.S.; Honig, S.; Cho, K.; Chang, G. Segmentation of the proximal femur from MR images using deep convolutional neural networks. Sci. Rep. 2018, 8, 16485. [Google Scholar] [CrossRef]
  13. Chen, F.; Liu, J.; Zhao, Z.; Zhu, M.; Liao, H. Three-dimensional feature-enhanced network for automatic femur segmentation. IEEE J. Biomed. Health Inform. 2017, 23, 243–252. [Google Scholar] [CrossRef] [PubMed]
  14. Chen, W.T.; Liu, W.C.; Chen, M.S. Adaptive color feature extraction based on image color distributions. IEEE Trans. Image Process. 2010, 19, 2005–2016. [Google Scholar] [CrossRef] [PubMed]
  15. Radman, A.; Zainal, N.; Suandi, S.A. Automated segmentation of iris images acquired in an unconstrained environment using HOG-SVM and GrowCut. Digit. Signal Process. 2017, 64, 60–70. [Google Scholar] [CrossRef]
  16. Wang, S.; Zhu, W.; Liang, Z.P. Shape deformation: SVM regression and application to medical image segmentation. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 209–216. [Google Scholar]
  17. Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
  18. Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
  19. Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 7–13 December 2015; pp. 1529–1537. [Google Scholar]
  20. O Pinheiro, P.O.; Collobert, R.; Dollár, P. Learning to segment object candidates. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 7–12 December 2015; Volume 28. [Google Scholar]
  21. Quan, T.M.; Hildebrand, D.G.C.; Jeong, W.K. Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics. Front. Comput. Sci. 2021, 3, 613981. [Google Scholar] [CrossRef]
  22. Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
  23. Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
  24. Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef]
  25. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
  26. Sekuboyina, A.; Husseini, M.E.; Bayat, A.; Löffler, M.; Liebl, H.; Li, H.; Tetteh, G.; Kukačka, J.; Payer, C.; Štern, D.; et al. VerSe: A vertebrae labelling and segmentation benchmark for multi-detector CT images. Med. Image Anal. 2021, 73, 102166. [Google Scholar] [CrossRef]
  27. Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
  28. Kim-Wang, S.Y.; Bradley, P.X.; Cutcliffe, H.C.; Collins, A.T.; Crook, B.S.; Paranjape, C.S.; Spritzer, C.E.; DeFrate, L.E. Auto-segmentation of the tibia and femur from knee MR images via deep learning and its application to cartilage strain and recovery. J. Biomech. 2023, 149, 111473. [Google Scholar] [CrossRef]
  29. Liukkonen, M.K.; Mononen, M.E.; Tanska, P.; Saarakkala, S.; Nieminen, M.T.; Korhonen, R.K. Application of a semi-automatic cartilage segmentation method for biomechanical modeling of the knee joint. Comput. Methods Biomech. Biomed. Eng. 2017, 20, 1453–1463. [Google Scholar] [CrossRef]
  30. Flannery, S.W.; Kiapour, A.M.; Edgar, D.J.; Murray, M.M.; Fleming, B.C. Automated magnetic resonance image segmentation of the anterior cruciate ligament. J. Orthop. Res. 2021, 39, 831–840. [Google Scholar] [CrossRef]
  31. Brest, J.; Greiner, S.; Boskovic, B.; Mernik, M.; Zumer, V. Self-adapting control parameters in differential evolution: A comparative study on numerical benchmark problems. IEEE Trans. Evol. Comput. 2006, 10, 646–657. [Google Scholar] [CrossRef]
  32. Cancer Imaging Archive. Available online: https://www.cancerimagingarchive.net/ (accessed on 30 September 2022).
Figure 1. Components of the proposed method.
Figure 1. Components of the proposed method.
Applsci 13 10798 g001
Figure 2. CT number distribution.
Figure 2. CT number distribution.
Applsci 13 10798 g002
Figure 3. U-net architecture. The number of channels is denoted at the bottom of the box. The arrows denote the different operations.
Figure 3. U-net architecture. The number of channels is denoted at the bottom of the box. The arrows denote the different operations.
Applsci 13 10798 g003
Figure 4. Results of generated GTs by the proposed GT generator.
Figure 4. Results of generated GTs by the proposed GT generator.
Applsci 13 10798 g004
Figure 5. Artificial bone segmentation results of U-net and U-net with windowing by jDE.
Figure 5. Artificial bone segmentation results of U-net and U-net with windowing by jDE.
Applsci 13 10798 g005
Figure 6. Spine segmentation results of U-net and U-net with windowing by jDE.
Figure 6. Spine segmentation results of U-net and U-net with windowing by jDE.
Applsci 13 10798 g006
Figure 7. Femur segmentation results of U-net and U-net with windowing by jDE.
Figure 7. Femur segmentation results of U-net and U-net with windowing by jDE.
Applsci 13 10798 g007
Table 1. Results of semantic segmentation in terms of IoU.
Table 1. Results of semantic segmentation in terms of IoU.
MethodArtificial BoneSpineFemur
Deeplabv30.792 ± 0.188 0.644 ± 0.115 0.870 ± 0.022
U-net0.811 ± 0.184 0.837 ± 0.103 0.946 ± 0.041
U-net + jDE (Proposed)0.832 ± 0.177 0.860 ± 0.089 0.949 ± 0.027
Table 2. Results of semantic segmentation in terms of Dice.
Table 2. Results of semantic segmentation in terms of Dice.
MethodArtificial BoneSpineFemur
Deeplabv30.867 ± 0.167 0.777 ± 0.094 0.930 ± 0.013
U-net0.882 ± 0.134 0.907 ± 0.067 0.972 ± 0.023
U-net + jDE (Proposed)0.896 ± 0.126 0.922 ± 0.057 0.973 ± 0.014
Table 3. Results of semantic segmentation in terms of pixelwise accuracy.
Table 3. Results of semantic segmentation in terms of pixelwise accuracy.
MethodArtificial BoneSpineFemur
Deeplabv30.995 ± 0.003 0.997 ± 0.001 0.998 ± 0.001
U-net0.995 ± 0.004 0.998 ± 0.001 0.999 ± 0.001
U-net + jDE (Proposed)0.996 ± 0.004 0.999 ± 0.001 0.999 ± 0.001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ono, K.; Tawara, D.; Tani, Y.; Yamakawa, S.; Yakushijin, S. U-Net-Based Semi-Automatic Semantic Segmentation Using Adaptive Differential Evolution. Appl. Sci. 2023, 13, 10798. https://doi.org/10.3390/app131910798

AMA Style

Ono K, Tawara D, Tani Y, Yamakawa S, Yakushijin S. U-Net-Based Semi-Automatic Semantic Segmentation Using Adaptive Differential Evolution. Applied Sciences. 2023; 13(19):10798. https://doi.org/10.3390/app131910798

Chicago/Turabian Style

Ono, Keiko, Daisuke Tawara, Yuki Tani, Sohei Yamakawa, and Shoma Yakushijin. 2023. "U-Net-Based Semi-Automatic Semantic Segmentation Using Adaptive Differential Evolution" Applied Sciences 13, no. 19: 10798. https://doi.org/10.3390/app131910798

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop