Improvement of Retinal Vessel Segmentation Method Based on U-Net

Wang, Ning; Li, Kefeng; Zhang, Guangyuan; Zhu, Zhenfang; Wang, Peng

doi:10.3390/electronics12020262

Open AccessArticle

Improvement of Retinal Vessel Segmentation Method Based on U-Net

by

Ning Wang

¹,

Kefeng Li

¹

,

Guangyuan Zhang

^1,*,

Zhenfang Zhu

¹

and

Peng Wang

^1,2

¹

School of Information Science and Electrical Engineering, Shandong Jiaotong University, Jinan 250357, China

²

Institute of Automation, Shandong Academy of Sciences, Jinan 250013, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(2), 262; https://doi.org/10.3390/electronics12020262

Submission received: 9 December 2022 / Revised: 28 December 2022 / Accepted: 2 January 2023 / Published: 4 January 2023

(This article belongs to the Special Issue New Machine Learning Technologies for Biomedical Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Retinal vessel segmentation remains a challenging task because the morphology of the retinal vessels reflects the health of a person, which is essential for clinical diagnosis. Therefore, achieving accurate segmentation of the retinal vessel shape can determine the patient’s physical condition in a timely manner and can prevent blindness in patients. Since the traditional retinal vascular segmentation method is manually operated, this can be time-consuming and laborious. With the development of convolutional neural networks, U-shaped networks (U-Nets) and variants show good performance in image segmentation. However, U-Net is prone to feature loss due to the operation of the encoder convolution layer and also causes the problem of mismatch in the processing of contextual information features caused by the skip connection part. Therefore, we propose an improvement of the retinal vessel segmentation method based on U-Net to segment retinal vessels accurately. In order to extract more features from encoder features, we replace the convolutional layer with ResNest network structure in feature extraction, which aims to enhance image feature extraction. In addition, a Depthwise FCA Block (DFB) module is proposed to deal with the mismatched processing of local contextual features by skip connections. Combined with the two public datasets on retinal vessel segmentation, namely DRIVE and CHASE_DB1, and comparing our method with a larger number of networks, the experimental results confirmed the effectiveness of the proposed method. Our method is better than most segmentation networks, demonstrating the method’s significant clinical value.

Keywords:

retinal vessels segmentation; U-Net; feature extraction

1. Introduction

Retinal vessels play an important role in various fields, especially in clinical medicine, wherein retinal vessels obtained are particularly important for diagnosing certain diseases and are also very easy to observe. Retinal morphology and density can provide timely feedback to the doctor and can cause visual impairment in mild cases and blindness in severe cases, such as diabetes, as well as cardiovascular diseases [1,2], so the precise segmentation of retinal vessels is crucial. The manual operation of retinal vessel segmentation was adopted in the early years, which is relatively backward. This method is not only time-consuming and labor-intensive, resulting in low efficiency, but also the segmentation results are not ideal. With the development of medical images, more and more image segmentation methods are used to segment retinal vessels. These have achieved relatively good results. Therefore the accuracy of retinal vascular segmentation for blindness prevention is particularly important [3].

In the early stage of medical image segmentation, the development of Convolutional Neural Networks (CNNs) [4] has achieved good results in image segmentation. CNNs can automatically learn features mainly through multi-layer structures and are also capable of learning features at multiple levels. Long et al. [5] proposed an FCN network structure that can classify images at the pixel level. FCN replaces the fully connected layer in CNNs with a convolutional layer, which can solve the segmentation problem at the semantic level by classifying each pixel. On this basis, Ronneberger et al. [6] proposed a network structure similar to a u-shaped encoder–decoder, which is called U-Net. In the middle of the encoder and decoder, a skip connection is used to obtain better context information and global features, and four pooling layers can also realize the multi-scale feature recognition of image features. Because of the excellent performance of U-Net, most of the medical image segmentation networks are based on the improvement of U-Net. Zhou et al. [7] proposed a U-Net++ network structure whose main purpose is to solve the low efficiency of the U-Net network caused by a large number of experiments and the problem of feature fusion only at the same scale; it significantly improves the optimal depth of supervised learning, can flexibly aggregate more scale features in skip connection and improves the efficiency by pruning technology to achieve a better segmentation effect. Alom et al. [8] proposed the R2U-Net model for medical image segmentation, adding a convolutional neural network of cyclic residuals to the U-Net model, which allows the network to deepen while avoiding the problem of disappearing gradients in the better acquisition of features at the same time to improve the accuracy of segmentation. The Attention U-Net network proposed by Oktay et al. [9] can well incorporate an attention mechanism into the neural network to make the skip connection more selective, which can improve attention to the segmented region and improve performance without increasing the calculation amount. To better fuse features together, Huang et al. [10] proposed a multi-scale deeply supervised network structure U-Net+++, which can improve location awareness and enhance the segmentation of boundaries while providing fewer parameters. Trans U-Net network structure proposed by Chen et al. [11] uses the Transformer [12] structure on the structure of the U-Net encoder. Transformer structure can better extract global information, which can make up for CNN’s inability to better extract global information so as to better extract features.

In retinal vascular image segmentation, U-Net and other related networks have also made many improvements. Jin et al. [13] proposed a DU-Net network structure, which uses deformable convolution to replace the original convolution layer, which combines low-level features with high-level features and realizes accurate segmentation according to the size and shape of blood vessels. Hu et al. [14] proposed an S-UNet network structure to avoid the loss of feature information caused by sampling under the U-Net structure. By connecting Mi-U-Net networks in series, this network structure can prevent the overfitting problem caused by small data volume. Wang et al. [15] proposed a FRNet that, by introducing a FAF structure, could efficiently combine features of different depths and reduce information loss. Yang et al. [16] proposed an improvement based on the U-Net network. The method used is to add a decoder. By adding a fusion network to fuse the outputs of two decoders, the two decoders are responsible for segmenting thin and thick vessels, thus achieving accurate segmentation of retinal vessels. Dong et al. [17] proposed a cascaded residual attention U-Net for retinal vessel segmentation, which they named CRAUnet. In this method, they used a similar DropBlock regularization, which could greatly reduce the overfitting problem. In addition, they also used an MFCA module to explore helpful information and to merge information instead of using a direct skip connection. Yang et al. [18] proposed a structure called DCU-net, wherein they used deformable convolution to construct the feature extraction module, and to improve the transfer efficiency, they used a residual channel attention module. Yang et al. [19] proposed a method based on residual attention and dual-supervision cascaded U-Net and named it RADCU-Net. This enhances efficiency while improving the accuracy of retinal vessel segmentation.

Combining the above analysis, an improved retinal vessel segmentation method based on the U-Net network structure is proposed. It has been shown to be effective in both DRIVE and CHASE_DB1 datasets. The main contributions to this paper are as follows:

(1): A new network structure based on U-Net is proposed, which can be used to accurately and efficiently detect retinal blood vessels.
(2): For the problem of partial feature loss caused by using convolution many times, ResNest is used to replace the original convolution layer of the encoder as the main network to enhance the feature extraction, which can better extract the image feature information.
(3): A novel DFB network architecture is proposed to solve the problem of up-and-down feature mismatch caused by skip connections. This can better achieve the image of low-level features and high-level features of the fusion to achieve accurate vascular segmentation.

2. Materials and Methods

Aiming at the problem of retinal vessel segmentation, an improved segmentation network based on U-Net is proposed. This section describes in detail the framework and modules of the proposed U-Net-based improved segmented network.

2.1. Network Structure

Figure 1 is the whole network structure of the proposed improved segmentation network based on U-Net. It is composed of two parts: the main U-shaped network and the multi-scale fusion block. In order to reduce the loss of more image features from encoding to decoding, a DFB structure is proposed in combination with an encoder–decoder framework. In order to solve the problem of information loss caused by convolution, ResNest Block is used to replace the convolution module, which makes the encoder extract image features better. In order to solve the problem of local feature information loss caused by using convolution multiple times in feature extraction, the original convolution module is replaced by ResNest Block so that the encoder can better extract image feature information. In order to solve the problem of image information loss caused by skip connection, a DFB network structure is proposed to optimize the original skip connection and achieve effective multi-scale feature representation.

2.2. Feature Extraction

It is very important to accurately extract the characteristics of retinal blood vessels if we want to understand the patient’s condition through the morphology of retinal blood vessels. We think that the loss of image feature information in the U-Net network structure is mainly caused by convolution in the encoder, so we mainly make some changes to the encoder. Since the deep convolutional neural network has achieved good results in image processing, the ResNest [20] module is used to replace the convolutional block of the encoder for better extraction of image features.

ResNet [21], as one of the most successful CNN architectures, is widely used in computer vision. From ResNet to ResNext [22] and then to ResNest, as the most successful improvement of ResNet, it has a relatively good performance in computer vision downstream tasks. Not only is its computational efficiency the same as ResNet, but also its speed accuracy is better. ResNest also performs well relative to other networks of similar model complexity without introducing additional computational costs and can be used as a skeleton for other tasks. The network structure is shown in Figure 2.

2.3. Depthwise FCA Block

Due to the semantic gap in the skip connection of the U-Net, the context feature information is not processed properly, which causes the mismatch between the low-level image and the high-level image feature fusion. Therefore, this network structure (see Figure 3) is proposed to optimize the existing skip connection part and realize multi-scale feature fusion expression.

In the proposed structure, depth separable convolution [23], a lightweight network, is used. Compared to conventional convolutional operations, the number of parameters and operating costs is relatively low. In this paper, four depth-separable convolutions are connected in parallel. Because too large of a convolution kernel can cause a waste of resources and computing costs, the convolution kernels of the four parallel depth-separable convolutions are selected as 1, 5, 9, and 13. On this basis, after the parallel depth-separable convolution is completed, stitching and clipping operations are performed, and finally the FCA Block [24] structure is entered (see Figure 4), so as to better process the feature information of low-level images. FCA Block is a frequency channel attention network that can highlight important channels in a multi-channel feature map and better express feature information. The FCA Block can also make up for the defect of insufficient feature information in the existing channel attention method, and the channel is generalized to a more general two-dimensional discrete cosine transform (DCT) form through a global average pool. In this way, more frequency components can be introduced to make full use of the information, and the image feature information processed by the encoder can be better fused to the decoder.

2.4. Datasets

In order to fully reflect the performance of the proposed network structure, the method is evaluated on two common datasets: DRIVE [25], CHASE_DB1 [26] (See Table 1).

Drive: There are a total of 40 fundus photos, 20 of which are used as a training set and 20 as a test set. The size of each image is 584 × 565 pixels, and the channel is 3. Each image also has a circular 45° field of view (FOV) for performance evaluation.

CHASE_DB1: A total of 28 eyeground photographs of 14 children, 20 of them as a training set and 8 as a test set. The size of each image is 999 × 960 pixels, and the channel is 3. Each image also has a circular 30° field of view (FOV) for performance evaluation.

Figure 5 is a partial sample image of both datasets.

2.5. Image Preprocessing

In order to train the proposed model better, image preprocessing is an important task. In the processing of retinal blood vessel images, the method of normalization processing is used. Since the DRIVE and CHASE_DB1 datasets have different pixels, we have standardized the image input pixels to 480 × 480.

Then each channel of the retinal vascular image is normalized. That is, the mean is subtracted from the characteristics of each channel, and then divided by variance. On this basis, we also use random flips and random clipping to enhance the data to achieve better training of the proposed model (see Figure 6). To prove that the preprocessing is effective, we performed a quantitative process (see Table 2), and the models for the comparison experiments we conduct later are in Table 3.

3. Results

In order to better reflect the proposed U-Net based improved network for retinal blood vessel segmentation of the image effect, the use of comparative experiments and ablation experiments to prove its effectiveness.

This section first describes the relevant evaluation indicators, then tests the basic segmentation network structure and compares it with the proposed method.

3.1. Evaluation Indicators

In order to better highlight the effectiveness of the proposed model, some evaluation indicators, including accuracy, F₁-score, sensitivity, specificity, and precision, were used to evaluate the segmentation ability of retinal vascular images.

A_cc (Accuracy): This reflects the proportion of correctly classified blood vessels and background pixels to the total number of pixels (see Equation (1)).

A_{cc} = \frac{TP + TN}{TP + FP + TN + FN}

(1)

Among them, TP and TN represent correctly segmented retinal vessels and background pixels, respectively, FP and FN represent incorrectly segmented retinal vessels and background pixels, respectively.

F₁-score: The ability to measure the accuracy of a dichotomous model, taking into account both model accuracy and recall, is a harmonic mean of both accuracy and recall (see Equation (2)).

F_{1} = \frac{2 \times TP}{2 \times TP + FP + FN}

(2)

S_e (Sensitivity): Also known as true positive rate (TPR), this represents the proportion of retinal vessels correctly identified (see Equation (3)).

S_{e} = \frac{TP}{TP + FN}

(3)

S_p (Specificity): Also known as true negative rate (TNR), this represents the proportion of pixel points with correct background pixel classification to the total number of background pixels (see Equation (4)).

S_{p} = \frac{TN}{TN + FP}

(4)

P_r (Precision): This represents the proportion of the number of correctly segmented retinal vessel pixels to the total number of segmented retinal vessel pixels (see Equation (5)).

P_{r} = \frac{TP}{TP + FP}

(5)

3.2. Experimental Setup

The proposed model is based on the Py-Torch framework. For training of the model, the epoch for training was set to 200. We used the SGD optimizer, setting the learning rate to 1 × 10⁻², the momentum to 0.9, and the weight decay to 1 × 10⁻⁴. The batch size was set to 4. In addition, in order to speed up the network training and testing process, the Nvidia GeForce RTX5000 TI card was used on the above-mentioned experimental process (See Figure 7).

For the loss function, cross entropy loss plus dice loss was selected. Cross-entropy loss, which is a common loss function in semantic segmentation, can not only obtain the difference in prediction probability but also measure the performance of different classifiers in more detail.

3.2.1. Cross-Entropy Loss

Cross-entropy is a very important concept in information theory, whose main purpose is to measure the difference between two probability distributions. For image segmentation, cross-entropy loss is calculated by the average cross entropy of all pixels. The definition of cross-entropy loss is:

L o s s (y^{'}, y) = \frac{1}{| Ω |} \sum_{i = 1}^{Ω} - y_{i} \log ({y^{'}}_{i}) - (1 - y_{i}) \log (1 - {y^{'}}_{i})

(6)

If Ω represents a pixel region, the pixel region consists of height a, width b, and K classes. Then there is

y \in M_{a \times b \times K} ({0, 1})

,

y^{'} \in M_{a \times b \times K} ([0, 1])

(see Equation (6)).

3.2.2. Dice Loss

D i c e L o s s = 1 - \frac{2 | X \cap Y |}{| X | + | Y |}

(7)

where X denotes the pixel tag of a true-segmented retinal vessel, and Y denotes the pixel category of the model-predicted retinal vessel-segmented image (see Equation (7)).

3.3. Ablation Experiments

In order to prove the effectiveness of each method, the ablation experiment is used to study the performance of each module for segmented images. First, the basic U-Net network is tested, and then the performance analysis is carried out by adding ResNest Block and DFB network structures. The results are shown in Table 4’s ablation experiment results.

From the results of the ablation experiment, it can be seen that, compared with the basic network structure model, our proposed model has improved in all indicators, especially in F₁-score and P_r indicators. √ in the table means the module has been added, and × means the module has not been added.

On the DRIVE dataset, F₁-score improved from 0.8278 to 0.8687, and P_r improved from 0.7287 to 0.7863. When the DFB module is added, the F₁-score, S_p, and P_r indicators were all improved, which is enough to prove the effectiveness of the proposed DFB module.

In the Chase_DB1 dataset, the performance indicator can also show that the proposed model has a good effect. The key indicators F₁-score and P_r are more effective, and the other indicators also show more outstanding performance than the basic network model.

From ablation experiments of the above two datasets, we can see that, except for A_cc and S_e, by adding the DFB module, the other performance indicators have been improved. Although A_cc and S_e decreased slightly after the addition of the DFB module, the performance of the proposed model is improved, which shows the validity of the proposed model for retinal image segmentation.

3.4. Comparison Test with Other Models

In order to better highlight the effect of the proposed model, we conducted a series of experiments in two public datasets, showing the effect of retinal blood vessel segmentation and corresponding indicators. The results were compared with more advanced models, with quantitative analysis and qualitative analyses of relevant indicators.

As shown in Table 5, the F₁-score of the proposed improved network structure based on U-Net on the DRIVE dataset can reach 0.8687, the score on S_p can reach 0.9703, and the score on P_r can reach 0.7863, which are higher than the scores of other models. Compared to the other best data, the difference between A_cc and the best score was 0.0080, and the difference between S_e scores was 0.0842.

On the CHASE_DB1 dataset, as shown in Table 6, the proposed improved network structure based on U-Net still obtains relatively good results. Among them, F₁-score can reach 0.8358, S_p score can reach 0.9720, P_r score can reach 0.7330; these scores are also the highest compared with other models. Compared to the other best data, the numerical difference between A_cc and the best score was 0.0060 and between S_e and the best score was 0.1207.

Obviously, the network structure has obvious segmentation effect in the two data sets. From the contrast experiment, the F₁-score, S_p, and P_r of the improved model based on U-Net are the highest on DRIVE dataset and CHASE_DB1 dataset, but the effect of the change on A_cc and S_e is not obvious. Among them, the effect of A_cc is not significantly reduced compared to other advanced models, so the improvement of A_cc is effective on the whole.

The S_e effect is not ideal or even slightly decreased in the ablation test and comparison test above, which may be caused by an excessive proportion of background noise pixels. Because the background pixel noise is too high, this will lead to missegmentation of the background pixel, resulting in an increase in FN in Equation (3), which will decrease S_e. In addition, other noise factors may be introduced in the identification process, which may be identified as background pixels, so that the proportion of correctly identified retinal blood vessel pixels measured may be reduced, which may also lead to a slight decrease in the S_e indicator.

In order to prove that the inference that the S_e indicator obviously decreases in the above experiment is correct, we adopt the opening and closing operation on the picture to remove the noise of the background pixels and analyze the influence of noise on the S_e indicator.

From the analysis of the S_e performance indicator by denoising in Table 7, it can be seen that the performance indicator of S_e rises after denoising, which proves that the inference we made above is correct. On the whole, the proposed model has better segmentation performance.

We did the same for A_cc and found that the denoised A_cc metric also increased. The results are presented in Table 8.

3.5. Qualitative Analysis

In order to visualize the effect of retinal vessel segmentation, the performance of the proposed model was qualitatively analyzed by visualization. Some samples are selected from the DRIVE and CHASE_DB1 datasets as experimental objects, and the effect of the segmentation map can be seen in Figure 8.

We can clearly see the effect of the proposed model on retinal vessel segmentation in Figure 8. Due to different light intensities in the original image, the segmentation of the darker regions of the retinal vessels is not ideal, and some regions are over-segmented. In the retinal blood vessels, there are more dense areas, which leads to the poor segmentation of small and thin vessels. However, based on qualitative and quantitative analysis results, the proposed model has better segmentation performance for retinal blood vessel segmentation.

For comparison, we also show predicted images from other models (Figure 9).

4. Discussion

Retinal vessel segmentation is still a great challenge, because it is very meaningful for clinical diagnosis to segment retinal vessels accurately. However, now we are faced with not only the light intensity processing of the retinal blood vessel image and the contrast processing with the background image but also the thickness and density of the retinal blood vessel. There are also some problems, such as the accuracy of the methods and techniques of image segmentation.

Although various variants of the U-Net network have achieved good results in medical image segmentation in recent years, due to the problem of information loss caused by multiple convolutions when the encoder module extracts feature images and the mismatch between high-level and low-level features caused by the skip connection part, etc. In order to solve the above problems, the convolutional layer of the encoder is replaced by a ResNest Block network structure, so that feature extraction can be strengthened as much as possible and so that the problem of feature information loss can be reduced. On the basis of this, a new structure DFB module is proposed, which can strengthen the matching of high-level and low-level features and optimize the original skip connection part.

In order to better highlight the proposed model segmentation performance, this paper adopts the methods of quantitative analysis and qualitative analysis and adopts ablation experiments to prove its effectiveness. Through quantitative analysis to compare with the current more advanced models, we can compare the performance of the proposed model from various indicators. This paper selects five important indicators. The proposed model in F₁-score, S_P, and P_r, three key indicators, has a better performance. Through qualitative analysis, we can observe that the performance of the segmented image is better. Ablation experiments verify that the proposed modules are improved.

However, the segmentation of retinal vascular images will face many challenges in the future, such as the problem of the uneven brightness of retinal vascular images, as well as retinal vascular density and thickness, etc. It is still the most difficult problem in retinal vascular image segmentation. In future work, we hope to make up for the shortcomings of A_cc and S_e and to continue to optimize the proposed network model to achieve more accurate segmentation of retinal vessels.

5. Conclusions

In this paper, an improved model based on the U-Net network structure is proposed, which shows a good segmentation effect for retinal image segmentation. Through a comparison of the more advanced network structure, it can reflect the effectiveness of the proposed network model structure. The contributions of this article are as follows:

(1): In order to segment retinal vessels accurately, an improved structure based on U-Net is proposed, which can segment retinal vessels accurately and help patients understand their condition in time.
(2): For the problem of information loss caused by using convolution many times, ResNest is used to replace the original convolution operation of the encoder as the main network structure, which can better extract the retinal vascular features, minimizing the problem of information loss.
(3): To solve the problem of mismatch between high-level features and low-level features caused by skip connection, a novel DFB network structure is proposed, which can better realize the feature fusion of context and realize accurate vessel segmentation.

In the future research, we will improve each indicator of retinal blood vessel segmentation and achieve more accurate retinal blood vessel segmentation.

Author Contributions

Methodology, N.W.; software, K.L.; validation, G.Z., Z.Z. and P.W.; formal analysis, N.W. and G.Z.; investigation, N.W.; resources, N.W. and G.Z.; data curation, N.W.; writing—original draft preparation, N.W.; writing—review and editing, N.W., K.L. and G.Z.; visualization, K.L. and G.Z.; supervision, Z.Z. and P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Shandong Province (ZR2021MF064) and the China Postdoctoral Science Foundation (2021M702030).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

DRIVE dataset source: https://paperswithcode.com/dataset/drive, CHASE_DB1 dataset source: https://paperswithcode.com/dataset/chase-db1.

Conflicts of Interest

The authors declare no conflict of interest.

References

Feldman-Billard, S.; Larger, É.; Massin, P. Early worsening of diabetic retinopathy after rapid improvement of blood glucose control in patients with diabetes. Diabetes Metab. 2018, 44, 4–14. [Google Scholar] [CrossRef] [PubMed]
Cho, K.H.; Kim, C.K.; Woo, S.J.; Park, K.H.; Park, S.J. Cerebral small vessel disease in branch retinal artery occlusion. Investig. Ophthalmol. Vis. Sci. 2016, 57, 5818–5824. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bourne, R.R.; Stevens, G.A.; White, R.A.; Smith, J.L.; Flaxman, S.R.; Price, H.; Jonas, J.B.; Keeffe, J.; Leasher, J.; Naidoo, K.; et al. Causes of vision loss worldwide, 1990–2010: A systematic analysis. Lancet Glob. Health 2013, 1, e339–e349. [Google Scholar] [CrossRef] [Green Version]
O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2017, arXiv:1411.4038. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, The Netherlands, 2015; pp. 234–241. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Cham, The Netherlands, 2018; pp. 3–11. [Google Scholar]
Alom, M.Z.; Hasan, M.; Yakopcic, C.; Taha, T.M.; Asari, V.K. Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv 2018, arXiv:1802.06955. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.-W.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing System, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Jin, Q.; Meng, Z.; Pham, T.D.; Chen, Q.; Wei, L.; Su, R. DUNet: A deformable network for retinal vessel segmentation. Knowl. -Based Syst. 2019, 178, 149–162. [Google Scholar] [CrossRef] [Green Version]
Hu, J.; Wang, H.; Gao, S.; Bao, M.; Liu, T.; Wang, Y.; Zhang, J. S-unet: A bridge-style u-net framework with a saliency mechanism for retinal vessel segmentation. IEEE Access 2019, 7, 174167–174177. [Google Scholar] [CrossRef]
Wang, D.; Hu, G.; Lyu, C. Frnet: An end-to-end feature refinement neural network for medical image segmentation. Vis. Comput. 2021, 37, 1101–1112. [Google Scholar] [CrossRef]
Yang, L.; Wang, H.; Zeng, Q.; Liu, Y.; Bian, G. A hybrid deep segmentation network for fundus vessels via deep-learning framework. Neurocomputing 2021, 448, 168–178. [Google Scholar] [CrossRef]
Dong, F.; Wu, D.; Guo, C.; Zhang, S.; Yang, B.; Gong, X. Medicine. CRAUNet: A cascaded residual attention U-Net for retinal vessel segmentation. Comput. Biol. Med. 2022, 20, 105651. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Li, Z.; Guo, Y.; Zhou, D. DCU-net: A deformable convolutional neural network based on cascade U-net for retinal vessel segmentation. Multimed. Tools Appl. 2022, 81, 15593–15607. [Google Scholar] [CrossRef]
Yang, Y.; Wan, W.; Huang, S.; Zhong, X.; Kong, X. RADCU-Net: Residual attention and dual-supervision cascaded U-Net for retinal blood vessel segmentation. Int. J. Mach. Learn. Cybern. 2022, 1–16. [Google Scholar] [CrossRef]
Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 2736–2746. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Qin, Z.; Zhang, P.; Wu, F.; Li, X. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 20–25 June 2021; pp. 783–792. [Google Scholar]
Staal, J.; Abràmoff, M.D.; Niemeijer, M.; Viergever, M.A.; Van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 2004, 23, 501–509. [Google Scholar] [CrossRef] [PubMed]
Fraz, M.M.; Remagnino, P.; Hoppe, A.; Uyyanonvara, B.; Rudnicka, A.R.; Owen, C.G.; Barman, S.A. An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Trans. Biomed. Eng. 2012, 59, 2538–2548. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Verma, M.; Nakashima, Y.; Nagahara, H.; Kawasaki, R. Iternet: Retinal image segmentation utilizing structural redundancy in vessel networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Seattle, WA, USA, 13–19 June 2020; pp. 3656–3665. [Google Scholar]
Sun, J.; Darbehani, F.; Zaidi, M.; Wang, B. Saunet: Shape attentive u-net for interpretable medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, The Netherlands, 2020; pp. 797–806. [Google Scholar]
Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Johansen, D.; De Lange, T.; Halvorsen, P.; Johansen, H.D. Resunet++: An advanced architecture for medical image segmentation. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; pp. 225–2255. [Google Scholar]
Zhou, Y.; Huang, W.; Dong, P.; Xia, Y.; Wang, S. D-UNet: A dimension-fusion U shape network for chronic stroke lesion segmentation. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 18, 940–950. [Google Scholar] [CrossRef] [PubMed]
Xu, Q.; Duan, W.; He, N. DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation. arXiv 2022, arXiv:2202.00972. [Google Scholar]

Figure 1. Network structure based on U-Net-improved segmentation network.

Figure 2. ResNest Block structure diagram.

Figure 3. Diagram of the depthwise FCA block structure.

Figure 4. FCA Block.

Figure 5. (a) DRIVE dataset and (b) CHASE_DB1 dataset. Looking from left to right, the leftmost column is the original image, the middle column is the corresponding ground truth, and the rightmost column is the corresponding mask.

Figure 6. Examples of image preprocessing. (a) Original image. (b) Image graying and image normalization.

Figure 7. Training loss and accuracy in datasets.

Figure 8. Retinal vessel segmentation sample image. (a) is the original images. (b) is the segmentation prediction maps of the proposed model, and (c) is the ground truth.

Figure 9. Predicted images from other models. The top row is the DRIVE dataset, and the bottom row is the CHASE_DB1 dataset.

Table 1. About the retinal vessel segmentation datasets.

Dataset	Number of Images	Training	Validation	Pixels	Epochs
DRIVE	40	20	20	584 × 565	200
CHASE_DB1	28	20	8	999 × 960	200

Table 2. Effect of preprocessing on experimental data.

Datasets	Preprocessing or Not	A_cc	F₁-Score	S_e	S_p	P_r
DRIVE	No	0.9395	0.8595	0.7531	0.9672	0.7734
DRIVE	Yes	0.9403	0.8687	0.7380	0.9703	0.7863
CHASE_DB1	No	0.9429	0.8213	0.7231	0.9675	0.7135
CHASE_DB1	Yes	0.9504	0.8358	0.7413	0.9720	0.7330

Table 3. Methods compared by the experiment.

Datasets	Comparison Methods
DRIVE and CHASE_DB1	Iternet
	SA_Unet
	Res-Unet++
	D-Unet
	Att-Unet
	Unet
	Unet++
	DCSAU_Net

Table 4. Results of ablation experiments.

Datasets	ResNest Block	DFB	A_cc	F₁-Score	S_e	S_p	P_r
DRIVE	×	×	0.9334	0.8278	0.7663	0.9580	0.7287
	√	×	0.9406	0.8649	0.7706	0.9669	0.7823
	√	√	0.9403	0.8687	0.7380	0.9703	0.7863
CHASE_DB1	×	×	0.9376	0.7770	0.7292	0.9594	0.6529
	√	×	0.9510	0.8168	0.8063	0.9658	0.7076
	√	√	0.9504	0.8358	0.7413	0.9720	0.7330

Table 5. Comparison of performance indicators on DRIVE dataset.

Method	A_cc	F₁-Score	S_e (TPR)	S_p (TNR)	P_r
Iternet [27]	0.9483	0.8660	0.8222	0.9668	0.7843
SA_Unet [28]	0.9350	0.8342	0.7491	0.9616	0.7366
Res-Unet++ [29]	0.9409	0.8575	0.7696	0.9662	0.7708
D-Unet [30]	0.9422	0.8570	0.7906	0.9649	0.7708
Att-Unet	0.9438	0.8475	0.8268	0.9611	0.7579
Unet	0.9334	0.8278	0.7663	0.9580	0.7287
Unet++	0.9470	0.8641	0.8189	0.9660	0.7817
DCSAU_Net [31]	0.9406	0.8584	0.7647	0.9666	0.7719
DFB-Unet	0.9403	0.8687	0.7380	0.9703	0.7863

Table 6. Performance metrics on the CHASE_DB1 dataset versus contrast with other models.

Method	A_cc	F₁-Score	S_e (TPR)	S_p (TNR)	P_r
DCSAU_Net	0.9466	0.8093	0.7812	0.9641	0.6973
Iternet	0.9547	0.8293	0.8620	0.9648	0.7272
Res-Unet++	0.9456	0.8113	0.7508	0.9660	0.6993
D-Unet	0.9527	0.8195	0.8538	0.9633	0.7130
Att-Unet	0.9542	0.8250	0.8480	0.9660	0.7199
SA_Unet	0.9362	0.7732	0.7543	0.9559	0.6491
Unet	0.9376	0.7770	0.7292	0.9594	0.6529
Unet++	0.9564	0.8317	0.8518	0.9672	0.7296
DFB-Unet	0.9504	0.8358	0.7413	0.9720	0.7330

Table 7. Analysis of S_e performance indicator by removing noise.

Datasets	S_e before Noise Removal	S_e after Noise Removal
DRIVE	0.7380	0.7746
CHASE_DB1	0.7413	0.7846

Table 8. Analysis of A_cc performance indicator by removing noise.

Datasets	A_cc before Noise Removal	A_cc after Noise Removal
DRIVE	0.9403	0.9426
CHASE_DB1	0.9504	0.9508

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, N.; Li, K.; Zhang, G.; Zhu, Z.; Wang, P. Improvement of Retinal Vessel Segmentation Method Based on U-Net. Electronics 2023, 12, 262. https://doi.org/10.3390/electronics12020262

AMA Style

Wang N, Li K, Zhang G, Zhu Z, Wang P. Improvement of Retinal Vessel Segmentation Method Based on U-Net. Electronics. 2023; 12(2):262. https://doi.org/10.3390/electronics12020262

Chicago/Turabian Style

Wang, Ning, Kefeng Li, Guangyuan Zhang, Zhenfang Zhu, and Peng Wang. 2023. "Improvement of Retinal Vessel Segmentation Method Based on U-Net" Electronics 12, no. 2: 262. https://doi.org/10.3390/electronics12020262

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improvement of Retinal Vessel Segmentation Method Based on U-Net

Abstract

1. Introduction

2. Materials and Methods

2.1. Network Structure

2.2. Feature Extraction

2.3. Depthwise FCA Block

2.4. Datasets

2.5. Image Preprocessing

3. Results

3.1. Evaluation Indicators

3.2. Experimental Setup

3.2.1. Cross-Entropy Loss

3.2.2. Dice Loss

3.3. Ablation Experiments

3.4. Comparison Test with Other Models

3.5. Qualitative Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI