Next Article in Journal
Prevalence of Glenohumeral Internal Rotation Deficit and Sex Differences in Range of Motion of Adolescent Volleyball Players: A Case-Control Study
Next Article in Special Issue
Breast Cancer Classification by Using Multi-Headed Convolutional Neural Network Modeling
Previous Article in Journal
Virtual Reality Combined with Artificial Intelligence (VR-AI) Reduces Hot Flashes and Improves Psychological Well-Being in Women with Breast and Ovarian Cancer: A Pilot Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

ESTAN: Enhanced Small Tumor-Aware Network for Breast Ultrasound Image Segmentation

1
Department of Computer Science, University of Idaho, Idaho Falls, ID 83402, USA
2
Department of Industrial Technology, University of Idaho, Idaho Falls, ID 83402, USA
3
Department of Radiology and Imaging Sciences, University of Utah School of Medicine, Salt Lake City, UT 84132, USA
*
Author to whom correspondence should be addressed.
Healthcare 2022, 10(11), 2262; https://doi.org/10.3390/healthcare10112262
Submission received: 1 September 2022 / Revised: 1 November 2022 / Accepted: 3 November 2022 / Published: 11 November 2022

Abstract

:
Breast tumor segmentation is a critical task in computer-aided diagnosis (CAD) systems for breast cancer detection because accurate tumor size, shape, and location are important for further tumor quantification and classification. However, segmenting small tumors in ultrasound images is challenging due to the speckle noise, varying tumor shapes and sizes among patients, and the existence of tumor-like image regions. Recently, deep learning-based approaches have achieved great success in biomedical image analysis, but current state-of-the-art approaches achieve poor performance for segmenting small breast tumors. In this paper, we propose a novel deep neural network architecture, namely the Enhanced Small Tumor-Aware Network (ESTAN), to accurately and robustly segment breast tumors. The Enhanced Small Tumor-Aware Network introduces two encoders to extract and fuse image context information at different scales, and utilizes row-column-wise kernels to adapt to the breast anatomy. We compare ESTAN and nine state-of-the-art approaches using seven quantitative metrics on three public breast ultrasound datasets, i.e., BUSIS, Dataset B, and BUSI. The results demonstrate that the proposed approach achieves the best overall performance and outperforms all other approaches on small tumor segmentation. Specifically, the Dice similarity coefficient (DSC) of ESTAN on the three datasets is 0.92, 0.82, and 0.78, respectively; and the DSC of ESTAN on the three datasets of small tumors is 0.89, 0.80, and 0.81, respectively.

1. Introduction

Breast ultrasound (BUS) imaging is an effective screening method due to its painless, noninvasive, nonradioactive, and cost-effective nature. Breast ultrasound image segmentation aims to extract tumor region(s) from normal breast tissues in images. It is an essential step in BUS computer-aided diagnosis (CAD) systems. However, because of the speckle noise, poor image quality, and variable tumor shapes and sizes, accurate BUS image segmentation is challenging.
According to the National Cancer Institute, in the United States, the relative survival is 99% if breast cancer is detected and treated at the early stages, and only 27% if cancer has spread to other organs of the body [1]. The early detection of breast tumors is the key to reducing the mortality rate. However, in the early stages, most tumors are small and occupy a relatively small region in BUS images. It is challenging to distinguish them from normal breast tissues. Therefore, the accurate detection of small tumors is critical for early breast cancer detection and can improve clinical decisions, treatment planning, and recovery.
Table 1. Deep learning approaches for BUS image segmentation.
Table 1. Deep learning approaches for BUS image segmentation.
ArticleYearMethods *Dataset SizeEvaluation Metrics *
Huang et al. [2]2018FCN + Wavelet features + CRFs325TPR, FPR, JI
Yap et al. [3]2018Patch-based LeNet, U-Net, and AlexNet469TPR, FPR, F1
Amiri et al. [4]2020Transfer Learning 163DSC
Nair et al. [5]2020Deep Neural Networks + Two Decoders + Simulated Data22230DSC
Zhuang et al. [6]2019U-Net + Attention gate1062TPR, Sp, F1, Pr, JI, Acc, DSC, AUC
Hu et al. [7]2019Dilated FCN + Active contour model570DSC, MAD, and HD
Vakanski et al. [8]2020U-Net + Attention blocks510TPR, FPR, DSC, JI, Pr, AUC-ROC
Byra et al. [9]2020U-Net + Attention gate + Entropy maps269DSC, JI
Moon et al. [10]2020Ensemble CNNs246TPR, FPR
Lee et al. [11]2020U-Net + Channel attention module163FPR, F1, JI, AUC, Pr, Sp, TPR
Chen et al. [12]2022U-Net + Bidirectional attention + refinement residual net 780Acc, DSC, Sens, Sp, Pr, JI
Hussain et al. [13]2022U-Net + level set349Acc, DSC, JI
Shareef et al. [14]2020U-Net + Two encoders725TPR, FPR, JI, DSC, AER, MAE, HD
* TPR: true positive rate, FPR: false positive rate, JI: Jaccard indices, DSC: dice similarity coefficient, Sp: specificity, F1: F1 score, Pr: precision, Acc: Accuracy, AUC: area under curve, AER: area error rate, MAD: mean absolute deviation, HD: average Hausdorff distance, ROC: receiver operating characteristic curve, Sens: sensitivity, MAE: mean area error, CRFs: conditional random fields, and FCN: fully convolutional network.
The approaches of BUS image segmentation can be classified into traditional approaches and deep learning-based approaches. Numerous traditional approaches have been used for BUS image segmentation, such as thresholding [15,16,17,18,19,20,21], region growing [22,23], and watershed [24,25]. Despite their simplicity, these methods require knowledge and expertise in extracting features, and they are not robust due to poor scalability and high sensitivity to noise. Refer to [26] for a comprehensive review of BUS image segmentation.
Recently, several deep learning approaches [2,3,4,5,6,7,8,9,10,11,12,13,14] have been developed for BUS image segmentation; Table 1 lists the most recent deep learning approaches for BUS image segmentation. Huang et al. [2] proposed a fuzzy fully convolutional network to perform BUS image segmentation. Fuzzy logic is adopted to solve the uncertainty issue in the BUS images and feature maps. Contrast enhancement and wavelet features were applied as preprocessing techniques to augment the training data. The augmented training image set and features from convolutional layers were transformed into a fuzzy domain by a fuzzy membership function. The context information and the human breast structure were integrated into Conditional Random Fields (CRFs) to enhance the segmentation results. Yap et al. [3] evaluated the performance of three different deep learning approaches: a patch-based LeNet, a U-Net, and transfer learning with a pretrained AlexNet on two BUS datasets (Dataset A and Dataset B). The transfer learning AlexNet outperformed all others on Dataset A for true positive and F-measure metrics and patch-based LeNet achieved the best results on Dataset B for false positive per image metric. Although the results show that the different deep learning approaches designed for other tasks can be adopted and trained on BUS datasets, all the approaches could not achieve the best results for all the evaluation metrics on both datasets. Amiri et al. [4] studied transfer learning and the significance of fine-tuning configurations of U-Net architecture to solve the issue of scarce ultrasound image data. Fine-tuning the shallow layers of U-Net for small BUS datasets achieved the best results; however, there is no significant difference in fine-tuning the whole network or shallow layers for large BUS dataset. Refer to [26,27] for more deep learning approaches for medical image segmentation.
In addition, Nair et al [5] proposed a DNN with two decoders to create BUS images and segmentation masks from raw single-plane wave channel data. This approach showed promising results where both the segmentation masks and B-mode images were generated in a single network using raw data. Zhuang et al. [6] proposed an RDAU-Net model, based on U-Net architecture, to perform the tumor segmentation task on BUS images. The dilated residual blocks and attention gates were used to replace the basic blocks and original skip connections in U-Net, respectively. The RDAU-Net design improves the overall sensitivity and accuracy of the model. Similarly, Hu et al. [7] proposed a DFCN method that combines the dilated fully convolution network with a phase-based active contour (PBAC) model to automatically segment breast tumors. The DFCN with PBAC network is more robust to noise and blurry boundaries, and successfully segments tumors with a large volume of shadows.
Moreover, Vakanski et al. [8] integrated radiologists’ visual attention with a U-Net model to perform BUS segmentation. The model designs attention blocks to ignore regions with low saliency and emphasize more regions with high saliency. This study outperformed the U-Net model, and has successfully combined prior knowledge information into a convolutional neural network. Byra et al. [9] proposed a deep learning segmentation approach for BUS images based on entropy parametric maps with the attention-gated U-Net network. The model achieved a good improvement; however, there are insufficient results and analysis to show the significance of entropy maps. Furthermore, Moon et al. [10] proposed an ensemble CNN architecture for a CAD system comprising multi-models trained on original BUS images, segmented image tumors, tumor masks, and fused images. The fused images were prepared by combining an original image, segmented tumor, and tumor shape information (TSI). The results show that the fused images achieved the best results among all others, and the study provides a clear guide to choosing an approach for a specific dataset size. Lee et al. [11] proposed a channel attention module with multi-scale grid average pooling for segmenting BUS images. The approach utilizes both local and global information and achieves good overall segmentation performance. Chen et al. [12] proposed bidirectional attention and refine network that they added on top of the U-net to accurately segment breast lesions. However, training such a network on a small dataset makes it challenging to deal with overfitting/underfitting issues. These methods achieved good overall performance. However, as shown in Figure 1, they failed to achieve good performance in segmenting small tumors. First, these methods were designed to improve the overall performance using general-purpose square kernels that were developed for learning features in natural images. Second, all currently available BUS datasets are small, and most deep learning-based approaches require a large and high-quality training set.
This work is inspired by the current progress in small object detection and/or segmentation, which is an important task in computer vision, as it forms the foundation of many image-related tasks, such as remote sensing, scene understanding, object tracking, instance and panoptic segmentation, aerospace detection, and image captioning. Chen et al. [28] proposed an augmented technique for the R-CNN algorithm with a context model and small region proposal generator, which was the first benchmark dataset for small object detection. Krishna et al. [29] designed a Faster R-CNN model with a modified upsampling technique to improve the performance of small object detection. Guan et al. [30] proposed a semantic context-aware network (SCAN), which integrates a location fusion module and context fusion module to detect semantic and contextual features. The DenseU-Net architecture was proposed by Dong [31] for the semantic segmentation of small objects in urban remote sensing images. It uses residual connections and a weighted focal loss function with median frequency balancing to improve the performance of small object detection. To the best of our knowledge, STAN [14] was the first deep learning architecture designed specifically for small tumor segmentation. Three skip connections and two encoders were employed to extract multi-scale contextual information from different layers of the contracting part. The Small Tumor-Aware Network outperformed other deep learning approaches for segmenting small tumors in BUS images. However, its average false positive rate (FPR) on small tumors is much larger than the FPR on large tumors.
In this paper, we extend STAN and propose a new architecture, namely the Enhanced Small Tumor-Aware Network or ESTAN, to achieve robust segmentation for tumors of different sizes. The new architecture has two encoder branches. The basic encoder has five blocks and learns features at different scales. The ESTAN encoder applies row-column-wise kernels to adapt to the breast anatomy during feature learning. Specifically, the human breast anatomy consists of four main layers: skin, premammary (subcutaneous fat), mammary, and retromammary layers [32]. Each layer is characterized by a distinct texture and corresponding echo patterns in ultrasound images. The tissue layers in BUS images appear vertically stacked, with similar echo patterns propagating horizontally across images. Breast pathology originates predominantly in the mammary layer. The row-column-wise kernels were designed to learn the breast tissue structure and thus improve detecting small tumor representations in BUS images. In the decoder, each block has three skip connections that fuse rich contextual features from the two encoders. The contextual features are robust to different tumor sizes and help distinguish tumor regions from normal regions.
The rest of the paper is organized as follows: Section 2 presents the proposed architecture; Section 3 demonstrates experimental results and implementation details; and Section 4 and Section 5 are the discussion and conclusion, respectively.

2. Enhanced Small Tumor-Aware Network

In this section, we introduce the proposed Enhanced Small Tumor-Aware Network (ESTAN) for solving the issue of small tumor segmentation in BUS images. The Enhanced Small Tumor-Aware Network builds upon two observations: (1) BUS images contain tumors of a broad range of sizes, and current state-of-the-art approaches have poor performance in segmenting small tumors; and (2) the current deep learning-based approaches use square-shape kernels and have difficulty utilizing context information of BUS images, e.g., breast tissue anatomy. To alleviate these challenges, we propose ESTAN to extract and fuse image context information at different scales. The Enhanced Small Tumor-Aware Network constructs feature maps using both square and large row-column-wise kernels. These feature maps extract multi-scale context information and preserve fine-grained tumor location information. Therefore, the new design enables ESTAN to accurately segment breast tumors of different sizes and is especially effective in segmenting small tumors. The overall architecture of the proposed approach is shown in Figure 2.

2.1. Basic Encoder

The Enhanced Small Tumor-Aware Network consists of two encoders—the basic and ESTAN encoders. The basic encoder downsamples the input feature maps to extract low-level spatial and contextual information. The basic encoder comprises five blocks, where each of the first four blocks contains two convolutional layers and a max pooling layer, and the fifth block only has two convolutional layers. The basic blocks in the encoder are different from the original U-Net [33] encoder blocks since the new architecture uses two skip connections to copy feature maps from the encoder blocks to the corresponding upsampling layers in the decoder module. Figure 2c illustrates the architecture of the basic encoder.
Let X   h × w × c denote the input images, where h, w, and c are the height, width, and number of channels, respectively. Let f be the convolution function for square kernels followed by a rectified linear unit (ReLU) activation function, K i be the number of kernels, and S i be kernel size in the ith convolution layer. The output of the jth block of the basic encoder is defined by
B j = ϕ ( f S 2 , K 2 ( f S 1 , K 1 ( X ) ) )  
where B j is the output, and ϕ   is the pooling operation. Additionally, the kernel size S 1 and S 2 in Basic Block 1, 2, 3, 4, and 5 are all set to 3. The number of kernels K 1 and K 2 in Basic Block 1, 2, 3, 4, and 5 have values 32, 64, 128, 256, and 512, respectively.

2.2. ESTAN Encoder

The receptive field in CNNs is important in building effective feature maps that model contextual information. It defines the input image region that impacts output features, and image regions outside the receptive field of a feature will not contribute to the feature calculation. To ensure the coverage of all relevant image regions and achieve enhanced performance, many dense prediction tasks used large receptive fields [34,35]. Several techniques have been applied to increase the receptive field, such as stacking more layers, sub-sampling, and dilated convolutions [36]. However, in BUS images, a large receptive field can result in poor performance for small tumor segmentation [37]. The goal of the ESTAN encoder is to avoid the large receptive field and capture small tumors.
The Small Tumor-Aware Network [14] proposed a two-encoder architecture and applied kernels of sizes 1 × 1, 3 × 3 and 5 × 5. The small kernel size can avoid a large receptive field. The two encoders fused contextual information at different scales by producing features using different sizes of receptive fields. This design improved the overall performance of small breast tumor segmentation. However, STAN produced high false positives for some BUS images with small tumors.
To overcome this problem, we redesigned the encoder by applying row-column-wise kernels. The small square kernels in STAN constructed feature maps using only square image regions. The motivation for the design is because BUS images are composed of vertically stacked tissue layers (Figure 3). Applying row-column-wise kernels in CNNs can avoid calculating features using image regions from multiple anatomical layers and produce more accurate and meaningful feature maps. In addition, in this study, ESTAN is compared with nine state-of-the-art approaches on three datasets, while STAN was compared with only three state-of-the-art approaches on two datasets.
The ESTAN encoder comprises five blocks, named ESTAN blocks, which are parallel with the basic encoder blocks. Each block has four square kernels and two row-column-wise kernels in two parallel branches. Such kernels can efficiently extract contextual and fine-grained details of small tumors in the BUS images. Furthermore, ESTAN blocks add one extra non-linearity to each encoder block. Figure 2b illustrates the design of each ESTAN block. Let C i be the number of kernels, and A i be the kernel size. The output of jth ESTAN block is defined by
E j = ϕ ( f A 5 , C 5 ( f A 2 , C 2 ( f A 1 , C 1 ( X ) ) + f A 4 , C 4 ( h 1 , A 3 , C 3 ( h A 3 , 1 , C 3 ( X ) ) ) ) )  
where E j is the output of the jth ESTAN block, and ϕ   is the pooling operation, h is the row-column-wise convolution function followed by a rectified linear unit (ReLU) activation function with the size of A 3 × 1 and 1 × A 3 , respectively. The size of A 3 in ESTAN Block 1, 2, 3, 4, and 5 are 15, 13, 11, 9, and 7, respectively. The size of A 5 in ESTAN Block 2 and 5 is 5, and in the rest is 1. Furthermore, block 5 has no pooling operation for both encoders. Moreover, the number of kernels ( C i ) in each ESTAN Block 1, 2, 3, 4, and 5 have values 32, 64, 128, 256, and 512, respectively.

2.3. Decoder and Skip Connections

The decoder module comprises four upsampling blocks, where each has one upsampling layer followed by three convolution layers. Unlike the U-Net architecture, where the decoder has two convolution layers, the ESTAN adds an additional kernel after the first convolution kernel to control the post-concatenation channels. Let f be the convolution function followed by a rectified linear unit (ReLU) activation function, Y i be the number of kernels, and M i be the kernel size. The output of the jth block of the decoder is defined by
U j = f M 3 , Y 3 ( f M 2 , Y 2 ( f M 1 , Y 1 ( Ψ ) ) )  
where Ψ is the upsampling layer. Kernel sizes M 1 and M 3 in all blocks are 3, M 2 in blocks 1, 2, and 3 is 1, and M 2 in Block 4 is 5. In addition, Y 1 , Y 2 and Y 3 , which represent the numbers of kernels in jth Up Block, have the same values in each block, and their values are 256, 128, 64, and 32, respectively.
We have introduced two skipping connections to copy feature maps at different scales from two encoders to the decoder. The first skip connection combines the result of f S 1 , K 1 in the basic encoder block and the result of f A 5 , C 5 in the ESTAN encoder block and are concatenated to the upsampling layer. The second skip connection concatenates the results of f S 2 , K 2 and f M 2 , Y 2 . The output layer utilizes a 1 × 1 convolution followed by a sigmoid activation to predict the final results. Figure 2d illustrates the decoder block.

3. Experimental Results

3.1. Datasets, Evaluation Metrics and Setup

We use three public BUS datasets: BUSIS [20,26,38,39], BUSI [40] and Dataset B [3]. The BUSIS dataset contains 562 images collected from three hospitals using GE VIVID 7, LOGIQ E9, Hitachi EUB-6500, Philips iU22, and Siemens ACUSON S2000. The BUSIS dataset includes 306 benign and 256 malignant breast ultrasound images. The BUSI dataset is from Baheya Hospital for Early Detection & Treatment of Women’s Cancer in Egypt using the LOGIQ E9 ultrasound system and the LOGIQ E9 Agile ultrasound system with ML6-15-D Matrix linear probe transducers. The BUSI dataset has 780 images, of which there are 133 normal, 487 benign, and 210 malignant images collected from 600 women patients aged 25 to 75 years old. In addition, radiologists from Baheya Hospital reviewed and modified the ground truth masks. The Dataset B has only 163 breast ultrasound images, and the UDIAT Diagnostic Centre of the Parc Taul’ı Corporation, Sabadell (Spain) collected the images using a Siemens ACUSON Sequoia C512 system with a 17L5 linear array transducer (8.5 MHz). Dataset B consists of 53 malignant, and 110 benign images from different women with a mean image size of 760 × 570 pixels. The Dice loss [41] function is used in this work.
The tumor size is an important variable, and Figure 4 illustrates the histograms of tumor size distributions of the three datasets based on their original resolution. The physical sizes of most tumors in the three datasets are unavailable; therefore, we define the tumor size as the length (in pixels) of the longest axis of a tumor region in the original BUS image. The distributions of BUSI and Dataset B show positive skewness where many tumors are smaller than 150 pixels. The BUSI dataset has more large tumors compared to the other datasets, and the sizes of most tumors are between 150 and 250 pixels. In addition, the images in the BUSIS dataset were collected with five different BUS workstations; thus, the image quality has large variations.
To evaluate the segmentation results, both area and boundary metrics are employed. The metrics are true positive rate (TPR), false positive rate (FPR), Jaccard index (JI), Dice similarity coefficient (DSC), area error rate (AER), Hausdorff distance (HD), and mean absolute error (MAE). For detailed information about the seven metrics, refer to [26]. We perform five-fold cross-validation individually for each dataset to evaluate the test performance of all methods, and the input image size is 256 × 256 pixels for all the approaches. In this study, we compare the proposed method with nine state-of-the-art approaches: AlexNet [42], SegNet [37], U-Net [33], CE-Net [43], MultiResUNet [44], RDAU-Net [6], SCAN [30], DenseU-Net [31], and STAN [14]. These approaches have different backbone networks and different training strategies. We employ a transfer learning technique for AlexNet, which is pretrained on ImageNet. SegNet, U-Net, CE-Net, MultiResUNet, RDAU-Net, SCAN, and DenseU-Net are trained from scratch.
Note that the FPR is calculated as the ratio between the number of false positives and the total number of actual positives [22,26,38], which is different from the commonly used FPR formulation as the ratio between the number of false positives and actual negatives. In the definition, if the size of false positive regions is larger than the size of the actual positive regions, FPR will be greater than 1. The new FPR definition is preferred in BUS image segmentation because of the large size of negative regions (denominator) in the old FPR.
All experiments are performed on a workstation with a 3.50 GHz Intel(R) Xeon(R) CPU, 32 GB of RAM, and an Nvidia Titan Xp GPU.

3.2. Overall Performance

In this section, we compare the proposed approach with AlexNet, SegNet, U-Net, CE-Net, MultiResUNet, RDAU-Net, SCAN, DenseU-Net, and STAN. The results are shown in Figure 5 and Table 2.
Figure 5 shows the segmentation results of four sample BUS images. In the first row, the tumor in the BUS image is small, and AlexNet, U-Net, MultiResUNet, SCAN, and DenseU-Net have poor segmentation performance. In the second and third samples (second and third rows), all approaches, except the proposed ESTAN, produce a high false positive, which demonstrates that they have difficulty distinguishing tumor regions from tumor-like regions. In Figure 5k, STAN can segment small tumors accurately but still produces false tumor regions. Figure 5l shows that ESTAN segments the four images accurately without any false tumor regions.
Table 2 presents the quantitative results of all approaches on the three datasets. The proposed ESTAN achieved the best overall performance on all three datasets. AlexNet and SegNet obtained high TPRs, but at the cost of high FPRs.
To investigate the statistical significance of all approaches, the Wilcoxon signed-rank test was employed to compare ESTAN against all other approaches for FPR, JI, DSC, AER, HE, and MAE metrics on the three datasets. The significance level is defined as p-value < 0.05. The obtained p-values from the Wilcoxon signed-rank test were corrected using the Holm–Bonferroni method for multiple comparisons. The results indicate a statistically significant difference for the six metrics on the three datasets, except for the cases that are marked with (*) in Table 2.

3.3. Small Tumor Segmentation

The physical size for all images of the three datasets is not available. Therefore, the length of the longest axis of a tumor region in the original BUS image (non-resized) is used as a criterion for selecting small tumors, and the length threshold is set to 120 pixels. BUSIS, BUSI, and Dataset B contain 49, 151, and 76 small tumors, respectively. Figure 6 illustrates the FPR comparison between the overall and small tumor segmentation. All ten approaches have higher FPR for small tumors on BUSIS and Dataset B datasets. The FPR of AlexNet increases dramatically for small tumor segmentation. The ESTAN approach is superior in comparison to all nine approaches and achieves the lowest false positive for both overall and small tumor segmentation. Table 3 shows the results of all approaches on the three datasets using the selected seven quantitative metrics. The Enhanced Small Tumor-Aware Network outperforms all the other nine approaches for small tumor segmentation on the three datasets. AlexNet and SegNet obtain high TPRs, but at the cost of high FPRs.

3.4. Segmentation Tumors with Different Sizes

To demonstrate the effectiveness of the proposed ESTAN model, we split the BUSIS [20,26,38,39] dataset into four tumor-size groups. We chose the BUSIS dataset for the following reasons: (1) the images were collected from three hospitals using five ultrasound devices operated by different radiologists; (2) the ground truths of the BUSIS dataset have less bias because they were prepared by four experienced radiologists, where three radiologists generated tumor boundaries for each BUS image separately, and the fourth radiologist—a senior expert—judged and adjusted the majority voting results; and (3) all ten approaches achieved the best performance on the BUSIS dataset compared to BUSI and Dataset B. We chose the length of the longest axis of a tumor as a criterion for selecting tumor groups in the original BUS image. The first group contains 19 images with tumor sizes from 0 to 100 pixels, the second group has 30 images from 100 to 120 pixels, the third group consists of 81 images from 120 to 160 pixels, and the fourth group has 432 images from 160 to 533 pixels.
Table 4 lists the values of JI and FPR for the four tumor groups. AlexNet has poor performance for segmenting the small tumor group with JI of 0.57 and FPR of 0.97, while FPRs and JIs are improved dramatically in the other three groups. The results of segmenting tumors in both mid-size groups (100–120, and 120–160) are close to each other, e.g., CE-NET and SCAN have achieved the same JI with 0.81 and 0.80 in both groups, respectively. The results show that the tumor sizes between (0–100) are the most difficult cases, and all ten approaches cannot achieve as good performance as segmenting large tumors. On the other hand, for the fourth group containing large tumor sizes (>160 pixels) all approaches achieved better results than the other tumor groups. The proposed ESTAN achieved the highest JI and lowest FPR values in all tumor groups.

4. Discussions

The BUS images used in this work were obtained from different ultrasound devices with non-uniform settings, and vary in image resolution, tissue depth, and contrast. It is challenging to develop and train a robust deep model that performs consistently well on BUS images from different sources. As shown in Table 2, the performance of all approaches differs on images from different datasets. For instance, DenseU-Net achieved a JI of 0.74 on the BUSIS dataset, but its JI on Dataset B is only 0.60. To improve the robustness of deep learning models for BUS image segmentation, we recommend (a) involving large and diverse BUS datasets collected from different resources in model training, and (b) redesigning network architectures and training strategies to learn robust features from ultrasound images.
The preparation of a large, diverse, and annotated BUS image dataset could be time-consuming and prohibitively expensive. Therefore, in the short term, the more feasible strategy is to develop robust deep networks and training processes. The results in Table 2, Table 3 and Table 4 indicate that the proposed two-encoder network architecture and the row-column kernels could lead to more robust segmentation results. Another possible solution to this challenge is to develop image synthesis approaches that could generate realistic and diverse BUS images.
The strengths of this study include (a) utilizing the human breast anatomical layers to design convolution kernels, (b) using two encoders to learn features and three skip connections to transfer contextual information to the decoder to locate tumors more accurately, and (c) validating the efficacy and weakness of the proposed approach using extensive experiments on three publicly available datasets. Although ESTAN achieved remarkable results for segmenting tumors of various sizes on the three datasets, it failed to detect tumors in 29 extremely challenging cases, because these cases had high speckle noise, extremely low contrast, and no clear tumor boundaries. To extract features at different scales, ESTAN uses two encoders that require more parameters, memory, and computational resources. Therefore, optimizing ESTAN to eliminate unnecessary parameters and operations is significant, specifically for resource-constrained systems such as mobile devices. In the future, we will investigate long-range semantic information to improve the current approach.

5. Conclusions

In this work, we proposed the Enhanced Small Tumor-Aware Network (ESTAN) to improve the segmentation of small tumors in BUS images. The Enhanced Small Tumor-Aware Network is comprised of two encoder branches that extract and fuse image context information at different scales. The proposed ESTAN encoder applies row-column-wise kernels to adapt to breast anatomy. The decoder has three skip connections from the two encoders to fuse features. The new design enhances the performance by incorporating multi-scale features and breast anatomy into the encoder layers. The proposed architecture is sensitive to small breast tumors and identifies small tumors accurately. In addition, the approach achieves state-of-the-art performance in segmenting tumors of different sizes. We validated the proposed approach extensively using three datasets and compared it with the other nine breast tumor segmentation approaches. The results demonstrate that ESTAN achieves state-of-the-art performance on all datasets.

Author Contributions

Conceptualization, B.S., A.V., P.E.F. and M.X.; methodology, B.S., A.V. and M.X.; software, B.S., A.V. and M.X.; validation, B.S., A.V., P.E.F. and M.X.; data curation, B.S., A.V., P.E.F. and M.X.; writing—original draft preparation, B.S., A.V., P.E.F. and M.X.; writing—review and editing, B.S., A.V., P.E.F. and M.X.; website, B.S. and M.X.; funding acquisition, A.V. and M.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Institute for Modeling Collaboration (IMCI) at the University of Idaho through NIH Award #P20GM104420.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Schussler, N.; Ruhl, J.L.; Callaghan, C.; Ries, L.A.G.; Adamo, P. Summary Stage 2018: Codes and Coding Instructions; National Cancer Institute: Bethesda, MD, USA, 2018. [Google Scholar]
  2. Huang, K.; Zhang, Y.; Cheng, H.D.; Xing, P.; Zhang, B. Fuzzy Semantic Segmentation of Breast Ultrasound Image with Breast Anatomy Constraints. arXiv 2019, arXiv:1909.06645. [Google Scholar]
  3. Yap, M.H.; Pons, G.; Martí, J.; Ganau, S.; Sentís, M.; Zwiggelaar, R.; Davison, A.K.; Martí, R. Automated Breast Ultrasound Lesions Detection Using Convolutional Neural Networks. IEEE J. Biomed. Health Inform. 2018, 22, 1218–1226. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Amiri, M.; Brooks, R.; Rivaz, H. Fine-tuning U-Net for ultrasound image segmentation: Different layers, different outcomes. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2020, 67, 2510–2518. [Google Scholar] [CrossRef] [PubMed]
  5. Nair, A.A.; Washington, K.N.; Tran, T.D.; Reiter, A.; Bell, M.A.L. Deep learning to obtain simultaneous image and segmentation outputs from a single input of raw ultrasound channel data. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2020, 67, 2493–2509. [Google Scholar] [CrossRef]
  6. Zhuang, Z.; Li, N.; Raj, A.N.J.; Mahesh, V.G.V.; Qiu, S. An RDAU-NET model for lesion segmentation in breast ultrasound images. PLoS ONE 2019, 14, e0221535. [Google Scholar] [CrossRef] [Green Version]
  7. Hu, Y.; Guo, Y.; Wang, Y.; Yu, J.; Li, J.; Zhou, S.; Chang, C. Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model. Med. Phys. 2018, 46, 215–228. [Google Scholar] [CrossRef] [Green Version]
  8. Vakanski, A.; Xian, M.; Freer, P.E. Attention-Enriched Deep Learning Model for Breast Tumor Segmentation in Ultrasound Images. Ultrasound Med. Biol. 2020, 46, 2819–2833. [Google Scholar] [CrossRef]
  9. Byra, M.; Jarosik, P.; Dobruch-Sobczak, K.; Klimonda, Z.; Piotrzkowska-Wroblewska, H.; Litniewski, J.; Nowicki, A. “Breast mass segmentation based on ultrasonic entropy maps and attention gated U-Net. arXiv 2020, arXiv:2001.10061. [Google Scholar]
  10. Moon, W.K.; Lee, Y.W.; Ke, H.H.; Lee, S.H.; Huang, C.S.; Chang, R.F. Computer-aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks. Comput. Methods Programs Biomed. 2020, 190, 105361. [Google Scholar] [CrossRef]
  11. Lee, H.; Park, J.; Hwang, J.Y. Channel Attention Module with Multi-scale Grid Average Pooling for Breast Cancer Segmentation in an Ultrasound Image. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2020, 67, 1344–1353. [Google Scholar] [CrossRef]
  12. Chen, G.; Dai, Y.; Zhang, J. C-Net: Cascaded convolutional neural network with global guidance and refinement residuals for breast ultrasound images segmentation. Comput. Methods Programs Biomed. 2022, 225, 107086. [Google Scholar] [CrossRef] [PubMed]
  13. Hussain, S.; Xi, X.; Ullah, I.; Inam, S.A.; Naz, F.; Shaheed, K.; Ali, S.A.; Tian, C. A Discriminative Level Set Method with Deep Supervision for Breast Tumor Segmentation. Comput. Biol. Med. 2022, 149, 105995. [Google Scholar] [CrossRef] [PubMed]
  14. Shareef, B.; Xian, M.; Vakanski, A. STAN: Small Tumor-Aware Network for Breast Ultrasound Image Segmentation. In Proceedings of the 17th IEEE International Symposium on Biomedical Imaging (ISBI 2020), Iowa City, IA, USA, 3–7 April 2020. [Google Scholar]
  15. Ikedo, Y.; Fukuoka, D.; Hara, T.; Fujita, H.; Takada, E.; Endo, T.; Morita, T. Development of a fully automatic scheme for detection of masses in whole breast ultrasound images. Med. Phys. 2007, 34, 4378–4388. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Chang, R.-F.; Wu, W.-J.; Moon, W.K.; Chen, D.-R. Automatic ultrasound segmentation and morphology based diagnosis of solid breast tumors. Breast Cancer Res. Treat. 2005, 89, 179–185. [Google Scholar] [CrossRef] [PubMed]
  17. Yap, M.H.; Edirisinghe, E.A.; Bez, H.E. A novel algorithm for initial lesion detection in ultrasound breast images. J. Appl. Clin. Med. Phys 2008, 9, 181–199. [Google Scholar] [CrossRef]
  18. Shan, J.; Cheng, H.D.; Wang, Y. Completely automated segmentation approach for breast ultrasound images using multiple-domain features. Ultrasound Med. Biol. 2012, 38, 262–275. [Google Scholar] [CrossRef]
  19. Shan, J.; Cheng, H.D.; Wang, Y. A novel automatic seed point selection algorithm for breast ultrasound images. In Proceedings of the 19th International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008; pp. 1–4. [Google Scholar]
  20. Xian, M.; Zhang, Y.; Cheng, H.D. Fully automatic segmentation of breast ultrasound images based on breast characteristics in space and frequency domains. Pattern Recognit. 2015, 48, 485–497. [Google Scholar] [CrossRef]
  21. Elaziz, M.A.; Oliva, D.; Ewees, A.A.; Xiong, S. Multi-level thresholding-based grey scale image segmentation using multi-objective multi-verse optimizer. Expert Syst. Appl. 2019, 125, 112–129. [Google Scholar] [CrossRef]
  22. Madabhushi, A.; Metaxas, D.N. Combining low-, high-level and empirical domain knowledge for automated segmentation of ultrasonic breast lesions. IEEE Trans. Med. Imaging 2003, 22, 155–169. [Google Scholar] [CrossRef]
  23. Massich, J.; Meriaudeau, F.; Pérez, E.; Martí, R.; Oliver, A.; Martí, J. Lesion segmentation in breast sonography. International Workshop on Digital Mammography, Catalonia, Spain, June 16–18, 2010. [Google Scholar]
  24. Huang, Y.L.; Chen, D.R. Automatic contouring for breast tumors in 2-D sonography. Annu. Int. Conf. IEEE Eng. Med. Biol. Proc. 2005, 7, 3225–3228. [Google Scholar]
  25. Lo, C.; Chen, R.; Chang, Y.; Yang, Y.; Hung, M.; Huang, C.; Chang, R. Multi-Dimensional Tumor Detection in Automated. IEEE Trans. Med. Imaging 2014, 33, 1503–1511. [Google Scholar] [CrossRef] [PubMed]
  26. Zhang, Y.; Xian, M.; Cheng, H.D.; Shareef, B.; Ding, J.; Xu, F.; Huang, K.; Zhang, B.; Ning, C.; Wang, Y. BUSIS: A Benchmark for Breast Ultrasound Image Segmentation. Healthcare 2022, 10, 729. [Google Scholar] [CrossRef] [PubMed]
  27. Dar, R.A.; Rasool, M.; Assad, A. Breast cancer detection using deep learning: Datasets, methods, and challenges ahead. Comput. Biol. Med. 2022, 149, 106073. [Google Scholar]
  28. Chen, C.; Liu, M.-Y.; Tuzel, O.; Xiao, J. R-CNN for Small Object Detection. In Asian Conference on Computer Vision; LNCS Series Volume 10115; Springer: Cham, Switzerland, 2016; pp. 214–230. [Google Scholar]
  29. Krishna, H.; Jawahar, C.V. Improving small object detection. In Proceedings of the 4th IAPR Asian Conference on Pattern Recognition (ACPR), Nanjing, China, 26–29 November 2017; pp. 346–351. [Google Scholar]
  30. Guan, L.; Wu, Y.; Zhao, J. SCAN: Semantic context aware network for accurate small object detection. Int. J. Comput. Intell. Syst. 2018, 11, 951–961. [Google Scholar] [CrossRef]
  31. Dong, R.; Pan, X.; Li, F. DenseU-Net-Based Semantic Segmentation of Small Objects in Urban Remote Sensing Images. IEEE Access 2019, 7, 65347–65356. [Google Scholar] [CrossRef]
  32. Xu, F.; Zhang, Y.; Xian, M.; Cheng, H.D.; Zhang, B.; Ding, J.; Ning, C.; Wang, Y. Breast Anatomy Enriched Tumor Saliency Estimation. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 1–5. [Google Scholar]
  33. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Fischer, P., Brox, T., Eds.; Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar]
  34. André, A.; Norris, W.; Sim, J. Computing Receptive Fields of Convolutional Neural Networks. Distill 2019, 4, e21. [Google Scholar]
  35. Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2016, 29, 4898–4906. [Google Scholar]
  36. Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  37. Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
  38. Xian, M.; Zhang, Y.; Cheng, H.D.; Xu, F.; Zhang, B.; Ding, J. Automatic breast ultrasound image segmentation: A survey. Pattern Recognit. 2018, 79, 340–355. [Google Scholar] [CrossRef] [Green Version]
  39. Cheng, H.D.; Shan, J.; Ju, W.; Guo, Y.; Zhang, L. Automated breast cancer detection and classification using ultrasound images: A survey. Pattern Recognit. 2010, 43, 299–317. [Google Scholar] [CrossRef] [Green Version]
  40. Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Br. 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]
  41. Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
  42. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  43. Gu, Z.; Cheng, J.; Fu, H.; Zhou, K.; Hao, H.; Zhao, Y.; Zhang, T.; Gao, S.; Liu, J. CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Trans. Med. Imaging 2019, 38, 2281–2292. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020, 121, 74–87. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Performance of state-of-the-art approaches for segmenting breast tumors with different sizes. GT: Ground truth. (a) BUS Images; (b) GT; (c) DenseU-Net; (d) CE-Net; and (e) RDAU-Net. The arrows point to BUS images with no tumor detected.
Figure 1. Performance of state-of-the-art approaches for segmenting breast tumors with different sizes. GT: Ground truth. (a) BUS Images; (b) GT; (c) DenseU-Net; (d) CE-Net; and (e) RDAU-Net. The arrows point to BUS images with no tumor detected.
Healthcare 10 02262 g001
Figure 2. ESTAN architecture. (a) Overall architecture; (b) ESTAN block; (c) basic block; and (d) up block. is the concatenation operator, Ai, Si, Mi, denote kernel sizes, and Ci, Ki, Yi define number of kernels.
Figure 2. ESTAN architecture. (a) Overall architecture; (b) ESTAN block; (c) basic block; and (d) up block. is the concatenation operator, Ai, Si, Mi, denote kernel sizes, and Ci, Ki, Yi define number of kernels.
Healthcare 10 02262 g002
Figure 3. Major breast layers of a sample BUS image.
Figure 3. Major breast layers of a sample BUS image.
Healthcare 10 02262 g003
Figure 4. Histogram of tumor sizes (number of pixels).
Figure 4. Histogram of tumor sizes (number of pixels).
Healthcare 10 02262 g004
Figure 5. Tumor segmentation examples. (a) BUS Image, (b) ground truth, (c) AlexNet, (d) SegNet, (e) U-Net, (f) CE-Net, (g) MultiResUNet, (h) RDAU-Net, (i) SCAN, (j) DenseU-Net, (k) STAN, and (l) ESTAN.
Figure 5. Tumor segmentation examples. (a) BUS Image, (b) ground truth, (c) AlexNet, (d) SegNet, (e) U-Net, (f) CE-Net, (g) MultiResUNet, (h) RDAU-Net, (i) SCAN, (j) DenseU-Net, (k) STAN, and (l) ESTAN.
Healthcare 10 02262 g005
Figure 6. False positive rates of overall and small tumor segmentation on the three datasets.
Figure 6. False positive rates of overall and small tumor segmentation on the three datasets.
Healthcare 10 02262 g006
Table 2. Overall performance.
Table 2. Overall performance.
DatasetsMethodsTPRFPRJIDSCAERHDMAE
BUSIS [20,26,38,39]AlexNet0.950.340.740.840.3925.17.1
SegNet0.940.160.820.900.2221.74.5
U-Net0.920.140.830.900.2226.84.9
CE-Net0.910.130.830.900.2221.64.5
MultiResUNet0.930.110.840.910.1918.84.1
RDAU-NET0.910.110.840.910.2019.34.1
SCAN0.910.110.830.900.2026.94.9
DenseU-Net0.910.160.810.880.2525.35.5
STAN0.920.090.850.910.1818.93.9
ESTAN0.910.070.860.920.1616.43.2
Dataset B [3]AlexNet0.871.170.470.611.3040.814.5
SegNet0.850.830.600.710.9841.611.4
U-Net0.780.410.650.750.6339.610.8
CE-Net0.740.48 *0.610.720.7440.110.5
MultiResUNet0.790.260.660.750.4837.110.7
RDAU-NET0.780.30 *0.670.770.5232.48.3
SCAN0.750.29 *0.650.740.5443.79.9
DenseU-Net0.710.430.600.690.7248.915.5
STAN0.800.27 *0.70 *0.780.47 *35.59.7 *
ESTAN0.840.220.740.820.3825.57.0
BUSI [40]AlexNet0.871.140.550.681.2747.414.1
SegNet0.770.550.620.720.7846.513.3
U-Net0.770.560.630.730.7859.013.7
CE-Net0.770.640.640.730.8843.912.4
MultiResUNet0.780.370.670.750.5941.212.0
RDAU-NET0.800.42 *0.680.760.6239.212.0
SCAN0.730.430.630.720.7047.013.8
DenseU-Net0.740.430.640.720.6947.415.5
STAN0.760.42 *0.660.750.6646.512.1
ESTAN0.800.360.700.780.5634.89.9
* refers to the statistically significant results. The bold results are the best performance according to a metric.
Table 3. Performance of small tumor segmentation.
Table 3. Performance of small tumor segmentation.
DatasetsMethodsTPRFPRJIDSCAERHDMAE
BUSIS [20,26,38,39]AlexNet0.950.770.600.730.8226.39.6
SegNet0.920.250.750.840.3322.46.2
U-Net0.920.300.760.840.3844.28.3
CE-Net0.910.360.730.820.4634.89.0
MultiResUNet0.910.230.770.840.3327.78.5
RDAU-NET0.890.190.780.860.3022.07.3
SCAN0.880.180.770.850.3027.46.2
DenseU-Net0.900.500.720.810.6034.58.2
STAN0.900.170.790.870.2621.35.2
ESTAN0.900.110.820.890.2114.93.0
Dataset B [3]AlexNet0.871.860.350.492.0049.218.4
SegNet0.851.450.500.621.6050.114.2
U-Net0.770.680.590.680.9143.113.8
CE-Net0.720.880.530.631.1550.014.4
MultiResUNet0.790.420.620.710.6239.311.5
RDAU-NET0.780.520.620.710.7334.18.8
SCAN0.750.500.610.700.7448.711.2
DenseU-Net0.700.730.540.631.0256.020.0
STAN0.810.400.670.760.5935.911.1
ESTAN0.850.300.720.800.4421.56.3
BUSI [40]AlexNet0.942.740.410.562.8152.515.4
SegNet0.811.420.550.661.6152.116.6
U-Net0.861.340.630.731.4861.013.0
CE-Net0.831.860.590.692.0350.913.3
MultiResUNet0.850.830.670.760.9934.710.6
RDAU-NET0.870.990.680.771.1333.99.9
SCAN0.801.130.630.731.3342.412.5
DenseU-Net0.811.060.650.731.2640.913.7
STAN0.861.100.670.761.2549.211.3
ESTAN0.890.770.720.810.8824.26.1
The bold results are the best performance according to a metric.
Table 4. Performance of four tumor size groups of BUSIS dataset.
Table 4. Performance of four tumor size groups of BUSIS dataset.
Tumor Size Groups(0–100)(100–120)(120–160)(>160)
Number of Images193081432
JIFPJIFPJIFPJIFP
AlexNet0.570.970.630.640.680.440.760.27
SegNet0.710.280.770.230.790.210.830.14
U-Net0.720.340.780.270.800.180.840.11
CE-Net0.620.630.800.190.800.160.840.09
MultiResUNet0.710.340.800.160.820.170.860.09
RDAU-NET0.720.260.820.140.810.170.850.09
SCAN0.710.240.810.140.810.160.800.09
DenseU-Net0.670.770.750.340.780.210.830.11
STAN0.760.250.810.110.830.120.860.08
ESTAN0.790.150.830.090.850.100.870.06
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shareef, B.; Vakanski, A.; Freer, P.E.; Xian, M. ESTAN: Enhanced Small Tumor-Aware Network for Breast Ultrasound Image Segmentation. Healthcare 2022, 10, 2262. https://doi.org/10.3390/healthcare10112262

AMA Style

Shareef B, Vakanski A, Freer PE, Xian M. ESTAN: Enhanced Small Tumor-Aware Network for Breast Ultrasound Image Segmentation. Healthcare. 2022; 10(11):2262. https://doi.org/10.3390/healthcare10112262

Chicago/Turabian Style

Shareef, Bryar, Aleksandar Vakanski, Phoebe E. Freer, and Min Xian. 2022. "ESTAN: Enhanced Small Tumor-Aware Network for Breast Ultrasound Image Segmentation" Healthcare 10, no. 11: 2262. https://doi.org/10.3390/healthcare10112262

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop