Gradually Applying Weakly Supervised and Active Learning for Mass Detection in Breast Ultrasound Images

Yun, JooYeol; Oh, JungWoo; Yun, IlDong

doi:10.3390/app10134519

Open AccessArticle

Gradually Applying Weakly Supervised and Active Learning for Mass Detection in Breast Ultrasound Images

by

JooYeol Yun

¹

,

JungWoo Oh

¹ and

IlDong Yun

^2,*

¹

Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea

²

Department of Computer and Electronic Systems Engineering, Hankuk University of Foreign Studies, Yongin 17035, Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2020, 10(13), 4519; https://doi.org/10.3390/app10134519

Submission received: 12 June 2020 / Revised: 26 June 2020 / Accepted: 26 June 2020 / Published: 29 June 2020

(This article belongs to the Special Issue Recent Developments in Machine Learning Techniques for Medical Image Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

We propose a method for effectively utilizing weakly annotated image data in an object detection tasks of breast ultrasound images. Given the problem setting where a small, strongly annotated dataset and a large, weakly annotated dataset with no bounding box information are available, training an object detection model becomes a non-trivial problem. We suggest a controlled weight for handling the effect of weakly annotated images in a two stage object detection model. We also present a subsequent active learning scheme for safely assigning weakly annotated images a strong annotation using the trained model. Experimental results showed a 24% point increase in correct localization (CorLoc) measure, which is the ratio of correctly localized and classified images, by assigning the properly controlled weight. Performing active learning after a model is trained showed an additional increase in CorLoc. We tested the proposed method on the Stanford Dog datasets to assure that it can be applied to general cases, where strong annotations are insufficient to obtain resembling results. The presented method showed that higher performance is achievable with lesser annotation effort.

Keywords:

active learning; breast ultrasound; convolutional neural networks; mass classification; object detection; weakly supervised learning

1. Introduction

Breast cancer is the second leading cause of death for women all over the world, while their cause still remains unknown [1]. Like most cancer, early detection plays an important role in reducing the death rate [2]. While digital mammography is the most commonly used technique for detecting breast cancer, its limitations are clear when observing dense breasts, where lesions can be hidden by tissues having similar attenuation [3]. Ultrasound imaging is a complementary method for digital mammography, due to its sensitivity, cost-effectiveness, and safety. However, analyzing ultrasound images is not a straight forward task due to the presence of noise and, thus, requires a skilled radiologist. Computer Aided Diagnosis (CAD) could reduce the dependency on the radiologist and also be beneficial for detecting breast cancer [4].

Breast ultrasound (BUS) images follow the characteristics of a typical ultrasound image, which is generally low in resolution and containing noise. Resolution can be enhanced using higher frequency waves, which will on the other hand limit the penetration depth [5]. While the usual process of diagnosing a BUS image will accompany clinical palpation when palpable, CAD would only have the BUS image available [6]. Additionally, BUS images are not taken at a fixed angle during diagnosis, unless special superimposing among all aspect angles will be performed later [7]. The loss of palpation information and the diversity of image aspects makes it challenging for a CAD to improve.

Conventional methods for BUS image classification that does not use a neural network framework are based on preprocessing and feature extraction of BUS images following region detection or segmentation with those features. Selected features after region detection are classified with different methods [1]. Most of these works focus on feature extraction of the image when the aim is to classify an image to benign or malignant. Other works that aim to localize the lesion use rule based approaches, such as the deformable parts model [8].

Recent deep learning based frameworks conduct both classification and region detection as annotated data became more available. Semantic segmentation is performed with BUS images in [9] by replacing the last three fully connected layers of AlexNet to fully convolutional networks that perform pixel-wise classification. The work utilizes mask labels that have labels for every pixel in the image as ground truth for all of the images. Mask annotations require more labor from a clinician and, therefore, are harder to obtain. Shin et al. [10] proposed a method for object localization and classification using a Faster-RCNN model. While using bounding box annotations as ground truth for the localization task, it makes use of weakly supervised data only comprised of image level label to aid the classification model.

We present a method for sequentially localizing and classifying BUS images based on the Faster-RCNN model presented in [11]. We train a convolutional neural network (CNN) for bounding box regression and mass classification. A fully connected network (FCN) that classifies bounding boxes as either benign, malignant, or background is trained concurrently with the earlier network. The ground truth information are bounding box coordinates and classification labels for each mass. However, BUS data consisting of only the classification labels for each image are more accessible while bounding box annotations require additional expert effort. As BUS image classification still remains a difficult problem, a large dataset size will be beneficial to enhance the performance.

Weakly supervised learning is a technique for machine learning with noisy, sparse annotations. A customized alteration, depending on the degree of the annotations, is needed in order to use data with different levels of supervision. Methods for utilizing image level annotations for segmentation are proposed in [12]. An initial segmentation model is trained using a few strongly annotated images. Images with no mask annotations are given a pseudo mask ground truth generated by the initially trained model and the second model is trained to perform both segmentation and image level classification with these pseudo annotations. Generative adversarial networks (GANs) are tuned to perform semantic segmentation while using both image level annotations and generate mask annotations. Shin et al. [10] uses both bounding box annotations and image level labels to localize and classify objects using multiple-instance learning (MIL). Images without bounding box annotations are given a bounding box chosen from a bag of bounding boxes presented during the localization stage. Various methods for choosing an object among the candidates are tested.

Active learning is a mechanism for expanding the given dataset by labeling unlabeled data with the train model. User intervention for labeling is encouraged during the whole training process. Active learning can be applied to different types of datasets and fields where data is scarce. Mask prediction for lung CT images generated by unsupervised segmentation is used as ground truth annotation for training a supervised segmentation network [13]. The segmentation network is trained multiple times while using the mask prediction from the previous model as the ground truth, progressively improving after each training session.

We propose an appropriate method for controlling the influence of weakly labeled data in a Faster-RCNN based object detection model. The presented method shows increase in correct localization (CorLoc) measures, which is preferred over mean average precision (mAP) in medical imaging, and fraction of lesions detected, which measures the localization performance. The presented method assumes a relatively small strongly annotated dataset insufficient for achieving high classification capability and a larger dataset with weakly labeled images, which is a typical setting for medical imaging where making strong annotations are costly.

The main contributions of this work are, first, designing a reasonable method of controlling the effect of weakly labeled data in an end-to-end object detection model and, second, designing an acceptable approach for actively assigning annotations for weakly labeled data, supplementing the insufficient annotations for object detection. The strongly annotated data,

D_{s t r o n g}

, contain a single bounding box coordinate and the box classification label per image, and the weakly labeled data,

D_{w e a k}

, only contain an image level label per image. An actively annotated dataset,

D_{a c t i v e}

, is newly constructed after a training session and will be concatenated to

D_{s t r o n g}

in the next training session. Individual data streams are maintained during training for the strongly annotated dataset and the weakly labeled dataset. Dataflow in the network is shown in Figure 1. The loss for

D_{s t r o n g}

is calculated in the same manner, as it is proposed in [11], where loss for the region proposal network (RPN) and the RCNN-top layer is propagated seperately. Images in the

D_{w e a k}

dataset can contribute to the classification loss in RCNN-top only when the RPN has proposed a correct region. The loss for

D_{w e a k}

will have less influence until this condition is believed to be satisfied. After the first training session is finished,

D_{a c t i v e}

dataset is crafted from

D_{w e a k}

by giving a prediction that is likely to contain a mass a single ground truth annotation. Images in

D_{a c t i v e}

will be concatenated to

D_{s t r o n g}

, reducing the sparsity issue that the task originally conveyed. The experiments show that using

D_{w e a k}

images in a conservative manner helps the classifier to be detect more lesions. Training with

D_{a c t i v e}

shows an additional increase in the overall performance. We believe that the proposed method can be adopted to general cases where strong annotations are insufficient to train the model classifier and weak labels are more available.

2. Materials and Methods

2.1. Datasets

The proposed data are evaluated on the Seoul National University Bundang Hospital Breast Ultrasound (SNUBH BUS) dataset for BUS images and further tested on the Stanford Dog dataset for general images. While the SNUBH BUS dataset has both

D_{s t r o n g}

and

D_{w e a k}

images, the Stanford Dog dataset only contains

D_{s t r o n g}

images. Thus, the Stanford Dog data are manually divided into

D_{s t r o n g}

and

D_{w e a k}

, where only image labels are used in images selected as

D_{w e a k}

.

The SNUBH dataset collected from the Seoul National University Bundang Hospital is obtained from different ultrasound systems described in [10], including Philips (ATL HDI 5000, iU22), SuperSonic Imagine (Aixplorer), and Samsung Medison (RS80A). The dataset contains a total of 5624 images from 2578 patients. The

D_{s t r o n g}

subset is comprised of 1200 images, 600 of which are benign and the other 600 of which are malignant. We use 400 images from each class as a training set, and 200 as the test set.

D_{w e a k}

subset is comprised of 4224 images, 3291 of which are benign and the remaining 933 malignant. All of the image labels are proven with biopsy results, also meaning that the data are the cases where biopsy was needed to diagnose the patient, making classification with BUS images an even more difficult task.

The Stanford Dog dataset is a collection of color images of 120 breeds of dogs with a total of 20,580 images, all including class labels and bounding box coordinates. In order to mimic the situation in BUS images, we select two similar looking middle size breeds to classify, the Bloodhound and the English foxhound and then converted them to grayscale images. The number of images in each class is 187 and 157, respectively. Each dataset is subdivided into 20

D_{s t r o n g}

training set, 60 test set, and the remaining 107 and 77 images from Blackhound and English Foxhound, respectively, to

D_{w e a k}

dataset. This setting enforces a situation where there are limited amount of strong annotations. The Stanford Dog dataset is tested to demonstrate the validness of the presented method on general images. Only a limited amount of strongly annotated images are available for training and the task is not straight forward, since the images are grayscale images, having room for improvement. The dataset is available online (http://vision.stanford.edu/aditya86/ImageNetDogs/). A summary of the number of images for both datasets is provided in Table 1.

2.2. Training Procedure Using D_strong Subset

The Faster-RCNN model is used for object detection tasks, which is detecting lesions in BUS images. Faster-RCNN is a two stage object detector, where a RPN is trained to specifically perform region proposals on feature maps. Region of interest (RoI) obtained from the RPN is then fed to the RCNN-top layer for classification and additional bounding box regression. Bounding box information is only given by images in

D_{s t r o n g}

subset. This information is used for bounding box regression in both the RPN and RCNN-top, and for foreground background classification in the RPN. The overall dataflow is shown in Figure 1. The loss is comprised of four terms,

L_{r e g}^{R P N}

,

L_{c l s}^{R P N}

,

L_{r e g}^{R C N N - t o p}

,

L_{c l s}^{R C N N - t o p}

.

L_{r e g}^{R P N}

and

L_{r e g}^{R C N N - t o p}

, which are regression losses for the RPN and the RCNN-top, respectively, are obtained by calculating the smooth l1 loss between the ground truth box and the predicted box coordinates.

L_{c l s}^{R P N}

and

L_{c l s}^{R C N N - t o p}

, which are classification losses for the RPN and the RCNN-top, respectively, are obtained by calculating the crossentropy loss between the ground truth label and the predicted label. Corresponding ground truth label and coordinates are assigned when the intersection over union (IoU) between the boxes are over 0.5. Details of calculating the four terms remain same as the method that was proposed in [11].

2.3. Training Procedure Using D_weak Subset

Without bounding box annotations, bounding box regression or foreground background classification can be performed. Thus, images in the

D_{w e a k}

dataset can only aid the classification procedure in the RCNN-top section. We must have a strategy for giving labels to RoIs proposed by the RPN in order to use

D_{w e a k}

images. Although there is no complete way for figuring out the labels of each RoIs, it is known that given an image label, there is at least one mass that should be labeled as the image label. We are able to infer the most probable RoI that should be labeled by rewriting the model with random variables. Let

X_{r o i}

be indicator random variables that map all RoIs to their ground truth (background, benign, malignant) and G be the set of all RoIs in an image. Set G is obtained as an output of the RPN. RoIs in G are considered to contain distinct objects after the non-maximum suppression (NMS) post-processing. NMS eliminates RoIs that overlap with an IoU over 0.5. The RoI with the higher foreground score is kept among the two RoIs. The relationship of the values is defined as

m a l i g n a n t > b e n i g n > b a c k g r o u n d

.

Y = max_{i \in G} (X_{r o i_{1}}, X_{r o i_{2}}, X_{r o i_{3}}, \dots)

Thus, Y represents the label of an image, since a single malignant lesion would make an image label malignant, and a single benign lesion would make the image label benign if there are no other malignant lesions. Subsequently, the most probable mass to be labeled given the image label can be written, as follows,

\underset{i \in G}{argmax} p (X_{r o i_{i}} = l a b e l Y = l a b e l) \equiv \underset{i \in G}{argmax} p (X_{r o i_{i}} = l a b e l) .

Because Y is a max of all RoI labels, conditioning the probability with

Y = l a b e l

gives no information if the probability in question is that of X having the same label. Thus, it is optimal to choose the RoI with the highest probability of containing the labeled object. Let

\hat{X}

denote the mapping between the proposed RoI and the predicted label by the RCNN-top layer. Since

\hat{X}

is trained directly by the cross entropy loss with X when using the

D_{s t r o n g}

dataset,

\hat{X}

can be used as an alternative of X if suitably trained. Therefore, we label the RoI with the highest image label score after running through RCNN-top, to be the train target in the RCNN-top section and then calculate the loss for a single

D_{w e a k}

image, as follows,

L_{c l s}^{R C N N - t o p} = c r o s s e n t r o p y (max_{i \in G} p (\hat{X_{i}}), p (l a b e l)) .

However,

\hat{X}

would not be able to replace X in the early stages of training. Hence, we introduce a controlled weight for

L_{c l s}^{R C N N - t o p}

, so-called α. We increase α from a 0.01 as the training progresses and the manner of this increase can vary. The weight α for

L_{c l s}^{R C N N - t o p}

was selected among the following candidates:

α = 1

α = 1 - 0.99 ({0.9}^{s t e p / 2000})

α = 0.01 + 0.99 (s t e p / t o t a l s t e p s)

α = 0.01 + 0.99 {(s t e p / t o t a l s t e p s)}^{5}

(1)

α = 0.01 + 0.99 {(s t e p / t o t a l s t e p s)}^{16}

(2)

α = 0.01 + 0.99 {(s t e p / t o t a l s t e p s)}^{32} .

(3)

Changes of α following the training steps are visualized in Figure 2. The usage of

L_{c l s}^{R C N N - t o p}

is considered to be more conservative as the equation number increases. The

D_{w e a k}

and

D_{s t r o n g}

dataset is concurrently used and the calculated loss from each image is summed, as follows,

L_{f i n a l} = L_{s t r o n g} + α L_{w e a k},

L_{s t r o n g} = L_{c l s}^{R P N} + L_{r e g}^{R P N} + L_{c l s}^{R C N N - t o p} + L_{r e g}^{R C N N - t o p},

L_{w e a k} = L_{c l s}^{R C N N - t o p} .

2.4. D_active Construction with D_weak Test Results

D_{a c t i v e}

is a dataset that we create with the

D_{w e a k}

dataset by adding annotations that are generated from the initial model after a training is finished.

D_{a c t i v e}

dataset can aid the

D_{s t r o n g}

dataset, since images in

D_{s t r o n g}

are assumed to be insufficient in this problem setting. Predicted bounding boxes and predictions are not reliable in itself, which requires the cautious selection of images to include. Verifying whether a predicted bounding box contains an object or not is the main issue. The double prediction problem can be a benefit for solving this problem. Double prediction is the case when two different predictions are made for a single object, as seen in Figure 3. Objects in double predicted boxes are more likely to contain an object than other predicted boxes, since it was predicted to contain a lesion twice. We can generate a strong annotation by selecting the correct labeled box of the two predicted boxes. The image level label is used to pick the correct bounding box among the two uncertain predictions.

All of the images in

D_{w e a k}

are tested through the trained model, generating multiple bounding boxes with labels for each image. We iterate through the boxes in an image to check whether there is a double prediction based on the PASCAL VOC criteria, which defines boxes to be overlapping when their IoU is higher than 0.5. If multiple double prediction pairs exist for an image, we choose the pair with the higher IoU. Once a pair is selected for an image, we annotate the image with the bounding box that hold the original image label. Newly annotated images will contain a bias towards benign, since

D_{w e a k}

is biased. Thus, we only choose malignant images to add to the

D_{a c t i v e}

dataset to compensate this bias, and also due to the medical imaging setting where a failure to detect a malignant lesion is critical. The newly generated

D_{a c t i v e}

dataset is used in the same manner as the

D_{s t r o n g}

dataset, since they can now produce the same type of losses.

2.5. Faster-RCNN Hyperparameters and Model Details

We use the PASCAL VOC pre-trained VGG-16 [14] as the backbone for generating feature maps, only fine-tuning the final layers higher than conv3_1, which is the method used by the original Faster-RCNN [11]. The RPN’s regression and classification network was modified to use 3 × 3 convolution instead of 1 × 1 for better detection of objects. We reduced the size of the fully connected layer in the RCNN-top to 2048 to prevent overfitting. The

D_{s t r o n g}

dataset was augmented by horizontal flipping, which increases the number of images, and, by random brightness, contrast adjustments given to images, which preserves the number of images. Steps are used to check the training progress, since epochs cannot be calculated when using two datasets with different sizes. One step corresponds to using a single batch from each datasets. Th Adam optimizer was used for optimization, with a configuration of batch size 1 for each dataset. Negative sampling for background RoI was performed when training

D_{w e a k}

images, since the choosing a RoI with the image label for

D_{w e a k}

images makes the distribution of RoIs unbalanced. The least scoring box was labeled as background for the RCNN-top to calculate. Class weights were also given for

D_{w e a k}

losses, since the dataset has a bias towards benign. All of the details and code of the model will be available online (https://github.com/YeolJ00/faster-rcnn-pytorch) for research purposes.

3. Results

3.1. Evaluation Specifications

In this study, a model generates multiple bounding boxes for an image. Each detection is considered to be a true positive (TP) if the classified label of the detection matches the target GT class, and the IoU between the predicted bounding box and the target GT is higher than 0.5. Otherwise, it is regarded as a false positive (FP). We evaluate the performance of the model with the test images in SNUBH and Stanford Dog dataset through some measures such as correct localization (CorLoc), and fraction of lesion detected.

CorLoc is defined as the ratio of correctly classified and localized images. A correctly classified image is an image that contains a TP detection in its predicted boxes. Although mean average precision is widely used for general deep learning models, CorLoc is more applicable the BUS case, since detecting a positive mass is critical in medical imaging. Additionally, only a single mass in an image is labeled as GT, while there could be other possible unlabeled masses, thus FP detections might actually contain masses. The fraction of lesion detected is the measure for localization performance, which is obtained by the ratio of images that have a bounding box that overlaps with its GT box.

3.2. Experiments for Controlling the Effect of Weakly Annotated Images in SNUBH Dataset

Table 2 presents the quantitative result of the experiments. The experiments are conducted on a total of 160,000 training steps, and all of the hyperparameters except α are equally applied. It is found that a model does not perform well when α is a constant value or increased with an inverse exponential function. We believe that the value was too high in the early stages of training.

L_{w e a k}

was not penalized enough before RPN was trained enough to provide valid RoI proposals, which gives an incorrect loss for the classifier. Based on this idea, we compared more conservative functions for increasing α. We can see that all of the subsequent methods demonstrate an improvement both in CorLoc and the fraction of lesion detected. The fraction of lesion detected is the fraction of ground truth lesions that were given a bounding box. Performance tends to increase as α is maintained low during most of the training phase, and the model exhibited the best result when α followed (2). 24% point CorLoc increase and a 20% point fraction of lesion detected increase was shown as compared to the model without controlled weight. A slight loss of performance was shown when α follows (3). We believe this is due to a drastically increasing α for the case when the total step is 160,000, making the loss increase faster than the optimization step. Additionally, weakly annotated data was fully used only for a small number of steps in (3).

Qualitative results for controlling α are shown in Figure 4. The proposed schedule for α shows both solid localization of objects and classification of bounding box proposals. Figure 4 also shows a false positive detection for the proposed method, yet the false positive detection has a relatively low score of being malignant when compared to the method following (3).

3.3. Experiments for Active Learning on SNUBH Dataset

Quantitative results for active learning experiment is shown in Table 3.

D_{a c t i v e}

constructed from the model trained with the proposed α weight (2) consists of 238 malignant images. Active learning aims to extend the

D_{s t r o n g}

dataset, which is the primary dataset that trains the model. Performing active learning gives a 2.75% increase in CorLoc measure and a 3.75% increase in the fraction of lesion detected measure. Both classification and localization performance has increased.

Figure 5 presents the qualitative results. Some masses that were difficult to detect or classify were given the correct predictions after training with

D_{a c t i v e}

. Both localization and classification performance are enhanced.

3.4. Experiments on Comparable Object Detectors

The proposed model was compared with other object detectors in [10,11]. A vanilla Faster-RCNN model was trained with

D_{s t r o n g}

images while using the specifications introduced in [10]. The Faster-RCNN based model in [11] is a model that uses weakly annotated images jointly with strong, bounding box annotations. Thus, we were able to reconstruct the model to train with the SNUBH dataset. Implementations of the models are provided online (https://github.com/YeolJ00/faster-rcnn-pytorch). Table 4 shows the results.

3.5. Experiments on Stanford Dog dataset

Experiments for controlled weight and active learning was performed with the Stanford Dog dataset.

The results for controlling α and active learning are summarized in Table 5 and Table 6 respectively. Little increase in CorLoc was shown for the proposed α control method. We believe that the reason behind the negligible performance increase for the proposed α control method is due to the big bounding box proportion in the images. This enables the RPN to propose correct bounding boxes at an earlier stage of the training, which means that the loss is less likely to be lead to a local minimum. Acitve learning added 23 images to the strong annotated dataset, 10 Blackhound boxes, and 13 English Foxhound boxes. We included images from both classes, since this is not a medical imaging task where a detecting a certain class is preferred. Performing active learning on the trained model shows a slight decrease in CorLoc measures, which is a measure that ignores FP predcitions. However, the widely used measure of performance for object detection tasks is mAP, which increased by 17.46% point after active learning. The increase in strong annotations has reduced false positive predictions, significantly increasing the precision of the model. Model performance does not vary much due to the generally high performance. The prediction result samples can be viewed in Figure 6.

4. Conclusions and Discussion

We propose an applicable mechanism for utilizing weakly annotated images for object detection models in a setting where bounding box information is insufficient for achieving high classification performance. The proposed method enables a successful increase of the size of strong annotations by safely assigning bounding box predictions as ground truth. The method is applied to the primary task of detecting masses in BUS and tested on the Stanford Dog dataset to verify generality. A comparison with different variants of the method supports the reasoning behind the manner of controlling the influence of weakly annotated images. We notice that maintaining the loss from weakly annotated images at a low level until the RPN proposes bounding boxes containing objects guides the model to have a higher classification capability. Additionally, we set specific configurations for the active learning scheme, which can be a risky work, since there is no way to confirm the correct assigning of GT bounding boxes. The results show that it can enhance classification performance if it was an issue.

For our future work, we plan to extend the proposed method to autonomously detect whether if the RPN is proposing bounding boxes containing objects and control the weight, which was originally increased following a fixed schedule. This will increase the generality of the method, since the point of RPN convergence may vary depending on the size and detection difficulty of a dataset. We believe that the proposed method can be applied to typical cases of medical imaging tasks where strong annotations are costly and weakly labeled data are relatively easy to obtain from the diagnosis procedure.

Author Contributions

Conceptualization, J.Y.; methodology, J.Y.; software, J.O.; validation, J.Y.; formal analysis, I.Y.; investigation, J.Y. and J.O.; resources, I.Y.; data curation, I.Y.; writing—original draft preparation, J.Y.; writing—review and editing, J.O.; visualization, J.O.; supervision, I.Y.; project administration, I.Y.; funding acquisition, I.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF). This research was funded by the Ministry of Education, Science, Technology (No. 2019R1A2C1085113).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SNUBH	Seoul National University Bundang Hospital
BUS	Breast Ultrasound
GANs	Generative Adversarial Networks
CNN	Convolutional Neural Networks
FCN	Fully Connected Networks
RPN	Region Proposal Network
RoI	Region of Interest
CorLoc	Correct Localization
NMS	Non maximum suppression
mAP	mean average precision
TP	True Positive
FP	False Positive
GT	Ground Truth

References

Cheng, H.; Shan, J.; Ju, W.; Guo, Y.; Zhang, L. Automated breast cancer detection and classification using ultrasound images: A survey. Pattern Recognit. 2010, 43, 299–317. [Google Scholar] [CrossRef] [Green Version]
Cheng, H.D.; Shi, X.; Min, R.; Hu, L.; Cai, X.; Du, H. Approaches for automated detection and classification of masses in mammograms. Pattern Recognit. 2006, 39, 646–668. [Google Scholar] [CrossRef]
Stavros, A.T.; Thickman, D.; Rapp, C.L.; Dennis, M.A.; Parker, S.H.; Sisney, G.A. Solid breast nodules: Use of sonography to distinguish between benign and malignant lesions. Radiology 1995, 196, 123–134. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Drukker, K.; Gruszauskas, N.P.; Sennett, C.A.; Giger, M.L. Breast US computer-aided diagnosis workstation: Performance with a large clinical diagnostic population. Radiology 2008, 248, 392–397. [Google Scholar] [CrossRef] [Green Version]
Ragesh, N.; Anil, A.; Rajesh, R. Digital image denoising in medical ultrasound images: A survey. In Proceedings of the Icgst Aiml-11 Conference, Dubai, UAE, 12 April 2011; Volume 12, p. 14. [Google Scholar]
Madjar, H. Role of breast ultrasound for the detection and differentiation of breast lesions. Breast Care 2010, 5, 109–114. [Google Scholar] [CrossRef] [Green Version]
Hansen, C.; Huttebrauker, N.; Schasse, A.; Wilkening, W.; Ermert, H.; Hollenhorst, M.; Heuser, L.; Schulte-Altedorneburg, G. Ultrasound breast imaging using full angle spatial compounding: In-vivo results. In Proceedings of the 2008 IEEE Ultrasonics Symposium, Beijing, China, 2–5 November 2008; pp. 54–57. [Google Scholar]
Pons, G.; Martí, R.; Ganau, S.; Sentís, M.; Martí, J. Computerized detection of breast lesions using deformable part models in ultrasound images. Ultrasound Med. Biol. 2014, 40, 2252–2264. [Google Scholar] [CrossRef] [PubMed]
Yap, M.H.; Goyal, M.; Osman, F.M.; Martí, R.; Denton, E.; Juette, A.; Zwiggelaar, R. Breast ultrasound lesions recognition: End-to-end deep learning approaches. J. Med. Imaging 2018, 6, 011007. [Google Scholar]
Shin, S.Y.; Lee, S.; Yun, I.D.; Kim, S.M.; Lee, K.M. Joint weakly and semi-supervised deep learning for localization and classification of masses in breast ultrasound images. IEEE Trans. Med. Imaging 2018, 38, 762–774. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
Wang, Y.; Liu, J.; Li, Y.; Lu, H. Semi-and weakly-supervised semantic segmentation with deep convolutional neural networks. In Proceedings of the 23rd ACM international conference on Multimedia, Brisbane, Australia, 13 October 2015; pp. 1223–1226. [Google Scholar]
Zhang, L.; Gopalakrishnan, V.; Lu, L.; Summers, R.M.; Moss, J.; Yao, J. Self-learning to detect and segment cysts in lung CT images without manual annotation. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 1100–1103. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]

Figure 1. Illustration of the dataflow in the presented model. All of the four losses are used when training with

D_{s t r o n g}

images, while only the classification loss in RCNN-top is calculated for

D_{w e a k}

images. Refer to [11] and Section 2.3 for detailed methods for choosing target RoIs and calculating losses.

Figure 1. Illustration of the dataflow in the presented model. All of the four losses are used when training with

D_{s t r o n g}

images, while only the classification loss in RCNN-top is calculated for

D_{w e a k}

images. Refer to [11] and Section 2.3 for detailed methods for choosing target RoIs and calculating losses.

Figure 2. Illustration for the tendancy of alpha when the step size is 160000. The black plot shows a log-like increase, namely inverse exponential, in α, which converges to 1 quickly. The blue plot is a linear increase of α. (1), (2), (3) are the conservative increase of α during the training phase, namely polynomial increase, which relates to the orange plot, red plot, and green plot, respectively.

Figure 3. Example of a double prediction case in Breast ultrasound (BUS) images. The bounding box in blue represents the ground truth for a benign mass. Predicted boxes are colored in orange and cyan for malignant and benign predictions, respectively.

Figure 4. Qualitative results for controlling α. Bounding boxes colored red/blue are ground truth boxes for malignant/benign masses. Bounding boxes colored orange/cyan are predictions for malignant/benign masses. Two cases are presented for each method.

Figure 5. Qualitative results for controlling α. Bounding boxes colored red/blue are ground truth boxes for malignant/benign masses. Bounding boxes colored orange/cyan are predictions for malignant/benign masses. Boxes on the left are the results before active learning, and the right side shows the same predictions made for the images after active learning.

Figure 6. Prediction results from the Stanford Dog images. Image on the right and left are predictions for a Blackhound image and a English Foxhound image, respectively.

Table 1. Cardinality of SNUBH and Stanford Dog Dataset.

Dataset		SNUBH			Stanford Dog
Role	Supervision	Mal.	Ben.	Total	Blk.	Eng.	Total
Train	Strong	400	400	800	20	20	40
Train	Weak	933	3291	4224	107	77	184
Test	Strong	200	200	400	60	60	120
Total		5424			354

Mal., Ben., Blk., Eng., denote malignant, benign, Blackhound, English Foxhound respectively.

Table 2. Results showing variants of controled weight α with the SNUBH BUS dataset.

α Control Schedule	CorLoc [%]	Fraction of Lesion Detected [%]
constant	41.75	56.00
inverse exponential	49.75	69.25
linear	60.25	70.50
polynomial (1)	58.75	67.00
proposed: polynomial (2)	65.75	76.00
polynomial (3)	63.00	74.50

Correct localization (CorLoc) and fraction on lesion detected according to the manner of how α is increase. CorLoc measures both classification and localization performance while fraction of lesion detected only measures the localization performance. Detailed equations are presented in Section 2.3.

Table 3. Results showing the effect of active learning in SNUBH dataset.

Active Learing	CorLoc [%]	Fraction of Lesion Detected [%]
before	65.75	76.00
after	68.50	79.75

CorLoc and Fraction of lesion detected before and after active learning is presented.

Table 4. Results for various object detectors.

Detectors	CorLoc [%]	Fraction of Objects Detected [%]
Vanilla Faster-RCNN [10]	42.50	57.50
Weakly supervised Faster-RCNN [11]	33.75	59.00
proposed	68.50	79.75

CorLoc, Fraction of objects detected is shown for different object detectors.

Table 5. Results showing variants of controled weight

α

with the Stanford Dog dataset.

Table 5. Results showing variants of controled weight

α

with the Stanford Dog dataset.

α Increase Method	CorLoc [%]	Fraction of Objects Detected [%]
constant	83.33	86.67
inverse exponential	85.83	87.50
linear	83.33	87.50
polynomial (1)	79.17	84.17
proposed: polynomial (2)	87.50	89.17
polynomial (3)	81.67	87.50

CorLoc and fraction on objects detected according to the manner of how α is increase. Detailed equations remain same as the test with SNUBH BUS dataset.

Table 6. Results showing the effect of active learning in the Stanford Dog dataset.

Active Learing	CorLoc [%]	Fraction of Lesion Detected [%]	mAP [%]
before	87.50	89.17	36.84
after	84.17	87.50	54.30

lCorLoc, fraction of lesion detected, and mean average precision (mAP) before and after active learning is presented.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yun, J.; Oh, J.; Yun, I. Gradually Applying Weakly Supervised and Active Learning for Mass Detection in Breast Ultrasound Images. Appl. Sci. 2020, 10, 4519. https://doi.org/10.3390/app10134519

AMA Style

Yun J, Oh J, Yun I. Gradually Applying Weakly Supervised and Active Learning for Mass Detection in Breast Ultrasound Images. Applied Sciences. 2020; 10(13):4519. https://doi.org/10.3390/app10134519

Chicago/Turabian Style

Yun, JooYeol, JungWoo Oh, and IlDong Yun. 2020. "Gradually Applying Weakly Supervised and Active Learning for Mass Detection in Breast Ultrasound Images" Applied Sciences 10, no. 13: 4519. https://doi.org/10.3390/app10134519

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gradually Applying Weakly Supervised and Active Learning for Mass Detection in Breast Ultrasound Images

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Training Procedure Using D_strong Subset

2.3. Training Procedure Using D_weak Subset

2.4. D_active Construction with D_weak Test Results

2.5. Faster-RCNN Hyperparameters and Model Details

3. Results

3.1. Evaluation Specifications

3.2. Experiments for Controlling the Effect of Weakly Annotated Images in SNUBH Dataset

3.3. Experiments for Active Learning on SNUBH Dataset

3.4. Experiments on Comparable Object Detectors

3.5. Experiments on Stanford Dog dataset

4. Conclusions and Discussion

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Gradually Applying Weakly Supervised and Active Learning for Mass Detection in Breast Ultrasound Images

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Training Procedure Using Dstrong Subset

2.3. Training Procedure Using Dweak Subset

2.4. Dactive Construction with Dweak Test Results

2.5. Faster-RCNN Hyperparameters and Model Details

3. Results

3.1. Evaluation Specifications

3.2. Experiments for Controlling the Effect of Weakly Annotated Images in SNUBH Dataset

3.3. Experiments for Active Learning on SNUBH Dataset

3.4. Experiments on Comparable Object Detectors

3.5. Experiments on Stanford Dog dataset

4. Conclusions and Discussion

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2. Training Procedure Using D_strong Subset

2.3. Training Procedure Using D_weak Subset

2.4. D_active Construction with D_weak Test Results