1. Introduction
With the development of imaging technology, many organs of the human body can be made easily visible on images, which helps to diagnose many kinds of diseases without the need to perform operations [
1,
2,
3]. Due to this advantage, imaging techniques are widely used as high-quality and cheap diagnostic methods compared to other diagnostic methods. Many imaging techniques have been developed to suit different organ types and diseases. For example, X-ray imaging was one of the earliest imaging techniques used to capture images of the lung and bone [
2]. Using X-ray images, doctors can assess the lung cancer condition or determine the position of the broken bone [
2]. Ultrasound imaging techniques are currently used to capture images of the breast or thyroid region to detect and evaluate the condition of breast and thyroid cancers [
3]. Computed tomography (CT) or magnetic resonance imaging (MRI) are used to capture brain images [
1,
4]. Using these images, doctors (radiologists) can diagnose the type and condition of the disease and determine a more accurate treatment method based on their experiences. However, the disadvantage of this diagnosis and treatment method is that doctors (radiologists) must perform every step in the diagnosis process based on their knowledge and experience of a specific type of disease. Therefore, diagnostic performance varies and depends heavily on the personal ability of a doctor (radiologist).
Computer-aided diagnosis (CAD) systems have been widely developed and used to assist doctors and to enhance the performance of diagnosis and treatment processes. This technique uses a computer program to process the captured images of organs and provides suggestions to doctors during the diagnosis and treatment process. The advantage of this method is that it can be trained using large amounts of data with the help of experts (experienced doctors). Therefore, performance is not affected by the personal ability of a doctor. In addition to these advantages, the design and development of CAD systems face several difficulties, such as the performance of the processing algorithm and data collection.
Brain diseases, such as brain tumors [
5], brain strokes [
6], and brain disorders [
7], are important problems in human health. Brain tumors are the main cause of cancer-related death [
5], whereas brain stroke also causes death and disability in elderly people with health problems, such as hypertension, abundant utilization of liquor, diabetes, cholesterol, and smoking [
6]. Therefore, the classification of brain tumor/stroke images are important for the diagnosis and treatment of brain diseases. Studies on CAD systems for brain tumor classification problems are mainly based on either handcrafted image feature extraction, followed by a classification method or a deep learning network using a convolutional neural network (CNN).
1.1. Handcrafted Feature-Based Methods
In the first category, previous studies mainly used expert design feature extraction methods to extract information from brain images and classified the input images into predefined categories using either classification or distance measurement (image retrieval) methods. In a study by Jun Cheng et al. [
8], they used the Fisher vector representation (FV) method to extract brain image features for several subregions. With the extracted FV, they can retrieve brain tumor images in a dataset using the similarity distance measurement between two FVs. Similarly, Huang et al. [
9] used a bag-of-words model to successfully retrieve brain tumor images. Ismael et al. [
10] used a 2D discrete wavelet transform and 2D Gabor filtering technique to extract statistical information from brain images, such as the mean, variance, skewness, and contrast, to form a feature vector that represents the content of the brain images. Finally, they used a multilayer perception network (MLP) to classify the input brain images into several predefined categories of brain images. Gurbina et al. [
11] also used different types of wavelets transform to extract information from images and a support vector machine (SVM) to identify whether a brain image contains a tumor. Zaw et al. [
12] used the naïve Bayes classification method to classify brain images based on eight region properties (area, perimeter, eccentricity, equivalent diameter, solidity, convex area, major axis length, and minor axis length) and three intensity features (maximum, mean, and minimum) of the extracted brain tumor region. Minz et al. [
13] extracted gray level co-occurrence matrix (GLCM) features from brain images and used the Adaboost classification algorithm to classify an input brain image into cancerous or non-cancerous categories. While these methods can classify brain images into predefined categories, they use expert-design feature extractors (statistical, Fisher vector, wavelet transform, Gabor filter, and GLCM). As a result, the extracted image features capture only limited aspects of brain images, which can result in the low performance of classification systems.
1.2. Deep Feature-Based Method
Deep learning-based methods have recently been used to deal with the detection and classification of brain tumor images. Many deep learning-based systems have been proposed for brain tumor image segmentation or the classification problem. For example, Chen et al. [
14] proposed a brain tumor segmentation network using a combination of the convolutional neural network (CNN) and transformer architecture, namely CSU-Net. This research yielded higher segmentation accuracy than other segmentation methods, such as the 3DU-Net network.
To overcome the limitations of handcrafted-based brain tumor classification methods, deep learning-based classification methods have recently been used. In a study by Abiwinanda et al. [
5], they proposed the use of a convolutional neural network (CNN) to classify three types of brain tumors, including glioma, meningioma, and pituitary. They validated the classification performance using CNNs at various depths. As a result, they showed that a CNN that consists of two convolution layers and one dense (fully connected layer) layer outperforms other network architectures in their experiments. To reduce the number of network parameters of a CNN, Isunuri et al. [
15] proposed the use of separable convolution in CNNs for brain tumor classification problems. In several other studies, Swati et al. [
16], Kumar et al. [
17], and Deepak et al. [
18] used a fine-tuning method to train a CNN for brain tumor image classification problems. By using the pretrained CNN models (such as AlexNet, VGG16, VGG19, residual network, and GoogleNet network architectures that were successfully trained on the ImageNet dataset for a general image classification problem), they successfully trained the classification model using a small amount of training data that is difficult to perform compared to training the network from scratch. Furthermore, Alanazi et al. [
19] proposed a CNN-based classification method based on a fine-tuning procedure using two brain tumor datasets. In this approach, the authors first trained a CNN on the first dataset. The trained CNN model was then fine-tuned using the second dataset to associate the knowledge of the two datasets. They showed that the fine-tuning approach is efficient for the development of brain tumor image classification systems. Bodapati et al. [
20] used two pretrained CNNs (Inception ResNet and Xception) to extract deep image features. Using two different CNN architectures, the authors showed that they can extract two different sets of feature representations. Finally, they proposed a fusion method based on a multi-layer perceptron (MLP) network to combine the extracted image features for the classification problem. To extract image features at different image scales, Togacar et al. [
21] proposed the use of image features at various stages of a CNN. The shallow convolution layers help to extract low-abstract image features, whereas the deep convolution layers help to extract highly abstract image features. By combining low- and high-abstract image features, they showed that their proposed method is efficient for brain tumor image classification. In a similar approach, Diaz-Pernas et al. [
22] proposed a multiscale CNN that processes input images using several subnetworks with different-sized convolution kernels to extract multiscale image features. They confirmed that multiscale features are efficient for brain tumor classification. Kang et al. [
23] recently proposed an ensemble approach for the brain tumor classification problem. For this purpose, they first used three different pretrained CNN models, namely, DenseNet, InceptionNet, and residual-based networks, to extract image features. With the extracted image features, they applied several classification methods, such as support vector machines (SVMs), to the combined concatenated features to classify input images into several predefined categories of brain tumor images. By using pretrained CNNs that were successfully trained on a large dataset (ImageNet dataset), Kang’s method does not need to train the CNN again while utilizing the power of deep features. A similar study conducted by Deepak et al. [
24] confirmed the efficiency of deep features and SVM-based classifiers in brain tumor classification systems. Kesav et al. [
25] proposed a combination of detection and classification networks for brain tumor detection and the classification problem. They first constructed a two-channel CNN network to efficiently classify brain images into either tumor or non-tumor images. After that, a region-based CNN (RCNN) network is used to identify the tumor region. Most recently, Chatterjee et al. [
26] proposed spatio-spatial models for classifying the 3D scan of brain images into different types of brain tumors by learning information in both spatial and temporal spaces.
While these deep learning-based studies have shown that they successfully classify brain tumor images into predefined categories, they require a large amount of training data to train the classification network, which is normally difficult to obtain for medical image processing. Therefore, their performance will be significantly reduced in the case of less data being available for training. In addition, the classification performance can be affected by the architecture of the CNN.
Table 1 summarizes previous studies on the brain tumor image classification problem in comparison to our proposed method.
The rest of our paper is organized as follows. In
Section 2, we highlight the main contribution of our work. In
Section 3, we present our proposed method in detail. Using the proposed method, we performed experiments with two public datasets to evaluate the performance of brain tumor image classification using a small training dataset and the experimental results are presented in
Section 4. In
Section 5, we discuss our experimental results presented in
Section 4. Finally, the conclusion of our work is presented in
Section 6.
5. Discussion
A large dataset for brain tumor segmentation purpose exists, namely the BraTS dataset [
48,
49,
50]. This dataset contains 3D MRI scans of human brain with tumors. Therefore, it does not fit with our study as we tend to classify 2D MRI images. In addition, although we can use tumor-free slices as non-tumor and the rest as tumor images, it makes difficulties for training and testing of our proposed method because slices in 3D MRI scans exhibit high correlation that makes it difficult to train classification methods (SVM, MLP, and FS). Therefore, we did not use the BraTS dataset in our experiments. Instead, we used two public datasets that have been used in previous study [
23] to evaluate the performance of our proposed method, including the BT-small [
28] and BT-large datasets [
29]. These datasets were used in a previous study for the brain tumor image classification problem [
15,
16,
18,
20,
23] and open for all researchers. Therefore, the use of these datasets allow other researches to easily use and compare with our study.
In a previous study conducted by Kang et al. [
23], a very high classification accuracy was obtained using an SVM- or CNN-based method, approximately 98.04% with the BT-Small dataset, and 93.72% with the BT-Large dataset. Using the BT-Large dataset, previous studies conducted by Abiwinanda et al. [
5], Swati et al. [
16], Deepak et al. [
18], Bodapati et al. [
20], Díaz-Pernas et al. [
22], and Isunuri et al. [
15] showed very high classification accuracies, ranging from 84% to more than 98% when training with a large amount of images. From our experimental results, we confirmed that the lack of training data is a critical problem in a classification system as lower classification accuracies were produced by our experiments compared to those produced by previous studies using the same dataset. This is because previous studies performed training on a large dataset and testing on a small dataset. However, our study focuses on a different scenario, in which the classification system lacks training data. In this special case, the performance of conventional methods, such as SVM- or CNN-based methods (features extraction by CNN followed by MLP for classification), is significantly reduced, as shown in our experimental results presented in
Table 7 and
Table 9. However, as shown in
Table 8 and
Table 10, our proposed method enhanced the overall classification accuracy by combining the results of the SVM- and MLP-based methods with a new classification method based on the few-shot learning technique. Based on these experimental results, we believe that our proposed method is more efficient than previous methods and helps to enhance the classification accuracy for classification problems that lack training data.
As explained in
Section 3.1, the BT-Large dataset contains brain images that belong to three predefined categories: Meningioma, glioma, and pituitary, as shown through some examples in
Figure 8. This dataset is different from the BT-Small dataset, which contains images of two categories of tumor (with tumor region) and non-tumor (without any tumor region) images. As the BT-Small dataset contains images with and without tumors, the difference between the images in the two categories in the BT-Small dataset is high. Therefore, we can easily distinguish between tumor and non-tumor images. However, the difference between images among categories in the BT-Large dataset was low because all images in the BT-Large dataset contained tumors. In addition, the BT-Small dataset contained two image classes (categories), whereas the BT-Large dataset contained three image classes (meningioma, glioma, and pituitary). Therefore, classifying images into a large number of predefined categories using a small amount of training data is also a more difficult problem than classifying images into a small number of predefined categories. Therefore, the classification performance of the BT-Small dataset was much better than that of the BT-Large dataset, as shown in
Table 7,
Table 8,
Table 9 and
Table 10. For a 3-class classification system, a random guessing system produces an accuracy of about 33.333% that is much lower than our classification accuracy (55.792% mentioned in
Table 10). In addition,
Table 13 shows that our proposed method also outperformed various classification methods when using small data for training. From this result, we think that our algorithm successfully enhanced the classification performance compared to previous classification methods. From our experimental results, we showed that the lack of training data lead to reduction in performance of conventional classification systems. Knowing this problem, our proposed method can be used in real-world applications when there is less data for training the classification system such as a new or less common disease is presented.
Classifying images into a large number of predefined categories is more difficult a problem than classifying images into a smaller number of categories. This problem is caused by the fact that there always exist the inter-class and intra-class correlation among images in the dataset. As a result, when the number of classes (predefined categories) increases, it causes more errors because of the interclass correlation. To reduce the negative effects of this problem, a possible solution is that we should collect more data and perform data augmentation to make the dataset generalize for a general classification problem. In addition, we could perform a pre-classification scheme that classifies the data into some major categories before fine-tuning into detailed categories.
In
Figure 11, we show some example error cases caused by individual method, including the SVM-, MLP-, and FS-based methods. As shown in this figure, all these methods (SVM-, MLP-, and FS-based methods) can falsely classify a non-tumor image as a tumor image when the input images contain noise and/or high-contrast brain regions, as shown in the images on the left side of this figure. When the size of the tumor is relatively small or unclear, these methods can falsely classify a tumor image as a non-tumor image, as shown in the images on the right side of this figure.
As shown in
Section 4, our proposed method outperformed previous methods for brain tumor image classification using a small training dataset. For demonstration purposes, we show some examples in which our proposed method (based on the weighted-SUM rule) helps to enhance the classification performance of individual classifiers in
Figure 12. We can observe from this figure that, although an individual method (among three methods) can falsely classify an image, the fusion with the result of other classifiers helps to correct that error and consequently enhances the classification performance of the overall system.
To investigate the internal functioning of the MLP-based method for the classification problem, we extracted the class activation map (CAM) image [
51] of example input images using a CNN-based network (feature extracted by CNN and followed by classification by MLP) presented in
Figure 4, as shown in
Figure 13 and
Figure 14, for the case where the input image is a tumor image (
Figure 13) and for the case where the input image is a non-tumor image (
Figure 14). As shown in
Figure 13, the MLP network pays more attention to the tumor region (higher weight) than the other regions for making a decision on the input image belonging to the tumor class (
Figure 13c), whereas it uniformly analyzes the overall input image for making a decision on the input image belonging to the non-tumor class (
Figure 13b). When the input image is a non-tumor image, as shown in
Figure 14, the networks pay attention to the entire brain region for making decisions on the input image belonging to the non-tumor class, whereas they pay attention to some high-frequency regions such as the skull or brain boundary region to make decisions on the input image belonging to a tumor class. From these figures, we see that the MLP network pays attention to tumor regions or high-frequency regions for making decisions on input images belonging to the tumor region and pays uniform attention to the entire brain region to make decisions on input images belonging to the non-tumor class.
To investigate the internal functioning of the FS network, we plotted the extracted image features before and after transformation using the FS network of some input image pairs, as shown in
Figure 15 and
Figure 16. In these figures, the
D value indicates the measured cosine distance between two extracted image features. The upper part of
Figure 15a shows the original deep image features extracted by pretrained CNNs, and the cosine distance between the two features is approximately 0.8 (
D = 0.80 in
Figure 15). This indicates that the two images are quite similar (close to 1). By transforming the input image features to other spaces using the FS network, as shown in the lower part of
Figure 15a, the cosine distance between two input images is increased to roughly 0.98 (
D = 0.98 in
Figure 15a), which indicates that the two images are very close together. This is what we expect, as they are all in the same class of non-tumor images. Similarly,
Figure 15b shows the case of two tumor input images. Without the use of the FS network, the similarity of two input images is approximately 0.81 (
D = 0.81 in
Figure 15b). Using our proposed FS network, the similarity becomes 0.99 (
D = 0.99 in
Figure 15b), which indicates that the two images belong to the same category.
Figure 16 shows a case in which two input images were from different classes. As shown in the upper part of this figure, the cosine distance between the two images is obtained as 0.70 (
D = 0.70 in
Figure 16) using the deep features extracted using pretrained CNNs. This value indicates that the contents of the two images are quite similar, even though they belong to two different classes. Using our proposed FS network, the distance between two input images is reduced to 0.01 (
D = 0.01 in
Figure 16), which indicates that the two images are different (cosine distance is now close to 0). This is what we expect because the two images are from two different classes. From
Figure 15 and
Figure 16, we can see that the FS network helps to adjust the similarity measurement when two images from the same class will have high similarity measurements, whereas the two images from different classes will have very low similarity measurements.
In our experiments, we used less data for training, and a large amount of data for testing to simulate the situation in which a new or less common disease appears, as explained in
Section 4. The use of less data for training is more difficult than training deep learning models using a large amount of data because of the lack of information. Due to this problem, some level of overfitting in the classification models exists, which makes the testing result worse than the training result. As a result, the classification result decreases when more data are used for testing. As shown in
Table 4 and
Table 5, the number of training images are similar in both cases of BT-Small and BT-Large, but the number of testing images in BT-Large is much larger than that in BT-Small.
As shown in
Table 8,
Table 10 and
Table 13, our study produced higher performance than those of previous studies when working with small training data. While the errors produced by our proposed method are high (17% for the binary problem, and 44% for the ternary problem), they are much lower than that of a random classification system. In addition, we are working with a computer-aided diagnosis (CAD) system. Therefore, the results produced by our proposed method cannot be used directly in the diagnosing and treatment process. Instead, our proposed method provides suggestions for doctors in a double screening procedure. Therefore, it can be used to help enhance the disease diagnosis and treatment performance conducted by doctors.
BT-Small and BT-large datasets used in our experiments were public datasets which have been widely used for previous research. They have the image variations including some level of the mix of orientations and contrast changes. As these datasets were originally collected considering the MRI images of real application, we used these datasets including the image variations for our experiments.