Breast Cancer Histopathological Image Recognition Based on Pyramid Gray Level Co-Occurrence Matrix and Incremental Broad Learning

Li, Jia; Shi, Jingwen; Su, Hexing; Gao, Le

doi:10.3390/electronics11152322

Open AccessEssay

Breast Cancer Histopathological Image Recognition Based on Pyramid Gray Level Co-Occurrence Matrix and Incremental Broad Learning

School of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(15), 2322; https://doi.org/10.3390/electronics11152322

Submission received: 18 May 2022 / Revised: 9 July 2022 / Accepted: 12 July 2022 / Published: 26 July 2022

(This article belongs to the Section Bioelectronics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In order to recognize breast cancer histopathological images, this article proposed a combined model consisting of a pyramid gray level co-occurrence matrix (PGLCM) feature extraction model and an incremental broad learning (IBL) classification model. The PGLCM model is designed to extract the fusion features of breast cancer histopathological images, which can reflect the multiresolution useful information of the images and facilitate the improvement of the classification effect in the later stage. The IBL model is used to improve the classification accuracy by increasing the number of network enhancement nodes horizontally. Unlike deep neural networks, the IBL model compresses the training and testing time cost greatly by making full use of its single-hidden-layer structure. To our knowledge, it is the first attempt for the IBL model to be introduced into the breast cancer histopathological image recognition task. The experimental results in four magnifications of the BreaKHis dataset show that the accuracy of binary classification and eight-class classification outperforms the existing algorithms. The accuracy of binary classification reaches 91.45%, 90.17%, 90.90% and 90.73%, indicating the effectiveness of the established combined model and demonstrating the advantages in breast cancer histopathological image recognition.

Keywords:

breast cancer histopathological image; feature extraction; pyramid gray level co-occurrence matrix; incremental broad learning

1. Introduction

According to the latest global cancer data released by the International Agency for Research on Cancer (IARC) of the World Health Organization in 2021, breast cancer in women (11.7%) has surpassed lung cancer (11.4%) as the most common cancer in 2020 [1]. Early screening of breast cancer is fundamental to diagnosis, which can not only find hidden dangers in time, but also effectively improve the survival rate of the patient. Therefore, the development of an automatic recognition system based on breast cancer histopathological images is of great significance to help doctors improve diagnostic efficiency and save medical resources. Currently, with the continuous development of artificial intelligence, machine learning and its related technologies have been widely used in the identification and diagnosis of breast cancer in Computer-Aided Diagnosis (CAD) [2,3], which has reduced the workload of physicians vastly. However, the problems of providing correct and meaningful computer diagnostic results and improving the recognition of breast cancer have yet to be solved.

At present, CAD of breast cancer histopathological images can be divided into two categories: image recognition based on traditional artificial feature extraction and recognition based on deep learning feature extraction. For traditional artificial feature extraction methods, basic statistical features are mainly used to describe images. Common image features include texture features, spatial features and color features [4]. Among these, texture features are a quantitative form of the change of target image’s sharpness, contrast and intensity. Thus, the stability of image grayscale change, grayscale correlation of local area and groove depth can be represented. Existing texture feature extraction methods include wavelet transform, Gray Level Co-occurrence Matrix (GLCM), Local Binary Pattern (LBP) and Complete Local Binary Pattern (CLBP) descriptors [2,5,6,7] and Gaussian pyramid [8]. Meanwhile, feature extraction methods based on deep learning have been successfully applied to histopathological image recognition of breast cancer [3,9,10], such as the fine-tuned neural network Inception-V3 model [11], fine-tuning a set of CNN (AlexNet, VGGNet, ResNet, Inception-BN, etc.) models [12]. Furthermore, other scholars have proposed an integration model of various deep learning models; for example, multi-view deep residual neural network (mResNet) model [13], pre-trained convolutional neural network integration model (Adaptive VGG19, MobileNet and DenseNet) [14], multi-model (VGG16, Inception-V3 and Resnet-V2-152) [15]. In addition, some improved models based on neural network structure have emerged, such as a deep convolutional Neural network (CSDCNN) model based on class structure [16], end-to-end CNN [17] and improved Lenet-5 [18].

Through the analysis of the existing research results, it is found that the deep-learning-based breast cancer histopathological image recognition algorithm has significant advantages in accuracy, owing to its multi-hidden layer network structure. The deep features of data, which are conducive to the improvement of the recognition effect, can be automatically mined. However, the complex structure of the deep network makes the process of training deep neural network quite time-consuming. To solve the this problem, J.L. Chen et al. proposed a wide network structure based on random vector function linked neural network (RVFLNN) and single-layer feedforward neural network to replace the deep network structure, called the Board Learning System (BLS) [19]. At the same time, some derived networks based on BLS have been applied to image classification and arrhythmia detection [20]. These techniques provide some new insights into the design of a new generation of classifiers. Inspired by these researchers, this paper proposes a breast cancer histopathological image recognition method based on the pyramid gray co-occurrence matrix (PGLCM) and incremental generalized learning (IBL) model. The proposed method can describe the GLCM features of breast cancer histopathological images with multi-resolution by constructing a Gaussian pyramid model of breast cancer histopathological images. In addition, the established IBL classification model has high classification accuracy and real-time performance. The effectiveness of the proposed method was verified experimentally using the public dataset of the breast Cancer Histopathology Database (BreaKHis).

The remaining part of the paper proceeds as follows: Section 2 describes the dataset used for evaluation; Section 3 is concerned with the methodology used for this study; the experiments and results are presented in Section 4; Section 5 discusses the results of the proposed method, and Section 6 concludes the article.

2. Database

Histopathological examination of breast cancer is the “gold standard” for breast cancer diagnosis. Thus, the dataset used in this experiment is the BreaKHis dataset from the P&D laboratory—Pathological Anatomy and Cytopathology in Paraná, Brazil [2]—which consists of hematoxylin and Eosin (H&E)-stained histopathological images of breast cancer from 82 patients (24 benign and 58 malignant), with a total of 7909 images (3-channel RGB, 8-bit depth per channel, 700 × 460 pixels, PNG format), containing 5429 malignant tumor samples and 2480 benign tumor samples.

The amounts of the different types of data in the BreaKHis dataset are shown in Table 1, which contains images of benign tumors (Benign) and malignant tumors (Malignant) at four different magnifications (40X, 100X, 200X, 400X), with Benign including Adenosis (A), Fibroadenoma (F), Phyllodes Tumor (PT), and Tubular Adenoma (TA), while the Malignant include Ductal Carcinoma (DC), Lobular Carcinoma (LC), Mucinous Carcinoma (MC) and Papillary Carcinoma (PC), a total of eight types. Each breast cancer histopathological image file name stores information about the image itself: procedure biopsy method, tumor category, tumor type, patient identification, and magnification factor. Figure 1 shows the H&E-stained breast cancer histopathological images for these eight tumor types, A is Adenosis, F is Fibroadenoma, PT is Phyllodes Tumor, TA is Tubular Adenoma, DC is Ductal Carcinoma, LC is Lobular Carcinoma, MC is Mucinous Carcinoma, PC is Papillary Carcinoma.

3. Proposed Methodology

The breast cancer histopathological image recognition process based on the proposed PGLCM-IBL model is shown in Figure 2, including three stages: data preprocessing, feature extraction, and classification. First, the dataset is expanded and balanced preprocessed using data augmentation methods such as rotation and flip. Second, the PGLCM features of breast cancer histopathological images are extracted. Finally, the IBL classifier is used to achieve the recognition of breast cancer histopathological image types.

3.1. Data Preprocessing

Figure 3 shows the sample distribution of the BreaKHis dataset according to Table 1. From Figure 3, it can be seen that the data samples are unevenly distributed among the eight tumor types, especially the DC type is overrepresented. Therefore, in order to ensure the training effect and generalization performance of the designed model, the BreaKHis dataset was expanded and balanced using data augmentation methods such as rotation and flip. Figure 4 shows the tumor type A images after rotated and flipped. Table 2 shows the amounts of different types of data in the BreaKHis dataset using data augmentation.

3.2. Feature Extraction

Figure 5 shows the constructed PGLCM feature extraction model for breast cancer histopathological images. The specific feature extraction steps are: (1) Converting the breast cancer histopathological images into grayscale images; (2) Establishing a 5-layer Gaussian pyramid of breast cancer histopathological images; (3) Extracting 4 GLCM texture features, namely contrast, correlation, energy, and homogeneity, at multiresolution of breast cancer histopathological images in 0°, 45°, 90° and 135° directions with distances of 1–10; (4) GLCM features of histopathological images are feature fused to form pyramidal fusion features.

The following introduces the important parts involved in the feature extraction process.

3.2.1. Gaussian Pyramid

A pyramid of breast cancer histopathological images is a series of images arranged in a pyramid shape with decreasing resolution, and all from the same original image; each image in the collection is called a layer. The bottom layer of the pyramid is a high-resolution representation of the original image. The top layer is a low-resolution approximation of the original image. The stacked images are shaped like a “pyramid”. The higher the layer, the smaller the image and the lower the resolution, so the breast cancer histopathological image pyramid can better characterize the GLCM features of the image in a multiresolution manner.

The process of establishing Gaussian pyramid [21] is shown in Figure 6. Firstly, the image is smoothed (Gaussian blurring) using a Gaussian low-pass filter, and secondly, the image after the Gaussian smoothing process is down-sampled to obtain a series of sizes. From Figure 6, it can be seen that the image of the k-th layer of the Gaussian pyramid is obtained by Gaussian blurring and down-sampling to obtain the Gaussian image of the

(k + 1)

-th layer. The down-sampling is inter-row and inter-column down-sampling, which can remove all the even rows and columns to get the upper image with 1/4 of the pixels of the lower image. By repeatedly iterating these two steps, a complete Gaussian pyramid can be obtained.

We denote the k-th layer of the Gaussian pyramid as

G_{k} (x, y)

, then the structural expression of the Gaussian pyramid is shown in Equation (1):

G_{k} (x, y) = \sum_{x} \sum_{y} W (x, y) G_{k} (2 i + x, 2 j + y) (1 ⩽ k ⩽ M, 0 ⩽ x ⩽ C_{k}, 0 ⩽ y ⩽ R_{k}),

(1)

where M is the total number of layers of the Gaussian pyramid,

C_{k}

and

R_{k}

are the pixel columns and rows of the image at the k-th layer of the pyramid, respectively, and

W (x, y)

is a two-dimensional Gaussian function whose expression is shown in Equation (2):

W (x, y, σ) = \frac{1}{2 π σ^{2}} exp (- \frac{(x^{2} + y^{2})}{2 σ^{2}}),

(2)

Equation (1) completes the main process of image down-sampling and Gaussian blurring. A complete Gaussian pyramid model is composed of

G_{1}, G_{2}, \dots G_{M}

generated in the above steps, where

G_{1}

is the bottom layer of the original image, and

G_{M}

is the top layer of the Gaussian pyramid. This paper focuses on describing the situation when M = 5, that is, the 5-layer Gaussian pyramid of breast cancer histopathological images shown in Figure 5.

3.2.2. Gray Level Co-Occurrence Matrix and Its Common Features

The Gray Level Co-occurrence Matrix (GLCM) proposed by Haralick et al. [22] is a common method for describing texture-related information data between image pixel points. The texture features extracted by this method have good discriminative power. Historically, the GLCM has been been a classical algorithm for texture feature extraction, which analyzes the image texture information by calculating the similarity between grayscales between different pixel points at a specific distance and a specific direction. Figure 7 shows the specific definitions of the image in 0°, 45°, 90°, and 135° directions. Equations (3)–(6) describe the computation of the GLCM in 0°, 45°, 90°, and 135° directions, respectively.

P_{0^{°}, d} (a, b) = \{\begin{matrix} m - i = 0, n - j = d, f (i, j) = a \\ f (m, n) = b [(i, j), (m, n) \in I] \end{matrix}\},

(3)

P_{45^{°}, d} (a, b) = \{\begin{matrix} m - i = - d, n - j = d, f (i, j) = a \\ f (m, n) = b [(i, j), (m, n) \in I] \end{matrix}\},

(4)

P_{90^{°}, d} (a, b) = \{\begin{matrix} m - i = - d, n - j = 0, f (i, j) = a \\ , f (m, n) = b [(i, j), (m, n) \in I] \end{matrix}\},

(5)

P_{135^{°}, d} (a, b) = \{\begin{matrix} m - i = - d, n - j = - d, f (i, j) = a \\ , f (m, n) = b [(i, j), (m, n) \in I] \end{matrix}\},

(6)

Among them, the meaning of each parameter is: suppose that in an image I whose gray level is k, its GLCM size should be

p \times q

, from a certain pixel point

A (x, y)

with a gray value of a, a pixel point B (which has a gray value of b), which is d pixels away in the

θ

direction, the number of all pairs of pixels

(a, b)

with distance d and direction

θ

that appear in image I is the number of elements that constitute the GLCM.

Ulaby et al. [23] found that only four features, namely contrast, correlation, energy and homogeneity, are uncorrelated among the relevant texture features calculated based on the GLCM, which are easy to compute yet give high classification accuracy. Therefore, in this paper, the above 4 features are selected as the features of breast cancer histopathological images. The meaning and calculation equation of each of their texture feature values are as follows:

Contrast means that each element is located away from the main diagonal in the matrix of element distribution. The contrast value reflects the diversity of grayscale intensity contained in the sample breast cancer histopathological image. The clearer the texture, the greater the contrast. It is calculated using the Equation (7):

Contr = \sum_{i}^{L - 1} \sum_{j}^{L - 1} {(i - j)}^{2} P (i, j),

(7)

where

| i - j |

is the grayscale difference between adjacent pixels, and

p (i, j)

represents the

(i, j)

-th item in the normalized GLCM, which is called the distribution probability of different grayscale levels between adjacent pixels. L is the gray level of the breast cancer histopathological image.

Correlation represents a measure of how a pixel in the entire breast cancer histopathological image is related to its adjacent pixels, and its calculation uses the Equation (8):

Corrp = \frac{\sum_{i} \sum_{j} (i j) p (i, j) - μ_{x} μ_{y}}{σ_{x} σ_{y}},

(8)

Energy represents the uniformity of the gray distribution and the complexity of texture in the histopathological image of breast cancer. If the element values of the GLCM are similar, the energy is small, indicating that the texture is fine; if some of the values are large and others are small, it indicates that the energy value is larger. It is calculated using the Equation (9):

Energ = \sum_{i}^{L - 1} \sum_{j}^{L - 1} P {(i, j)}^{2},

(9)

Homogeneity refers to the tightness of the distribution of elements in GLCM to the diagonal line, which reflects the structural uniformity of gray levels in breast cancer histopathological images. Its calculation uses Equation (10):

Homop = \sum_{i}^{L - 1} \sum_{j}^{L - 1} \frac{1}{1 + {(i - j)}^{2}} P (i, j),

(10)

Figure 8 shows the contrast, correlation, energy and homogeneity feature maps of a GLCM at a distance of 2 in a breast cancer histopathological image at 90° direction.

3.3. Classifier

Broad Learning (BL) is a structure that can replace deep neural networks, and can be used to efficiently re-establish the network by incremental learning when the network needs to be extended. Compared with the traditional deep neural network model, BL has the advantages of fast training data and simple structure while ensuring certain accuracy. Based on the above characteristics, this paper introduces BL into the classification of breast cancer histopathological images for the first time. Figure 9 shows the structure of the Incremental Broad Learning (IBL) model based on the classification of breast cancer histopathological images. First, using the features mapped by the input feature vector as the “feature nodes” of the network. Next, the mapped features are transformed into “enhancement nodes” with randomly generated weights. Finally, all mapped feature nodes, enhancement nodes and addition enhancement nodes are directly connected to the output, and the corresponding output coefficients can be obtained by finding the pseudo-inverse of the corresponding mapping matrix. In this paper, the number of enhancement nodes is increased to improve the classification accuracy. The equation of enhancement nodes is:

H_{m + q} = ξ ([Z_{1}, Z_{2}, \dots, Z_{n}] W_{h_{m + q}} + β_{h_{m + q}}), q = 1, 2, \dots, k,

(11)

where q is the addition enhancement node, equal to k,

ξ

is the activation function,

[Z_{1}, Z_{2}, \dots, Z_{n}]

is the mapping feature of all the first n input data,

W_{h_{m + q}}

and

β_{h_{m + q}}

is randomly generated weights and biases for all enhancement nodes.

This neural network uses pseudo-inversion to calculate the weights of feature nodes and enhancement nodes instead of updating feature kernels with backpropagation, which greatly saves time in the training process. Moreover, the network is highly scalable because the reconfiguration can be achieved by incremental learning algorithms without training the entire network.

4. Experiments

In the experiment, the data-preprocessed breast cancer histopathological images were randomly divided into a test set (30%) and a training set (70%). The same PGLCM feature extraction operation was taken and the mechanism of model classification performance improvement by IBL network extension (adding network nodes) was explored. In this experiment, model training and parameter learning for the network are done in the training set, and improving the recognition ability of the test model is done in the test set.

In this thesis, the effectiveness of our proposed method is verified by the following three experiments: (1) verifying the binary classification and eight-class classification effects of the PGLCM-IBL model on the BreaKHis dataset; (2) verifying the effectiveness of introducing Gaussian pyramid using ablation experiments; (3) verifying the superiority of this method by comparison experiments. The hardware configuration used in this experiment is: CPU: Intel Core i5-10400, GPU: NVIDIA GeForce GTX 1650, 32 GB memory; software platform: MATLAB2018b; experiments run on Windows 10 64-bit operating system.

4.1. Experiments Based on PGLCM-IBL

This experiment uses the confusion matrix and the accuracy (Acc), sensitivity (Sen), specificity (Spe), positive prediction rate (Ppr) and negative prediction rate accuracy (Npr) as evaluation indexes The larger the value of Acc, Sen, Spe, Ppr and Npr, the closer the prediction of the model and the real situation, the better the model performance, and their expressions are:

A c c = \frac{T P + T N}{T P + T N + F P + F N},

(12)

S e n = \frac{T P}{T P + F N},

(13)

S p e = \frac{T N}{T N + F P},

(14)

P p r = \frac{T P}{T P + F P},

(15)

N p r = \frac{T N}{T N + F N},

(16)

Among them,

T P

represents true positives,

F N

represents false negatives,

F P

represents false positives, and

T N

represents true negatives.

Through several experiments, it is verified that, when the number of feature node windows is 200, the number of feature nodes in each window is 100, the number of enhancement nodes is 800, and the number of addition enhancement nodes reaches 10,000, a better recognition effect and a higher real-time performance can be obtained. Figure 10 shows the confusion matrix of the binary classification results of breast cancer histopathological images based on Pthe GLCM-IBL model. As can be seen from Figure 10, in the binary classification experimental test data, regardless of the magnification,

T P = 3521

,

F N = 630

,

F P = 481

,

T N = 3741

,

A c c = 86.73 %

,

S e n = 84.82 %

,

S p e = 88.61 %

,

P p r = 87.98 %

,

N p r = 85.59 %

. In addition, the green diagonal shows the number of correctly classified images (42.05% for

T P

as a percentage of all test data and 50.42% for

T N

as a percentage of all test data), while the other pink data show the number of images with incorrect network judgments (5.74% for

F P

as a percentage of all test data and 7.52% for

F N

as a percentage of all test data). The confusion matrix edge datas are

S e n

,

S p e

,

P p r

and

N p r

calculated from the test data, for example, take malignant tumors as an example: there are 4151 images in the test set, the number of images correctly classified by the network is 3521, and the number of images incorrectly classified as benign tumors is 630. As can be seen from Figure 10, the accuracy of benign tumor recognition is higher than that of malignant tumor recognition.

Figure 11 shows the confusion matrix of the eight-class classification results of breast cancer histopathological images based on the PGLCM-IBL model. As can be seen from Figure 11, in the eight-class classification experimental test data, regardless of the magnification, the number of correctly classified images is shown on the green diagonal line, while the other pink data show the number of images misjudged by the network. Take A (Adenosis) as an example: there are 1066 images in the test set, the number of images correctly classified by the network is 1013, the number of images incorrectly classified as F (Fibroadenoma) is 11, the number of images incorrectly classified as PT (Phyllodes Tumor) is 10, and so on. By analogy, we can know the number of images incorrectly classified as TA (Tubular Adenoma), DC (Ductal Carcinoma), LC (Lobular Carcinoma), MC (Mucinous Carcinoma) and PC (Papillary Carcinoma) by A, and then

S e n_{c l a s s A} = 95.03 %

,

S p e_{c l a s s A} = 97.46 %

,

P p r_{c l a s s A} = 84.49 %

,

N p r_{c l a s s A} = 99.26 %

. Similarly, the remaining seven types of results can be found, as shown in Table 3, yielding a final Acc of 83.43%. As shown in Figure 11, the recognition accuracy of A was the highest and the recognition accuracy of DC (Ductal Carcinoma) was the lowest, in which DC was incorrectly recognized as LC (Lobular Carcinoma) the most, indicating that DC and LC tumors have some similarity, and the number of DC incorrectly recognized as other tumors was also higher compared with other tumors, indicating that DC tumors have some complexity difficult to identify.

4.2. Ablation Study

To verify the effectiveness of introducing the Gaussian pyramid, ablation experiments were conducted to compare the performance of PGLCM-IBL and GLCM-IBL for binary classification and eight-class classification of breast cancer histopathological images. Figure 12 shows the experimental results of PGLCM-IBL and GLCM-IBL for binary classification accuracy and eight-class classification accuracy of breast cancer histopathological images, with red representing the curve of GLCM-IBL classification accuracy with the number of addition enhancement nodes and blue representing the curve of PGLCM-IBL classification accuracy with the number of addition enhancement nodes. As can be seen from Figure 12, the classification accuracy of PGLCM-IBL is higher than that of GLCM-IBL, and the classification accuracy of PGLCM-IBL peaks when the number of IBL addition enhancement nodes is 10,000, where the accuracy of binary classification is 86.73%, which is 6.03% higher than that of GLCM-IBL, and the total sum test time was only 32.94 s. Then, every 5000 enhancement nodes enter saturation state, in which the classification accuracy is decreasing; the accuracy of eight-class classification is 83.43%, which is 11.21% higher than that of GLCM-IBL, and the total sum test time is only 21.32 s.

The experimental results in Figure 12 show the effectiveness of introducing the Gaussian pyramid for breast cancer histopathological image recognition. In addition, it shows that the PGLCM-IBL model has high real-time performance.

4.3. Comparative Experiment

In order to evaluate the performance of PGLCM-IBL for eight-class classification and binary classification of breast cancer histopathological images under specific magnification conditions, comparative experiments of different classification algorithms were conducted. For the ablation experiments, the number of addition enhancement nodes by IBL was chosen to be 10,000 for the experiments, and Table 4 shows the results of eight-class classification and binary classification experiments of different models for BreaKHis dataset under specific magnification conditions.

In eight-class classification experiments related to magnification on the BreaKHis dataset, the literature [24] compared the eight-class classification accuracy of GoogLeNet, ResNet50 and the Inception-ResNet-V2 model proposed in the literature [24], the Inception-ResNet-V2 model, compared to the GoogLeNet and ResNet50 model. Except for the unsatisfactory results at 400X, it has better results in 40X–200X and can achieve 86.7% accuracy; the literature [25] used CNN for eight-class classification of breast cancer histopathological images and achieved 84–88.2% accuracy; the literature [26] compared the eight-class classification accuracy of PFTAS + QDA, PFTAS + SVM, PFTAS + RF and Single-Task CNN models, where PFTAS + QDA achieved 82.15–83.37% accuracy, PFTAS + SVM achieved 79.70–85.30% accuracy, PFTAS + RF achieved 81.20–84.40% accuracy, and Single-Task CNN with independent magnification achieved 83.08–85.67% accuracy; the PGLCM-IBL model proposed in this thesis achieved 85.52–89.49% accuracy, and compared with all the above models, the highest accuracy was achieved for all four magnifications, 89.49%, 85.52%, 85.85% and 88.54%, respectively.

In binary classification experiments related to magnification on the BreaKHis dataset, the literature [9] used a CNN model for binary classification of breast cancer histopathological images and achieved 80.8–89.6% accuracy; the literature [27] extracted DeCAF features of breast cancer histopathological images for binary classification and achieved 81.6–84.8% accuracy; the literature [3] proposed Single Task CNN based on magnification independence achieved 82.1–84.6% accuracy; the PGLCM-IBL model proposed in this thesis achieved 90.17–91.45% accuracy, and compared with all the above models, the highest accuracy was achieved in all four magnifications, namely: 91.45%, 90.17%, 90.90% and 90.73%. This shows that the magnification has a great influence on the experimental results. In general, the higher the magnification of medical images, the larger the cellular tissue and the more difficult it is to distinguish the internal environment, leading to worse results.

5. Discussion

The results of experiments (1) and (2) in Section 4.2 show that the PGLCM-IBL model is an effective method for breast cancer histopathological image recognition. Compared with the GLCM-IBL model, the PGLCM-IBL model embedded with Gaussian pyramid technique has higher classification accuracy and better real-time performance. It is shown through experiments that the PGLCM feature extraction method can extract GLCM features at multiresolution of breast cancer histopathological images with better feature expression capability. The results of experiment (3) verified the superiority of the PGLCM-IBL method compared to other methods. From Section 4.3, the proposed method achieves significantly better accuracy than other algorithms on both eight-class classification and binary classification experiments related to magnification. This means that the PGLCM features extracted from breast cancer histopathological images can help characterize different tumors under specific magnification conditions. Furthermore, the samples in Table 2 contain new data for model training and testing, indicating that our algorithm has good generalization ability to “fresh data”. Meanwhile, the IBL classifier constructed in this study shows high real-time performance, which provides a technical reference for doctors to improve the efficiency of diagnosis and save medical resources. Moreover, in order to further verify the effectiveness of PGLCM feature extraction method, we also added the breast cancer histopathological image recognition experiment based on PGLCM and random forest (RF) in the follow-up work beyond this paper. When the number and depth of random forest decision trees are 200 and 20 respectively, the classification accuracy of the 8 categories can reach 92.46%, indicating that the proposed PGLCM feature extraction method is beneficial to improve the recognition effect.

It is worth emphasizing that in this paper, at least 10 independent experiments were conducted for each classification algorithm carried out in order to avoid the effect of randomness, and then the average of the multiple classification accuracies was taken as the average accuracy. Meanwhile, in order to further assess the credibility of the classification results, the standard deviation of each classification algorithm in the conclusion between different independent experiments is counted in this paper, and the result obtained is that the standard deviation of each algorithm is within

\pm 3.1 \times 10^{- 2}

, which indicates the credibility of the experimental results. However, the proposed method also has some shortcomings, which are mainly reflected in the following two aspects: (1) In order to facilitate the comparative experiments with other existing algorithms, the same BreaKHis dataset is adopted in this paper in terms of sample set selection, and does not verify on other datasets; (2) Limited by hardware conditions, the influence trend of IBL enhanced node number on the improvement of classification accuracy has not been further discussed.

6. Conclusions

In this paper, we propose a breast cancer histopathological image recognition algorithm based on PGLCM-IBL and evaluate its performance on the BreaKHis dataset. The PGLCM feature extraction model improves the scale of feature extraction and feature representation by introducing Gaussian pyramid technique, and the IBL model improves classification accuracy by increasing the number of enhancement nodes. Among them, the IBL model’s simple single hidden layer structure greatly reduces the training and testing time, and can find a good balance between classification accuracy and time cost. Furthermore, data augmentation operations on the BreaKHis dataset, including data expansion and balancing, in the data preprocessing stage also played a role in improving the generalization ability of the model.

Author Contributions

Conceptualization and Formal analysis—J.L. and J.S., Data curation—J.S. and H.S., Writing—Original Draft J.S. and J.L., Writing—Edit and Review, J.L. and L.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Wuyi University Hong Kong Macao Joint Research and Development under Grant 2019WGALH23, and by the Science and Technology Project of Basic and Theoretical Scientific Research in Jiangmen City under Grant 2021030102120004848.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/ (accessed on 17 May 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 2015, 63, 1455–1462. [Google Scholar] [CrossRef] [PubMed]
Bayramoglu, N.; Kannala, J.; Heikkilä, J. Deep learning for magnification independent breast cancer histopathology image classification. In Proceedings of the 2016 23rd International conference on pattern recognition (ICPR), Cancún, Mexico, 4–8 December 2016; pp. 2440–2445. [Google Scholar]
Xiaojun, B.; Tiewen, P. Memetic Algorithm Based on Content-Based lmage Retrieval. Microcomput. Inf. 2010, 26, 25–27. [Google Scholar]
Zhang, Y.; Zhang, B.; Lu, W. Breast cancer histological image classification with multiple features and random subspace classifier ensemble. In Knowledge-Based Systems in Biomedicine and Computational Life Science; Springer: Berlin/Heidelberg, Germany, 2013; pp. 27–42. [Google Scholar]
Reis, S.; Gazinska, P.; Hipwell, J.H.; Mertzanidou, T.; Naidoo, K.; Williams, N.; Pinder, S.; Hawkes, D.J. Automated classification of breast cancer stroma maturity from histological images. IEEE Trans. Biomed. Eng. 2017, 64, 2344–2352. [Google Scholar] [CrossRef] [PubMed]
Aksebzeci, B.H.; Kayaalti, Ö. Computer-aided classification of breast cancer histopathological images. In Proceedings of the 2017 Medical Technologies National Congress (TIPTEKNO), Trabzon, Turkey, 12–14 October 2017; pp. 1–4. [Google Scholar]
Yurong, H.; Lan, D.; Jiuzhen, L. Fusion of Gaussian Image Pyramid Feature for Low-resolution Face Recognition. J. Chin. Comput. Syst. 2021, 42, 2107–2115. [Google Scholar]
Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. Breast cancer histopathological image classification using convolutional neural networks. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 2560–2567. [Google Scholar]
Akbar, S.; Peikari, M.; Salama, S.; Nofech-Mozes, S.; Martel, A. Transitioning between convolutional and fully connected layers in neural networks. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Berlin/Heidelberg, Germany, 2017; pp. 143–150. [Google Scholar]
Agarwal, R.; Diaz, O.; Lladó, X.; Yap, M.H.; Martí, R. Automatic mass detection in mammograms using deep convolutional neural networks. J. Med. Imaging 2019, 6, 031409. [Google Scholar] [CrossRef] [PubMed]
Tsochatzidis, L.; Costaridou, L.; Pratikakis, I. Deep learning for breast cancer diagnosis from mammograms—A comparative study. J. Imaging 2019, 5, 37. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Dhungel, N.; Carneiro, G.; Bradley, A.P. Fully automated classification of mammograms using deep residual neural networks. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, Australia, 18–21 April 2017; pp. 310–314. [Google Scholar]
Kassani, S.H.; Kassani, P.H.; Wesolowski, M.J.; Schneider, K.A.; Deters, R. Classification of histopathological biopsy images using ensemble of deep learning networks. arXiv 2019, arXiv:1909.11870. [Google Scholar]
Fuquan, S.; Chenglong, C.; Kun, Z.; Chaoran, K.; Yushan, J.; Yunhui, D. Benign and Malignant Diagnosis of Breast Cancer Histopathological lmage Based on Multimodel Neural Network. J. Chin. Comput. Syst. 2020, 41, 732–735. [Google Scholar]
Han, Z.; Wei, B.; Zheng, Y.; Yin, Y.; Li, K.; Li, S. Breast cancer multi-classification from histopathological images with structured deep learning model. Sci. Rep. 2017, 7, 4172. [Google Scholar] [CrossRef] [PubMed]
Akogo, D.A.; Palmer, X.L. End-to-end learning via a convolutional neural network for cancer cell line classification. J. -Ind.-Univ. Collab. 2019, 1, 17–23. [Google Scholar] [CrossRef] [Green Version]
Xiaoling, Y.; Zhenqi, W.; Jia, L. Research Cancer Cell Recognition System Based on Deep Neural Network. Softw. Guide 2020, 19, 65–68. [Google Scholar]
Chen, C.P.; Liu, Z. Broad learning system: An effective and efficient incremental learning system without the need for deep architecture. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 10–24. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Zhang, Y.; Gao, L.; Li, X. Arrhythmia Classification Using Biased Dropout and Morphology-Rhythm Feature With Incremental Broad Learning. IEEE Access 2021, 9, 66132–66140. [Google Scholar] [CrossRef]
Wang Lei, S.Y. Fast bilateral filtering using exponential functions. Electron. Lett. 2017, 53, 237–239. [Google Scholar] [CrossRef]
Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man. Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef] [Green Version]
Ulaby, F.T.; Kouyate, F.; Brisco, B.; Williams, T.L. Textural infornation in SAR images. IEEE Trans. Geosci. Remote Sens. 1986, GE-24, 235–245. [Google Scholar] [CrossRef]
Jingwen, L.; Lican, H. Pathological Image Recognition of Breast Cancer Based on Inception-ResNet-V2. Softw. Guide 2020, 19, 225–229. [Google Scholar]
Bardou, D.; Zhang, K.; Ahmad, S.M. Classification of breast cancer based on histology images using convolutional neural networks. IEEE Access 2018, 6, 24680–24693. [Google Scholar] [CrossRef]
Yu, L.; Ziqiang, S. Recognition algorithm of breast pathological images based on convolutional neural network. J. Jiangsu Univ. Sci. Ed. 2019, 40, 573–578. [Google Scholar]
Spanhol, F.A.; Oliveira, L.S.; Cavalin, P.R.; Petitjean, C.; Heutte, L. Deep features for breast cancer histopathological image classification. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 1868–1873. [Google Scholar]

Figure 1. H&E-stained breast cancer histopathological images of 8 tumor types. (a) A; (b) F; (c) PT; (d) TA; (e) DC; (f) LC; (g) MC; (h) PC.

Figure 2. Breast cancer histopathological image recognition process.

Figure 3. Sample distribution of the BreaKHis dataset.

Figure 4. Data augmentation. (a) Original Image; (b) Flip Horizontal; (c) Flip Vertical; (d) Rotate 90° Clockwise; (e) Rotate 180° Clockwise; (f) Rotate 270° Clockwise.

Figure 5. Feature extraction model of breast cancer histopathological images based on PGLCM. (a) Converting the breast cancer histopathological images into grayscale images; (b) Establishing a 5-layer Gaussian pyramid of breast cancer histopathological images; (c) Extracting 4 GLCM texture features, namely contrast, correlation, energy and homogeneity, at multiresolution of breast cancer histopathological images in 0°, 45°, 90° and 135° directions with distances of 1–10; (d) GLCM features of histopathological images are feature fused to form pyramidal fusion features.

Figure 6. The process of establishing Gaussian pyramid.

Figure 7. Common directions of GLCM. (a)

θ = 0^{°}

; (b)

θ = 45^{°}

; (c)

θ = 90^{°}

; (d)

θ = 135^{°}

.

Figure 7. Common directions of GLCM. (a)

θ = 0^{°}

; (b)

θ = 45^{°}

; (c)

θ = 90^{°}

; (d)

θ = 135^{°}

.

Figure 8. Feature map of breast cancer histopathological images. (a) Original Image; (b) Contrast Feature; (c) Correlation Feature; (d) Energy Feature; (e) Homogeneity Feature.

Figure 9. Classification structure of breast cancer histopathological images based on IBL model.

Figure 10. Confusion Matrix of the binary classification results.

Figure 11. Confusion Matrix of the eight-class classification results.

Figure 12. PGLCM-IBL and GLCM-IBL classification results of breast cancer histopathological images. (a) Binary classification results; (b) Eight-class classification results.

Table 1. The amounts of different types of data in the BreaKHis dataset.

Magnification	Benign				Malignant				Total
Magnification	A	F	PT	TA	DC	LC	MC	PC	Total
40X	114	253	109	149	864	156	205	145	1995
100X	113	260	121	150	903	170	222	142	2081
200X	111	264	108	140	896	163	196	135	2013
400X	106	237	115	130	788	137	169	138	1820
Total	444	1014	453	569	3451	626	792	560	7909
Patients	4	10	3	7	38	5	9	6	82

Table 2. The amounts of different types of data in the BreaKHis dataset using data augmentation.

	A	F	PT	TA	DC	LC	MC	PC	Total
Magnification	A	F	PT	TA	DC	LC	MC	PC	Total
40X	912	856	872	894	864	884	905	870	7057
100X	904	891	968	900	903	956	981	851	7354
200X	888	913	864	840	896	918	864	810	6993
400X	848	821	920	780	788	772	746	828	6503
Total	3552	3481	3624	3414	3451	3530	3496	3359	27,907

Table 3. Evaluation indexes report of the eight-class classification.

Class	Sen	Spe	Ppr	Npr
A	95.03%	97.46%	84.49%	99.26%
F	81.15%	97.01%	79.48%	97.30%
PT	95.31%	97.90%	87.14%	99.29%
TA	92.20%	98.61%	90.26%	98.91%
DC	46.72%	96.92%	68.17%	92.80%
LC	82.44%	97.46%	82.44%	97.46%
MC	81.12%	97.28%	81.05%	97.30%
PC	92.96%	98.41%	88.90%	99.03%

Table 4. Comparison of accuracy rate of different classification algorithms.

Category	Model	Magnification
Category	Model	40X	100X	200X	400X
Eight-class Classification	GoogLeNet [24]	68.7	65.9	69.1	62.8
	ResNet50 [24]	82.5	78.8	84.3	81
	Inception-ResNet-V2 [24]	86.7	80.3	83.5	68.5
	CNN [25]	88.2	84.6	83.3	84
	PFTAS + QDA [26]	82.70	82.15	83.37	82.40
	PFTAS + SVM [26]	81.65	79.70	85.30	82.30
	PFTAS + RF [26]	81.70	82.60	84.40	81.20
	Single-Task CNN [26]	83.08	84.15	85.67	83.10
	Proposed method	89.49	85.52	85.85	88.54
Binary Classification	CNN [9]	89.6	85	84	80.8
	DeCAF features using CNN [27]	84.6	84.8	84.2	81.6
	Single Task CNN [3]	83	83.1	84.6	82.1
	Proposed method	91.45	90.17	90.90	90.73

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Shi, J.; Su, H.; Gao, L. Breast Cancer Histopathological Image Recognition Based on Pyramid Gray Level Co-Occurrence Matrix and Incremental Broad Learning. Electronics 2022, 11, 2322. https://doi.org/10.3390/electronics11152322

AMA Style

Li J, Shi J, Su H, Gao L. Breast Cancer Histopathological Image Recognition Based on Pyramid Gray Level Co-Occurrence Matrix and Incremental Broad Learning. Electronics. 2022; 11(15):2322. https://doi.org/10.3390/electronics11152322

Chicago/Turabian Style

Li, Jia, Jingwen Shi, Hexing Su, and Le Gao. 2022. "Breast Cancer Histopathological Image Recognition Based on Pyramid Gray Level Co-Occurrence Matrix and Incremental Broad Learning" Electronics 11, no. 15: 2322. https://doi.org/10.3390/electronics11152322

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Breast Cancer Histopathological Image Recognition Based on Pyramid Gray Level Co-Occurrence Matrix and Incremental Broad Learning

Abstract

1. Introduction

2. Database

3. Proposed Methodology

3.1. Data Preprocessing

3.2. Feature Extraction

3.2.1. Gaussian Pyramid

3.2.2. Gray Level Co-Occurrence Matrix and Its Common Features

3.3. Classifier

4. Experiments

4.1. Experiments Based on PGLCM-IBL

4.2. Ablation Study

4.3. Comparative Experiment

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI