Next Article in Journal
The Structure of Local Rings with Singleton Basis and Their Enumeration
Previous Article in Journal
A Class of Fibonacci Matrices, Graphs, and Games
Previous Article in Special Issue
A Topological Machine Learning Pipeline for Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Persistent Homology for Breast Tumor Classification Using Mammogram Scans

1
School of Computing, The University of Buckingham, Buckingham MK18 1EG, UK
2
Independent Researcher, North York, ON M2R 1G4, Canada
3
Department of Computer Science and IT, Salahaddin University, Erbil 44001, Iraq
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(21), 4039; https://doi.org/10.3390/math10214039
Submission received: 30 September 2022 / Revised: 23 October 2022 / Accepted: 27 October 2022 / Published: 31 October 2022

Abstract

:
An important tool in the field of topological data analysis is persistent homology (PH), which is used to encode abstract representations of the homology of data at different resolutions in the form of persistence barcode (PB). Normally, one will obtain one PB from a digital image when using a sublevel-set filtration method. In this work, we built more than one PB representation of a single image based on a landmark selection method, known as local binary patterns (LBP), which encode different types of local texture from a digital image. Starting from the top-left corner of any 3-by-3 patch selected from an input image, the LBP process starts by subtracting the central pixel value from its eight neighboring pixel values. Then, each cell is assigned with 1 if the subtraction outcome is positive, and 0 otherwise, to obtain an 8-bit binary representation. This process will identify a set of landmark pixels to represent 0-simplices and use Vietoris–Rips filtration to obtain its corresponding PB. Using LBP, we can construct up to 56 PBs from a single image if we restrict to only using the binary codes that have two circular transitions between 1 and 0. The information within these 56 PBs contain detailed local and global topological and geometrical information, which can be used to design effective machine learning models. We used four different PB vectorizations, namely, persistence landscapes, persistence images, Betti curves (barcode binning), and PB statistics. We tested the effectiveness of the proposed landmark-based PH on two publicly available breast abnormality detection datasets using mammogram scans. The sensitivity and specificity of the landmark-based PH obtained was over 90% and 85%, respectively, in both datasets for the detection of abnormal breast scans. Finally, the experimental results provide new insights on using different PB vectorizations with sublevel set filtrations and landmark-based Vietoris–Rips filtration from digital mammogram scans.

1. Introduction

Topological data analysis (TDA) is a collection of methods from algebraic topology and geometry to build and extract topological features from data. Persistent homology (PH), the main tool of TDA, extracts topological summaries from data in the form of connected components, loops, and cavities using a process known as filtration, which relies on a nested sequence of simplicial complexes that capture the birth and death of those topological invariants [1]. A collection of births and deaths of these topological features are then represented as points in persistence diagram(s) (PD) or equivalently as bars in persistence barcode(s) (PB). Topological structures represented as PDs are stable with respect to small perturbations to the input data when the bottleneck or Wasserstein distance is used to compare PDs [2]. Although mostly used when the input data have the form of a point cloud, PH can also be used when the input data to TDA pipeline are images where they have a grid structure. We demonstrate that one can construct Vietoris–Rips filtration from digital images based on pixel landmark locations that convey different types of local textural information. In this paper, we aimed at harnessing the power of PH to differentiate benign breast tumors from their malignant counterparts using breast mammogram. A mammogram scan is a special type of X-ray imaging that involves exposing breast tissues to a small amount of radiation to obtain an inside picture of the breast details for the purpose of abnormality/mass detection and classification.
Female breast cancer is among the four leading types of cancer in women worldwide. The World Health Organization (WHO) and its cancer research agencies such as the American Cancer Society and International Agency for Cancer Research reported 19.3 million new cases of cancer in 2020 with 10 million deaths and estimated that this number could be increased to 28.4 million new cases by 2040 [3]. Mammogram scans have a number of advantages to detect early signs of breast cancer in women, among them being their wide deployment in hospitals, their ease of storage, less time to examine by radiologists, and low cost. A number of difficulties face radiologists in properly examining mammograms such as low resolution, size of the lesion within the breast tissue, location of the lesion, and dense breast tissue in young patients. Therefore, designing sophisticated computer aided diagnostics (CAD) to assist radiologists in making their final decision is a necessity.
The main contribution of this paper can be summarized as follows. (1) Constructing 56 persistence diagrams from a single mammogram whereby each PD is constructed based on a set of automatically extracted mammogram pixel locations that convey different type of textural information. (2) The space of persistence barcodes featurized using four different methods, namely binning, barcode-statistics, persistence images, and persistence landscapes to measure the true performance of the proposed approach.

2. Methods

To build PH from digital mammograms, we relied on pixel-based landmarks that correspond to abnormality in textures. We derived our approach from a texture descriptor method known as local binary patterns (LBPs) introduced more than two decades ago in [4]. Abnormality is expected to distort the local texture and structure in mammogram scans. Using LBP, we encoded this change in the local texture and structure of mammograms to ensemble a set of point clouds as input to the PH pipeline. This method provides a rich source of persistence topological features for machine learning. Next, we describe our proposed landmark selection procedure and PH construction.

2.1. Image Patch Local Binary Patterns (IP-LBPs)

Since 1996, LBP has been used successfully in many pattern recognition applications and different versions of LBP have been proposed and investigated with considerable success [5,6,7]. For any grayscale image I , LBP constructs a new grayscale image I ¯ by encoding each pixel   p I with 8-bit binary representation determined by comparing the central pixel with that of its eight neighbors in a 3-by-3 image-patch, surrounding it in a clockwise manner. Starting from the top-left corner of any 3-by-3 patch, the LBP process starts by subtracting the central pixel value from its eight neighboring pixel values. Then, each cell is assigned with 1 if the subtraction outcome is positive, and 0 otherwise (see Figure 1 for illustration). This process results in an 8-bit binary code that can then be converted back to decimal values representing the central pixel ( x c , y c ) using the following equation:
L B P ( x c , y c ) = i = 1 i = 8 f   ( p i p c ) × 2 i
where p c is the central pixel value; p i is the neighboring gray-value pixels; and the function f ( x ) is defined as follows:
f ( x ) = { 1    i f   x 0 0    i f   x < 0  
In total, there are 256 binary codes that one will obtain for any 3-by-3 image patch following the LBP procedure. In [8], Ojala et al. demonstrated that only 58 binary codes out of the 256 were enough to represent 90% of textures in natural images. The 58 binary codes are known as uniform LBP (ULBP) and they experimentally demonstrated that the histogram of ULBP codes can be used as a discriminating feature for computer vision applications [7,9]. ULBP codes encode local texture features such as edges, corners, spots, lines, and flat regions in an image and their binary codes have either 0 or 2 circular transitions from 0 to 1 or from 1 to 0. There are 56 ULBP codes that have two circular transitions and only two ULBP codes with 0 circular transitions in their 8-bit binary representation. 00000000 and 11111111 are the two ULBP codes with 0 transitions. Examples of ULBP codes with two circular transitions are 11000000 and 00111100, whereas a binary code such as 10101010 is not a ULBP because there are more than two circular transitions from 1 to 0 or vice versa. We can group the 56 ULBP codes according to the number of 1’s in their binary representation to form a 7-group geometry G λ for λ = 1 , 2 , , 7 where λ refers to the number of 1’s in each geometry. Furthermore, each G λ consists of eight binary codes that can be obtained from each other by a circular rotation (see Figure 2). Starting from the top-left corner of the mammograms, we scanned the entire input image by selecting the central pixel value of 3-by-3 patches as landmarks if its binary representation satisfied one of the geometrical circles in Figure 2.
We can select one or more of these geometries to select landmarks from digital mammograms to construct PD. Different rows correspond to different types of texture. For example, the first and the last row correspond to flat and spot texture, row 5 (4 ones in the binary code) corresponds to edges and row 6 corresponds to corners.
For example, candidate landmarks are central pixel positions of the first rotation of G 1 ( R 1 ) if a 3-by-3 patch’s binary code is 00000001 where R ξ refers to rotations of specific G λ for ξ = 1 ,   2 , ,   8 . We followed the same strategy to select a set of pixel value locations for each of the 56 ULBP geometries depicted in Figure 2.
The first two stages of Figure 3 show an example of a set of landmark pixel locations extracted from a normal and abnormal mammogram and their corresponding PDs. After selecting a set of mammogram pixel landmarks, we generated a Euclidean distance matrix D from these pixel value locations, which will then be used as input for the PH generation pipeline.

2.2. Persistent Homology of Digital Images

In this section, we introduce Vietoris–Rips simplicial complexes and cubical complexes as two persistent homology approaches to build topological features from breast mammograms.

2.2.1. Vietoris–Rips Complexes Based on Image Pixel Landmarks

In order to build topological features from data (point cloud or image), the PH relies on mathematical objects known as simplices, which are building blocks of higher dimensional objects in space known as the simplicial complex. In this work, we constructed Vietoris–Rips ( V R ) simplicial complexes using the pixel-value locations obtained from the ULBP method. For a set of pixel landmarks in 2 , its V R with parameter ϵ , denoted as V R ( , ϵ ) , is the simplicial complex where { l 0 , l 1 , l 2 , , l η } is its vertex set, which spans a η -simplex if the Euclidean distance between any two landmark locations is less than or equal to the chosen value of ϵ (i.e., d ( l i , l j ) ϵ     0 i , j η ). As we increase the value of ϵ , so does the V R of the pixel locations. This process results in producing a nested sequence of V R simplicial complexes known as filtration. In other words, V R ( , ϵ 1 ) V R ( , ϵ 2 ) if ϵ 1 ϵ 2 . Homological features born and vanished during the filtration process are then stored as points in PD.
We direct the interested reader to see [1,10,11] for more mathematical details on V R construction from a point cloud, PH introduction, and mathematical backgrounds, respectively.

2.2.2. Cubical Complexes of Digital Images

The cubical analogue of a (VR) simplicial complex is a cubical complex in which the role of simplices is played by cubes of different dimensions, as shown in Figure 4. A finite cubical complex in R d is a union of cubes aligned on the grid d , satisfying some conditions similar to the simplicial complex case [12]. A d -dimensional image is a map η : I d . An element v I is called the voxel, or a pixel when d = 2 , and η ( v ) is called its greyscale value. There are several ways to represent digital images as cubical complexes, but the greyscale image comes with a natural filtration and was hence adopted here. Voxels are represented by vertices and cubes are built between these vertices. We represent voxels by d -cubes and all of its adjacent lower dimensional cubes are added. Next, we obtained a function on the resulting cubical complex K by extending the values of voxels to all of the cubes σ K as follows:
η ( σ ) = min σ   f a c e   o f   τ η ( τ )
Assume K to be the resulting cubical complex built on the greyscale image I . Let
K i = { σ K   | η ( σ ) i   }
be the i -th sublevel set of K . The set { K i } i I m ( I ) defines a filtration of cubical complexes indexed by the value of greyscale function η .

2.3. Persistence Diagram Vectorization

Topological features summarized by PDs are not amenable to many machine learning and statistical tasks; for instance, PD’s Fréchet mean is not unique [13]. Hence, many vectorization approaches have been proposed to transform the data in PDs to resolve this issue and be able to apply machine learning methods. We used four methods to vectorize the topological features in PD: persistence images [14], persistence landscapes [15], binning [16], and barcode statistics. Next, we briefly describe each of these vectorization approaches.
Persistence landscape (PL). PL is one of the early vectorization methods proposed to map PDs into a stable and invertible function space using a family of piecewise linear functions { Ψ k : } k so that Ψ k ( τ ) = sup { m 0   |   α τ m ,   τ + m   k } , where α i , j = # { P = ( p 1 , p 2 ) P D   |   p 1 i j p 2 } . More details of this method can be seen in [15]. Restricting these functions to a closed interval of ( a , b ) and choosing a uniform discretization will result in a 2-dimensional feature vector suitable for machine learning classifiers. In this paper, we set k = 100 to use the 100 largest such functions in our analysis of mammogram classification.
Persistence image (PI). PI is one of the popular vectorization methods used to transform the topological information contained in PDs into a vector. To construct PI, first rotate PD by π / 4 then turn the rotated PD into a persistent surface via Φ : 2 and a Gaussian distribution Φ μ so that Φ ( PD ) = μ P D w ( μ )   Φ μ ( z ) , where w is a piecewise linear weight function. Finally, the persistent surface Φ is discretized by taking samples over a regular grid.
Persistence binning (P-binning). This approach is one of the simple vectorization methods that relies on counting the number of bars in PBs that intersect with each vertical line V = 0 , 1 , 2 , , ω . In this paper, we set ω = 30 equidistance vertical lines. Thus, a topological feature vector of size ω was obtained for different dimensions of PBs.
Barcode statistics (P-statistics). The simplest approach to vectorize the space of PBs is to extract statistics directly from PBs. We collected only 10 statistics: average and standard deviation of birth, death and lifespan of bars, median of birth, death and lifespan of bars, and finally the number of bars. The statistics of birth of topological features in the dimension zero of PBs can be ignored as they returned zero by default.

3. Dataset Description and Evaluation Scheme

Two widely used mammogram databases were utilized to test the performance of landmark-based PH for mammogram abnormality classification, which are publicly available. The two datasets are known as Digital Database for Screening Mammogram (DDSM) [17] and Mini Mammographic Image Analysis Society (Mini-MIAS) [18]. Mini-MIAS dataset contains 113 abnormal and 209 normal mammograms of women breasts, which include fatty, granular, calcification, architectural symmetry, and dense cases. DDSM constitutes 2620 mammograms in total, in which 512 mammograms were randomly selected in our experiments with 302 normal cases and 257 abnormal cases. Images in both datasets were cropped region of interest (ROI) images with the size 128-by-128. A number of benchmarking mammographic datasets are available for experimental purposes in which they vary according to certain pre-defined criteria such as type and structure of the digital mammogram, dense, fatty or glandular tissues, noise level in the images, and the number of benign and malignant cases in these datasets. We opted to use Mini-MIAS and DDSM due to the fact that images in both datasets were captured in uncontrolled conditions, so the images contained sufficient noise and low-resolution images. Examples of images from the Mini-MIAS and DDSM datasets can be seen in Figure 5.
Two evaluation metrics that were used are sensitivity (SE), the proportion of breast cancer cases correctly classified as patients having malignant tumors, and specificity (SP), which corresponds to the number of normal breast mammogram cases correctly classified as normal. The accuracy and F1-score is the harmonic mean of precision and recall.
The formula for both sensitivity and specificity is defined as follows:
Sensitivity = True   positive   True   positive + False   negative
where true positive (TP) refers to cancer patients truly identified as patients having abnormal breast mammograms, and false negative (FN) refers to breast cancer patients misclassified as negative of having breast cancers.
Specificity = True   negative True   negative + False   positive
where the true negative (TN) refers to the number of truly classified women clear of breast cancer, and false positive (FP) means the number of cases wrongly classified as breast cancer positive, which in fact are clear of having cancer. The formula for both accuracy and F1-score can be stated as follows:
Accuracy = TN +   TP TN +   TP + FP + FN , F 1 score = 2 × TP ( 2 × TP ) + FP + FN .
The support vector machine (SVM) classifier used to differentiate abnormal mammograms from their normal counterparts optimized all hyperparameters of SVM with a 5-fold cross validation setting.

4. Reproducibility and Implementation Details

In all experiments, we extracted 0-dimensional and 1-dimensional PDs using the Ripser package in python (https://pypi.org/project/ripser/ (accessed on 20 August 2022)). The PI of a resolution 30-by-30, linear weighting function, and the rest of the parameters in default setting were generated using GUDHI library in python (https://pypi.org/project/gudhi/ (accessed on 20 August 2022)). PL was generated with k = 100 and the rest of the other parameters with default setting from the GUDHI library. Cubical complex filtration and its corresponding PD was constructed using the GUDHI library in python. SVM classification was performed in MATLAB with standardization and tuning for the optimal hyperparameters. In other words, in each fold, we search for the best kernel among the four kernel options available in MATLAB, which are linear, Gaussian, radial basis function, and polynomial. This means that a linear kernel for the first fold may not be good in the second fold and we may have a case of four different kernels in a 5-fold cross validation. ULBP was implemented from scratch in python to select landmarks. A padding of zero was performed during the process of ULBP landmark selection during 3-by-3 patch scanning of mammograms with an overlap value of 2 between two consecutive patches. The code to reproduce results can be found in the GitHub repository (https://github.com/dashtiali/mammogram-classification (accessed on 1 October 2022)). Full details on how to properly use the code can be found in the GitHub link provided above. A MATLAB version of landmark selection and PH generation and visualization can be found in [19].

5. Experimental Results

In order to classify the mammogram scans, the SVM classifier was trained in a 5-fold cross validation setting based on the topological features. For each image, there was 56 PDs built on 56 point-clouds extracted from the ULBP landmark selection method. There are many approaches in which one can train and test the machine learning classifier for the 56 vectorized PDs generated. We first concatenated the topological features following the seven geometrical groups in ULBP. In other words, features of the eight rotations of each ULBP geometry concatenated for each PD vectorization method. In addition to the seven feature vectors obtained, the PH features extracted in dimension zero and one, which corresponded to the connected components and 1-dimensional cycles. The experimental results obtained from combining topological features in dimension zero and one were better than using either dimension alone.
In Table 1 and Table 2, we report on the sensitivity and specificity of the best performing ULBP geometry obtained from the best performing dimension of the PH features and the four PD vectorization methods. Out of the seven ULBP geometries, none of the geometries performance was consistent in both datasets using either of the four vectorization methods utilized. PL with G3 performed better than the other ULBP geometries on DDSM while P-Binning and G7 performed the best on the Mini-MIAS dataset. Combined PH features of dimension zero and one for all 56 ULBP geometries with PL provided 92% sensitivity and 86% specificity for DDSM (see Table 3). The results reported here can be partially compared with that reported in [20], where ULBP and PH were used for mammogram abnormality classification.
The authors in [20] only used binning to vectorize PD with the KNN classifier and they reported the best classification performance of a sensitivity of 86% and specificity of 98% for Mini-MIAS together with an 82% sensitivity and 75% specificity for DDSM. Our results outperformed these results in both datasets.
Finally, in Table 4 and Table 5, we report the classification performance of SVM using the cubical complex filtration approach where we used all grayscale pixel values of the mammograms to construct one PD and then the four vectorization methods.
It can be seen that by using cubical complexes, we can obtain 98% and 94% for sensitivity and specificity, respectively, for the Mini-MIAS dataset and up to 86% of sensitivity and 81% specificity for DDSM. P-Statistics performed better than the other three vectorization methods using cubical complexes, which was not the case using landmark-based VR filtration. Using our proposed landmark based approach, one geometry alone (G7) achieved roughly the same performance for Mini-MIAS where we only used a small portion of the mammogram scan pixel values. For DDSM, we outperformed the cubical complexes if we concatenated the topological features from all geometries and obtained almost the same performance using one geometry (i.e., G3).

6. Discussion and Future Work

This study introduced a distributed method of constructing 56 PDs based on automatically extracted landmarks from breast mammograms. In general, we found that a small set of pixel landmarks was enough to detect abnormality in breast mammograms such as G3 in DDSM and G7 in the Mini-MIAS dataset. Computing the 56 PDs can be conducted in a distributed manner, which is crucial for large scale datasets. Instead of building one PD using cubical complexes, our approach provides a localized PD representation that conveys topological features linked to different types of mammogram texture distribution. Different PD vectorizations were examined where we found that it was good practice to try more than one method, as a single approach may not consistently perform well on different datasets. This work is the first step toward a more comprehensive study for different approaches of PD vectorization in medical imaging because we concluded that different types of vectorization methods affect the performance greatly, as can be seen from Table 1 to Table 5. Until now, to the best of our knowledge, there is no comprehensive analysis or a roadmap to select suitable vectorization method(s) for medical image analysis or any other image modalities. On the other hand, it is not an easy task to search the rich literature of PH vectorizations and pick the correct method suitable for the problem at hand. Nonetheless, based on the findings in this work, at this stage, we are not advising the use of a single vectorization method, as this could lead to misleading performances.
Furthermore, our analysis showed that the proposed landmark based PH could outperform classical approaches of building topology from digital images such as cubical complexes. This points to the fact that a small set of pixel value landmarks that correspond to different type of textures can be used to differentiate malignant mammogram scans from benign scans. This is particularly useful when the medical image dimensions (number of rows and columns) are very high, and using the entire pixel values is time consuming, or downsampling may result in the loss of critical medical information.
The only limitation of this work is the increase in the dimensions of feature vectors when combining more than one ULBP geometry, as was the case when we concatenated all ULBP geometries to boost the classification performance for the DDSM dataset in Table 3. Aggregating PDs before vectorization is one approach to address this limitation in future. Future work will also focus on using other texture methods as landmark selection procedures such as center-symmetric LBP or small image patches of high density. Furthermore, testing the proposed ULBP based PH on other medical image modalities such as ultrasounds and other types of disease is included in our list of future works.

Author Contributions

Conceptualization, A.A. and R.R.; Data curation, A.A., D.A. and R.R.; Formal analysis, T.M.; Methodology, A.A., T.M. and R.R.; Software, A.A. and D.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No data is available.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Carlsson, G. Topology and Data. Bull. Am. Math. Soc. 2009, 46, 255–308. [Google Scholar] [CrossRef] [Green Version]
  2. Cohen-Steiner, D.; Edelsbrunner, H.; Harer, J. Stability of Persistence Diagrams. Discrete Comput. Geom. 2007, 37, 103–120. [Google Scholar] [CrossRef] [Green Version]
  3. WHO. International Agency for Research on Cancer; WHO: Geneva, Switzerland, 2020. [Google Scholar]
  4. Ojala, T.; Pietikäinen, M.; Harwood, D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 1996, 29, 51–59. [Google Scholar] [CrossRef]
  5. Xu, Q.; Yang, J.; Ding, S. Texture Segmentation using LBP embedded Region Competition. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 2005, 5, 41–47. [Google Scholar] [CrossRef] [Green Version]
  6. Heikkilä, M.; Pietikäinen, M.; Schmid, C. Description of interest regions with local binary patterns. Pattern Recognit. 2009, 42, 425–436. [Google Scholar] [CrossRef] [Green Version]
  7. Abbasi, S.; Tajeripour, F. Detection of brain tumor in 3D MRI images using local binary patterns and histogram orientation gradient. Neurocomputing 2017, 219, 526–535. [Google Scholar] [CrossRef]
  8. Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
  9. Ahonen, T.; Hadid, A.; Pietikäinen, M. Face recognition with local binary patterns. Lect. Notes Comput. Sci. 2004, 3021, 469–481. [Google Scholar]
  10. Nanda, V.; Sazdanović, R. Simplicial models and topological inference in biological systems. In Natural Computing Series; Springer: Berlin/Heidelberg, Germany, 2014; Volume 48, pp. 109–141. [Google Scholar]
  11. Otter, N.; Porter, M.A.; Tillmann, U.; Grindrod, P.; Harrington, H.A. A Roadmap for the Computation of Persistent Homology. EPJ Data Sci. 2017, 6, 17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Garin, A.; Tauzin, G.A. Topological “Reading” Lesson: Classification of MNIST using TDA. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 1551–1556. [Google Scholar]
  13. Turner, K.; Mileyko, Y.; Mukherjee, S.; Harer, J. Fréchet Means for Distributions of Persistence Diagrams. Discret. Comput. Geom. 2014, 52, 44–70. [Google Scholar] [CrossRef]
  14. Adams, H.; Emerson, T.; Kirby, M.; Neville, R.; Peterson, C.; Shipman, P.; Chepushtanova, S.; Hanson, E.; Motta, F.; Ziegelmeier, L.; et al. Persistence Images: A Stable Vector Representation of Persistent Homology. J. Mach. Learn. Res. 2017, 18, 218–252. [Google Scholar]
  15. Bubenik, P. Statistical Topological Data Analysis using Persistence Landscapes. J. Mach. Learn. Res. 2015, 16, 77–102. [Google Scholar]
  16. Asaad, A.T.; Rashid, R.D.; Jassim, S.A. Topological image texture analysis for quality assessment. In Proceedings of the SPIE—The International Society for Optical Engineering, Anaheim, CA, USA, 9–13 April 2017; Volume 10221. [Google Scholar]
  17. Heath, M.; Bowyer, K.; Kopans, D.; Moore, R.; Kegelmeyer, P. The digital database for screening mammography. In Proceedings of the Fifth International Workshop on Digital Mammography, Toronto, ON, Canada, 11–14 June 2001; pp. 212–218. [Google Scholar]
  18. Suckling, J.; Parker, J.; Dance, D.; Astley, S.; Hutt, I.; Boggis, C.; Ricketts, I.; Stamatakis, E.; Cerneaz, N.; Kok, S.; et al. Mammographic Image Analysis Society (MIAS) Database; The University of Cambridge: Cambridge, UK, 2015; Available online: https://www.repository.cam.ac.uk/handle/1810/250394 (accessed on 20 September 2012).
  19. Ali, D.; Asaad, A. DAAR Topology: A Software to Build Topological Features from Images. Available online: https://github.com/daartopology/DAAR-Topology (accessed on 16 October 2021).
  20. Asaad, A. Persistent Homology Tools for Image Analysis; The University of Buckingham: London, UK, 2020. [Google Scholar]
Figure 1. The LBP process where 1’s in the binary code is represented by the bold points on the circle.
Figure 1. The LBP process where 1’s in the binary code is represented by the bold points on the circle.
Mathematics 10 04039 g001
Figure 2. Geometric representation of the ULBP method.
Figure 2. Geometric representation of the ULBP method.
Mathematics 10 04039 g002
Figure 3. Landmark based PH construction and classification pipeline.
Figure 3. Landmark based PH construction and classification pipeline.
Mathematics 10 04039 g003
Figure 4. A greyscale image patch and its corresponding cubical complex filtration and persistent barcode representation in dimension zero and one. B 0 and B 1 represent Betti numbers in dimension zero and one, respectively.
Figure 4. A greyscale image patch and its corresponding cubical complex filtration and persistent barcode representation in dimension zero and one. B 0 and B 1 represent Betti numbers in dimension zero and one, respectively.
Mathematics 10 04039 g004
Figure 5. Examples of ROI for normal and abnormal cases from the Mini-MIAS and DDSM datasets.
Figure 5. Examples of ROI for normal and abnormal cases from the Mini-MIAS and DDSM datasets.
Mathematics 10 04039 g005
Table 1. The top performing ULBP geometries and PH dimension and all PD vectorizations for DDSM. Avg = average, Std = standard deviation for 5-fold cross-validation using SVM.
Table 1. The top performing ULBP geometries and PH dimension and all PD vectorizations for DDSM. Avg = average, Std = standard deviation for 5-fold cross-validation using SVM.
Feature TypeClassification MetricsAvg ± Std
PD-dim0, 1, P-Binning, and G5Sensitivity85.02 ± 7.5
Specificity77.4 ± 2.7
Accuracy81.57 ± 4.6
F1-Score83.18 ± 4.8
PD-dim1, P-Statistics, and G3Sensitivity85.1 ± 4.9
Specificity79.7 ± 6.6
Accuracy82.64 ± 2.7
F1-Score84.11 ± 2.4
PD-dim0, 1, PI, and G3Sensitivity76.4 ± 9.4
Specificity66.9 ± 7.3
Accuracy72.1 ± 2.8
F1-Score74.53 ± 4.3
PD-dim0, 1, PL, and G3Sensitivity86.06 ± 4.8
Specificity80.9 ± 4.4
Accuracy83.7 ± 4
F1-Score85.07 ± 3.7
Table 2. The top performing ULBP geometries and all PD vectorizations for Mini-MIAS.
Table 2. The top performing ULBP geometries and all PD vectorizations for Mini-MIAS.
Feature TypeClassification MetricsAvg ± Std
PD-dim0, P-Binning, and G7Sensitivity97.6 ± 1.5
Specificity95.5 ± 3.2
Accuracy96.92 ± 1.5
F1-Score97.63 ± 1.1
PD-dim0, 1, P-Statistics, and G5Accuracy98.6 ± 2.9
Specificity94.6 ± 3.8
Accuracy97.27 ± 2.1
F1-Score97.9 ± 1.7
PD-dim0, 1, PI, and G7Sensitivity98.1 ± 1.0
Specificity94.6 ± 2.1
Accuracy96.89 ± 1.1
F1-Score97.62 ± 0.9
PD-dim0, 1, PL, and G7Sensitivity97.6 ± 0.1
Specificity92.8 ± 6.1
Accuracy95.94 ± 2.2
F1-Score96.92 ± 1.6
Table 3. Concatenation of all ULBP geometries together with dimension 0 and 1 of the PD for DDSM classification using the top three PD vectorization.
Table 3. Concatenation of all ULBP geometries together with dimension 0 and 1 of the PD for DDSM classification using the top three PD vectorization.
Feature TypeClassification MetricsAvg ± Std
PD-dim0, 1, and PLSensitivity92.3 ± 4
Specificity86.5 ± 3
Accuracy89.62 ± 1.4
F1-Score90.56 ± 1.4
PD-dim0, 1, and P-BinningSensitivity85.1 ± 6
Specificity82.6 ± 4
Accuracy83.57 ± 4.3
F1-Score84.73 ± 4.2
PD-dim0, 1, and P-StatisticsSensitivity87.6 ± 4
Specificity82.3 ± 4
Accuracy84.62 ± 1.5
F1-Score85.93 ± 1.6
Table 4. Cubical complex performance results for the Mini-MIAS dataset using four different vectorization methods and three homology dimensions.
Table 4. Cubical complex performance results for the Mini-MIAS dataset using four different vectorization methods and three homology dimensions.
Feature TypeSensitivity
(Avg ± STD)
Specificity
(Avg ± STD)
Accuracy
(Avg ± STD)
F1-Score
(Avg ± STD)
P-Binning and PD-dim099.02 ± 2.182.51 ± 3.6565.17 ± 1.378.68 ± 0.9
P-Binning and PD-dim199.02 ± 1.340.91 ± 2.0364.6 ± 0.778.41 ± 0.5
P-Binning and PD-dim0, 198.54 ± 2.182.51 ± 3.6564.86 ± 1.578.45 ± 1
P-Statistics and PD-dim098.54 ± 1.3492.15 ± 3.2596.29 ± 0.897.18 ± 0.6
P-Statistics and PD-dim198.09 ± 1.0796.47 ± 1.9997.52 ± 0.898.09 ± 0.6
P-Statistics and PD-dim0, 198.58 ± 1.394.76 ± 1.5497.24 ± 1.297.88 ± 0.9
PI and PD-dim088.68 ± 5.5378.04 ± 13.4884.95 ± 6.288.46 ± 4.7
PI and PD-dim190.95 ± 4.5584.98 ± 3.8188.86 ± 3.691.34 ± 2.9
PI and PD-dim0,193.83 ± 2.5391.24 ± 2.6792.92 ± 294.49 ± 1.6
PL and PD-dim093.39 ± 3.2369.96 ± 12.2985.17 ± 5.889.16 ± 4.2
PL and PD-dim189.7 ± 8.7878.51 ± 8.5685.81 ± 5.388.99 ± 4.6
PL and PD-dim0, 195.43 ± 6.2380.22 ± 11.6890.13 ± 3.892.59 ± 2.9
Table 5. Cubical complex performance results for the DDSM dataset using four different vectorization methods and three homology dimensions.
Table 5. Cubical complex performance results for the DDSM dataset using four different vectorization methods and three homology dimensions.
Feature TypeSensitivity
(Avg ± STD)
Specificity
(Avg ± STD)
Accuracy
(Avg ± STD)
F1-Score
(Avg ± STD)
P-Binning and PD-dim057.08 ± 15.1362.94 ± 1059.78 ± 5.559.69 ± 9.7
P-Binning and PD-dim168.6 ± 17.562.26 ± 12.8965.69 ± 4.267.44 ± 8.3
P-Binning and PD-dim0, 169.16 ± 19.5251.14 ± 22.960.87 ± 7.564.72 ± 8.8
P-Statistics and PD-dim082.08 ± 7.5780.57 ± 5.5881.38 ± 4.482.55 ± 4.4
P-Statistics and PD-dim180.13 ± 8.587.19 ± 3.8883.37 ± 5.583.73 ± 5.9
P-Statistics and PD-dim0, 186.03 ± 881.72 ± 4.8984.05 ± 4.285.23 ± 4.3
PI and PD-dim061.01 ± 981.69 ± 7.8570.52 ± 668.89 ± 7.2
PI and PD-dim171.62 ± 10.0178.19 ± 5.3574.65 ± 6.275.07 ± 7.2
PI and PD-dim0, 173.19 ± 6.1873.95 ± 6.4873.54 ± 4.274.87 ± 4.2
PL and PD-dim074.46 ± 7.1771.54 ± 7.8673.12 ± 5.374.89 ± 5.1
PL and PD-dim181.13 ± 6.0673.53 ± 7.3477.63 ± 5.579.66 ± 5
PL and PD-dim0, 183.74 ± 4.7277.01 ± 5.0380.65 ± 4.782.37 ± 4.3
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Asaad, A.; Ali, D.; Majeed, T.; Rashid, R. Persistent Homology for Breast Tumor Classification Using Mammogram Scans. Mathematics 2022, 10, 4039. https://doi.org/10.3390/math10214039

AMA Style

Asaad A, Ali D, Majeed T, Rashid R. Persistent Homology for Breast Tumor Classification Using Mammogram Scans. Mathematics. 2022; 10(21):4039. https://doi.org/10.3390/math10214039

Chicago/Turabian Style

Asaad, Aras, Dashti Ali, Taban Majeed, and Rasber Rashid. 2022. "Persistent Homology for Breast Tumor Classification Using Mammogram Scans" Mathematics 10, no. 21: 4039. https://doi.org/10.3390/math10214039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop