Coronary Artery Disease Detection by Machine Learning with Coronary Bifurcation Features

Chen, Xueping; Fu, Yi; Lin, Jiangguo; Ji, Yanru; Fang, Ying; Wu, Jianhua

doi:10.3390/app10217656

Open AccessArticle

Coronary Artery Disease Detection by Machine Learning with Coronary Bifurcation Features

by

Xueping Chen

^1,†

,

Yi Fu

^2,†

,

Jiangguo Lin

^1,†

,

Yanru Ji

¹,

Ying Fang

^1,*

and

Jianhua Wu

^1,*

¹

Institute of Biomechanics, School of Bioscience and Bioengineering, South China University of Technology, Guangzhou 510006, China

²

Collaborative Innovation Center for Biomedicines & School of Medical Instruments, Shanghai University of Medicine & Health Sciences, Shanghai 200237, China

^*

Authors to whom correspondence should be addressed.

^†

X. Chen, Y. Fu and J. Lin contributed equally to this work.

Appl. Sci. 2020, 10(21), 7656; https://doi.org/10.3390/app10217656

Submission received: 19 September 2020 / Revised: 23 October 2020 / Accepted: 27 October 2020 / Published: 29 October 2020

(This article belongs to the Collection Advances of Biomedical Signal Processing for Disease Diagnosis, Prognosis or Severity Determination)

Download

Browse Figures

Versions Notes

Abstract

:

Background: Early accurate detection of coronary artery disease (CAD) is one of the most important medical research areas. Researchers are motivated to utilize machine learning techniques for quick and accurate detection of CAD. Methods: To obtain the high quality of features used for machine learning, we here extracted the coronary bifurcation features from the coronary computed tomography angiography (CCTA) images by using the morphometric method. The machine learning classifier algorithms, such as logistic regression (LR), decision tree (DT), linear discriminant analysis (LDA), k-nearest neighbors (k-NN), artificial neural network (ANN), and support vector machine (SVM) were applied for estimating the performance by using the measured features. Results: The results showed that in comparison with other machine learning methods, the polynomial-SVM with the use of the grid search optimization method had the best performance for the detection of CAD and had yielded the classification accuracy of 100.00%. Among six examined coronary bifurcation features, the exponent of vessel diameter (

n

) and the area expansion ratio (AER) were two key features in the detection of CAD. Conclusions: This study could aid the clinicians to detect CAD accurately, which may probably provide an alternative method for the non-invasive diagnosis in clinical.

Keywords:

coronary artery disease; coronary bifurcations; machine learning; morphological features; classification performance

1. Introduction

Coronary artery disease (CAD) is one of the leading causes of death in the world [1,2]. Early detection of CAD could save patients’ lives and reduce the cost of healthcare, which is of great clinical significance. Many tools have been developed for CAD diagnosis [3,4,5], among which the cardiac catheterization is the most direct and reliable approach [3]. However, cardiac catheterization is costly and time-consuming, and it is also an invasive and risky surgical operation. Hence, few CAD patients prefer to choose cardiac catheterization as their diagnostic method. Therefore, finding a reliable and non-invasive method for early detection of CAD is very desirable. Although some non-invasive diagnostic approaches like coronary computed tomography angiography (coronary CTA, CCTA), echocardiography, and nuclear magnetic resonance imaging (MRI) could accurately detecting CAD, they lack the ability to detect CAD at early stage. On the other hand, these methods usually require assistance from medical experts. Thus, they have not been effectively used for the early detection of CAD. Our previous study found that the changes in the morphological features of coronary arterial trees were highly correlated to the degrees of CAD lesion (unpublished yet). Moreover, these morphological changes had probably occurred before the CAD lesion developed [6]. Machine learning methods are widely used for pattern recognition and binary classification problems which can be used to classify common diseases with positive or negative attributes [7]. Therefore, it is of great potential to build a machine learning classifier by using the coronary morphological features for the non-invasive detection of CAD at its early stages.

In recent decades, various machine learning algorithms were used for quick and non-invasive CAD detection, such as k-nearest neighbors (k-NN), decision tree (DT), artificial neural network (ANN) and support vector machine (SVM) [8,9,10,11]. The comparison of performances of CAD detection by current machine learning methods with the use of different algorithms and medical features is shown in Table A1 (see Appendix A). Machine learning comprised of various steps: collecting the original features data, preprocessing of the data, choosing a machine learning model, training the selected model and parameter tuning, evaluating the selected model, and finally making predictions. The features selection is the first step and plays a crucial role in machine learning, which can impact the detection accuracy of the machine learning classifier [12]. Studies suggested that many features were clinically associated with CAD and varied with the degrees of diseases, including physiological-based features such as blood pressure, blood glucose, blood lipids, age, obesity degree, and overweight [1], signal-based features such as electrocardiogram (ECG) and phonocardiogram (PCG) signals [13], computer imaging-based features such as “gray value” [14], and morphological features such as diameter stenosis [15]. Although all these features were potentially used in machine learning for the detection of CAD, the performance of the machine learning classifier depended on the quality of features used. Selecting the most important features could significantly improve the accuracy of the medical diagnosis. Previous studies had proposed numerous feature selection algorithms to analyze the most important features among CAD datasets. For example, Li et al. developed a signal feature selection method and obtained the accuracy, sensitivity, specificity, and G-mean of 95.62%, 98.48%, 89.17%, and 93.69%, respectively with the use of the dual-input neural network [13]. Wosiak et al. proposed an unsupervised features selection method from the medical datasets and obtained relatively good diagnosis results [16]. Nasarian et al. proposed a heterogeneous hybrid feature selection algorithm and had yielded the classification accuracy of 81.23% with SMOTE and XGBoost classifier [12]. However, the medical significance behind these selected features data is not well known. Previous studies suggested different features were associated with CAD in different correlation degrees [12]. In addition, morphological features were the most direct features for characterizing CAD in clinical [4]. In other words, the coronary arteries with CAD lesion has special morphological features that are different from the normal arteries. Therefore, morphological features of the coronary arterial trees may be of great potential to build a machine learning classifier for the detection of CAD.

The aim of this paper is to propose a new methodology (a combination of using machine learning algorithms and imaging-based morphological measurement methods) for the detection of CAD, which may help to detect CAD at the early stage in a non-invasive way and provide a tool to save more lives. To collect morphological features dataset, we reconstructed three-dimensional (3D) coronary arteries from CCTA images of CAD patients and healthy individuals and measured the morphological features from the reconstructed 3D coronary arterial trees. To evaluate the quality of the morphological features, machine learning algorithms such as logistic regression (LR), DT, linear discriminant analysis (LDA), k-NN, ANN, and support vector machines (SVMs) were then applied to build classifiers for the detection of CAD. The accuracy of these machine learning algorithms was used to evaluate the prediction capability of classification algorithms. Moreover, the algorithm with best performance was further used to find the most important features in detecting CAD. In addition, the effects of the kernel function selection, data sampling, training dataset amount, and the dimension of input features on the classification performances of the machine learning model were also studied. The main contributions of this work were: (1) combined utilization of machine learning and morphological measurement method for CAD detection, (2) achieved superior classification performance compared with existing studies, (3) found the most two important morphological features for CAD detection.

2. Materials and Methods

We here proposed a morphometric methodology to collect the morphological features dataset for the detection of CAD. Then, we evaluated these features by different commonly used machine learning algorithms to find the best-fitted classifier for the detection of CAD. Finally, we applied the best algorithm to evaluate all the morphological features to seek the most important features for the detection of CAD.

2.1. Morphometric Features Data Collection and Selection

With the development of hardware and software of angiographic techniques, the CCTA imaging has been successfully applied to the visualization of coronary arteries in recent decades [17,18]. We can obtain many features from the CCTA images, such as geometric features, size of calcified plaque, and diameter stenosis rate, etc. Selecting the most important features has a significant impact on the medical diagnostic process. It helps to get an accurate and quick diagnosis. In this study, we selected the morphometric features data for building the machine learning classifiers. These features were measured from the CCTA images of the Southern Chinese populations. In this study, we totally collected morphometric features datasets with 1163 variables (features data), among which 571 variables were from patients with CAD lesion (CAD subjects) and 592 variables were from individuals without CAD lesion (non-CAD subjects). This study was approved by the Ethics Committee of the College of Basic Medicine, Southern Medical University and the Ethics Committee of the Guangdong General Hospital, Guangdong Academy of Medical Sciences, and was performed per the Declaration of Helsinki.

The morphological features were obtained from the coronary bifurcations and the method of data collection was shown in Figure 1. Morphometric data of coronary arterial trees were extracted from CCTA images with MIMICS software (Materialize). In the MIMICS software, centerlines were formed by a series of center points that located at the center of the cross-sectional plane of the 3D coronary artery. Subsequently, the best fit diameter was calculated as twice of the average radius from the points of the centerlines to the contour of the 3D coronary artery. The original morphometric data of mother vessel diameters (

D_{m}

), larger daughter vessel diameters (

D_{l}

), smaller daughter vessel diameters (

D_{s}

), and bifurcation angles (α), were determined at all bifurcations of arterial trees. Table 1 showed the definitions of six morphological features (α,

n

,

\frac{D_{s}^{3}}{D_{m}^{3}}

,

\frac{D_{l}^{3}}{D_{m}^{3}}

,

\frac{D_{s}^{3}}{D_{l}^{3}}

, and

A E R

). The morphological features of

n

and

A E R

represented the exponent of vessel diameter and area expansion ratio of the coronary bifurcations, respectively. They were calculated from mother vessel diameters and two daughter vessel diameters. These six morphological features were selected since they were highly correlated with atherosclerotic lesion in previous researches [19,20] or were of potential clinical importance as indicated by medical experts.

2.2. Machine Learning Modeling Processes and Algorithms Evaluation

The aim of this study was to obtain the best machine learning classifier and to find the most important morphological features for the detection of CAD. Moreover, to evaluate the quality of the selected morphological features for the detection of CAD, several commonly used classifier algorithms were applied for classification, namely LR, DT, LDA, k-NN, ANN, and SVM, respectively. In this study, the classification performances of the SVM model for three different kernel functions (namely linear, polynomial, and radial basis function (RBF)) were first studied to find the best kernel function for the detection of CAD. We named these three sub-algorithms as linear-SVM, polynomial-SVM, and RBF-SVM, respectively. Moreover, the accuracies among the best sub-algorithm of SVM and the other machine learning algorithms were further compared to assess the capability of all algorithms we used. This aim was to select the best machine learning algorithm for the following up researches. To find the most important features for the detection of CAD, we then applied the selected best algorithm to evaluate the classification performance of the specific feature(s) or their combinations.

The process flow diagram of our proposed machine learning approach was shown in Figure 2. The main steps of the proposed approach were summarized below.

Step 1. Collecting the original morphological features;

In this study, we totally collected six original morphological features (α,

n

,

\frac{D_{s}^{3}}{D_{m}^{3}}

,

\frac{D_{l}^{3}}{D_{m}^{3}}

,

\frac{D_{s}^{3}}{D_{l}^{3}}

, and

A E R

). These morphological features were selected since they had been shown highly correlating with CAD.

Step 2. Dividing all the features into four equal groups;

The morphological feature datasets for both CAD subjects and non-CAD subjects were randomly divided into four equal-sized subsets. The subsets of the non-CAD subjects were designated as

A 1

,

A 2

,

A 3

, and

A 4

; while the subsets of the CAD subjects were designated as

B 1

,

B 2

,

B 3

, and

B 4

(Figure 2). This part was aimed to prepare for studying the effect of data sampling on the classification performance of the machine learning model.

Step 3. Preprocessing of the original morphological features;

To meet the requirement of the data format of the classifier algorithms, the original morphometric data were preprocessed. The preprocessing formula was described as follows:

y = 2 \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}} - 1

(1)

where

x_{m a x}

and

x_{m i n}

were the maximum and minimum values of each morphological feature, respectively. After data preprocessing, the values of original morphometric data were normalized to values ranging from −1 to 1.

Step 4. Choosing a machine learning model, training the selected model, and parameter tuning;

Different classification algorithms (LR, DT, LDA, k-NN, ANN, linear-SVM, polynomial-SVM, and RBF-SVM) were applied for training under the use of same morphological features (75% of data for training, the remaining for testing). The cross-validation was set as 10-fold (in subsets with proportional data quantities for both classes). For the SVM classifiers, two key parameters,

C

and

g

, were needed to be pre-optimized to establish the best SVM model. There were two common algorithms for parameter optimization, namely grid search and particle swarm optimization (PSO) [21]. In this study, the given range of grid search algorithm was set from

2^{- 8}

to

2^{8}

, and the maximum iteration number of PSO algorithm was set to 200 to find the near-optimal parameters.

Step 5. Evaluating the models and making the predictions;

The calculations of the statistical metrics on the results of models were defined as follows [22]:

A c c u a r c y = \frac{T P + T N}{T P + F N + T N + F P}

(2)

S e n s i t i v i t y = \frac{T P}{T P + F N}

(3)

S p e c i f i c i t y = \frac{T N}{T N + F P}

(4)

Accuracy showed the ratio of correctly classified samples to the total number of tested samples.

T P

,

F N

,

T N

, and

F P

represented true positives, false negatives, true negatives, and false positives, respectively.

2.3. Features Evaluation

In this study, the effects of the data sampling, the volume of the training dataset, and the dimension of input features, on the classification performances of machine learning were further analyzed.

2.3.1. Effects of Data Sampling

To examine the impact of data sampling on the classification performances, we randomly selected one subset from CAD subjects and one subset from non-CAD subjects as the testing datasets, respectively. All the remaining subsets were used as the training datasets. This process was repeated 16 times for each algorithm. In this part, we only studied the classification performances of the polynomial-SVM algorithm as it showed good performances (see Results below).

2.3.2. Effects of the Volume of Training Dataset

There were three cases in this section. Case 0 was that 75% of data (444 from non-CAD subjects and 429 from CAD subjects) were used for training, Case 1 was that 50% of data (296 from non-CAD subjects and 286 from CAD subjects) were used for training, and Case 2 was that 25% of data (148 from non-CAD subjects and 143 from CAD subjects) were used for training. Given that the different data sampling did not affect the classification performances (See Results below), in order to simplify the subsequent description, we further defined the subsets of non-CAD subjects as

C 1 = A 1 + A 2

,

C 2 = A 3 + A 4

; and the subsets of CAD subjects as

D 1 = B 1 + B 2

,

D 2 = B 3 + B 4

. Case 0: we randomly selected three subsets from (

A 1

,

A 2

,

A 3

,

A 4

) and (

B 1

,

B 2

,

B 3

,

B 4

) as training data sets, respectively, the remaining morphometric data were used as the testing data sets; Case 1: we randomly selected one subset from (

C 1

,

C 2

) and (

D 1

,

D 2

) as training data sets, respectively, the remaining subsets were used as the testing data sets; Case 2: similarly, we randomly selected one subset from (

A 1

,

A 2

) and (

B 1

,

B 2

) as training data sets, respectively, the remaining morphometric data (including (

A 3

,

A 4

) and (

B 3

,

B 4

)) were used as the testing data sets. This part performed by using the best classification algorithm among the algorithms given above.

2.3.3. Effects of the Dimension of Input Features

Six different morphological features were available to generate the classifier. There were

C_{6}^{k}

(

k \in [1, 6]

) possible combinations for the specific input features, where

k

was the number of morphological features selected from six morphological features. This part performed by using the best classification algorithm with 75% of morphometric data used for training.

The best classification algorithm was further applied to train each morphological feature to find the most important features for the detection of CAD.

2.4. Models Running Approaches

The proposed machine learning methods were performed on a PC with 2.60 GHz Intel Core i7 CPU, 16 GB RAM, and a windows 7 operating system. All the machine learning models were run in the MATLAB software (Math Works, Natick, MA, USA) with the use of the classification learner codes and the computational time of each model can be completed within ~30 min. Moreover, libSVM [23], an open-source library, was applied to build the SVM models. The results of the machine learning classification were analyzed in the following subsections.

3. Results

3.1. Polynomial-SVM Model with Grid Search Optimization Showed the Best Performance in the Detection of CAD

In this study, the classification performances of linear-SVM, polynomial-SVM, and RBF-SVM were first studied to choose the best SVM algorithm for the following investigation. Previous studies suggested that the SVM model could not achieve best classification outcomes if the kernel functions and parameters were not selected properly [21]. To find the best SVM algorithm for the detection of CAD, we compared the classification performances of linear-SVM, polynomial-SVM, and RBF-SVM algorithm with the use of three different parameter setting methods of default method, grid search and PSO (Table 2). The results showed that although all SVM algorithms worked well with the default parameters, the classification performances of the SVM models were improved remarkably through parameter optimizations with both grid search and PSO. Moreover, for all the SVM models, the performances optimized by grid search were better than those by PSO (Table 2). On the other hand, the classification performances of polynomial-SVM were much better than those of linear-SVM and RBF-SVM (Table 2). This suggested that polynomial-SVM with the parameter optimization method of grid search achieved the best performance among three given SVM models.

To compare the performances among the best SVM algorithm (polynomial-SVM) and the other machine learning algorithms, the accuracies of the polynomial-SVM algorithm and other machine learning algorithms (LR, DT, LDA, k-NN, and ANN) were further studied (Table 3). The aim of this part was to assess the classification capability of all used algorithms. The results indicated that the polynomial-SVM model achieved the best performance in the detection of CAD, followed by ANN. The classification accuracy of LR, DT, LDA, k-NN, ANN, and polynomial-SVM were 96.30%, 97.00%, 92.30%, 95.70%, 98.40% and 100.00%, respectively. This suggested that polynomial-SVM was the best algorithm for the detection of CAD among all these algorithms. Hence, the following studies will be mainly based on the polynomial-SVM model (with the parameter optimization of grid search) for further research.

3.2. The Performance of the SVM Models Was Not Affected by Data Sampling

To examine whether the classification performances of the polynomial-SVM model were dependent on data sampling, 75% morphometric data from both non-CAD and CAD subjects were randomly selected as the training datasets, and the other morphometric data were selected as the testing datasets. The results in Table 4 indicated that the values of the classification performances (including training accuracy, testing accuracy, testing sensitivity, and testing specificity) for the polynomial-SVM model were similar in all sampling cases. Moreover, the fluctuation of the classification performances of the polynomial-SVM model was less than 1‰. This suggested that the classification performances were not affected by the data sampling. Furthermore, the results also showed that all classification performances (accuracy, sensitivity, and specificity) of the polynomial-SVM model were close to 100%. This suggested that polynomial-SVM model was very effective and stable to detect CAD by combining the measured morphological features. Table A2 also showed the comparison of the classification performances among linear-SVM, polynomial-SVM, and RBF-SVM under using different sampling methods. The fluctuations of the classification performances of all the SVM models were within 5%. This further suggested that the classification performances of all SVM algorithms were not affected by the data sampling, especially for the polynomial-SVM algorithm (see Appendix A).

3.3. Adequate Training Data Volume Was Necessary and Sufficient to Obtain High Detecting Performance

Although polynomial-SVM algorithm had exhibited high performances in classification events, this model had speed limitation for training larger datasets [24]. However, excessively reducing the volume of training data might affect the classification performance of the polynomial-SVM model. A reasonable training data volume should not only guarantee the running efficiency but also retain the high classification performance. To study the effect of the training data volume on the classification performances, 75% (Case 0), 50% (Case 1) and 25% (Case 2) of morphometric data were selected as training datasets (see “Methods” section; please note that the results of Case 0 were presented in Table 4). As shown in Table 5, although the performance of the training accuracy was not affected by the training datasets volume, the performances of testing accuracy, testing sensitivity, and testing specificity were sensitive to the volume of training datasets. In addition, when the volume of training data was sufficient (≥50%), all the performances of the polynomial-SVM model reached the best (close to 100%) and were not affected by the training datasets volume and the sampling method (Table 4 and Table 5 (Case 1)). These results suggested that adequate training datasets volume (≥50%) was necessary and sufficient to obtain high detecting performance.

3.4. The Effect of the Dimension and Combination of Morphological Features on the Classification Performances

To examine the effect of the input features dimension (features number of a combination) on the classification performances, all the combinations of the six morphological features were examined with the polynomial-SVM model. When the dimension was

k

, there were

C_{6}^{k}

possible combinations that can be used to build model (see “Methods” section). Each scatted point in Figure 3a–d represented a detection result for the specific combination. The results showed that the performances of training accuracy, testing accuracy, testing sensitivity, and testing specificity exhibited a similar trend. The classification performances of the model were improved as the dimension of input features increased (Figure 3). When the dimension

\geq

4, all the classification performances achieved a high level (over 95%) and were not susceptible to the combination approach. However, when the dimension <4, the classification performances increased rapidly as the number of the dimension increased, and the classification performances exhibited large dispersion among different feature combinations for a specific dimension. This suggested that the feature combination approach was also a factor affecting the classification performances of the machine learning model when the dimension <4.

3.5. Bifurcation Diameter Exponent ( $n$ ) and Area Expansion Ratio (AER) Were Two Key Features for the CAD Detection

To find out which morphological feature(s) was the most important feature for the detection of CAD, we compared the classification performances of the polynomial-SVM model among each single morphological feature (Table 6). The results showed that the morphological feature

n

exhibited the best performance among the six morphological features, followed by

A E R

, while the

\frac{D_{s}^{3}}{D_{m}^{3}}

had the worst performance. Moreover, it was worth noting that, in the cases of combinations of two input features, there were two points of the classification performances that were significantly higher than the other points (see Figure 3). This suggested that those two detection results of the polynomial-SVM model achieved pretty good performances than the other results under the two-dimensional combination. Hence, in order to find out which features were selected of these two detections, we further compared the classification performances of the polynomial-SVM model for the combinations with two features (Table 7). The results showed that the combinations of (

n

&

A E R

) and (

n

&

\frac{D_{l}^{3}}{D_{m}^{3}}

) achieved the testing accuracy of 89.31% and 87.93%, which were significantly higher than the third-ranked combination of

n

&

\frac{D_{s}^{3}}{D_{m}^{3}}

(with the testing accuracy of 64.83%) (Table 7). These results suggested that the polynomial-SVM model with the use of feature

n

and

A E R

may have great clinical application prospects in the early detection of CAD.

4. Discussion

Diversity is one of the properties of medical features dataset and greatly increased the difficulty of medical data mining. In this study, we utilized the commonly used machine learning models (such as LR, DT, LDA, k-NN, ANN, linear-SVM, polynomial-SVM, and RBF-SVM) for the detection of CAD by using the image-based morphometric features. Six measured morphological features were applied to generate the machine learning models. These morphological features were representative of the detailed quantitative topological information of the coronary bifurcations. The results showed that the polynomial-SVM algorithm with the use of the grid search optimization method showed the best performance for the detection of CAD and yielded the classification accuracy of 100.00%. It was worth noting that adequate training data volume and input feature dimensions were necessary for obtaining high classification performances of machine learning. Moreover, when the volume of training dataset was large enough, the classification performances were not susceptible to the data sampling method. In addition, we further found that the exponent of vessel diameter (

n

) and the area expansion ratio (AER) were the most two important features for the detection of CAD, especially for their combination.

Remarkable progress have been made in applying different machine learning algorithms with medical features for detecting different diseases such as various types of cancer and cardiovascular diseases [12,13,14,25,26,27]. Abdar et al. proposed a nested ensemble nu-support vector classification (NE-nu-SVC) model for the diagnosis of CAD [26]. The proposed model provided accuracies of 94.66% and 98.60% for two different datasets (Z-Alizadeh Sani and Cleveland CAD datasets), respectively. Singh et al. [27] applied extreme learning machine (ELM) to detect CAD with 100% accuracy when constructing the classifier with 31 input features. However, the detection accuracy of ELM was drastically decreased to 68.48% when the number of input features was reduced to 24. Nasarian et al.’s summarized the existing machine learning methods for the detection of CAD with the use of various medical databases [12]. It showed that the detection accuracy of different algorithms varied with the medical databases. This suggested that both machine learning algorithms and features selection method would play significant roles in the medical diagnostic process. Table A1 (see Appendix A) summarized the comparisons of machine learning performances for the detection of CAD with different classifier algorithms and medical features, which also supported Nasarian et al.’s study that both features and machine learning algorithms could significantly affect the performances of machine learning. In this study, we hence comprehensively compared the prediction performance of existing commonly used algorithms to select the best algorithm for the evaluation of the features.

The SVM model is one of the most well-known supervised machine learning techniques that have been widely used in pattern recognition and binary classification problems [28]. Since the SVM algorithm was invented, a large number of studies have been done by previous researchers on its optimization [21,29,30]. Syarif et al. showed that SVM parameter optimization using grid search was powerful to improve classification accuracy [21]. Our findings showed that although SVM worked well with the default parameters, and the performances of SVM could be significantly improved by using parameter optimizations with both grid search and PSO. Liu et al. found that the RBF-SVM achieved the best accuracy in the classification of fasting plasma glucose level ≥ 126 mg/dl vs. fasting plasma glucose level < 126 mg/dL, and the linear-SVM performed the best in the classification of fasting plasma glucose level ≥ 100 mg/dL vs. fasting plasma glucose level < 100 mg/dL by applying support vector machine modeling to predict diabetes disease [29]. The study by Patle et al. revealed that the RBF-SVM was better than the polynomial-SVM for the classification task when the data set was very large [30]. In this paper, we found that the polynomial-SVM with grid search optimization method performed the best in the detection of CAD by using six measured morphological features to build the SVM classifiers (Table 2).

Previous studies showed that features used for building the machine learning classifier can be extracted from the diagnostic images (such as MRI, CTA and ultrasound) by automatic algorithms [31]. However, the clinical significance of these features was unknown and difficult to be interpreted. Moreover, for a given image, automatic algorithms can generate a large number of features. It was suggested that reducing the number of features in machine learning to speed up the training was of great importance especially when dealing with large datasets [32]. However, this process may degrade the classification results. A reasonable method for feature selection was of great significance for building a high-performance classification model. There are various automatic algorithms for the feature extraction or selection, such as F-score and SVM-RFE [33,34]. These methods were helpful to efficiently select the features for the machine learning. However, the F-score algorithm did not provide any mutual information among the features [33], while the SVM-RFE algorithm can only be used to linear kernel SVM [34]. In the present study, the six morphometric features were selected because they had exhibited great clinical significance in terms of atherosclerosis [20] or indicated by medical experts. The results showed that our features achieved high classification performances with using only six morphological features for building the machine learning models, especially for polynomial-SVM (close to 100.00%) (Table 2, Table 3 and Table 4 and Figure 3 and Table A2). This suggested that reasonable and targeted selection of features for building machine learning models could greatly improve the classification performances. To the best of our knowledge, this study was the first report that the machine learning models successfully detected CAD by using the measured morphological features from the reconstructed 3D coronary bifurcations.

Trivedi et al. found that the SVM model achieved an accuracy of 98.5% with 1358 dimensions of feature while the accuracy below 85.2% with 158 dimensions of feature, by studying the effect of feature dimension on the ability to detect email spams [35]. Our present modeling strategy showed a similar phenomenon that the average testing accuracy with the lowest (one) and highest (six) dimensions of the feature were 51.83% and 99.98%, respectively (Figure 3). Moreover, we found that the morphological feature

n

achieves the best classification performances among the six morphological features, followed by

A E R

, while the

\frac{D_{s}^{3}}{D_{m}^{3}}

achieves the worst (Table 6). The results also indicated that two combinations in the two-dimensional section achieved the testing accuracy as high as 90% (Figure 3). This classification accuracy was remarkably higher than that of the other combinations with two input features. Further studies suggested that these two combinations were (

n

&

\frac{D_{l}^{3}}{D_{m}^{3}}

) and (

n

&

A E R

) (Table 7). These results were consistent with our recent study which predicts that the

| \frac{n - 3}{3} |

and

A E R

were the best two indicators for CAD prediction (data not shown). Our results further indicated that the volume of training data was another factor that can impact the classification performance (Table 5). A study by Rudd et al. indicated that the whole data were split into 80% (1652# of 2066#) for training and the remaining 20% for testing achieved high accuracy of 97.5% [31]. Our present strategy showed similar high performances when no less than 50% of the morphometric data were used for training. In addition, the performance did not improve significantly as the volume of training datasets further increases from 50% to 75% (Table 4 and Table 5 (Case 1)). Specifically, 50% and 75% of morphometric data were selected for training resulting in a mean testing accuracy of 99.96% and 99.98%, respectively. These results were obtained by the polynomial-SVM model. These suggest that the polynomial-SVM model was a stable and powerful approach for the detection of coronary artery disease.

Although the polynomial-SVM model performed the best in this study, this algorithm could also have its inherent pros and cons. Compared to other models used in this study, the most pros of the polynomial-SVM model is that it has the best performances when combined with measured morphological features. However, it also has its cons side. For instance, to have the best classification performances, parameters tuning is necessary for the polynomial-SVM model. In addition, when dealing with large amounts of datasets, the polynomial-SVM model will be time-consuming [24]. A survey study by Leo et al. indicated that different types of features data could suit for different machine learning models [36]. Our present study demonstrated that, compared to other considered models, the polynomial-SVM model is the best for detecting the changes of the morphological features, however, whether it is also the best for detecting other types of features needs further exploration.

5. Conclusions

In this work, the detection of CAD based on machine learning models with the measured morphological features were proposed. The experimental results demonstrated that among all the considered machine learning models, the polynomial-SVM model performed the best; and, moreover, the exponent of vessel diameter (

n

) and the area expansion ratio (AER) were the two most important features for the detection of CAD; in addition, the combinations of (

n

&

\frac{D_{l}^{3}}{D_{m}^{3}}

) and (

n

&

A E R

) can significantly decrease the dimension of input features without losing much detection accuracy. This study was proposed a new methodology that combined using machine learning techniques and imaging-based morphological measurement methods for the detection of CAD and obtained the high detection accuracy, which could aid the clinicians to detect CAD accurately in a non-invasive way at the early stages. The proposed methodology may also be applied to earlier detection of other diseases related to morphology change, such as carotid artery disease. Moreover, as the change of the morphological features of nervous tissue usually occurred before the disease happened [37], then it also could be potentially used for the early detection of neurological diseases, such as Parkinson’s disease. For future work, we will be committed to developing an automatic method for the measurements of morphological features to speed up the original data acquisition process. Moreover, the proposed method will be applied for more different vascular disease detection to verify the feasibility of our given strategy.

Author Contributions

Conceptualization, X.C., Y.F. (Ying Fang) and J.W.; methodology, X.C., Y.F. (Yi Fu) and J.L.; validation, X.C., Y.F. (Yi Fu), J.L., Y.J., Y.F. (Ying Fang) and J.W.; data curation, X.C., Y.F. (Yi Fu) and J.L.; writing—original draft preparation, X.C.; writing—review and editing, Y.F. (Yi Fu), J.L. and Y.J.; funding acquisition, Y.F. (Ying Fang), J.W. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by National Natural Science Foundation of China 11672109 (Y. Fang), 11432006 (J. Wu), 31771012 (J. Lin) and 31500759 (J. Lin), the Fundamental Research Funds for the Central Universities, SCUT, D2156960 (J. Wu), and the Guangzhou Science Technology Program 201707010062 (J. Lin).

Acknowledgments

We here thank Jingxing Dai (Southern Medical University) and Yueheng Wu (Guangdong General Hospital) provided the original CTA imaging data for this study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The comparison of existing machine learning performances in the detection of CAD with the use of different algorithms and medical features. The comparison results in the table showed that selection of appropriate algorithms and medical features for machine learning had a significant impact on the accuracy of CAD detection.

Study	Year	Machine Learning Algorithms	Medical Features Datasets	Accuracy (%)
Nasarian et al. [12]	2020	Decision tree (DT)	Long beach dataset selected by ADASYN	77.53
		Gaussian Naive Bayes (GNB)		79.44
		Random Forest (RF)		80.45
		Extreme Gradient Boosting (XGBoost)		80.41
		DT	Z-Alizadeh Sani dataset selected by ADASYN	88.12
		GNB		88.72
		RF		91.56
		XGBoost		92.35
Chen et al. [14]	2020	Deep learning (DL)	Coronary computed tomography angiography (CCTA)	87.10
Aouabed et al. [26]	2019	Nested ensemble nu-Support Vector Classification (NE-nu-SVC)	Cleveland	98.60
Subramaniyam et al. [38]	2019	Taylorgradient descent (TGD)-based actor critic neural network (ACNN)	Hungarian	82.55
Zomorodi-moghadam et al. [39]	2019	Hybrid particle swarm optimization (PSO)	Z-Alizadeh Sani	84.2
Li et al. [13]	2019	Dual-input neural network	Electrocardiogram (ECG) and phonocardiogram (PCG) signals	95.62
Betancur et al. [40]	2019	DL	Myocardial perfusion imaging (MPI)	84.8
Abdar et al. [25]	2019	N2Genetic-nuSVM	Z-Alizadeh Sani	93.08
Abdar et al. [25]	2019	NE-nu-SVC	Z-Alizadeh Sani and Cleveland	94.66 and 98.60
Qin et al. [41]	2017	Ensemble algorithm based on multiple feature selection (EA-MFS)	Z-Alizadeh Sani	93.70
Raghavendra et al. [42]	2017	Double Density-Dual Tree DWT (DD-DTDWT)	Ultrasound images	96.05
Alizadehsani et al. [43,44]	2016	Support Vector Machine (SVM)	Z-Alizadeh Sani	86.14
Alizadehsani et al. [43,44]	2013	Sequential Minimal Optimization (SMO)	Z-Alizadeh Sani	94.08
Acharya et al. [45]	2013	Gaussian Mixture Model (GMM)	Echocardiography images	100.00
		DT		88.90
		Fuzzy sugeno (FS)		90.70
		KNN		86.30
		Naive Bayes classifier (NBC)		94.4
		Radial basis probabilistic neural network (RBPNN)		82.4

Table A2 listed classification performances of linear-SVM, polynomial-SVM, and RBF-SVM, by using the parameter optimization method of grid search. The results indicated that the values of the classification performances (training accuracy, testing accuracy, testing sensitivity, and testing specificity) for the specific SVM model were similar in all sampling cases. Moreover, the fluctuations of the classification performances of all the SVM models were within 5%. The fluctuation of these classification performances was even less than 1‰ of the polynomial-SVM model. This suggested that the classification performances were not significantly affected by the data sampling, no matter what SVM algorithm was used. Moreover, the results showed that classification performances of accuracy, sensitivity, and specificity of all the SVM models were over 95%, and the values of polynomial-SVM were the best (almost all were 100%). This suggested that SVM models were very effective and stable to detect CAD by combining the measured morphological features, especially for polynomial-SVM.

Table A2. The classification performances of all SVM models (linear-SVM, polynomial-SVM, and RBF-SVM) with the parameter optimization method of grid search (75% of morphometric datasets for training).

Results	SVM Kernel Functions		A(1,2,3)	A(1,2,4)	A(1,3,4)	A(2,3,4)	Mean	Results	SVM Kernel Functions		A(1,2,3)	A(1,2,4)	A(1,3,4)	A(2,3,4)	Mean
Results	SVM Kernel Functions	CAD	A(1,2,3)	A(1,2,4)	A(1,3,4)	A(2,3,4)	Mean	Results	SVM Kernel Functions	CAD	A(1,2,3)	A(1,2,4)	A(1,3,4)	A(2,3,4)	Mean
Training Accuracy (%)	Linear	B(1,2,3)	96.33	96.56	96.22	96.33	96.36	Testing Accuracy (%)	Linear	B(1,2,3)	96.22	96.22	96.22	95.88	96.14
		B(1,2,4)	96.90	96.67	96.67	95.56	96.45			B(1,2,4)	95.53	95.88	95.88	96.22	95.88
		B(1,3,4)	96.45	96.56	96.33	96.22	96.39			B(1,3,4)	96.91	96.92	97.25	97.25	97.08
		B(2,3,4)	96.56	96.33	96.11	96.79	96.45			B(2,3,4)	94.83	95.17	95.17	95.86	95.26
		Mean	96.56	96.53	96.33	96.25	96.41			Mean	95.87	96.05	96.13	96.30	96.09
	Polynomial	B(1,2,3)	100	100	100	100	100		Polynomial	B(1,2,3)	100	100	100	100	100
		B(1,2,4)	100	100	100	100	100			B(1,2,4)	100	100	99.66	100	99.92
		B(1,3,4)	100	100	100	100	100			B(1,3,4)	100	100	100	100	100
		B(2,3,4)	100	100	100	100	100			B(2,3,4)	100	100	100	100	100
		Mean	100	100	100	100	100			Mean	100	100	99.92	100	99.98
	RBF	B(1,2,3)	100	99.78	99.89	99.89	99.89		RBF	B(1,2,3)	98.97	99.31	99.31	99.66	99.31
		B(1,2,4)	99.31	99.77	99.89	99.89	99.47			B(1,2,4)	97.59	98.26	98.97	98.97	98.45
		B(1,3,4)	100	99.77	99.66	99.31	99.69			B(1,3,4)	99.31	99.66	99.66	99.66	99.57
		B(2,3,4)	99.89	99.77	100	99.89	99.89			B(2,3,4)	98.97	99.66	98.97	100	99.40
		Mean	99.80	99.77	99.61	99.75	99.73			Mean	98.71	99.23	99.23	99.58	99.18
Testing Sensitivity (%)	Linear	B(1,2,3)	98.65	98.65	99.32	100	99.16	Testing Specificity (%)	Linear	B(1,2,3)	93.71	93.71	93.01	91.61	93.01
		B(1,2,4)	98.65	98.65	99.32	100	99.16			B(1,2,4)	92.31	93.01	92.31	92.31	92.49
		B(1,3,4)	98.65	98.65	100	100	99.32			B(1,3,4)	95.10	95.10	94.41	94.41	94.76
		B(2,3,4)	97.97	98.65	99.32	100	98.99			B(2,3,4)	91.55	91.55	90.85	91.55	91.38
		Mean	98.48	98.65	99.49	100	99.16			Mean	93.17	93.34	92.65	92.47	92.91
	Polynomial	B(1,2,3)	100	100	100	100	100		Polynomial	B(1,2,3)	100	100	100	100	100
		B(1,2,4)	100	100	100	100	100			B(1,2,4)	100	100	99.31	100	99.83
		B(1,3,4)	100	100	100	100	100			B(1,3,4)	100	100	100	100	100
		B(2,3,4)	100	100	100	100	100			B(2,3,4)	100	100	100	100	100
		Mean	100	100	100	100	100			Mean	100	100	99.83	100	99.96
	RBF	B(1,2,3)	98.65	99.32	99.32	91.61	97.22		RBF	B(1,2,3)	99.31	99.30	99.30	99.30	99.30
		B(1,2,4)	98.65	98.65	98.65	99.32	98.82			B(1,2,4)	96.50	98.60	99.30	98.60	98.25
		B(1,3,4)	98.65	99.32	99.32	99.32	99.15			B(1,3,4)	100	100	100	100	100
		B(2,3,4)	98.65	99.32	98.65	100	99.16			B(2,3,4)	99.30	100	99.30	100	99.65
		Mean	98.65	95.15	98.99	97.56	98.59			Mean	98.78	99.48	99.48	99.48	99.30

References

Benjamin, E.J.; Muntner, P.; Alonso, A.; Bittencourt, M.S.; Callaway, C.W.; Carson, A.P.; Chamberlain, A.M.; Chang, A.R.; Cheng, S.; Das, S.R.; et al. Heart Disease and Stroke Statistics-2019 Update: A Report From the American Heart Association. Circulation 2019, 139, e1–e473. [Google Scholar] [CrossRef]
Benjamin, E.J.; Blaha, M.J.; Chiuve, S.E.; Cushman, M.; Das, S.R.; Deo, R.; de Ferranti, S.D.; Floyd, J.; Fornage, M.; Gillespie, C.; et al. Heart Disease and Stroke Statistics-2017 Update: A Report From the American Heart Association. Circulation 2017, 135, e146–e603. [Google Scholar] [CrossRef]
Tavakol, M.; Ashraf, S.; Brener, S.J. Risks and complications of coronary angiography: A comprehensive review. Glob. J. Health Sci. 2012, 4, 65–93. [Google Scholar] [CrossRef] [PubMed]
Nie, K.; Chen, J.H.; Yu, H.J.; Chu, Y.; Nalcioglu, O.; Su, M.Y. Quantitative analysis of lesion morphology and texture features for diagnostic prediction in breast MRI. Acad. Radiol. 2008, 15, 1513–1525. [Google Scholar] [CrossRef] [Green Version]
De Filippo, M.; Capasso, R. Coronary computed tomography angiography (CCTA) and cardiac magnetic resonance (CMR) imaging in the assessment of patients presenting with chest pain suspected for acute coronary syndrome. Ann. Transl. Med. 2016, 4, 255. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Carita, P.; Guaricci, A.I.; Muscogiuri, G.; Carrabba, N.; Pontone, G. Prognostic Value and Therapeutic Perspectives of Coronary CT Angiography: A Literature Review. Biomed. Res. Int. 2018, 2018, 6528238. [Google Scholar] [CrossRef] [PubMed]
Huang, S.; Cai, N.; Pacheco, P.P.; Narrandes, S.; Wang, Y.; Xu, W. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genom. Proteom. 2018, 15, 41–51. [Google Scholar] [CrossRef] [Green Version]
Poddar, M.G.; Kumar, V.; Sharma, Y.P. Automated diagnosis of coronary artery diseased patients by heart rate variability analysis using linear and non-linear methods. J. Med. Eng. Technol. 2015, 39, 331–341. [Google Scholar] [CrossRef]
Karimi, M.; Amirfattahi, R.; Sadri, S.; Marvasti, S.A. Noninvasive detection and classification of coronary artery occlusions using wavelet analysis of heart sounds with neural networks. In Proceedings of the 3rd IEE International Seminar on Medical Applications of Signal Processing (Ref. No. 2005-1119), London, UK, 3–4 November 2005; pp. 117–120. [Google Scholar]
Liu, N.; Lin, Z.; Koh, Z.; Huang, G.-B.; Ser, W.; Ong, M.E.H. Patient Outcome Prediction with Heart Rate Variability and Vital Signs. J. Signal Process. Syst. 2011, 64, 265–278. [Google Scholar] [CrossRef]
Asl, B.M.; Setarehdan, S.K.; Mohebbi, M. Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal. Artif. Intell. Med. 2008, 44, 51–64. [Google Scholar] [CrossRef]
Nasarian, E.; Abdar, M.; Fahami, M.A.; Alizadehsani, R.; Hussain, S.; Basiri, M.E.; Zomorodi-Moghadam, M.; Zhou, X.; Pławiak, P.; Acharya, U.R.; et al. Association between work-related features and coronary artery disease: A heterogeneous hybrid feature selection integrated with balancing approach. Pattern Recognit. Lett. 2020, 133, 33–40. [Google Scholar] [CrossRef]
Li, H.; Wang, X.; Liu, C.; Wang, Y.; Li, P.; Tang, H.; Yao, L.; Zhang, H. Dual-Input Neural Network Integrating Feature Extraction and Deep Learning for Coronary Artery Disease Detection Using Electrocardiogram and Phonocardiogram. IEEE Access 2019, 7, 146457–146469. [Google Scholar] [CrossRef]
Chen, M.; Wang, X.; Hao, G.; Cheng, X.; Ma, C.; Guo, N.; Hu, S.; Tao, Q.; Yao, F.; Hu, C. Diagnostic performance of deep learning-based vascular extraction and stenosis detection technique for coronary artery disease. Br. J. Radiol. 2020, 93, 20191028. [Google Scholar] [CrossRef] [PubMed]
Fan, T.; Zhou, Z.; Fang, W.; Wang, W.; Xu, L.; Huo, Y. Morphometry and hemodynamics of coronary artery aneurysms caused by atherosclerosis. Atherosclerosis 2019, 284, 187–193. [Google Scholar] [CrossRef] [PubMed]
Wosiak, A.; Zakrzewska, D. Integrating Correlation-Based Feature Selection and Clustering for Improved Cardiovascular Disease Diagnosis. Complexity 2018, 2018, 2520706. [Google Scholar] [CrossRef]
Kigka, V.I.; Rigas, G.; Sakellarios, A.; Siogkas, P.; Andrikos, I.O.; Exarchos, T.P.; Loggitsi, D.; Anagnostopoulos, C.D.; Michalis, L.K.; Neglia, D.; et al. 3D reconstruction of coronary arteries and atherosclerotic plaques based on computed tomography angiography images. Biomed. Signal Process. Control 2018, 40, 286–294. [Google Scholar] [CrossRef] [Green Version]
Sharma, R.K.; Voelker, D.J.; Sharma, R.K.; Singh, V.N.; Bhatt, G.; Moazazi, M.; Nash, T.; Reddy, H.K. Coronary computed tomographic angiography (CCTA) in community hospitals: “Current and emerging role”. Vasc. Health Risk Manag. 2010, 6, 307–316. [Google Scholar] [CrossRef] [Green Version]
Kaimovitz, B.; Huo, Y.; Lanir, Y.; Kassab, G.S. Diameter asymmetry of porcine coronary arterial trees: Structural and functional implications. Am. J. Physiol. Heart Circ. Physiol. 2008, 294, H714–H723. [Google Scholar] [CrossRef] [Green Version]
Huang, X.; Yin, X.; Xu, Y.; Jia, X.; Li, J.; Niu, P.; Shen, W.; Kassab, G.S.; Tan, W.; Huo, Y. Morphometric and hemodynamic analysis of atherosclerotic progression in human carotid artery bifurcations. Am. J. Physiol. Heart Circ. Physiol. 2016, 310, H639–H647. [Google Scholar] [CrossRef] [Green Version]
Syarif, I.; Prugel-Bennett, A.; Wills, G. SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance. Telkomnika 2016, 14, 1502. [Google Scholar] [CrossRef]
Zhu, Y.; Wu, J.; Fang, Y. Study on application of SVM in prediction of coronary heart disease. J. Biomed. Eng. 2013, 30, 1180–1185. [Google Scholar]
Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 27. [Google Scholar] [CrossRef]
Nalepa, J.; Kawulok, M. Selecting training sets for support vector machines: A review. Artif. Intell. Rev. 2018, 52, 857–900. [Google Scholar] [CrossRef] [Green Version]
Abdar, M.; Książek, W.; Acharya, U.R.; Tan, R.-S.; Makarenkov, V.; Pławiak, P. A new machine learning technique for an accurate diagnosis of coronary artery disease. Comput. Methods Programs Biomed. 2019, 179, 104992–105003. [Google Scholar] [CrossRef] [PubMed]
Abdar, M.; Acharya, U.R.; Sarrafzadegan, N.; Makarenkov, V. NE-nu-SVC: A New Nested Ensemble Clinical Decision Support System for Effective Diagnosis of Coronary Artery Disease. IEEE Access 2019, 7, 167605–167620. [Google Scholar] [CrossRef]
Singh, R.S.; Saini, B.S.; Sunkaria, R.K. Detection of coronary artery disease by reduced features and extreme learning machine. Clujul Med. 2018, 91, 166–175. [Google Scholar] [CrossRef] [Green Version]
Awad, M.; Khanna, R. Support Vector Machines for Classification. In Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers; Awad, M., Khanna, R., Eds.; Apress: Berkeley, CA, USA, 2015; pp. 39–66. [Google Scholar] [CrossRef] [Green Version]
Yu, W.; Liu, T.; Valdez, R.; Gwinn, M.; Khoury, M.J. Application of support vector machine modeling for prediction of common diseases: The case of diabetes and pre-diabetes. BMC Med. Inform. Decis. Mak. 2010, 10, 16. [Google Scholar] [CrossRef] [Green Version]
Patle, A.; Chouhan, D.S. SVM kernel functions for classification. In Proceedings of the 2013 International Conference on Advances in Technology and Engineering (ICATE), Mumbai, India, 23–25 January 2013; pp. 1–9. [Google Scholar]
Rudd, J.M. Application of Support Vector Machine Modeling and Graph Theory Metrics for Disease Classification. Model Assist. Stat. Appl. 2017, 13, 341–349. [Google Scholar] [CrossRef] [Green Version]
Brank, J.; Grobelnik, M.; Milic-Frayling, N.; Mladenic, D. Feature selection using support vector machines. In Data Mining III; Zanasi, A., Ed.; WIT: Southampton, UK, 2002; pp. 261–273. [Google Scholar]
Chen, Y.-W.; Lin, C.-J. Combining SVMs with Various Feature Selection Strategies. Feature Extr. 2005, 207, 315–324. [Google Scholar] [CrossRef] [Green Version]
Cho, B.H.; Yu, H.; Kim, K.W.; Kim, T.H.; Kim, I.Y.; Kim, S.I. Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods. Artif. Intell. Med. 2008, 42, 37–53. [Google Scholar] [CrossRef]
Trivedi, S.; Dey, S.; Shikhar, P. Effect of various kernels and feature selection methods on SVM performance for detecting email spams. Int. J. Comput. Appl. 2013, 66, 18–23. [Google Scholar]
Leo, M.; Carcagnì, P.; Mazzeo, P.L.; Spagnolo, P.; Cazzato, D.; Distante, C. Analysis of Facial Information for Healthcare Applications: A Survey on Computer Vision-Based Approaches. Information 2020, 11, 128. [Google Scholar] [CrossRef] [Green Version]
Pannese, E. Morphological changes in nerve cells during normal aging. Brain Struct. Funct. 2011, 216, 85–89. [Google Scholar] [CrossRef]
Subramaniyam, A.; Mahapatra, R.P.; Singh, P. Taylor and Gradient Descent-Based Actor Critic Neural Network for the Classification of Privacy Preserved Medical Data. Big Data 2019, 7, 176–191. [Google Scholar] [CrossRef]
Zomorodi-moghadam, M.; Abdar, M.; Davarzani, Z.; Zhou, X.; Pławiak, P.; Acharya, U.R. Hybrid particle swarm optimization for rule discovery in the diagnosis of coronary artery disease. Expert Syst. 2019, 2019, 1–17. [Google Scholar] [CrossRef]
Betancur, J.; Hu, L.H.; Commandeur, F.; Sharir, T.; Einstein, A.J.; Fish, M.B.; Ruddy, T.D.; Kaufmann, P.A.; Sinusas, A.J.; Miller, E.J.; et al. Deep Learning Analysis of Upright-Supine High-Efficiency SPECT Myocardial Perfusion Imaging for Prediction of Obstructive Coronary Artery Disease: A Multicenter Study. J. Nucl. Med. 2019, 60, 664–670. [Google Scholar] [CrossRef] [PubMed]
Qin, C.-J.; Guan, Q.; Wang, X.-P. Application of Ensemble Algorithm Integrating Multiple Criteria Feature Selection in Coronary Heart Disease Detection. Biomed. Eng. Appl. Basis Commun. 2017, 29. [Google Scholar] [CrossRef]
Raghavendra, U.; Fujita, H.; Gudigar, A.; Shetty, R.; Nayak, K.; Malpe, U.; Samanth, J.; Acharya, U.R. Automated technique for coronary artery disease characterization and classification using DD-DTDWT in ultrasound images. Biomed. Signal Process. Control 2017, 40, 324–334. [Google Scholar] [CrossRef]
Alizadehsani, R.; Habibi, J.; Hosseini, M.J.; Mashayekhi, H.; Boghrati, R.; Ghandeharioun, A.; Bahadorian, B.; Sani, Z.A. A data mining approach for diagnosis of coronary artery disease. Comput. Methods Programs Biomed. 2013, 111, 52–61. [Google Scholar] [CrossRef]
Alizadehsani, R.; Zangooei, M.H.; Hosseini, M.J.; Habibi, J.; Khosravi, A.; Roshanzamir, M.; Khozeimeh, F.; Sarrafzadegan, N.; Nahavandi, S. Coronary artery disease detection using computational intelligence methods. Knowl. Based Syst. 2016, 109, 187–197. [Google Scholar] [CrossRef]
Acharya, U.R.; Sree, S.V.; Muthu Rama Krishnan, M.; Krishnananda, N.; Ranjan, S.; Umesh, P.; Suri, J.S. Automated classification of patients with coronary artery disease using grayscale features from left ventricle echocardiographic images. Comput. Methods Programs Biomed. 2013, 112, 624–632. [Google Scholar] [CrossRef] [PubMed]

Figure 1. A schematic representation of the coronary bifurcation geometry and the method for morphological data measurement. The detailed information of

D_{m}

,

D_{l}

,

D_{s}

and α were shown in Table 1.

Figure 1. A schematic representation of the coronary bifurcation geometry and the method for morphological data measurement. The detailed information of

D_{m}

,

D_{l}

,

D_{s}

and α were shown in Table 1.

Figure 2. The flow diagram of the machine learning modeling. The morphological data for both non-coronary heart disease (CAD) subjects and CAD subjects were divided into four equal-sized subsets. The subsets of the non-CAD subjects were designated as

A 1

,

A 2

,

A 3

and

A 4

; and the subsets of the CAD subjects were designated as

B 1

,

B 2

,

B 3

and

B 4

. non-CAD: non-coronary artery disease; CAD: coronary artery disease; PSO: particle swarm optimization; SVM: support vector machine.

Figure 2. The flow diagram of the machine learning modeling. The morphological data for both non-coronary heart disease (CAD) subjects and CAD subjects were divided into four equal-sized subsets. The subsets of the non-CAD subjects were designated as

A 1

,

A 2

,

A 3

and

A 4

; and the subsets of the CAD subjects were designated as

B 1

,

B 2

,

B 3

and

B 4

. non-CAD: non-coronary artery disease; CAD: coronary artery disease; PSO: particle swarm optimization; SVM: support vector machine.

Figure 3. (a–d) The classification performances of polynomial-SVM model with various feature dimensions. (a) Training accuracy, (b) testing accuracy, (c) testing sensitivity, and (d) testing specificity.

Table 1. The definitions of the coronary morphological features.

Features	Definitions
$D_{m}$	Diameter of mother vessel.
$D_{l}$	Diameter of larger daughter vessel.
$D_{s}$	Diameter of smaller daughter vessel.
$α$	Bifurcation angle between larger and smaller vessel axes.
$n$	The exponent of vessel diameter: $D_{m}^{n} = D_{l}^{n} + D_{s}^{n}$ .
$\frac{D_{s}^{3}}{D_{m}^{3}}$	Ratio of diameters between smaller daughter and mother vessels.
$\frac{D_{l}^{3}}{D_{m}^{3}}$	Ratio of diameters between larger daughter and mother vessels.
$\frac{D_{s}^{3}}{D_{l}^{3}}$	Ratio of diameters between smaller and larger daughter vessels.
$A E R$	Area expansion ratio: $A E R = \frac{D_{l}^{2} + D_{s}^{2}}{D_{m}^{2}}$ .

Table 2. The comparison of the classification performances of three different SVM models (linear-SVM, polynomial-SVM, and radial basis function (RBF)-SVM) with the use of different parameter setting methods.

Results		Parameter Setting	Default	Grid Search	PSO
	SVM	Methods
	Kernel Functions
Training Accuracy (%)	Linear		90.86	96.56	95.24
	Polynomial		95.19	100	98.99
	RBF		88.81	99.77	97.99
Testing Accuracy (%)	Linear		90.29	96.92	94.88
	Polynomial		95.23	100	99.86
	RBF		88.05	99.66	97.65
Testing Sensitivity (%)	Linear		90.39	98.65	97.94
	Polynomial		91.64	100	99.80
	RBF		88.42	99.32	97.55
Testing Specificity (%)	Linear		87.25	95.10	91.86
	Polynomial		94.09	100	99.83
	RBF		90.09	100	97.51

RBF: radial basis function.

Table 3. The comparison of the classification accuracy among the polynomial-SVM model and other machine learning methods with the use of the same training datasets (75% of morphometric datasets for training).

Methods	LR	DT	LDA	k-NN	ANN	Polynomial-SVM
Accuracy (%)	96.30	97.00	92.30	95.70	98.40	100.00

Here, LR, DT, LDA, k-NN, and ANN represented logistic regression, decision tree, linear discriminant analysis, k-nearest neighbors and artificial neural network classifiers, respectively. The parameter optimization method of the polynomial-SVM applied the grid search.

Table 4. The classification performances of the polynomial-SVM model with 75% of morphometric datasets for training.

Results		A(1,2,3)	A(1,2,4)	A(1,3,4)	A(2,3,4)	Mean
Results	CAD	A(1,2,3)	A(1,2,4)	A(1,3,4)	A(2,3,4)	Mean
Training Accuracy (%)	B(1,2,3)	100	100	100	100	100
	B(1,2,4)	100	100	100	100	100
	B(1,3,4)	100	100	100	100	100
	B(2,3,4)	100	100	100	100	100
	Mean	100	100	100	100	100
Testing Accuracy (%)	B(1,2,3)	100	100	100	100	100
	B(1,2,4)	100	100	99.66	100	99.92
	B(1,3,4)	100	100	100	100	100
	B(2,3,4)	100	100	100	100	100
	Mean	100	100	99.92	100	99.98
Testing Sensitivity (%)	B(1,2,3)	100	100	100	100	100
	B(1,2,4)	100	100	100	100	100
	B(1,3,4)	100	100	100	100	100
	B(2,3,4)	100	100	100	100	100
	Mean	100	100	100	100	100
Testing Specificity (%)	B(1,2,3)	100	100	100	100	100
	B(1,2,4)	100	100	99.31	100	99.83
	B(1,3,4)	100	100	100	100	100
	B(2,3,4)	100	100	100	100	100
	Mean	100	100	99.83	100	99.96

Note:

A (i_{1}, i_{2}, i_{3})

(where

i_{1}, i_{2}, i_{3} \in [1, 4] a n d i_{1} < i_{2} < i_{3}

) represented the features data from the non-CAD subjects; while

B (j_{1}, j_{2}, j_{3})

(where

j_{1}, j_{2}, j_{3} \in [1, 4] a n d j_{1} < j_{2} < j_{3}

) represented the features data from the CAD subjects. The data combinations of

A (i_{1}, i_{2}, i_{3})

and

B (j_{1}, j_{2}, j_{3})

represented the training datasets.

Table 5. The classification performances of the polynomial-SVM model with the use of different training data volumes.

Case 1	Results		C1	C2	Mean	Case 2	Results		A1	A2	Mean
Case 1	Results	CAD	C1	C2	Mean	Case 2	Results	CAD	A1	A2	Mean
50% Datasets for Training	Training Accuracy (%)	D1	100	100	100	25% Datasets for Training	Training Accuracy (%)	B1	100	100	100
		D2	100	100	100			B2	100	100	100
		Mean	100	100	100			Mean	100	100	100
	Testing Accuracy (%)	D1	99.83	100	99.92		Testing Accuracy (%)	B1	84.88	97.59	91.24
		D2	100	100	100			B2	87.29	92.78	90.04
		Mean	99.92	100	99.96			Mean	86.09	95.19	90.64
	Testing Sensitivity (%)	D1	99.66	100	99.83		Testing Sensitivity (%)	B1	73.65	96.62	85.14
		D2	100	100	100			B2	81.08	92.57	86.83
		Mean	99.83	100	99.92			Mean	77.37	94.60	85.98
	Testing Specificity (%)	D1	100	100	100		Testing Specificity (%)	B1	96.50	98.60	97.55
		D2	100	100	100			B2	93.71	93.01	93.36
		Mean	100	100	100			Mean	95.11	95.81	95.46

Note: Case 1 and Case 2 indicated the 50% and 25% volume of morphometric data for training, respectively.

Table 6. The comparison of the classification performances of the polynomial-SVM model for each morphological feature. The morphological feature

n

exhibited the best performance among the six morphological features, followed by

A E R

, while the

\frac{D_{s}^{3}}{D_{m}^{3}}

had the worst.

Table 6. The comparison of the classification performances of the polynomial-SVM model for each morphological feature. The morphological feature

n

exhibited the best performance among the six morphological features, followed by

A E R

, while the

\frac{D_{s}^{3}}{D_{m}^{3}}

had the worst.

Variables	Training Accuracy (%)	Testing Accuracy (%)	Testing Sensitivity (%)	Testing Specificity (%)
$n$	61.63	62.41	48.65	76.76
$A E R$	54.75	55.52	75.00	35.21
$a$	52.69	55.17	77.03	32.39
$\frac{D_{l}^{3}}{D_{m}^{3}}$	51.89	51.38	33.78	69.72
$\frac{D_{s}^{3}}{D_{l}^{3}}$	49.94	47.59	33.78	69.73
$\frac{D_{s}^{3}}{D_{m}^{3}}$	40.09	38.28	17.57	58.86

Table 7. The comparison of the classification performances of the plynomial-SVM model for each combination with two features. The combinations of (

n

&

A E R

) and (

n

&

\frac{D_{l}^{3}}{D_{m}^{3}}

) performed better than the other combinations.

Table 7. The comparison of the classification performances of the plynomial-SVM model for each combination with two features. The combinations of (

n

&

A E R

) and (

n

&

\frac{D_{l}^{3}}{D_{m}^{3}}

) performed better than the other combinations.

Combinations	Training Accuracy (%)	Testing Accuracy (%)	Testing Sensitivity (%)	Testing Specificity (%)
$n$ , $A E R$	90.95	89.31	81.08	97.89
$n$ , $\frac{D_{l}^{3}}{D_{m}^{3}}$	87.97	87.93	82.43	93.66
$n$ , $\frac{D_{s}^{3}}{D_{m}^{3}}$	66.78	64.83	33.78	97.18
$a$ , $A E R$	61.40	62.76	69.59	55.63
$\frac{D_{s}^{3}}{D_{m}^{3}}$ , $\frac{D_{l}^{3}}{D_{m}^{3}}$	59.91	61.38	40.54	83.10
$n$ , $a$	65.75	61.03	48.65	73.94
$n$ , $\frac{D_{s}^{3}}{D_{l}^{3}}$	59.91	60.34	53.38	67.61
$\frac{D_{l}^{3}}{D_{m}^{3}}$ , $\frac{D_{s}^{3}}{D_{l}^{3}}$	59.68	59.97	20.27	99.30
$\frac{D_{l}^{3}}{D_{m}^{3}}$ , $A E R$	58.88	57.59	44.59	71.13
$a$ , $\frac{D_{s}^{3}}{D_{m}^{3}}$	56.01	55.51	23.65	88.73
$a$ , $\frac{D_{s}^{3}}{D_{l}^{3}}$	52.69	54.48	27.70	82.39
$\frac{D_{s}^{3}}{D_{l}^{3}}$ , $A E R$	56.24	50.07	75.00	28.17
$\frac{D_{s}^{3}}{D_{m}^{3}}$ , $A E R$	50.63	49.31	11.49	88.73
$a$ , $\frac{D_{l}^{3}}{D_{m}^{3}}$	44.33	48.28	61.49	34.51
$\frac{D_{s}^{3}}{D_{m}^{3}}$ , $\frac{D_{s}^{3}}{D_{l}^{3}}$	44.79	43.10	22.30	64.79

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Fu, Y.; Lin, J.; Ji, Y.; Fang, Y.; Wu, J. Coronary Artery Disease Detection by Machine Learning with Coronary Bifurcation Features. Appl. Sci. 2020, 10, 7656. https://doi.org/10.3390/app10217656

AMA Style

Chen X, Fu Y, Lin J, Ji Y, Fang Y, Wu J. Coronary Artery Disease Detection by Machine Learning with Coronary Bifurcation Features. Applied Sciences. 2020; 10(21):7656. https://doi.org/10.3390/app10217656

Chicago/Turabian Style

Chen, Xueping, Yi Fu, Jiangguo Lin, Yanru Ji, Ying Fang, and Jianhua Wu. 2020. "Coronary Artery Disease Detection by Machine Learning with Coronary Bifurcation Features" Applied Sciences 10, no. 21: 7656. https://doi.org/10.3390/app10217656

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Coronary Artery Disease Detection by Machine Learning with Coronary Bifurcation Features

Abstract

1. Introduction

2. Materials and Methods

2.1. Morphometric Features Data Collection and Selection

2.2. Machine Learning Modeling Processes and Algorithms Evaluation

2.3. Features Evaluation

2.3.1. Effects of Data Sampling

2.3.2. Effects of the Volume of Training Dataset

2.3.3. Effects of the Dimension of Input Features

2.4. Models Running Approaches

3. Results

3.1. Polynomial-SVM Model with Grid Search Optimization Showed the Best Performance in the Detection of CAD

3.2. The Performance of the SVM Models Was Not Affected by Data Sampling

3.3. Adequate Training Data Volume Was Necessary and Sufficient to Obtain High Detecting Performance

3.4. The Effect of the Dimension and Combination of Morphological Features on the Classification Performances

3.5. Bifurcation Diameter Exponent ( $n$ ) and Area Expansion Ratio (AER) Were Two Key Features for the CAD Detection

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Coronary Artery Disease Detection by Machine Learning with Coronary Bifurcation Features

Abstract

1. Introduction

2. Materials and Methods

2.1. Morphometric Features Data Collection and Selection

2.2. Machine Learning Modeling Processes and Algorithms Evaluation

2.3. Features Evaluation

2.3.1. Effects of Data Sampling

2.3.2. Effects of the Volume of Training Dataset

2.3.3. Effects of the Dimension of Input Features

2.4. Models Running Approaches

3. Results

3.1. Polynomial-SVM Model with Grid Search Optimization Showed the Best Performance in the Detection of CAD

3.2. The Performance of the SVM Models Was Not Affected by Data Sampling

3.3. Adequate Training Data Volume Was Necessary and Sufficient to Obtain High Detecting Performance

3.4. The Effect of the Dimension and Combination of Morphological Features on the Classification Performances

3.5. Bifurcation Diameter Exponent ( n ) and Area Expansion Ratio (AER) Were Two Key Features for the CAD Detection

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.5. Bifurcation Diameter Exponent ( $n$ ) and Area Expansion Ratio (AER) Were Two Key Features for the CAD Detection