A Machine Learning-Based Approach for the Prediction of Cardiovascular Diseases

Kamireddy, Rasool Reddy; Darapureddy, Nagadevi

doi:10.3390/ASEC2023-16352

Open AccessProceeding Paper

A Machine Learning-Based Approach for the Prediction of Cardiovascular Diseases^†

by

Rasool Reddy Kamireddy

^1,*

and

Nagadevi Darapureddy

²

¹

Department of Electronics & Communication Engineering (ECE), Malla Reddy College of Engineering and Technology, Hyderabad 500100, India

²

Department of Electronics & Communication Engineering (ECE), Chaitanya Bharathi Institute of Technology, Hyderabad 500075, India

^*

Author to whom correspondence should be addressed.

^†

Presented at the 4th International Electronic Conference on Applied Sciences, 27 October–10 November 2023; Available online: https://asec2023.sciforum.net/.

Eng. Proc. 2023, 56(1), 140; https://doi.org/10.3390/ASEC2023-16352

Published: 27 November 2023

(This article belongs to the Proceedings of The 4th International Electronic Conference on Applied Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

Heart and blood vessel disorders are referred to as cardiovascular diseases (CVDs). It is one of the leading global causes of death and consists of many disorders that harm the cardiovascular system. The World Health Organization (WHO) estimates that in 2019, 18 million deaths worldwide were caused by CVDs, accounting for about 32% of all deaths. Therefore, the early detection and prediction of cardiovascular disease can be beneficial in identifying high-risk individuals and enabling timely interventions to reduce the disease’s impact and improve patient outcomes. This study provides a machine learning (ML)-based framework CVD detection to satisfy this criterion. The proposed model includes data preprocessing, hyperparameter optimization using GridSearchCV, and classification using supervised learning approaches, such as support vector machine (SVM), K-nearest neighbors (KNN), XGBoost, random forest (RF), LightBoost (LB), and stochastic gradient descent (SGD). All these models are carried out on the publicly accessed database, namely Kaggle. The experimental results demonstrate that the suggested ML technique has attained a 92.76% detection rate with the SGD classifier on the 80:20 training/testing ratios, which is superior to the well-received approaches.

Keywords:

cardiovascular diseases; machine learning; hyperparameter optimization; supervised learning approaches

1. Introduction

The term “cardiovascular diseases” (CVDs) refers to a group of medical illnesses that affect how the heart and blood vessels function. They are a major issue for global health and one of the leading causes of death worldwide [1]. Cardiovascular diseases encompass a wide range of conditions, but the most common ones include coronary artery disease (CAD) [2], heart failure, stroke, arrhythmias [3], hypertension, and peripheral artery disease (PAD) [4]. The key elements of the cardiovascular system are the heart, blood vessels, and blood. The heart functions as a pump, circulating blood throughout the body and supplying vital nutrients and oxygen to organs and tissues. The blood vessels serve as a network of highways, delivering oxygen-rich blood from the heart to the rest of the body and returning oxygen-depleted blood to the heart. The typical symptoms of CVDs are chest pain, shortness of breath, fatigue, dizziness, palpitations, and swelling in the legs and ankles depending on the particular ailment.

Risk Factors

Cardiovascular diseases are more likely to occur when a number of risk factors are present. Age, family history, smoking, unhealthy diet, physical inactivity, obesity, high blood pressure, diabetes, and high cholesterol are some of the most prevalent risk factors. Addressing these risk factors through lifestyle modifications and medical interventions can significantly reduce the likelihood of developing CVDs. In addition, the early detection and management of risk factors are crucial in reducing the impact of CVDs on individuals and communities. Recently, researchers concentrated on the machine learning and convolution neural network (CNN) models to do this. In the subsequent sections, we discuss a few recently developed models for the early diagnosis of CVDs.

2. Related Works

From the last few years, researchers have developed numerous methods for predicting CVDs using machine learning (ML) and deep learning (DL). Here, we discussed a few recently developed approaches.

Rubini PE et al. [5] developed an ML model for detecting CVDs, and they achieved 84.71% classification accuracy with the random forest (RF) classifier. Abdullah Alqahtani et al. [6] implemented an ensemble learning-based ML and DL approach. Through this process, the authors obtained 88.7% prediction accuracy. Chintan M. Bhatt et al. [7] proposed an enhanced approach to identify the CVDs using K-mode clustering and multilayer perceptron, and they reached an approximately 87.28% detection rate.

Yaumi et al. [8] suggested a hybrid feature selection model based on Q-learning, bee swarm optimization (BSO), and support vector machine (SVM) classifiers. With this process, the authors yielded an accuracy of 73%. Waigi R et al. [9] presented an advanced ML framework for detecting CVDs, and they attained 72.77% accuracy with the decision tree (DT) classifier. Shorewall et al. [10] employed a stacking model for the identification of CVDs, and they obtained 75.1% classification accuracy. Atharv Nikam et al. [11] developed a ML-based model to diagnosis CVDs, and they achieved 73.13% accuracy on DT learning approach.

From the above literature, we observed that most of the approaches attained a low performance. Therefore, we proposed an enhanced ML model to predict CVDs by conducting hyperparameter tuning.

The rest of the work is summarized: Section 3 illustrates the analysis of the presented methodology. Section 4 describes the simulation outcomes and discussions, and, finally, Section 5 represents the conclusion and future scope of the study.

3. Materials and Methods

Figure 1 represents the flow diagram of the suggested technique for automatic screening of CVDs, which includes preprocessing, hyperparameter tuning, and classification.

3.1. Materials

In this work, we utilized the database described in [12], which included 70,000 records with three types of feature categories with eleven distinct features, such as objective, examination, and subjective. Figure 2 illustrates the description of the data attributes.

3.2. Data Preprocessing

Data preprocessing is an essential phase in the ML pipeline that involves transforming raw data into a format that is suitable and optimal for training a machine learning model. Therefore, proper data preprocessing can significantly increase the accuracy of the resulting framework. In this work, we removed the similar records and some suspicious attributes, for example: in blood pressure features, we removed the “ap_hi” and “ap_lo” with negative, abnormally low, and abnormally high values. Through this process, we removed 1587 data attributes from our original database.

3.3. Classification

Classification is a fundamental task in data science and machine learning that involves categorizing or labeling data into predefined classes or categories based on their features. The major objective of classification is to develop a methodology that can identify the class label of new, unseen instances based on the patterns and relationships learned from the training data. Therefore, it was used in various applications, such as spam detection, disease diagnosis, sentiment analysis, image recognition, etc.

In this article, we analyzed various supervised ML models for the prognosis of CVDs, such as SVM, K-nearest neighbors (KNN), XGBoost, RF, LightBoost (LB), and stochastic gradient descent (SGD). However, the accuracy of all these classifiers relatively relies on the hyperparameters, including cost function, number of neighbors and estimators, distance measures, etc. Hence, in this study, we applied GridSearchCV with 5-fold cross-validation.

GridSearchCV stands for grid search cross-validation. It is a commonly used technique in machine learning to systematically search for the best combination of hyperparameters for a given algorithm. Hyperparameters are parameters that are not learned during training but are set before training and affect the behavior of the algorithm. GridSearchCV automates the process of trying out different combinations of hyperparameters and evaluating their performance using cross-validation (CV). Here, CV helped to ensure that the model’s performance was assessed on multiple subsets of the data to avoid overfitting. Table 1 represents the hyperparameters utilized in this work.

4. Results and Discussion

To evaluate the performance of the presented ML approaches, we partition the given data into 80% training (54,730) and 20% testing (13,683) sets and validated them through various familiar evolution measures, such as the sensitivity, specificity, precision, F1 score, area under the curve (ROC), and accuracy.

The proposed models’ training and testing approaches were carried out in Python 3 using a high-level TensorFlow application programming interface, such as Keras, with scikit learn and run on the Colaboratory (Colab) GPU accelerator designed by Google researchers with 12GB RAM.

Table 2 illustrates the performance of the presented ML models using the hyperparameters defined in Table 1. From these, we observed that all the models yield a low-sensitivity value compared to other metrics, which means that our models perform well on negative samples (non-CVDs). Among all the classifiers, the KNN obtained a low classification accuracy compared to other state-of-the-art techniques with a 72.13% value. Similarly, we also identified that the SGD classifier achieved high prediction accuracy in contrast to existing approaches with 92.76%.

The accuracy of the implemented scheme is compared with the findings of recent studies, which are presented in Table 3. From this it was found that the implemented SGD classifier achieved an accuracy of 92.76%, surpassing the performance of previous methodologies by approximately 4.1%. This observed improvement was notably substantial, particularly in disease prediction. Therefore, the presented model can serve as a predictive tool in clinical analysis, aiding doctors in the prognosis of subjects with CVDs.

5. Conclusions and Future Scope

A primary cause of death worldwide, CVDs are one of the most common diseases. A timely diagnosis can aid in stopping the disease’s progression. Therefore, we proposed a technique to detect CVDs. The suggested technique has achieved 92.76% accuracy with the SGD classifier, which is higher than the existing models. Therefore, our model can be utilized as a decision support tool for the analysis of CVDs. In the future, we will improve the detection accuracy of our suggested method by implementing a deep learning model. In addition, if anyone has a cardiovascular disease, we will also focus on the type of heart disease they have.

Author Contributions

R.R.K.: Conceptualization, Methodology, Formal Analysis, and Supervision and Software; N.D.: Writing—Original Draft, Review, and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

The research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the data and materials are available with us for this research paper.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 15 April 2023).
Ahmad, M.; Batcha, D.M.S. Coronary Artery Disease Research in India: A Scientometric Assessment of Publication during 1990–2019. arXiv 2021, arXiv:2102.11739. [Google Scholar] [CrossRef]
Odutayo, A.; Wong, C.X.; Hsiao, A.J.; Hopewell, S.; Altman, D.G.; Emdin, C.A. Atrial fibrillation and risks of cardiovascular disease, renal disease, and death: Systematic review and meta-analysis. BMJ 2016, 354, i4482. [Google Scholar] [CrossRef]
Olin, J.W.; Sealove, B.A. Peripheral artery disease: Current insight into the disease and its diagnosis and management. Mayo Clin. Proc. 2010, 85, 678–692. [Google Scholar] [CrossRef] [PubMed]
RuRubini, P.E.; Subasini, C.A.; Katharine, A.V.; Kumaresan, V.; Kumar, S.G.; Nithya, T.M. A Cardiovascular Disease Prediction Using Machine Learning Algorithms. Ann. Rom. Soc. Cell Biol. 2021, 25, 904–912. [Google Scholar]
Alqahtani, A.; Alsubai, S.; Sha, M.; Vilcekova, L.; Javed, T. Cardiovascular disease detection using ensemble learning. Comput. Intell. Neurosci. 2022, 2022, 5267498. [Google Scholar] [CrossRef] [PubMed]
Bhatt, C.M.; Patel, P.; Ghetia, T.; Mazzeo, P.L. Effective heart disease prediction using machine learning techniques. Algorithms 2023, 16, 88. [Google Scholar] [CrossRef]
Fajri, Y.A.; Wiharto, W.; Suryani, E. Hybrid Model Feature Selection with the Bee Swarm Optimization Method and Q-Learning on the Diagnosis of Coronary Heart Disease. Information 2022, 14, 15. [Google Scholar] [CrossRef]
Waigi, D.; Choudhary, D.S.; Fulzele, D.P.; Mishra, D. Predicting the risk of heart disease using advanced machine learning approach. Eur. J. Mol. Clin. Med. 2020, 7, 1638–1645. [Google Scholar]
Shorewala, V. Early detection of coronary heart disease using ensemble techniques. Inform. Med. Unlocked 2021, 26, 100655. [Google Scholar] [CrossRef]
Nikam, A.; Bhandari, S.; Mhaske, A.; Mantri, S. Cardiovascular disease prediction using machine learning models. In Proceedings of the 2020 IEEE Pune Section International Conference (PuneCon), Pune, India, 16 December 2020; pp. 22–27. [Google Scholar]
Kaggle. Available online: https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset (accessed on 5 April 2023).

Figure 1. Block diagram of the proposed mythology.

Figure 2. Block description of the database attributes.

Table 1. Hyperparameters utilized in this work.

Classifier	Parameters	Values	Optimal Parameter Values
RF	Max depth Min samples leaf Min samples split Estimators Max features Criterion	5, 10, 15 50, 100, 150 50, 100, 150 100, 200, 300, 400, 500 ‘auto’, ‘qrt’, ‘log2’ ‘gini’, ‘entropy’	Criterion: entropy; Max depth: 5; Max features: sqrt; Min samples leaf: 50; Min samples split: 100; Estimators: 500
XGBoost	Max depth Estimators Learning rate Subsample	2, 3, 4, 5, 6 100, 200 0.1, 0.2, 0.3, 0.4, 0.5, 0.6 0.3, 0.6, 0.9	Max depth: 2; Estimators: 200; Learning rate = 0.2; Subsample = 0.9
LB	Estimators Learning rate Max number of splits	50, 100, 200 0.1, 0.2, 0.3, 0.4, 0.5, 0.6 10, 20, 50, 100	Estimators: 50; Learning rate = 1; Max number of splits = 50
SVM	Kernel C Gamma	rbf, poly 1, 10, 100, 1000 1, 0.1, 0.001, 0.0001	Kernel: rbf; C = 100; gamma = 0.0001
KNN	Neighbors Weights Metric	4, 5, 7, 9, 11, 13, 15, 17, 19 uniform’,’distance’ ‘minkowski’,’euclidean’, ‘manhattan’	Metric: manhattan; Neighbors: 19; Weights: uniform
SGD	Loss Alpha Penalty	“hinge”, “log”, “squared_hinge”, “modified_huber”, “perceptron” 0.0001, 0.001, 0.01, 0.1 “l2”, “l1”, “elasticnet”, “none”	Alpha: 0.001; Loss: log; Penalty: l2

Table 2. Performance of the suggested ML models.

Classifier	Evaluation Measures (%)
Classifier	Sensitivity	Specificity	Precision	F1-Score	AUC	Accuracy
RF	65.3	79.88	76.14	70.3	72.59	72.65
XGBoost	67.33	78	75.05	70.98	72.66	72.71
LB	68.11	78.44	75.65	71.68	73.27	73.32
SVM	66.16	78.98	75.58	70.56	72.57	72.63
KNN	68.42	75.78	73.52	70.88	72.1	72.13
SGD	92.08	93.29	91.65	91.86	92.68	92.76

Table 3. Comparison between the existing and proposed approaches.

Technique	Accuracy (%)
RF [5]	84.71
Ensemble [6]	88.7
MLP [7]	87.28
BSO-SVM [8]	73
DT [9]	72.77
Stacking Model [10]	75.1
DT [11]	73.13
The Proposed Model	92.76

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kamireddy, R.R.; Darapureddy, N. A Machine Learning-Based Approach for the Prediction of Cardiovascular Diseases. Eng. Proc. 2023, 56, 140. https://doi.org/10.3390/ASEC2023-16352

AMA Style

Kamireddy RR, Darapureddy N. A Machine Learning-Based Approach for the Prediction of Cardiovascular Diseases. Engineering Proceedings. 2023; 56(1):140. https://doi.org/10.3390/ASEC2023-16352

Chicago/Turabian Style

Kamireddy, Rasool Reddy, and Nagadevi Darapureddy. 2023. "A Machine Learning-Based Approach for the Prediction of Cardiovascular Diseases" Engineering Proceedings 56, no. 1: 140. https://doi.org/10.3390/ASEC2023-16352

Article Menu

A Machine Learning-Based Approach for the Prediction of Cardiovascular Diseases^†

Abstract

1. Introduction

Risk Factors

2. Related Works

3. Materials and Methods

3.1. Materials

3.2. Data Preprocessing

3.3. Classification

4. Results and Discussion

5. Conclusions and Future Scope

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Machine Learning-Based Approach for the Prediction of Cardiovascular Diseases †

Abstract

1. Introduction

Risk Factors

2. Related Works

3. Materials and Methods

3.1. Materials

3.2. Data Preprocessing

3.3. Classification

4. Results and Discussion

5. Conclusions and Future Scope

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

A Machine Learning-Based Approach for the Prediction of Cardiovascular Diseases^†