Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models

Padhy, Neelamadhab

doi:10.3390/ecsa-10-16239

Open AccessProceeding Paper

Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models^†

by

Neelamadhab Padhy

School of Engineering and Technology, Department of Computer Science and Engineering, GIET University, Gunupur 765022, Odisha, India

^†

Presented at the 10th International Electronic Conference on Sensors and Applications (ECSA-10), 15–30 November 2023; Available online: https://ecsa-10.sciforum.net/.

Eng. Proc. 2023, 58(1), 73; https://doi.org/10.3390/ecsa-10-16239

Published: 15 November 2023

(This article belongs to the Proceedings of The 10th International Electronic Conference on Sensors and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The Internet of Things (IoT) and sensor networks are used for structural health monitoring (SHM). This study aimed to create a model for predicting cardiac disease using sensor networks, the IoT, and machine learning. Through wearable sensors, physiological data, such as heart rate, blood pressure, and oxygen saturation levels, were collected from patients. Data were subsequently processed and translated into an analysis-ready format. The most important predictors of heart disease were identified using feature selection techniques. Accuracy, precision, recall, F1-score, etc., were used to evaluate the performance of the proposed model. An SVM obtained the highest accuracy 93.87%.

Keywords:

sensory data; CVD; machine learning; performance analysis

1. Introduction

In the 21st century, heart disease is one of the important causes of death in the world. Several researchers have discussed HDP using machine learning. However, early prediction and detection are important components in the healthcare sector. Wearable sensors play a crucial role in tracking physiological variables like heart rate, blood pressure, pulse oximeters, activity sensors, temperature sensors, respiration sensors, and electrocardiogram (ECG) signals. With the help of these sensors, a heart disease system can be properly monitored, and any irregularities can be easily identified. Machine learning algorithms can be suitable in this situation to create predictive models that can identify those who have a high risk of acquiring heart disease and allow for early therapies. Structural health monitoring is one of the leading technologies in the health industry, and focuses on the sensors that monitor health issues. SHM allows monitoring of the health of the cardiovascular system and detects any structural abnormalities or defects. A ML algorithm effectively finds unseen data in the database and provides an early warning signal to patients and healthcare providers. This can enable timely interventions, such as lifestyle changes, medication, or surgery, and improve patient outcomes. The objective of this project was to develop a predictive model for heart disease using sensor-based monitoring and machine learning algorithms. The model was trained using a dataset of physiological parameters collected from patients using wearable sensors. The model’s effectiveness was assessed using established benchmarks like accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic (ROC) curve. The ultimate objective is to create a sturdy and dependable model that can anticipate the probability of heart disease in patients, allowing for timely interventions that can enhance patient results and decrease healthcare expenses. The main motivation of this article is to enhance HDP models’ accuracy using sensory data. Sensory data are electrocardiography (ECG) and blood tests, which are often limited by their reliance on intermittent measurements and subjective interpretation. These data are provided continuously and analyzed using novel classification approaches. These algorithms provide patterns and predict outcomes. The goal is to establish a heart disease prediction model employing sensor-based monitoring of the model’s accuracy.

This paper includes four sections. Section 1 introduces structural health monitoring systems along with sensory devices. Section 2 provides insight into heart disease prediction through machine learning classifiers. Section 3 describes the proposed model and is followed by Section 4, results and discussion.

2. Literature Review

Many wearable sensor-based systems have been recently proposed to enhance the process of predicting heart disease. Tang, C. et al. [1] used mechanical sensors that allowed the monitoring of heart diseases. Their objective was to check pulse waves, heart rhythms, and BP. They also focused real-time sensory data for CVD prediction. Lin, J. et al. [2] conducted a systematic literature review (SLR) about various physiological signals for cardio problems. They also provided future direction. The authors Esther, G. M. et al. [3] developed one android App that monitors a patient’s record and provides information to the patient’s doctors. Finally, they integrated into the cloud, where machine learning predictions occurred. Salvi, S. et al.’s [4] research used machine learning to present a combined hybrid feature selection and classification strategy for heart disease prediction in a cloud-based IoT healthcare system. Kumar, R. et al.’s [5] study examined the application of linear regression as a machine learning approach for air quality time-series predictions. This study employed sensor data from three separate locations in Delhi and the National Capital Region to estimate air quality for the following day using linear regression as a machine learning approach, and the findings were significant. Kumar, P. M. et al. [6] applied various machine learning methods to predict heart disease, and logistic regression with majority voting achieved 88.59% accuracy, which is superior to that of previous techniques. A framework for dependable at-home assistance in healthcare was proposed by Hongxu and Niraj [7]. The primary focus of this architecture was data transmission between installed servers located in patient homes and distant hospitals. Daniele et al.’s [8] Their main finding was to increase the accuracy of the classifiers using deep learning techniques. They have proposed deep learning algorithms for enhancing accuracy and demonstrated their method that performs well in real-time on-node processing on smartphones and a wearable sensor platform. In their line of work, these gadgets serve as data collection points.

3. Proposed Model

The model consists into 4 different phases.

Phase 1: We developed the proposed model for heart disease prediction using machine-learning classification algorithms. These are some of the steps followed: Data was collected from different sources through sensors. We deployed sensors (heart rate and blood pressure sensors, pulse oximeters, activity sensors, temperature sensors, respiration sensors, and electrocardiogram (ECG) signals) into the patient’s body for testing.
Phase 2: The major steps were data cleansing, normalization, and feature engineering (data preparation and analysis (DPA)). The objective of this step was to train the model. During DPA, we also identified relevant features in the dataset and then performed standard statistical tests and correlation analysis. When feature engineering was completed, we used novel machine learning classification algorithms (RF, DTC, K-NN, SVM, GNB, AdaBoost, Bagging, KNN, and LR) for CVD prediction. Figure 1 presents the structural healthcare monitoring model for cardiovascular prediction. In our proposed model we used machine learning classification algorithms like random forest, decision tree, K-NN, Gussian Naive Bayes, AdaBoost, bagging, and logistic regression are examples of ensemble learning algorithm that generates multiple numbers of decision trees during training and produces output of the predicted classes of the individual trees. Normally, this classifier is satisfactory when we are dealing with high-dimensional data, as well as when found missing values in the dataset. A decision tree (DT) is one of the classifiers that allow splitting data into different homogeneous sets based on good features. It is simple because this classifier can handle both numerical and categorical data. K-NN is used for the same purpose, and classifies the unseen heart disease instances of a patient. A support vector machine is one of the powerful classifiers that finds the best hyperplane and separates data into different classes. It performs well using high-dimensional data and is capable of handling both linear and non-linear data. Gussian Naive Bayes (GNB) is a probabilistic method that computes the likelihood of each class given the input data and chooses the class with the highest likelihood. It is simple and quick, and can handle data with multiple dimensions. AdaBoost is another type of ensemble learning technique which combines different weak classifiers to create another strong classifier. As our dataset was large, we used this algorithm to handle the imbalance issue, which can enhance the performance of weak classifiers. Bagging is also an ensemble learning technique that generates different subsets of training data and trains a classifier using each subset. We used bagging classifiers to address overfitting and enhance the overall performance of our model. This technique involves training multiple models on different subsets of the data and combining their predictions to achieve better generalization to new, unseen data. Logistic regression was also used for classification purposes to estimate the probability of a binary outcome. Mutual information feature selection (MIFS) is a technique that chooses relevant features by evaluating their mutual information using the target variable. MIFS can be particularly beneficial when working with sensor data because it can identify features that have a strong correlation with the target variable. This method evaluates how much information a feature provides about the target variable and selects those that have a high degree of mutual information. Using MIFS, it is possible to select the most relevant features and reduce the dimensionality of a dataset. This can improve the accuracy and efficiency of heart disease prediction models that use sensor data. Figure 1 shows the proposed structural healthcare monitoring model for cardiovascular prediction.
Phase 3: In this phase, we mainly trained the model and tested its effectiveness. The best model was identified during the training phase, and the model, after being trained, faced a challenging task of being tested on unfamiliar data. Performance metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic (ROC) curve were evaluated. The required hyper parameters were used to enhance CVD prediction accuracy.
Phase 4: The model was validated using performance evaluation parameters on the test dataset, which we originally collected from patients through sensory data to ensure that it was not over fitting to the training data validation required.

4. Results and Discussion

Table 1 presents analyses of classification models. Here, we compare the performance metrics of the ensemble classification models, including accuracy, precision, recall, F1-score, true-negative rate (TNR), and true-positive rate (TPR). Nine classification models were created to implement the proposed model: random forest (RF), decision tree classifier (DTC), K-nearest neighbour (K-NN), support vector machine (SVM), Gaussian Naive Bayes (GNB), AdaBoost, bagging, KNN, and logistic regression (LR) models. Our observations were that the SVM achieved the best accuracy (93.87%). Similarly, when, considering the TPR, LR achieved an accuracy of 90.20%. Table 2 shows the confusion matrices for the various classifiers.

The Critical Observation

In the results mentioned above, the highest accuracy obtained is SVM, while AdaBoost had the lowest accuracy; accuracies ranged from 85.20% to 93.87%. Similarly, when we considered the AUC (area under the curve), it ranged from 0.85 to 0.93. The TPR and TNR are important components of any type of classifier. The highest TPR and TNR were found using the SVM and LR methods, whereas the lowest TPR was found using the DTC. Similarly, when we considered the F1-score as well as precision, the SVM obtained the highest classifier that could find positive cases.

Table 3 presents error analyses for several machine learning algorithms. We used nine models to indicate performance in Table 1 and Table 2, as well for errors in Table 3. MSE and RMSE are considered the best error measurements for the classification model. Based on results presented in Table 3, it was concluded that the SVM had the lowest MSE and RMSE, which indicates that the SVM was the most accurate model for heart disease prediction. The model with the greatest MSE and the second-highest RMSE was LR, indicating that LR was the least accurate model.

The dataset is one type of binary class classification nature and performed AUC = 0.94, which perfectly classified the positive and negative classifications. The ranges of AUC value start from zero to one. Here, zero indicates the random classifier and one indicates the perfect classifier. Figure 2 shows an AUC value of 0.94, and the model suggested a good level of discrimination between positive and negative classes. This graph is meant to illustrate the performance of the binary classifier. The below mentioned-Figure 2 represents the true positive vs true negative values and it is represented in the form of ROC curve. Similarly Figure 3 CVD prediction through boxplot representation.

The above Figure 4 demonstrates the classification results for heart disease prediction. Regarding precision metrics, our results demonstrated that the highest result, i.e., 93.33%, indicated a low false-positive rate. The above figure represents the normalized confusion matrix for the heart disease classification task. This graph has a square matrix in which columns represent predicted classes and rows represent actual classes. These classes were divided into the total number of samples in each true class, which is why this graph is called a normalized matrix. The reason for doing this was to compare the performance of different classes even though they each contain a number of independent samples. In the above figure, the diagonal matrix of each graph indicates the correctly identified classes and the off-diagonal value indicates misclassified observations. A classifier is said to be good when the diagonal matrix value represents one, which means all collected observations correctly classified the heart disease. The below-mentioned Figure 5 is the normalized confusion matrices.

5. Conclusions

The heart disease prediction model developed in this study has the potential to improve patient outcomes while also lowering healthcare costs by identifying patients at risk of developing heart disease and offering appropriate interventions and treatments. Future research can expand on the possibilities of sensor networks, the IoT, and machine learning approaches in healthcare, allowing the development of more accurate and effective predictive models for heart disease and other medical diseases. The SVM had the highest accuracy (93.87%), followed by LR (92.25%). Overall, the SVM and LR seem to be the most effective models for this dataset, while AdaBoost may need further optimization. The SVM had the best overall performance, with the lowest mean-squared error (MSE) and root-mean-squared error (RMSE) of 0.18752 and 0.32589, respectively.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available in “UCI” at https://www.kaggle.com/datasets/aasheesh200/framingham-heart-study-dataset (accessed on 5 February 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tang, C.; Liu, Z.; Li, L. Mechanical sensors for cardiovascular monitoring: From battery-powered to self-powered. Biosensors 2022, 12, 651. [Google Scholar]
Lin, J.; Fu, R.; Zhong, X.; Yu, P.; Tan, G.; Li, W.; Zhang, H.; Li, Y.; Zhou, L.; Ning, C. Wearable sensors and devices for real-time cardiovascular disease monitoring. Cell Rep. Phys. Sci. 2021, 2, 100541. [Google Scholar]
Esther, G.M.; Ahila, S.S.; Kumar, P.H. Coronary Heart Disease (Cad) Monitoring System Based On Wireless Sensors. J. Physics Conf. Series 2019, 1362, 012045. [Google Scholar]
Salvi, S.; Dhar, R.; Karamchandani, S. IoT-Based Framework for Real-Time Heart Disease Prediction Using Machine Learning Techniques. In Innovations in Cyber Physical Systems: Select Proceedings of ICICPS 2020; Springer: Singapore, 2021; pp. 485–496. [Google Scholar]
Kumar, R.; Kumar, P.; Kumar, Y. Time series data prediction using IoT and machine learning technique. Procedia Comput. Sci. 2020, 167, 373–381. [Google Scholar] [CrossRef]
Kumar, P.M.; Gandhi, U.D. A novel three-tier Internet of Things architecture with machine learning algorithm for early detection of heart diseases. Comput. Electr. Eng. 2018, 65, 222–235. [Google Scholar] [CrossRef]
Yin, H.; Jha, N.K. A health decision support system for disease diagnosis based on wearable medical sensors and machine learning ensembles. IEEE Trans. Multi-Scale Comput. Syst. 2017, 3, 228–241. [Google Scholar] [CrossRef]
Ravi, D.; Wong, C.; Lo, B.; Yang, G.Z. A deep learning approach to on-node sensor data analytics for mobile or wearable devices. IEEE J. Biomed. Health Inform. 2016, 21, 56–64. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Structural healthcare monitoring model for cardiovascular prediction.

Figure 2. True-positive rate vs. false-positive rate.

Figure 3. Boxplot representation of CVD prediction.

Figure 4. Classification results for heart disease data.

Figure 5. Normalized confusion matrices.

Table 1. Comparative analyses of the ensemble classification models.

Algorithms	Accuracy	Precision	Recall	F1-Score	TNR	TPR
RF	89.90	88.47	87.07	85.11	78.4%	84%
DTC	88.87	85.62	86.82	82.30	82.5%	54%
K-NN	87.16	86.34	85.99	88.40	89.2%	82%
SVM	93.87	93.33	93.67	93.35	77.6%	82%
GNB	87.25	89.20	87.54	91.20	89.2%	86%
AdaBoost	85.20	88.45	85.56	87.20	78.6%	84%
Bagging	89.54	89.99	90.20	81.25	87.7%	86%
KNN	87.55	89.25	88.89	90.20	89.7%	87%
LR	92.25	90.20	89.99	90.09	90.4%	88%

Table 2. Confusion matrices and ROC curve values for the various classifiers.

Name of the Model	Accuracy	AUC	TNR	TPR	F-Score
RF	89.90	0.88	78.4%	84%	85.11
DTC	88.87	0.87	82.5%	54%	82.30
K-NN	87.16	0.86	89.2%	82%	88.40
SVM	93.87	0.91	77.6%	82%	93.35
GNB	87.25	0.88	89.2%	86%	91.20
AdaBoost	85.20	0.85	78.6%	84%	87.20
Bagging	89.54	0.90	87.7%	86%	81.25
KNN	87.55	0.87	89.7%	87%	90.20
LR	92.25	0.93	90.4%	88%	90.09

Table 3. Details of the implementation environment used in the current study.

Model	MSE	RMSE
RF	0.37322	0.47185
DTC	0.33547	0.57805
K-NN	0.45281	0.59233
SVM	0.18752	0.32589
GNB	0.32558	0.50785
AdaBoost	0.45287	0.60890
Bagging	0.32897	0.54780

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Padhy, N. Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models. Eng. Proc. 2023, 58, 73. https://doi.org/10.3390/ecsa-10-16239

AMA Style

Padhy N. Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models. Engineering Proceedings. 2023; 58(1):73. https://doi.org/10.3390/ecsa-10-16239

Chicago/Turabian Style

Padhy, Neelamadhab. 2023. "Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models" Engineering Proceedings 58, no. 1: 73. https://doi.org/10.3390/ecsa-10-16239

Article Menu

Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models^†

Abstract

1. Introduction

2. Literature Review

3. Proposed Model

4. Results and Discussion

The Critical Observation

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models †

Abstract

1. Introduction

2. Literature Review

3. Proposed Model

4. Results and Discussion

The Critical Observation

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Predicting Heart Disease Using Sensor Networks, the Internet of Things, and Machine Learning: A Study of Physiological Sensor Data and Predictive Models^†