Predicting Abnormal Respiratory Patterns in Older Adults Using Supervised Machine Learning on Internet of Medical Things Respiratory Frequency Data

Santana-Mancilla, Pedro C.; Castrejón-Mejía, Oscar E.; Fajardo-Flores, Silvia B.; Anido-Rifón, Luis E.

doi:10.3390/info14120625

Open AccessArticle

Predicting Abnormal Respiratory Patterns in Older Adults Using Supervised Machine Learning on Internet of Medical Things Respiratory Frequency Data

by

Pedro C. Santana-Mancilla

¹

,

Oscar E. Castrejón-Mejía

¹,

Silvia B. Fajardo-Flores

¹

and

Luis E. Anido-Rifón

^2,*

¹

School of Telematics, Universidad de Colima, Colima 28040, Mexico

²

atlanTTic Research Center, School of Telecommunications Engineering, University of Vigo, 36310 Vigo, Spain

^*

Author to whom correspondence should be addressed.

Information 2023, 14(12), 625; https://doi.org/10.3390/info14120625

Submission received: 10 October 2023 / Revised: 16 November 2023 / Accepted: 17 November 2023 / Published: 21 November 2023

(This article belongs to the Special Issue Blending Artificial Intelligence and Machine Learning with the Internet of Things: Emerging Trends, Issues and Challenges)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Wearable Internet of Medical Things (IoMT) technology, designed for non-invasive respiratory monitoring, has demonstrated considerable promise in the early detection of severe diseases. This paper introduces the application of supervised machine learning techniques to predict respiratory abnormalities through frequency data analysis. The principal aim is to identify respiratory-related health risks in older adults using data collected from non-invasive wearable devices. This article presents the development, assessment, and comparison of three machine learning models, underscoring their potential for accurately predicting respiratory-related health issues in older adults. The convergence of wearable IoMT technology and machine learning holds immense potential for proactive and personalized healthcare among older adults, ultimately enhancing their quality of life.

Keywords:

internet of medical things; respiratory monitoring; abnormal respiratory patterns; predictive machine learning; older adults; wearable technology

1. Introduction

In modern times, technology has become an integral part of our daily routine, with numerous initiatives aimed at improving our lives through tech-based solutions. This project explicitly targets individuals aged 60 and above who are vulnerable regarding health. Around 26.3% of this population lacks health insurance coverage and faces difficulties regarding companionship and care due to changing family dynamics [1].

The growing global population of senior citizens is a critical matter that demands our attention. In 1950, there were 200 million people aged 60 and above, but this number exceeded 350 million by 1975. According to recent projections, this figure is expected to double by 2025 [2]. Older adults who have been inactive and confined to bed for extended periods may develop cardiovascular, respiratory, and musculoskeletal problems, exacerbating their illnesses [3]. Furthermore, this age group tends to recover more slowly, with 33% experiencing difficulties performing one or more daily activities among those aged 60 to 79, increasing to 50% for those over 80 [3].

Respiratory infections affecting the lower tract are a severe health risk for older individuals worldwide. Two types of lower respiratory tract infections (LRTIs) that are of concern are chronic obstructive pulmonary disease (COPD) and pneumonia [4]. With the past COVID-19 pandemic, these respiratory illnesses became the most fatal infectious diseases globally, with older adults being the most affected and requiring lengthy hospital stays [5,6,7]. The health emergency has increased the pressure on hospitals and clinics, emphasizing the need to develop healthcare models that allow patients to receive treatment at home. These new healthcare models have enhanced the quality of life for older individuals, reduced medical costs, and lessened the stress of hospitalization or transfer [8].

Internet of Medical Things (IoMT) technology is a promising solution in the medical field. The IoMT connects various medical instruments, devices, or sensors to the internet, enabling real-time data collection on a user’s physical condition and accurate diagnosis [9,10]. Additionally, integrating machine learning (ML) algorithms into this system has been proposed. ML employs historical data to create computer programs capable of autonomously solving problems effectively [11], which becomes a critical component for early detection of abnormal respiratory patterns and health risk forecasting for older adults.

New non-invasive wearable technology for respiratory monitoring has produced encouraging clinical results in early disease detection. When used correctly, this technology could significantly reduce hospitalizations and mortality rates from severe respiratory ailments, enhancing the quality of life for those affected.

Our research aims to develop a machine learning model that can be integrated into IoMT systems for respiratory monitoring. It will help in predicting potential health risks in older adults. We have established specific goals to achieve this, such as exploring different respiratory monitoring sensors, creating non-invasive IoMT system prototypes, and training our machine learning model to identify abnormal breathing patterns accurately. This approach supports enhancing respiratory health monitoring in older adults by combining advanced technology with predictive analytics.

2. Related Works

Recent research has seen increased machine learning and deep learning techniques for respiratory monitoring and detecting abnormal breathing patterns. Argerich et al. [12] studied elderly patients to detect periodic breathing. The research found that the features extracted from breathing patterns and standard clinical variables are solid indicators for identifying periodic breathing events.

Furthermore, Farrukh et al. [13] contributed to the field with an innovative IoMT-enabled smart healthcare model tailored to monitor the elderly population. By utilizing machine learning techniques on IoMT datasets, the study aimed to enhance healthcare monitoring for elderly individuals.

Pham et al. [14] introduced a novel approach by merging deep learning frameworks to predict respiratory anomalies, including inception-based and transfer learning-based models. The results demonstrated that this ensemble approach outperformed existing state-of-the-art systems, promising advancements in respiratory anomaly detection.

Wang et al. [15] introduced an online measuring system to pursue unobtrusive and real-time classification of abnormal respiratory patterns. This system uses deep neural networks and depth cameras to automatically classify abnormal respiratory patterns, with potential for large-scale applications. The study examined respiratory patterns and the differences between COVID-19 and other respiratory illnesses like the flu and the common cold.

Jin et al. [16] conducted a study using the IoMT to predict outcomes for outpatients. The study found that a multi-dimensional prediction model performed better than other models in comparative experiments, indicating the potential of the IoMT in improving outpatient care.

These findings highlight the potential of supervised machine learning on the IoMT for predicting abnormal respiratory patterns in older adults.

3. Materials and Methods

3.1. Data Acquisition

Our study used four respiratory Internet of Medical Things (IoMT) prototypes to collect data. These systems were created to work with a machine learning model so that we could detect abnormal breathing patterns in older adults and anticipate any potential health problems.

We created three IoMT prototypes using an ESP8266 microcontroller and the Arduino development environment. Each prototype used different sensors and methods to collect data. We used a system-on-chip (SoC) sensor for the fourth prototype, eliminating the need for an extra microcontroller. The prototypes were based on tutorials from BioMakers University [17].

The sensors in these systems can detect signals when a person breathes. These signals are analyzed to differentiate between breaths and other movements to determine the number of breaths per minute (RPM) by combining temporal data and sensor signals. The RPM data are then saved for future use in creating a dataset.

Next, the prototypes will be described.

3.1.1. Microphone Sensor Prototype

As part of our research, we developed a prototype that utilizes a Ky-037 microphone sensor, a NodeMCU ESP8266, and a 128 × 64 pixel OLED display. This device is designed to capture acoustic signals produced during the breathing cycle of our study participants, as shown in Figure 1. The microphone sensor is integrated into a facial mask to record changes in airflow and sound associated with inhalation and exhalation. The data collected from this prototype are vital in identifying abnormal breathing patterns in older adults.

3.1.2. Gas Sensor Prototype

As part of the study, another prototype that utilizes a gas sensor (Mq-3 model) and a NodeMCU ESP8266 was created. The gas sensor was integrated into participants’ facial masks (see Figure 2) to monitor the chemical composition of the air they inhaled and exhaled. This prototype enabled researchers to gather essential data on the concentration of respiratory gases, which proved helpful in identifying abnormal breathing patterns in older adults.

3.1.3. Movement Sensor Prototype

For accurate measurement of chest movement during respiratory cycles, the third prototype included an accelerometer and gyroscope sensor integrated with an ESP8266 NodeMCU, as shown in Figure 3. To ensure precise recording, the sensor was placed over the chest of the participants in a supine position using a strap that held the sensor firmly against the skin while allowing for unrestricted breathing. This placement allowed the sensor to detect variations in diaphragmatic movement accurately. In addition, a radar sensor was placed in the room as a reference sensor to monitor breathing. The sensor placement accuracy and the data’s reliability were verified and compared with simultaneous radar sensing. The resulting data provide essential information for evaluating abnormal breathing patterns in older adults and offer precise details on the kinetics of respiratory movements.

3.1.4. Radar Sensor Prototype

A fourth prototype was created using a radar sensor called XeThru X4M200 [18]. This sensor operates as a system-on-chip (SoC), removing the need for an additional microcontroller (see Figure 4). The sensor manufacturer provided software for data acquisition, which allowed real-time data acquisition when the sensor was placed between 1 and 5 m from the study subjects. The software provided critical data, including the number of breaths per minute (RPM). This sensor emits low-energy radio waves and reflects off objects, including the human body. By measuring the changes in the pattern of reflected waves caused by the expansion and contraction of the chest during breathing, the sensor accurately calculates the respiratory rate. This non-invasive detection method is particularly beneficial for continuous respiratory monitoring in clinical or home settings, as it minimizes patient discomfort and improves ease of use. Using this radar sensor prototype resulted in the generation of crucial data necessary for building a dataset to study abnormal breathing patterns in older adults.

The selection of diverse IoMT prototypes and sensors allowed for a comprehensive respiratory data collection, enriching our machine learning model and enhancing our ability to predict abnormal breathing patterns in older adults.

3.2. Dataset Construction

To move forward with the project, it was crucial to create a dataset. We used data from the three prototypes we developed and the XeThru X4M200 sensor to accomplish this. This dataset will form the basis for training the machine learning model, which will help predict abnormal respiratory rates in patients. It will provide more precise information about their future health.

We collected data from three individuals, aged 19 to 24 and of both genders. We used the XeThru X4M200 radar sensor to test the prototypes we created. This method ensured that the recorded information was accurate and that the data calculated by the IoMT prototypes were consistent with the radar sensor’s readings, especially regarding respiratory rates (RPM). We carefully selected various parameters to include in the dataset, which was organized in the following structure:

Age: The age of the patient (numeric).
RPM (respirations per minute): Numeric representation of the respiratory rate.
Sex: The gender of the patient (numeric):
- 0 = female
- 1 = male
Normal: Indicator of whether the RPM falls within the normal range based on the patient’s age and the ranges specified in the existing literature (numeric):
- 0 = no
- 1 = yes

Wheatley [19] documented the respiratory rate classification in adult patients and this classification was used to define the “Normal” parameter, as have other authors [20].

Eupnea (normal relaxed breathing): 12–20 RPM
Normal range > 65 years: 12–25 RPM
Normal range > 80 years: 10–30 RPM
Bradypnea (slow respiratory rate): <12 RPM
Tachypnea (fast respiratory rate): >20 RPM

During this data collection, a dataset with 213 records was created. It is essential to mention that the COVID-19 pandemic had a significant impact and posed limitations. Due to this, it was not possible to collect respiratory data from a larger sample size, as some of the prototypes required direct respiratory involvement.

3.3. Dataset Augmentation

Given the limited number of records obtained with the IoMT prototypes, owing to the constraints imposed by health regulations, a decision was made to perform data augmentation on the dataset. After a literature search, we found a comprehensive health dataset that included the respiratory rate variable from pulse oximeters and various age groups among adults. We processed this dataset to match the structure of our proposed dataset. It resulted in a dataset of approximately 25,000 records from individuals aged between 40 and 90 years, including both genders. Using data from pulse oximetry devices continues to be a part of the category of medical devices connected to the IoT, as they can provide real-time health data for patients. This choice of this dataset effectively supports our objective of using data from an IoMT system to address the health issue we are investigating.

Subsequently, the age range was narrowed to align with this project’s scope, primarily focused on seniors aged 60 or older. Consequently, we obtained a dataset comprising 16,512 records featuring data from adults aged between 60 and 90 years, encompassing both genders [21].

By utilizing data augmentation, we significantly improved our dataset, increasing the quantity and diversity of respiratory rate data. This approach helped us to include a more comprehensive range of older adults, making our machine learning model more effective in identifying abnormal respiratory patterns within this population. Despite the difficulties presented by the pandemic, data augmentation allowed us to overcome data constraints to validate our research findings.

In summary, we obtained records from individuals aged between 19 and 24 during the initial data collection phase. This helped us validate the accuracy and functionality of our IoMT prototypes. However, as our target group comprised older adults, we needed to expand our dataset to reflect their age range. To achieve this, we resorted to data augmentation and included records from individuals between 40 and 90 years of age. We selected this age range based on the availability of reliable datasets.

Subsequently, to ensure our research aligned with the specific goal of studying abnormal breathing patterns in older people, we narrowed our age range to 60 and older. This final selection of 16,512 records represents the study’s target population, enabling us to focus on the trends and patterns most relevant to this population.

3.4. Machine Learning Models

We will apply three distinct machine learning algorithms—K-nearest neighbors (K-NN), support vector machine (SVM), and gradient boosting—to our comprehensive dataset. The main objective is determining which algorithm is more effective in predicting abnormal respiratory patterns in elderly individuals. By utilizing the features provided, these algorithms will categorize respiratory patterns as either normal or abnormal.

The K-nearest neighbors (K-NN) algorithm classifies data points based on the majority class among their k-nearest neighbors [22]. This proximity-based approach will help us differentiate between normal and abnormal respiratory patterns by measuring the proximity of data points in our feature space. We will explore various values of “k” to find the optimal number of neighbors for our dataset.

We will also use a support vector machine (SVM) as a classification algorithm. SVM is versatile and robust, known for its ability to find a hyperplane that maximizes the margin between different classes in the data [23]. Our study aims to categorize respiratory patterns into either normal or abnormal. We will fine-tune different kernel functions and hyperparameters to optimize SVM’s performance.

In machine learning, gradient boosting is an ensemble learning technique that combines the predictions of multiple weak learners to build a predictive model [24]. This approach involves iteratively minimizing prediction errors to enhance the model’s accuracy. As with other algorithms, we will experiment with different hyperparameters to optimize their performance in our analysis.

3.4.1. Data Preprocessing

Prior to applying these algorithms, we conducted an exploratory data analysis (EDA) to understand better the dataset we are working with. This process involves analyzing and visualizing patterns, relationships, and trends within the data before moving on to building and training models. Exploring the respiratory rate data allowed us to identify key features relevant to our predictive task [25].

Addressing class imbalance: Given the potential class imbalance between normal and abnormal respiratory patterns, we applied oversampling and under-sampling to balance the dataset [26].

Feature scaling: We normalized the features to ensure that all variables contribute equally to the models’ predictions, regardless of their scale [27].

Data partitioning: The dataset was split into training and testing subsets to evaluate the performance of each algorithm. The split involved reserving 80% of the data for training and 20% for testing [28].

3.4.2. Model Evaluation

To gauge the performance of each machine learning algorithm, we utilized standard evaluation metrics [29], including precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC-ROC).

Precision = \frac{T P}{T P + F P}

Recall = \frac{T P}{T P + F N}

F 1 score = 2 * \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

AUC-ROC: The area under the ROC curve depicting the models’ trade-off between true and false positive rates.

where:

TP = true positives
FP = false positives
FN = false negatives

These metrics provided insights into the algorithms’ capabilities to accurately classify normal and abnormal respiratory patterns while also considering false positives and negatives.

3.4.3. Hyperparameter Tuning

We undertook hyperparameter tuning techniques to identify the optimal hyperparameters for each algorithm [30]. This process enabled us to maximize the predictive ability of each model.

3.4.4. Comparison

We compared the performance of the three models to discern the most effective algorithm for our predictive task [31]. This process guided us toward selecting the best model for predicting abnormal respiratory patterns in older adults.

3.4.5. Cross-Validation

To mitigate the risk of overfitting and obtain more robust performance estimates, we employed the K-fold cross-validation method [32].

Ultimately, the results derived from applying K-NN, SVM, and gradient boosting empowered us to identify the most appropriate model for predicting abnormal respiratory patterns in older adults. The algorithm demonstrating the highest predictive accuracy and robustness will be recommended for further integration into healthcare applications.

4. Results

In this section, we will provide a detailed analysis of the findings from our research, which focuses on the ability of machine learning models to predict abnormal respiratory patterns in older adults. We will examine the evaluation metrics, feature importance, and confusion matrices of each algorithm to gain a deeper understanding of the potential of our approach in applying machine learning to use in IoMT wearable technology to respiratory healthcare for older adults.

4.1. Exploratory Data Analysis

This vital phase enables us to unearth essential insights from our dataset, laying the groundwork for subsequent modeling efforts. In this section, we delve into the outcomes of our EDA, unveiling the distributions of key numeric variables, assessing class imbalances, and examining the relationships between gender and respiratory patterns (as seen in Figure 5). Furthermore, we explore the correlations among these variables to inform our feature selection process.

4.1.1. Numeric Variable Distributions

The age distribution appears uniform, indicating a good representation of different age groups within the dataset.

The RPM distribution seems right-skewed, suggesting a higher frequency of lower values.

4.1.2. Variable Correlations

The correlations between variables are relatively low overall. The strongest correlation (though still moderate) is between RPM and respiratory pattern normality (−0.324), which makes sense, as higher or lower than normal RPM directly indicates abnormal respiratory patterns.

The other variables, “age” and “sex”, have lower correlations with “normal”, but they could still contribute to the predictive power of the model.

4.2. Class Imbalance

There is an evident class imbalance in the distribution of the target variable (normal) with “Yes” representing a substantial majority at 87.96%, while “No” accounts for the remaining 12.04%. This striking disparity implies that our predictive models could inadvertently lean towards favoring the prevailing class, potentially undermining their ability to effectively detect abnormal respiratory patterns. This factor should be considered during the preprocessing stage to avoid model bias.

There is a noticeable variation in respiratory patterns between different genders, but it is not significant. Both genders generally exhibit typical respiratory patterns; however, this aspect requires further examination during the modeling process.

Figure 6 shows the class imbalances.

4.3. Feature Scaling

In order to ensure that all variables contributed equally to our machine learning models, we employed the standard scaler to standardize the “Age” and “RPM” features. This crucial step ensures that each variable contributes equally to the model and prevents dominant scale features from overshadowing subtler yet important predictors.

In machine learning algorithms, it is necessary to preprocess data to ensure accurate model results. A standard practice in this regard is feature scaling. Specific algorithms such as K-nearest neighbors (K-NN) and support vector machine (SVM) are sensitive to variation in feature scales. Therefore, it is necessary to normalize features to ensure that each variable has equal importance in determining the distance between data points, regardless of its original scale. This is important as it prevents features with higher orders of magnitude from being disproportionately weighted in the final model decisions. In our study, we apply normalization to align with these principles and ensure that the results are as accurate and fair as possible.

Table 1 provide a glimpse into the first 5 rows of the scaled training set features and their corresponding labels.

These standardized feature values, denoted as “Age” and “RPM” have been transformed to have a mean of 0 and a standard deviation of 1, ensuring a consistent scale for our model inputs. Additionally, the “Normal” labels indicate whether the respiratory patterns for each corresponding record are considered normal (1) or not (0).

4.4. Data Partitioning

We strategically partitioned the dataset into an 80–20% split for training and testing, respectively, to both maximize the data available for model learning and ensure a robust evaluation of performance. More importantly, we adopted stratified sampling to maintain the original class distribution, a critical step given the identified class imbalance. This method not only prevents model bias but also reflects our commitment to developing a model that is as effective in practice as it is during testing, an essential consideration for deploying such models in a clinical setting. This strategic approach ensured that each subset, for training and testing, mirrored the proportions of the “Normal” and “Not Normal” classes in the overall dataset. By maintaining this balance, we aimed to facilitate a more accurate assessment of our models’ generalizability and its ability to handle the inherent class imbalance, which is critical in predicting abnormal respiratory patterns in older adults. The dimensions of these datasets are summarized below:

Training set (features): 13,209 rows by 3 columns.
Testing set (features): 3303 rows by 3 columns.
Training set (target): 13,209 rows.
Testing set (target): 3303 rows.

4.5. Model Validation

To predict abnormal respiratory patterns in elderly individuals, we harnessed three distinctive machine learning algorithms: K-nearest neighbors (K-NN), support vector machine (SVM), and gradient boosting. Our study’s selection of these algorithms was based on strategic considerations. K-NN is known for its simplicity and effectiveness in classification problems with datasets that have moderate dimensions, like our respiratory pattern data. SVM was chosen for its robustness and ability to handle high-dimensional feature spaces, making it suitable for complex, feature-rich datasets. Gradient boosting provides a powerful and flexible approach that can iteratively improve in areas where previous models may show weaknesses, thanks to its additive learning nature. These three algorithms have shown strong performance in the literature for similar classification tasks in healthcare. Furthermore, their diversity offers an interesting comparison of performance and computational complexity, allowing us to evaluate the balance between precision and efficiency in detecting abnormal respiratory patterns.

Let us look at each algorithm’s training and prediction times.

K-nearest neighbors (K-NN)
○
Training time: 0.009 s
○
Prediction time: 0.168 s
Support vector machine (SVM)
○
Training time: 0.484 s
○
Prediction time: 0.093 s
Gradient boosting
○
Training time: 0.592 s
○
Prediction time: 0.006 s

All models exhibit fast training and prediction times, a promising sign for their use in real-time applications. These fast processing times make the models suitable for health monitoring through IoMT technologies. The models’ high precision levels also indicate their potential for clinical use in detecting abnormal respiratory patterns in older adults.

4.5.1. Evaluation Metrics

We carefully evaluated the models, taking into account various relevant metrics. The detailed evaluation metrics for each model are presented in Table 2.

The results of our predictive modeling showcase the capabilities of the machine learning models we employed (see Figure 7). The K-nearest neighbors (K-NN) model demonstrated exceptional predictive capacity, achieving a precision rate of 99.9% and boasting an AUC-ROC score of 99.9%. These metrics highlight the model’s ability to classify abnormal respiratory patterns accurately. High precision rates indicate that K-NN could help reduce false alarms in respiratory monitoring systems. This is crucial for patient trust and compliance in remote health monitoring. However, although technical metrics are encouraging, it is essential to assess the model’s performance critically in a clinical context. Real-world data complexity and patient variability come into play in such settings.

The support vector machine (SVM) showed strong performance by maintaining a balance between precision and recall at around 99.9%. Additionally, the SVM model scored an AUC-ROC of 99.9%, highlighting its ability to detect changes in respiratory patterns accurately.

The most noteworthy achievement was observed in the gradient boosting model, which achieved a precision rate of 100% and an AUC-ROC score of 100%. It showed a slight advantage in accurately classifying abnormal respiratory patterns in the test dataset compared to the previous models.

The results prove the selected models’ outstanding predictive abilities, solidifying their importance as practical aids for detecting abnormal respiratory patterns in older people. This opens up possibilities for their use in actual healthcare situations, where early diagnosis and treatment can significantly improve patient outcomes and reduce healthcare costs.

4.5.2. Confusion Matrices

Confusion matrices are essential in evaluating predictive models, providing key insights into the models’ performance [33]. By analyzing the confusion matrices, we can understand the predictive models’ performance comprehensively, which is critical for making informed decisions. Next, we present and discuss the confusion matrices obtained from our models.

Figure 8 shows the confusion matrix for the K-nearest neighbors (K-NN). The top left quadrant represents the count of the true positives (TP), representing instances correctly classified as abnormal respiratory patterns, totaling 396 cases. On the top right, we find the number of false negatives (FN), indicating cases incorrectly labeled as normal when they were, in fact, abnormal, with a count of two. The bottom left quadrant pertains to false positives (FP), which denotes instances mistakenly classified as abnormal when normal; in this instance, there are none (0). Lastly, the bottom right quadrant corresponds to true negatives (TN), reflecting instances accurately classified as normal, with a count of 2905 cases. In summary, the K-NN model demonstrates an exceptional capacity to accurately identify abnormal respiratory patterns, characterized by a minimal number of false negatives and an absence of false positives.

The confusion matrix for the SVM model (see Figure 9) bears resemblance to that of the K-NN model, encompassing similar elements:

Within this matrix are 393 instances of TP, signifying the correct classification of abnormal respiratory patterns. Additionally, it recorded five FN, denoting instances where abnormalities were erroneously classified as normal. There were two occurrences of FP, representing instances incorrectly labeled as abnormal when they were, in fact, normal. Lastly, the matrix contains 2903 TN, indicating instances correctly identified as normal respiratory patterns.

This matrix component congruence highlights the SVM model’s consistent and robust performance, identifying abnormal respiratory patterns while effectively reducing false positives and negatives.

When analyzing the gradient boosting model, its confusion matrix (see Figure 10) reflects the following statistics: it detected 398 TP without any FN or FP and counted 2903 TN. This accurate precision, where all abnormal respiratory patterns are correctly identified, along with a high true negative count, confirms the outstanding performance of this model.

The analysis of the confusion matrices provided a detailed view of the models’ performance, revealing the great capabilities of the models, each with unique strengths.

4.5.3. Cross-Validation

Our dataset’s substantial volume of 16,512 observations provided a robust foundation for employing stratified five-fold cross-validation, ensuring that our evaluation was thorough and resistant to the variability that smaller datasets might introduce. This methodological choice reinforces the reliability of our findings, as each fold’s consistent performance underscores the models’ stability and potential readiness for clinical trials. This method ensures that the original distribution of the target variable is preserved in each fold, which is important because of the observed class imbalance. We chose this approach because it strikes a reasonable balance between robust evaluation and computational efficiency.

We calculated the same evaluation metrics (precision, recall, F1 score, and AUC-ROC) for each fold. We provided each metric’s mean and standard deviation across folds, offering a comprehensive assessment.

Table 3 shows the results of the stratified five-fold cross-validation for our three models. K-nearest neighbors (K-NN) demonstrated high precision and recall, with minimal variability between folds, as evidenced by low standard deviations (0.001 for precision and 0.000 for recall). Support vector machine (SVM) exhibited impressive results, with metrics closely trailing K-NNs. Although its values were slightly below K-NN’s, SVM showcased an AUC-ROC nearing 1, indicative of its outstanding capability in accurately classifying both positive and negative classes.

Gradient boosting outperformed the other models and delivered optimal performance across the evaluation metrics. It achieved impeccable precision, recall, F1 score, and AUC-ROC scores. The absence of variability among folds underscores gradient boosting’s exceptional robustness and accuracy, making it a robust algorithm for real-world applications, especially in healthcare.

All three models demonstrate robust generalization capability, evidenced by good performance during cross-validation on different subsets of the training dataset.

4.5.4. Feature Importance

Feature importance indicates how influential different features (input variables) are on a model’s prediction of the target variable [34]. It is valuable to understand which variables contribute the most to the model’s performance and, therefore, to the predictions. Gradient boosting is often used to evaluate feature importance due to its intrinsic nature of building decision trees. Figure 11 shows the feature importance of our gradient bosting prediction model.

After analyzing the importance of several features in our model, we found that RPM has the highest importance score of 0.659. This means that it plays a crucial role in predicting the normalcy of respiratory patterns. A higher RPM may indicate potential abnormalities, highlighting the significance of this feature in our model.

Age is also a significant variable, with an importance score of 0.341. It indicates that an individual’s age significantly impacts the model’s ability to predict respiratory patterns, highlighting age as a crucial factor in this health concern.

Sex has a score of 0.000 and is not a significant predictor of respiratory pattern normality.

Healthcare professionals and researchers can use feature importance insights to detect abnormal respiratory patterns early in older adults, highlighting the importance of RPM and age.

5. Discussion

Machine learning algorithms have made significant progress in predicting abnormal respiratory patterns. Specifically, three algorithms—K-nearest neighbors, support vector machine, and gradient boosting—have displayed excellent predictive capabilities. Gradient boosting stood out among these algorithms with nearly perfect scores in various evaluation metrics.

The number of breaths a person takes per minute, known as respirations per minute (RPM), and their age are important factors in predicting normal breathing patterns [35]. The analysis found that sex has little impact on this prediction. These findings reinforce the idea that respiratory rate is a crucial factor in determining respiratory health and highlight the influence of age on respiratory normality.

Some limitations emerge from our model evaluation.

Our models have achieved an exceptional level of accuracy, which is promising. However, this requires careful interpretation. Such remarkable performance metrics often require additional scrutiny to validate the models’ suitability and protect against the risk of overfitting the training data [36]. To mitigate this limitation of high accuracy, we implemented a multivariate approach. We employed five-fold stratified cross-validation to assess generalizability.

In addition, we measured the models’ performance by considering precision, recall, F1 score and AUC-ROC. This provided a comprehensive evaluation, especially for classes with imbalanced data. We also analyzed the importance of features, which helped us identify crucial predictors and refine the feature set for better interpretability and generalization. Going forward, even though it is beyond the scope of this study, we plan to perform clinical validation with healthcare experts to align our models with their clinical expertise.

It is possible that our limited feature set could restrict the models’ ability to understand all the factors that affect respiratory patterns, even though RPM and age are important features. To improve model performance without adding complexity, feature engineering can be used to create additional relevant features from existing data.

Developing and applying machine learning models in healthcare requires a meticulous, well-balanced approach. The preliminary findings of this study have significant implications. The capacity to accurately predict abnormal respiratory patterns enables early interventions and management strategies to mitigate respiratory issues, especially in elderly populations.

Moreover, making a real-world implementation of these models into healthcare systems and integrating them into IoMT systems could facilitate continuous respiratory monitoring, enabling timely interventions and enhancing healthcare delivery.

6. Conclusions

This exploratory study has demonstrated that machine learning can be a powerful tool in predicting abnormal respiratory patterns, mainly when applied within the Internet of Medical Things (IoMT) framework. The gradient boosting algorithm performed exceptionally well. These results suggest that computational models have the potential to help healthcare practitioners identify and manage respiratory anomalies more effectively.

We used a data augmentation method to overcome the limitation of having a small number of records from the IoMT prototypes. The initial database consisted of records collected from only three individuals between the ages of 19 and 24, which needed to be improved for our study. We added approximately 25,000 records from a trusted health database to expand the database. This database had respiratory rate records obtained from pulse oximeters for a broader range of ages (40 to 90 years). After a rigorous search, we selected this complementary database and processed it to match our proposed format. After augmenting the initial records, the resulting dataset had 5% direct and 95% augmented records. This process improved the quantity and diversity of the dataset, allowing our machine learning model to be trained on a more representative spectrum of breathing patterns for older adults.

Our study’s key findings highlight the importance of RPM and age as predictive features in determining respiratory pattern normality. This discovery enhances our understanding of the factors influencing respiratory health and underscores their clinical relevance, especially when applied in IoMT systems. It also emphasizes how RPM and age can be helpful in real-world healthcare applications, especially in proactive management and intervention strategies for elderly populations susceptible to respiratory anomalies.

Our discoveries in this study provide new opportunities for further investigation and practical use in predicting respiratory health. However, ensuring that these models are robust, dependable, and ethically applied is crucial. Before implementing them in the real world, extensive validation, ethical considerations, and practical testing are necessary within the IoMT ecosystem and healthcare settings.

Author Contributions

Conceptualization, P.C.S.-M. and S.B.F.-F.; methodology, P.C.S.-M.; software, O.E.C.-M.; validation, L.E.A.-R. and P.C.S.-M.; formal analysis: P.C.S.-M. and L.E.A.-R.; data curation, O.E.C.-M.; writing—original draft preparation, P.C.S.-M. and O.E.C.-M.; writing—review and editing: S.B.F.-F. and L.E.A.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in the Kaggle repository “Dataset of Respiration Rates in Older Adults” at https://doi.org/10.34740/kaggle/dsv/6652581 (accessed on 18 November 2023) [21].

Conflicts of Interest

The authors declare no conflict of interest.

References

Juárez-Ramírez, C.; Márquez-Serrano, M.; Salgado de Snyder, N.; Pelcastre-Villafuerte, B.E.; Ruelas-González, M.G.; Reyes-Morales, H. Health inequality among vulnerable groups in Mexico: Older adults, indigenous people, and migrants. Rev. Panam. Salud Publica 2014, 35, 284–290. [Google Scholar] [PubMed]
Rojas, P.M.M.; Díaz, V.R.V.; Sacramento, P.I.; Rodríguez, M.M.; Martínez, R.L.; Delgado, P.G. Mortalidad por enfermedades respiratorias en el adulto mayor. Evolución en un año. Acta Médica Cent. 2016, 10, 33–39. [Google Scholar]
Ibarra Cornejo, J.L.; Fernández Lara, M.J.; Aguas Alveal, E.V.; Pozo Castro, A.F.; Antillanca Hernández, B.; Quidequeo Reffers, D.G. Efectos Del Reposo Prolongado En Adultos Mayores Hospitalizados. Ann. Fam. Med. 2018, 78, 439. [Google Scholar] [CrossRef]
Escobar-Rojas, A.; Castillo-Pedroza, J.; Cruz-Hervert, P.; Báez-Saldaña, R. Tendencias de morbilidad y mortalidad por neumonía en adultos mexicanos (1984–2010). Neumol. Y Cirugía Tórax 2015, 74, 4–12. [Google Scholar]
De Smet, R.; Mellaerts, B.; Vandewinckele, H.; Lybeert, P.; Frans, E.; Ombelet, S.; Lemahieu, W.; Symons, R.; Ho, E.; Frans, J.; et al. Frailty and Mortality in Hospitalized Older Adults with COVID-19: Retrospective Observational Study. J. Am. Med. Dir. Assoc. 2020, 21, 928–932.e1. [Google Scholar] [CrossRef]
Feehan, J.; Apostolopoulos, V. Is COVID-19 the Worst Pandemic? Maturitas 2021, 149, 56–58. [Google Scholar] [CrossRef]
Wu, Q.; Li, Q.; Lu, J. A One Health Strategy for Emerging Infectious Diseases Based on the COVID-19 Outbreak. J. Biosaf. Biosecurity 2022, 4, 5–11. [Google Scholar] [CrossRef]
Arboleda, J.F. Sistema Para El Apoyo a La Atención Domiciliaria Mediante Redes de Sensores Inteligentes. Master’s Thesis, Universidad de Antioquia, Colombia, SC, USA, 2016. [Google Scholar]
Santana-Mancilla, P.C.; Anido-Rifón, L.E.; Contreras-Castillo, J.; Buenrostro-Mariscal, R. Heuristic Evaluation of an IoMT System for Remote Health Monitoring in Senior Care. Int. J. Environ. Res. Public. Health 2020, 17, 1586. [Google Scholar] [CrossRef]
Guzman-Sandoval, V.M.; Gaytan-Lugo, L.S.; Santana-Mancilla, P.C. I-Care: An IoMT Remote Monitoring System of Physiological Pain in Pediatric Patients. In Proceedings of the 2021 Mexican International Conference on Computer Science (ENC), IEEE, Morelia, Mexico, 9–11 August 2021; pp. 1–4. [Google Scholar]
Luque, N.; Ortega, M. Análisis de Sistemas Para Registros Médicos Electrónicos En Clínicas y Su Enfoque al Machine Learning. Bachelor’s Thesis, Universidad Católica San Pablo, Arequipa, Perú, 2020. [Google Scholar]
Argerich, S.; Herrera, S.; Benito, S.; Giraldo, B.F. Evaluation of Periodic Breathing in Respiratory Flow Signal of Elderly Patients Using SVM and Linear Discriminant Analysis. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, Orlando, FL, USA, 16–20 August 2016; pp. 4276–4279. [Google Scholar]
Khan, M.F.; Ghazal, T.M.; Said, R.A.; Fatima, A.; Abbas, S.; Khan, M.A.; Issa, G.F.; Ahmad, M.; Khan, M.A. An IoMT-Enabled Smart Healthcare Model to Monitor Elderly People Using Machine Learning Technique. Comput. Intell. Neurosci. 2021, 2021, 1–10. [Google Scholar] [CrossRef]
Pham, L.; Ngo, D.; Tran, K.; Hoang, T.; Schindler, A.; McLoughlin, I. An Ensemble of Deep Learning Frameworks for Predicting Respiratory Anomalies. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), IEEE, Glasgow, Scotland, UK, 11–15 July 2022; pp. 4595–4598. [Google Scholar]
Wang, Y.; Hu, M.; Zhou, Y.; Li, Q.; Yao, N.; Zhai, G.; Zhang, X.-P.; Yang, X. Unobtrusive and Automatic Classification of Multiple People’s Abnormal Respiratory Patterns in Real Time Using Deep Neural Network and Depth Camera. IEEE Internet Things J. 2020, 7, 8559–8571. [Google Scholar] [CrossRef]
Jin, Y.; Yu, H.; Zhang, Y.; Pan, N.; Guizani, M. Predictive Analysis in Outpatients Assisted by the Internet of Medical Things. Future Gener. Comput. Syst. 2019, 98, 219–226. [Google Scholar] [CrossRef]
Bazán, J. BioMakers University. Available online: https://www.biomakers.ai (accessed on 30 September 2023).
XeThru X4M200 Respiration Sensor. X4M200 Respir. Sens. 2018. Available online: https://www.xethru.com/x4m200-respiration-sensor.html (accessed on 30 September 2023).
Wheatley, I. Respiratory rate 3: How to take an accurate measurement. Nurs. Times 2018, 114, 21–22. [Google Scholar]
Fatihah Shamsul Ariffin, F.; Munirah Kamarudin, L.; Ghazali, N.; Nishizaki, H.; Zakaria, A.; Muhammad Mamduh bin Syed Zakaria, S. Inhalation and Exhalation Detection for Sleep and Awake Activities Using Non-Contact Ultra-Wideband (UWB) Radar Signal. J. Phys. Conf. Ser. 2021, 1755, 012038. [Google Scholar] [CrossRef]
Oscar, E.; Castrejón-Mejía; Pedro, C. Santana-Mancilla Dataset of Respiration Rates in Older Adults. Available online: http://doi.org/10.34740/kaggle/dsv/6652581 (accessed on 20 September 2023).
Mezquita, Y.; Alonso, R.S.; Casado-Vara, R.; Prieto, J.; Corchado, J.M. A Review of K-NN Algorithm Based on Classical and Quantum Machine Learning. In Distributed Computing and Artificial Intelligence, Special Sessions, 17th International Conference; Rodríguez González, S., González-Briones, A., Gola, A., Katranas, G., Ricca, M., Loukanova, R., Prieto, J., Eds.; Advances in Intelligent Systems and Computing; Springer International Publishing: Cham, Switzerland, 2021; Volume 1242, pp. 189–198. ISBN 978-3-030-53828-6. [Google Scholar]
Noble, W.S. What Is a Support Vector Machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A Comparative Analysis of Gradient Boosting Algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Camizuli, E.; Carranza, E.J. Exploratory Data Analysis (EDA). In The Encyclopedia of Archaeological Sciences; López Varela, S.L., Ed.; Wiley: Hoboken, NJ, USA, 2018; pp. 1–7. ISBN 978-0-470-67461-1. [Google Scholar]
Japkowicz, N.; Stephen, S. The Class Imbalance Problem: A Systematic Study1. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
Li, D.; Zhang, B.; Li, C. A Feature-Scaling-Based k -Nearest Neighbor Algorithm for Indoor Positioning Systems. IEEE Internet Things J. 2016, 3, 590–597. [Google Scholar] [CrossRef]
Korjus, K.; Hebart, M.N.; Vicente, R. An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable. PLoS ONE 2016, 11, e0161788. [Google Scholar] [CrossRef]
Dalianis, H. Evaluation Metrics and Evaluation. In Clinical Text Mining; Springer International Publishing: Cham, Switzerland, 2018; pp. 45–53. ISBN 978-3-319-78502-8. [Google Scholar]
Joy, T.T.; Rana, S.; Gupta, S.; Venkatesh, S. Hyperparameter Tuning for Big Data Using Bayesian Optimisation. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, Cancun, Mexico, 4–8 December 2016; pp. 2574–2579. [Google Scholar]
Montesinos-López, O.A.; Montesinos-López, A.; Cano-Paez, B.; Hernández-Suárez, C.M.; Santana-Mancilla, P.C.; Crossa, J. A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library. Genes 2022, 13, 1494. [Google Scholar] [CrossRef]
Berrar, D. Cross-Validation. In Encyclopedia of Bioinformatics and Computational Biology; Elsevier: Amsterdam, The Netherlands, 2019; pp. 542–545. ISBN 978-0-12-811432-2. [Google Scholar]
Bhagat, M.; Bakariya, B. Implementation of Logistic Regression on Diabetic Dataset Using Train-Test-Split, K-Fold and Stratified K-Fold Approach. Natl. Acad. Sci. Lett. 2022, 45, 401–404. [Google Scholar] [CrossRef]
Saarela, M.; Jauhiainen, S. Comparison of Feature Importance Measures as Explanations for Classification Models. SN Appl. Sci. 2021, 3, 272. [Google Scholar] [CrossRef]
Siddiqui, H.U.R.; Shahzad, H.F.; Saleem, A.A.; Khan Khakwani, A.B.; Rustam, F.; Lee, E.; Ashraf, I.; Dudley, S. Respiration Based Non-Invasive Approach for Emotion Recognition Using Impulse Radio Ultra Wide Band Radar and Machine Learning. Sensors 2021, 21, 8336. [Google Scholar] [CrossRef] [PubMed]
Ying, X. An Overview of Overfitting and Its Solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]

Figure 1. IoMT prototype with sound sensor.

Figure 2. IoMT prototype with gas sensor.

Figure 3. IoMT prototype with movement sensors.

Figure 4. IoMT prototype with radar sensor.

Figure 5. Exploratory data analysis.

Figure 6. Class imbalance.

Figure 7. Models’ evaluation metrics.

Figure 8. Confusion matrix for K-nearest neighbors.

Figure 9. Confusion matrix for support vector machine.

Figure 10. Confusion matrix for gradient boosting.

Figure 11. Feature importance for the predictive model.

Table 1. Scaled features.

Age	RPM	Sex	Normal
−0.776	−0.256	−0.778	1
−0.083	−0.256	1.286	1
−1.007	−0.256	−0.778	1
0.264	0.364	−0.778	1
1.650	−0.566	−0.778	1

Table 2. Evaluation metrics for each model.

Model	Precision	Recall	F1 Score	ROC-AUC
K-nearest neighbors	0.999	1.000	0.999	0.999
Support vector machine	0.998	0.999	0.999	0.999
Gradient boosting	1.000	0.999	0.999	1.000

Table 3. Evaluation metrics for stratified five fold cross-validation.

Model	Precision Mean	Precision Std	Recall Mean	Recall Std
K-nearest neighbors	0.999	0.001	1.000	0.000
Support vector machine	0.998	0.001	0.999	0.001
Gradient boosting	1.000	0.000	1.000	0.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Santana-Mancilla, P.C.; Castrejón-Mejía, O.E.; Fajardo-Flores, S.B.; Anido-Rifón, L.E. Predicting Abnormal Respiratory Patterns in Older Adults Using Supervised Machine Learning on Internet of Medical Things Respiratory Frequency Data. Information 2023, 14, 625. https://doi.org/10.3390/info14120625

AMA Style

Santana-Mancilla PC, Castrejón-Mejía OE, Fajardo-Flores SB, Anido-Rifón LE. Predicting Abnormal Respiratory Patterns in Older Adults Using Supervised Machine Learning on Internet of Medical Things Respiratory Frequency Data. Information. 2023; 14(12):625. https://doi.org/10.3390/info14120625

Chicago/Turabian Style

Santana-Mancilla, Pedro C., Oscar E. Castrejón-Mejía, Silvia B. Fajardo-Flores, and Luis E. Anido-Rifón. 2023. "Predicting Abnormal Respiratory Patterns in Older Adults Using Supervised Machine Learning on Internet of Medical Things Respiratory Frequency Data" Information 14, no. 12: 625. https://doi.org/10.3390/info14120625

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Abnormal Respiratory Patterns in Older Adults Using Supervised Machine Learning on Internet of Medical Things Respiratory Frequency Data

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Data Acquisition

3.1.1. Microphone Sensor Prototype

3.1.2. Gas Sensor Prototype

3.1.3. Movement Sensor Prototype

3.1.4. Radar Sensor Prototype

3.2. Dataset Construction

3.3. Dataset Augmentation

3.4. Machine Learning Models

3.4.1. Data Preprocessing

3.4.2. Model Evaluation

3.4.3. Hyperparameter Tuning

3.4.4. Comparison

3.4.5. Cross-Validation

4. Results

4.1. Exploratory Data Analysis

4.1.1. Numeric Variable Distributions

4.1.2. Variable Correlations

4.2. Class Imbalance

4.3. Feature Scaling

4.4. Data Partitioning

4.5. Model Validation

4.5.1. Evaluation Metrics

4.5.2. Confusion Matrices

4.5.3. Cross-Validation

4.5.4. Feature Importance

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI