Next Article in Journal
Recent Advances in Apical Periodontitis Treatment: A Narrative Review
Next Article in Special Issue
Biomedical Relation Extraction Using Dependency Graph and Decoder-Enhanced Transformer Model
Previous Article in Journal
Impact of an Accessory for Left Ventricular Assist Devices on Device Flow and Pressure Head In Vitro
Previous Article in Special Issue
Walking Stability and Risk of Falls
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Blood Glucose Level Time Series Forecasting: Nested Deep Ensemble Learning Lag Fusion

1
Department of Electronic and Electrical Engineering, University of Sheffield, Sheffield S10 2TN, UK
2
Department of Oncology and Metabolism, University of Sheffield, Sheffield S10 2TN, UK
3
Department of Diabetes and Endocrinology, Sheffield Teaching Hospitals, Sheffield S5 7AU, UK
*
Author to whom correspondence should be addressed.
Bioengineering 2023, 10(4), 487; https://doi.org/10.3390/bioengineering10040487
Submission received: 21 March 2023 / Revised: 12 April 2023 / Accepted: 17 April 2023 / Published: 19 April 2023

Abstract

:
Blood glucose level prediction is a critical aspect of diabetes management. It enables individuals to make informed decisions about their insulin dosing, diet, and physical activity. This, in turn, improves their quality of life and reduces the risk of chronic and acute complications. One conundrum in developing time-series forecasting models for blood glucose level prediction is to determine an appropriate length for look-back windows. On the one hand, studying short histories foists the risk of information incompletion. On the other hand, analysing long histories might induce information redundancy due to the data shift phenomenon. Additionally, optimal lag lengths are inconsistent across individuals because of the domain shift occurrence. Therefore, in bespoke analysis, either optimal lag values should be found for each individual separately or a globally suboptimal lag value should be used for all. The former approach degenerates the analysis’s congruency and imposes extra perplexity. With the latter, the fine-tunned lag is not necessarily the optimum option for all individuals. To cope with this challenge, this work suggests an interconnected lag fusion framework based on nested meta-learning analysis that improves the accuracy and precision of predictions for personalised blood glucose level forecasting. The proposed framework is leveraged to generate blood glucose prediction models for patients with type 1 diabetes by scrutinising two well-established publicly available Ohio type 1 diabetes datasets. The models developed undergo vigorous evaluation and statistical analysis from mathematical and clinical perspectives. The results achieved underpin the efficacy of the proposed method in blood glucose level time-series prediction analysis.

1. Introduction

Type 1 diabetes is a chronic metabolic disorder [1]. The disease is currently incurable [2,3]. Nevertheless, its effective management can dramatically mitigate the symptoms and the risk of associated short-term and long-term complications [4,5]. Accordingly, people with type 1 diabetes and their potential carers are normally educated on the standard practices to control the illness [6,7,8].
Self-management of type 1 diabetes is, however, burdensome and prone to human errors [9,10,11]. Hence, automating the management tasks would be highly beneficial [12,13]. Some developments have already been made related to this concern [14,15,16]. For example, technological breakthroughs, such as continuous glucose monitoring biosensors [17,18] and insulin pumps [19,20], nowadays, serve myriads of type 1 diabetes patients. The former, in a minimally invasive fashion, takes regular snapshots of blood glucose levels in alignment with the general advice on a frequent review of glycaemic state [21,22]. The latter semiautomates insulin administration, requiring minimum user interference [23,24,25]. Moreover, there are ongoing efforts to develop fully noninvasive continuous blood glucose level monitoring sensors to help more effective diabetes management [26,27,28,29].
Despite the advancements achieved so far, continued progress in the automation process is still demanded to further facilitate and effectuate the management of type 1 diabetes [30,31]. In this respect, engineering accurate blood glucose predictor devices would be game changing [32,33]. Such instruments can provide early warning about possible adverse glycaemic events so that automated or nonautomated pre-emptive measures can be taken [34,35]. Additionally, these devices are a prerequisite for the advent of a closed-loop artificial pancreas as the current vision for the ultimate automated management of type 1 diabetes [36,37].
For predicting blood glucose levels, physiological, data-driven, and hybrid modelling approaches can be pursued [38,39]. In the data-driven approach, also used in this research, current and past values of diabetes-management-related variables are studied to project future blood glucose excursion [38,40].
For constructing data-driven blood glucose level predictors, one of the three main categories of time-series forecasting approaches is typically used: classical time-series forecasting, traditional machine learning, or deep learning analysis. Among these, deep learning, as a member of the modern artificial intelligence family, has proven potency in solving complicated computational tasks, including complex time-series forecasting [41,42,43,44,45,46].
Predicting the blood glucose levels of individuals with type 1 diabetes is a convoluted forecasting mission due to the highly erratic behaviour of the phenomenon [47]. Thus, in line with many other time-series forecasting areas, deep learning has gained enormous popularity in the blood glucose level prediction realm [48,49]. Subsequently, extensive research has been underway to advance the analysis. Notwithstanding all the enhancements in this field so far, there still exist challenges to be addressed adequately [50]. This work contributes to addressing one such challenge.
When applying deep learning algorithms for data-driven time-series blood glucose level forecasting, lag observations of data are studied to predict specific future values. Here, a quandary is to select the appropriate length of history to be investigated. This issue is even more pronounced when considering the fact that due to the significant discrepancy in the blood glucose profile across type 1 diabetes patients, the common practice is to generate personalised models. In this circumstance, finding an optimal length of history separately for each individual entails further disparity and complexity in the analysis. To address this difficulty, the present work suggests a compound lag fusion approach by exploiting the potential of nested ensemble learning over typical ensemble learning analysis. This is the first paper, to the best of our knowledge, that incorporates nested meta-learning analysis in the field of blood glucose level prediction.
The rest of the article is outlined as follows. Section 2 reviews some recent studies on type 1 diabetes blood glucose level prediction. Section 3 concisely describes the datasets used in this research. Section 4 explains model development and assessment analysis. Section 5 presents the results of the model assessment analysis along with the relevant discussions. Finally, Section 6 summarises and concludes the work.

2. Literature Survey

In the following, a number of recent articles on data-driven blood glucose level prediction are succinctly overviewed. For further alignment with the contents of this study, the focus of this overview is on the application of state-of-the-art machine learning techniques and the use of Ohio type 1 diabetes datasets for model development and evaluation. A more comprehensive review of the latest revolutions in the blood glucose level prediction area can be studied at these references [51,52,53,54].
A recent article offered a multitask approach for blood glucose level prediction by experimenting on the Ohio datasets [55]. The methods are based on the concept of transfer learning. The study explicitly targets addressing the challenge of the need for extensively large amounts of data for personalised blood glucose level prediction. For this purpose, it suggests pre-training a model on a source domain and a multitask model on the whole dataset and then using these learning experiences in constructing personalised models. The authors showcase the efficacy of their propositions by comparing the performance of their approach with sequential transfer learning and subject-isolated learning.
An autonomous channel setup was recently presented for deep learning blood glucose level prediction using the Ohio datasets [56]. The proposed method chose the history lengths for different variables adaptively by affecting the time-dependency scale. The crux is to avoid dismissing useful information from variables with enduring influence and engaging uninformative data from variables with transient impact at the same time. The models generated in the study undergo comparison analysis with standard non-autonomous channel structures deploying mathematical and clinical assessments.
A deep learning approach based on dilated recurrent neural networks accompanied by transfer learning concepts is introduced for blood glucose level prediction [57]. In the study, personalised models are created for individuals with type 1 diabetes using an Ohio dataset. The method is examined for short-term forecasting tasks. Its supremacy over standard methods, including autoregressive models, support vector regression, and conventional neural networks, is shown.
Another study suggests an efficient method for univariate blood glucose level prediction [58]. In the analysis, recurrent neural networks were used as learners. The learners are trained in an end-to-end approach to predict future blood glucose levels 30 and 60 min in advance using only histories of blood glucose data. The models are developed and assessed using an Ohio dataset. The results achieved are comparable with the state-of-the-art research on the dataset. In addition to accuracy analysis, the study investigates the certainty in the predictions. To do so, a parameterised univariate Gaussian is tasked with calculating the standard deviation of the predictions as a representative of uncertainty.
Employing the concepts of the Internet of things, a study compares four broadly used models of glycaemia, including support vector machine, Bayesian regularised neural network, multilayer perceptron, and Gaussian approach [59]. These models are used to investigate the possibility of completing the data collected from 25 individuals with type 1 diabetes by mapping intricate patterns of data. The findings highlight the potential of such analysis in contributing to improved diabetes management. Further, among the approaches examined, Bayesian regularised neural networks outperform others by delivering the best root mean square error and coefficient of determination.

3. Material

For generating blood glucose level prediction models, this study uses two well-established, publicly accessible Ohio type 1 diabetes datasets [60]. The first dataset includes data for six individuals with type 1 diabetes. The participants’ age at the time of data collection was in a range of 40 to 60 years. The sample comprised four females and two males. This dataset was initially released for the first blood glucose challenge in Knowledge Discovery at the Healthcare Data conference in 2018. This dataset is referred to as the Ohio 2018 dataset hereafter. The second dataset also contains six people with type 1 diabetes, different from those in the first dataset. The data contributors in this dataset were in an age range of 20 to 80 years at the point of data acquisition. Five of them were male and one female. This dataset was originally distributed for the second blood glucose level prediction challenge in Knowledge Discovery at the Healthcare Data conference in 2020. Hereafter, we refer to this dataset as the Ohio 2020 dataset.
Both datasets contain diabetes-related modalities, including blood glucose, physical activity, carbohydrate intake, and bolus insulin injection. Blood glucose and bolus insulin data were collected automatically using physiological sensors. For the former, a Medtronic Enlite continuous glucose monitoring device was used. For the latter, patients in the Ohio 2018 dataset wore a Basis Peak fitness band that collected heart rate data as a representative of physical activity. Alternatively, subjects in the Ohio 2020 dataset wore an Empatica Embrace fitness band that tracked the magnitude of acceleration as a representative of physical activity data. On the other hand, carbohydrate and bolus insulin data were self-reported by individuals in both datasets.
In both datasets, data were collected for eight weeks. The data come with the training and testing set already separated by the data collection and distribution team. The last ten days of data are allocated as a testing set and the remaining former data points as the training set. In the present study, using training sets only, bespoke predictive models are created for future values of blood glucose levels from historical values of blood glucose itself as the indigenous variable, along with exogenous variables of physical activity, carbohydrate intake, and bolus insulin injection. The testing sets are then used to evaluate the generated models. Table 1 displays individuals’ identification number, sex, and age information together with a short representation of the statistical properties of blood glucose as the intrinsic variable in the dataset. A more comprehensive description of the Ohio datasets and the data collection process can be found in the original documentation [60].

4. Methods

This section explicates the methodological implementations for blood glucose level prediction model generation and evaluation. First, some curation steps performed to prepare the data for formal prediction modelling analysis are explained. Next, time-series forecasting models constructed for blood glucose level prediction are described. After that, the criteria considered for evaluating the generated predictive models are presented. Finally, statistical analysis operated on the model outputs is outlined.

4.1. Data Curation

The following pre-modelling curation steps are operated on the raw data to render the ensuing formal deep learning prediction modelling analysis more effective.

4.1.1. Missingness Treatment

The first data curation stage deals with the missing values presented in the automatically collected blood glucose and physical activity data. At the beginning and end of the blood glucose and physical activity series, there are some timespans where data are absent. This unavailability occurred because the subject did not start and finish wearing the sensing devices exactly at the same time. As an initial missing value treatment step, the head and tail of all series are trimmed by removing the void timestamps so that variables start and end from the same point. Afterwards, the linear interpolation technique is used to fill in missing values in the training sets of blood glucose and physical activity. Alternatively, for the testing sets of these modalities, the linear extrapolation technique is used to fill in missing values. This technique precludes future value observation in the evaluation stage, so the models created possess applicability for real-time monitoring.

4.1.2. Sparsity Handling

The sparsity of the self-reported carbohydrate and bolus insulin data is the next pre-modelling issue to be addressed. A reasonable assumption as to the unavailable values of these modalities in the majority of timestamps is that there has been no occurrence to be reported in those points. Therefore, for these two modalities, as a simple yet acceptable practice, zero values are assigned to non-reported timestamps.

4.1.3. Data Alignment

Another data curation step is to unify the frequency of exogenous modalities and align their timestamps with the blood glucose level as the indigenous variable. Initially, acceleration data are downsampled from a one-minute frequency to a five-minute frequency. For this purpose, the entries in the nearest neighbourhood to blood glucose timestamps are kept, and the remaining data points are removed. Following that, timestamps of all extrinsic variables are aligned with those of blood glucose levels with the minimum possible shifts.

4.1.4. Data Transformation

As the next data curation step, as a common practice, feature values are converted into a standardised form that machine learning models can analyse more effectively. For each variable, first, the average of training set values is subtracted from all values in both the training and testing sets. Then, all obtained values are divided by the standard deviation of the training set to make unit variance variables.

4.1.5. Stationarity Inspection

Stationary time-series data have statistical characteristics, including variance and mean, that do not change over time. In this data treatment step, the stationarity condition in the time-series data is satisfied. By conducting the feature transformation step explained in Section 4.1.4, the variances in the series are stabilised. To stabilise the mean of the series, the first-order differencing method is applied. Subsequently, the outcomes are examined using two prevalent statistical tests of Kwiatkowski–Phillips–Schmidt–Shin [61] and Augmented Dickey–Fuller [62], where both confirm the stationary of the series.

4.1.6. Problem Reframing

The final data curation phase translates the time-series blood glucose level prediction question to the supervised machine learning language. Hence, pairs of independent and dependent variables need to be constructed from the time-series data. To this end, a rolling window approach is used to appoint sequences of lag observations for blood glucose, physical activity, carbohydrate, and bolus insulin as the independent variables and sequences of blood glucose in the prediction horizon as the dependent variable.

4.2. Modelling

This subsection describes time-series forecasting models created for blood glucose level prediction 30 and 60 min into the future. This work undertakes a sequence-to-sequence fashion for multi-step-ahead time-series prediction. Prior to explaining the formal modelling process, it is useful to provide a brief explanation of stacking as an ensemble learning variation used in this work.

4.2.1. Preliminary

Ensemble learning is an advanced machine learning method that attempts to improve analysis performance by combining the decisions of multiple models [63]. Stacking is a type of ensemble learning in which a meta-learner intakes predictions of a number of base learners as an input feature to make final decisions [64].

4.2.2. Model Development

The diagram in Figure 1 displays the procedure contrived in this work for model creation. According to the diagram, the models are constructed by training three categories of learners: non-stacking, stacking, and nested stacking. The models generated based on the block diagram in Figure 1 are described below.
A non-stacking model takes a specific length of historical blood glucose, physical activity, carbohydrate, and bolus insulin data as multivariate input and returns a sequence of forecasted future blood glucose levels over a predefined prediction horizon of 30 or 60 min. According to the diagram in Figure 1, for each prediction horizon of 30 and 60 min, eight non-stacking models are created in aggregate. For this purpose, a multilayer perceptron network and a long short-term memory network are trained separately on four different lag lengths of 30, 60, 90, and 120 min.
A stacking model is a meta-model that takes sequence predictions from four non-stacking models with a homogenous learner (multilayer perceptron network or long short-term memory network) as multivariate input and fuses them to generate new prediction outputs. According to v, for each prediction horizon of 30 and 60 min, two stacking models are created, one with multilayer perceptron networks and the other with long short-term memory networks as the underlying embedded learners.
A nested stacking model is a nested meta-model. It receives the outcomes of the two stacking models described above as multivariate inputs and returns new predictions. As can be seen in Figure 1, two nested stacking models are generated for each prediction horizon of 30 and 60 min; one employs a multilayer perceptron network and the other a long short-term memory network as the nested stacking learner.
According to Figure 1, in all model creation scenarios, the learners recruited are either multilayer perceptron or long short-term memory networks. For simplicity and coherency, all multilayer perceptron networks have similar architectures consisting of an input layer, a hidden dense layer with 100 nodes, followed by another dense layer as output. Additionally, all long short-term memory networks are the vanilla type with an input layer, a hidden 100-node LSTM layer, and a dense output layer. Given the five-minute resolution of time-series data investigated, the number of nodes in the output layer is 6 and 12 for 30 min and 60 min prediction horizons, respectively. In all networks, He uniform is set as the initialiser, Adam as the optimiser, ReLU as the activation function, and mean square error as the loss function. Moreover, in all training scenarios, epoch size and batch size are set to 100 and 32, respectively. In addition, the learning rate is initiated from 0.01, and then using the ReduceLROnPlateau callback, it is reduced by a factor of 0.1 once the validation loss reduction stagnates with patience of ten iterations.

4.3. Model Assessment

This section describes the analyses performed to validate the functionality of the developed blood glucose level prediction models. The generated models are assessed from regression, clinical, and statistical perspectives, as discussed below.

4.3.1. Regression Evaluation

Four broadly applied regression metrics are determined to verify the performance of the constructed models from a mathematical viewpoint. Mean absolute error (Equation (1)), root mean square error (Equation (2)), and mean absolute percentage error (Equation (3)) rate the accuracy of predictions. Further, the coefficient of determination (Equation (4)) measures the correlation between the reference and predicted blood glucose levels.
M A E = i = 1 N B G L i B G ^ L i / N
R M S E = ( i = 1 N ( B G L i B G ^ L i ) 2 ) / N
M A P E = ( ( i = 1 N ( B G L i B G ^ L i ) / B G L i ) / N ) × 100
r 2 = 1 ( ( i = 1 N ( B G L i B G ^ L i ) 2 ) ( i = 1 N ( B G L i B G L ¯ ) 2 ) )
where MAE: mean absolute error; BGL: blood glucose level; N: the size of the testing set; RMSE: root mean square error; MAPE: mean absolute prediction error; r2: coefficient of determination.

4.3.2. Clinical Evaluation

Two criteria are employed to evaluate the developed models from a clinical standpoint. One criterion is the Matthew’s correlation coefficient [65]. It is a factor fundamentally used for assessing the effectuality of binary classifications. In this work, this metric, calculated as Equation (5), is exploited to investigate the potency of the blood glucose prediction models in discriminating adverse glycaemic events from euglycaemic events. Hereby, an adverse glycaemic event is defined as a blood glucose level lower than 70 mg/dL (hypoglycaemia) or more than 180 mg/dL (hyperglycaemia), and a euglycaemia event as a blood glucose level between 70 mg/dL and 180 mg/dL.
M C C = T P × T N F P × F N / T P + F P T P + F N T N + F P T N + F N
where TP: true positive (the count of correctly predicted adverse glycaemic events); TN: true negative (the count of correctly predicted euglycaemic events); FP: false positive (the count of falsely predicted adverse glycaemic events); FN: false negative (the count of falsely predicted euglycaemic events).
The other considered clinical evaluation criterion is surveillance error [66]. It is based on error grid analysis to identify the clinical risk of inaccuracies in blood glucose level predictions. Detailed calculations of surveillance error can be found in the original article [66]. However, a concise elucidation of the outcome of the calculations is as follows. A unitless error value is measured for each predicted blood glucose level. Errors smaller than 0.5 indicate clinically risk-free predictions. Errors between 0.5 and 1.5 indicate clinically slight-risk predictions. Errors between 1.5 and 2.5 indicate clinically moderate-risk predictions. Errors between 2.5 and 3.5 indicate clinically high-risk predictions. Finally, errors bigger than 3.5 indicate clinically critical-risk predictions. We adopt two evaluation metrics based on surveillance error calculation outcomes. One is the average of surveillance errors across the entire testing set, and the other is the proportion of obtained surveillance errors less than 0.5 (clinically riskless predictions) across the entire testing set.

4.3.3. Statistical Analysis

Statistical analysis is conducted for further side-by-side performance assessment for different models. In this sense, the non-parametric Friedman test is exercised to compare the outcomes of different models [67]. This test is privileged for inter-model comparative analysis across multiple datasets with no normality assumption requirement as opposed to the counterpart ANOVA test [68]. In this study, the test is assigned to compare the performance of different types of models considering individuals as independent data sources. To do so, a significant level of five percent is considered to examine the consistency of results achieved for evaluation metrics. The null hypothesis for the test is that the results of the non-stacking, stacking, and nested stacking models have identical distributions. In the next step, for cases where the global Friedman test detects the existence of a statistically significant difference amongst the models’ performance, the local Nemenyi test [69], as a post hoc procedure, compares the models in a pairwise manner. In this multi-comparison analysis, the Holm–Bonferroni method is used to adjust the significance level [70]. Finally, the heuristic critical difference approach is employed to visualise the outcomes of the post hoc analysis [71]. The statistical tests are operated on all evaluation metrics in both prediction horizons of 30 and 60 min. Both multilayer perceptron and long short-term memory networks are examined as learners separately.

5. Results and Discussion

This section presents the outcomes of model assessment analyses and the relevant discussion. Initially, the results of regression-wise and clinical-wise evaluation investigations are given for the non-stacking, stacking, and nested stacking models. Therein, for each metric, mean and standard deviation values achieved over five model runs are reported, a common practice in deep learning to counteract the stochastic nature of the analysis. After presenting the evaluation results, the results of the statistical analysis performed for more detailed comparison inspections between different types of models are exhibited.
The full evaluation results of the non-stacking models are compartmentalised in four tables given in Appendix A. Table A1 is dedicated to models with multilayer perceptron learners created on the Ohio 2018 dataset, Table A2 to models with multilayer perceptron learners created on the Ohio 2020 dataset, Table A3 to models with long short-term memory learners created on the Ohio 2018 dataset, and Table A4 to models with long short-term memory learners created on the Ohio 2020.
In the non-stacking analysis, there are four modelling scenarios for each patient: blood glucose level prediction 30 and 60 min in advance, once assigning multilayer perceptron and once long short-term memory as the learner. As can be seen in the Appendix A tables, for each scenario, four models are created by training the learner on 30, 60, 90, or 120 min of historical data separately. Additionally, there are four parallel modelling scenarios for stacking and nested stacking analysis: blood glucose level prediction 30 and 60 min in advance, once employing multilayer perceptron and once long short-term memory as the last-level learner. On the other hand, one model is created for each scenario in stacking and nested stacking analysis because different lags are not separately studied.
To compare the stacking and nested stacking analyses with the non-stacking analyses, initially, for each patient, one of the four non-stacking models created for each modelling scenario is selected as the representative. Then, the representative non-stacking models are studied in parallel with the counterpart stacking and nested stacking models. To select the representative non-stacking models, first, the best evaluation metrics achieved in each modelling scenario are marked in bold font in the Appendix A tables. Subsequently, the model delivering the highest number of best-obtained evaluation metrics, highlighted in grey in the tables, is deemed as the representative. For eligibility, the results for these models are given in Table 2. Moreover, the complete evaluation results for the stacking and nested stacking models are recorded in Table 3 and Table 4 respectively.
After picking the representative non-stacking models, the overall performance of these models is compared with the stacking and nested stacking counterparts. To this end, first, the Friedman test is conducted on these models’ outcomes. p-values less than a significance level of 5% reveal scenarios in which there is a statistically meaningful distinction in the outputs of the three types of models for a specific evaluation metric. To elicit the performance difference for these cases, critical difference analysis integrated with post hoc Nemenyi test is used. The results of the critical difference analysis are shown in Figure 2. These diagrams show the average ranking of the modelling approaches in generating superior outcomes for a given evaluation metric. In each figure, models with statistically different average rankings are linked via a thick horizontal line. From Figure 2, the nested stacking models yielded superior evaluation outcomes overall. These findings substantiate the effectiveness of the propositions in addressing the challenge of lag optimisation while conducting enhanced outcomes.
It is noteworthy that, according to the highlighted models in the Appendix A tables, an inconsistency in the efficient lag to be investigated for different patients, prediction horizons, and learners can be observed. In detail, the optimal lag is 30 min in 19 cases, 60 min in 19 cases, 90 min in 5 cases, and 120 min in 5 cases. Such disparity further accentuates the utility of the nested stacking analyses that efficaciously circumvent the lag optimisation process.

6. Summary and Conclusions

This work offers a nested meta-learning lag fusion approach to address the challenge of history length optimisation in personalised blood glucose level prediction. For this purpose, in lieu of examining different lengths of history from a search space and picking a local optimum for each subject or a global suboptimum for all subjects, all the lags in the search space are studied autonomously, and the results are amalgamated. A multilayer perceptron and long short-term memory network are initially trained on four different lags separately, resulting in four non-stacking models from each network. The outcomes of the four non-stacking multilayer perceptron models are then combined into new outcomes using a stacking multilayer perceptron model. Similarly, a stacking long short-term memory model fuses the results of the four non-stacking long short-term memory models. Finally, the decisions of the two stacking prediction models are ensembled once using a multilayer perceptron and once using a long short-term memory network as a nested stacking model. These investigations are performed for two commonly studied prediction horizons of 30 and 60 min in blood glucose level prediction research. The generated models undergo in-depth regression-wise, clinical-wise, and statistic-wise assessments. The results obtained substantiate the effectiveness of the proposed stacking and nested stacking methods in addressing the challenge of lag optimisation in blood glucose level prediction analysis.

7. Software and Code

For developing and evaluating blood glucose level prediction models, this research used Python 3.6 [72] programming. The libraries and packages employed include TensorFlow [73], Keras [73], Pandas [74], NumPy [75], Sklearn [76], SciPy [77], statsmodels [78], scikit-post hocs [79], and cd-diagram [80]. The source code for implementations is available on this Gitlab repository.

Author Contributions

H.K.: conceptualisation, methodology, software, validation, formal analysis, investigation, data curation, writing the original draft, review and editing, visualisation. H.N.: conceptualisation, methodology, software, validation, formal analysis, investigation, data curation, review and editing. J.E.: conceptualisation, validation, review and editing, supervision. M.B.: conceptualisation, methodology, validation, investigation, resources, review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

University of Sheffield Institutional Open Access Fund.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Ohio datasets used in this research are publicly accessible upon request by following the instructions provided in this link.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

In this section, the complete outcomes of evaluation analysis on the non-stacking models are provided in four tables, as below.
Table A1. The evaluation results for non-stacking models created by multilayer perceptron learners using Ohio 2018 dataset.
Table A1. The evaluation results for non-stacking models created by multilayer perceptron learners using Ohio 2018 dataset.
PIDPHLLEvaluation metric
RMSE ± SD
(mg/dL)
MAE ± SD
(mg/dL)
MAPE ± SD (%)r2 ± SD
(%)
MCC ± SD (%)SE < 0.5 ± SD
(%)
ASE ± SD
559303019.96 ± 0.0913.78 ± 0.118.83 ± 0.1190.45 ± 0.080.77 ± 0.000.90 ± 0.000.19 ± 0.00
6019.65 ± 0.0613.56 ± 0.038.78 ± 0.0390.75 ± 0.050.77 ± 0.000.90 ± 0.000.19 ± 0.00
9019.85 ± 0.0113.73 ± 0.028.81 ± 0.0490.56 ± 0.010.77 ± 0.000.90 ± 0.000.19 ± 0.00
12019.88 ± 0.0713.83 ± 0.058.81 ± 0.0490.53 ± 0.070.77 ± 0.000.90 ± 0.000.19 ± 0.00
603033.73 ± 0.0424.46 ± 0.0416.49 ± 0.0572.59 ± 0.060.58 ± 0.000.77 ± 0.000.33 ± 0.00
6032.04 ± 0.0523.12 ± 0.0915.43 ± 0.1175.26 ± 0.080.62 ± 0.010.79 ± 0.000.31 ± 0.00
9031.67 ± 0.0522.84 ± 0.0615.23 ± 0.0475.82 ± 0.080.64 ± 0.000.79 ± 0.000.31 ± 0.00
12031.36 ± 0.0622.78 ± 0.0615.18 ± 0.0776.30 ± 0.080.63 ± 0.000.79 ± 0.000.31 ± 0.00
563303018.71 ± 0.0513.46 ± 0.068.47 ± 0.0482.97 ± 0.090.74 ± 0.000.91 ± 0.000.19 ± 0.00
6018.89 ± 0.0313.33 ± 0.038.30 ± 0.0282.65 ± 0.050.74 ± 0.000.91 ± 0.000.19 ± 0.00
9019.09 ± 0.0313.42 ± 0.038.34 ± 0.0282.27 ± 0.060.74 ± 0.010.91 ± 0.000.19 ± 0.00
12019.29 ± 0.0113.61 ± 0.008.45 ± 0.0081.91 ± 0.020.73 ± 0.010.91 ± 0.000.19 ± 0.00
603030.44 ± 0.0822.46 ± 0.0814.40 ± 0.0655.00 ± 0.230.49 ± 0.000.78 ± 0.000.33 ± 0.00
6030.43 ± 0.0521.75 ± 0.0213.57 ± 0.0255.02 ± 0.140.56 ± 0.010.80 ± 0.000.30 ± 0.00
9030.65 ± 0.0121.69 ± 0.0413.46 ± 0.0454.36 ± 0.040.57 ± 0.010.81 ± 0.000.30 ± 0.00
12030.68 ± 0.1521.72 ± 0.0913.47 ± 0.0554.28 ± 0.440.57 ± 0.000.81 ± 0.000.30 ± 0.00
570303018.24 ± 0.1913.27 ± 0.156.74 ± 0.0892.71 ± 0.150.84 ± 0.000.95 ± 0.000.13 ± 0.00
6017.44 ± 0.0312.47 ± 0.036.38 ± 0.0393.34 ± 0.030.86 ± 0.000.96 ± 0.000.12 ± 0.00
9017.58 ± 0.0312.54 ± 0.036.45 ± 0.0193.24 ± 0.030.86 ± 0.000.96 ± 0.000.12 ± 0.00
12017.71 ± 0.1312.53 ± 0.116.41 ± 0.0693.13 ± 0.100.86 ± 0.000.96 ± 0.000.12 ± 0.00
603030.36 ± 0.0823.08 ± 0.0711.89 ± 0.0379.85 ± 0.100.74 ± 0.000.89 ± 0.000.22 ± 0.00
6028.89 ± 0.0321.33 ± 0.0410.92 ± 0.0181.76 ± 0.040.78 ± 0.000.91 ± 0.000.20 ± 0.00
9028.95 ± 0.1021.07 ± 0.0910.82 ± 0.0281.68 ± 0.130.79 ± 0.000.91 ± 0.000.20 ± 0.00
12029.00 ± 0.1420.97 ± 0.1310.73 ± 0.0481.62 ± 0.180.79 ± 0.000.91 ± 0.000.20 ± 0.00
575303024.12 ± 0.0616.05 ± 0.1011.43 ± 0.0984.48 ± 0.070.73 ± 0.000.86 ± 0.000.24 ± 0.00
6024.49 ± 0.0415.93 ± 0.0211.39 ± 0.0284.00 ± 0.060.73 ± 0.000.85 ± 0.000.25 ± 0.00
9024.38 ± 0.0915.97 ± 0.1311.56 ± 0.1184.13 ± 0.120.74 ± 0.000.85 ± 0.000.25 ± 0.00
12024.35 ± 0.0916.07 ± 0.1211.72 ± 0.1684.17 ± 0.120.75 ± 0.000.85 ± 0.010.25 ± 0.00
603036.22 ± 0.1026.77 ± 0.1219.49 ± 0.1065.08 ± 0.190.51 ± 0.000.69 ± 0.000.40 ± 0.00
6036.27 ± 0.2026.24 ± 0.2518.96 ± 0.1764.96 ± 0.390.54 ± 0.010.70 ± 0.000.39 ± 0.00
9035.90 ± 0.2325.73 ± 0.1118.79 ± 0.0965.68 ± 0.440.55 ± 0.000.70 ± 0.000.39 ± 0.00
12035.63 ± 0.1725.66 ± 0.2018.91 ± 0.1766.19 ± 0.320.57 ± 0.010.71 ± 0.000.38 ± 0.00
588303018.80 ± 0.0913.99 ± 0.098.63 ± 0.0784.49 ± 0.150.75 ± 0.000.92 ± 0.000.19 ± 0.00
6018.27 ± 0.4213.61 ± 0.208.36 ± 0.0685.35 ± 0.680.75 ± 0.020.93 ± 0.000.18 ± 0.00
9018.07 ± 0.3513.50 ± 0.158.29 ± 0.0185.66 ± 0.560.76 ± 0.010.93 ± 0.000.18 ± 0.00
12018.44 ± 0.6713.64 ± 0.378.26 ± 0.1385.06 ± 1.090.75 ± 0.020.93 ± 0.010.18 ± 0.00
603030.36 ± 0.1122.68 ± 0.1314.16 ± 0.1259.60 ± 0.280.58 ± 0.000.77 ± 0.000.31 ± 0.00
6030.72 ± 0.2622.76 ± 0.2513.62 ± 0.1658.65 ± 0.690.56 ± 0.010.79 ± 0.000.30 ± 0.00
9030.58 ± 0.0522.47 ± 0.1013.41 ± 0.0859.01 ± 0.130.56 ± 0.000.80 ± 0.000.29 ± 0.00
12030.48 ± 0.2522.39 ± 0.2613.33 ± 0.1959.29 ± 0.670.57 ± 0.010.80 ± 0.000.29 ± 0.00
591303022.89 ± 0.0216.68 ± 0.0212.98 ± 0.0280.47 ± 0.040.62 ± 0.000.79 ± 0.000.29 ± 0.00
6022.98 ± 0.1116.61 ± 0.0512.99 ± 0.0380.32 ± 0.180.65 ± 0.010.80 ± 0.000.29 ± 0.00
9023.01 ± 0.0616.71 ± 0.0113.12 ± 0.0280.26 ± 0.090.64 ± 0.010.80 ± 0.000.29 ± 0.00
12022.97 ± 0.0716.78 ± 0.0513.21 ± 0.1180.32 ± 0.120.64 ± 0.010.80 ± 0.000.29 ± 0.00
603035.00 ± 0.0527.27 ± 0.0622.01 ± 0.0754.35 ± 0.140.36 ± 0.000.64 ± 0.000.45 ± 0.00
6035.93 ± 0.0727.77 ± 0.0222.37 ± 0.0751.89 ± 0.190.35 ± 0.000.63 ± 0.000.46 ± 0.00
9034.98 ± 0.0526.93 ± 0.0821.91 ± 0.1354.41 ± 0.120.39 ± 0.000.65 ± 0.000.45 ± 0.00
12034.91 ± 0.0727.12 ± 0.1622.19 ± 0.2554.60 ± 0.190.39 ± 0.000.65 ± 0.000.45 ± 0.00
Note. Values in bold indicate the best evaluation outcome for each metric in each learning scenario, and grey highlights denote the best model in each scenario based on the best-achieved evaluation metrics. Note. PID: patient identification; PH: prediction horizon; LL: lag length; RMSE: root mean square error; SD: standard deviation; MAE: mean absolute error; MAPE: mean absolute percentage error; r2: coefficient of determination; MCC: Matthew’s correlation coefficient; SE: surveillance error; ASE: average surveillance error.
Table A2. The evaluation results for non-stacking models created by multilayer perceptron learners using Ohio 2020 dataset.
Table A2. The evaluation results for non-stacking models created by multilayer perceptron learners using Ohio 2020 dataset.
PID PH LL Evaluation Metric
RMSE ± SD
(mg/dL)
MAE ± SD
(mg/dL)
MAPE ± SD (%)r2 ± SD
(%)
MCC ± SD (%)SE < 0.5 ± SD
(%)
ASE ± SD
540303023.48 ± 0.0417.73 ± 0.0312.88 ± 0.0086.93 ± 0.040.67 ± 0.000.81 ± 0.000.28 ± 0.00
6022.88 ± 0.1317.45 ± 0.1012.71 ± 0.0487.60 ± 0.140.68 ± 0.000.81 ± 0.000.27 ± 0.00
9023.41 ± 0.0817.79 ± 0.0412.84 ± 0.0487.02 ± 0.090.68 ± 0.000.81 ± 0.000.28 ± 0.00
12023.61 ± 0.1317.92 ± 0.0712.86 ± 0.0286.79 ± 0.150.67 ± 0.000.81 ± 0.000.28 ± 0.00
603040.74 ± 0.1631.20 ± 0.1523.55 ± 0.1260.76 ± 0.320.49 ± 0.000.65 ± 0.000.45 ± 0.00
6039.84 ± 0.1430.49 ± 0.1222.96 ± 0.1362.48 ± 0.270.52 ± 0.000.66 ± 0.000.44 ± 0.00
9040.15 ± 0.1630.68 ± 0.1523.09 ± 0.1461.90 ± 0.300.52 ± 0.010.66 ± 0.000.44 ± 0.00
12040.38 ± 0.1630.88 ± 0.1423.16 ± 0.0761.45 ± 0.310.52 ± 0.000.66 ± 0.000.44 ± 0.00
544303017.76 ± 0.0612.45 ± 0.078.47 ± 0.0787.73 ± 0.090.78 ± 0.000.91 ± 0.000.18 ± 0.00
6017.37 ± 0.0312.14 ± 0.038.21 ± 0.0388.26 ± 0.040.78 ± 0.000.92 ± 0.000.18 ± 0.00
9017.61 ± 0.0312.42 ± 0.048.35 ± 0.0387.94 ± 0.050.77 ± 0.000.91 ± 0.000.18 ± 0.00
12017.78 ± 0.1012.49 ± 0.048.39 ± 0.0387.71 ± 0.130.77 ± 0.000.91 ± 0.000.19 ± 0.00
603029.25 ± 0.0821.79 ± 0.0815.29 ± 0.0866.61 ± 0.190.59 ± 0.000.75 ± 0.000.32 ± 0.00
6028.49 ± 0.0320.74 ± 0.0414.16 ± 0.0568.32 ± 0.070.63 ± 0.000.78 ± 0.000.30 ± 0.00
9028.92 ± 0.0921.03 ± 0.0214.29 ± 0.0467.35 ± 0.200.63 ± 0.000.77 ± 0.000.30 ± 0.00
12029.14 ± 0.1221.12 ± 0.0914.32 ± 0.0466.86 ± 0.270.62 ± 0.000.77 ± 0.000.31 ± 0.00
552303014.06 ± 0.038.25 ± 0.116.48 ± 0.0986.18 ± 0.050.75 ± 0.000.92 ± 0.000.14 ± 0.00
6014.32 ± 0.088.91 ± 0.087.03 ± 0.0685.67 ± 0.160.73 ± 0.000.91 ± 0.000.15 ± 0.00
9014.47 ± 0.109.25 ± 0.097.30 ± 0.0985.36 ± 0.200.72 ± 0.000.91 ± 0.000.15 ± 0.00
12014.60 ± 0.089.42 ± 0.037.44 ± 0.0385.09 ± 0.160.72 ± 0.000.91 ± 0.000.15 ± 0.00
603023.83 ± 0.0314.57 ± 0.1011.75 ± 0.1260.36 ± 0.090.64 ± 0.000.84 ± 0.000.22 ± 0.00
6023.71 ± 0.0614.94 ± 0.0612.07 ± 0.0660.78 ± 0.180.63 ± 0.000.84 ± 0.000.22 ± 0.00
9023.75 ± 0.0815.44 ± 0.0912.42 ± 0.0660.66 ± 0.260.64 ± 0.000.84 ± 0.000.23 ± 0.00
12023.87 ± 0.0715.50 ± 0.0912.47 ± 0.0860.25 ± 0.220.64 ± 0.000.84 ± 0.000.23 ± 0.00
567303022.72 ± 0.0416.47 ± 0.0412.48 ± 0.0384.80 ± 0.050.64 ± 0.000.80 ± 0.000.28 ± 0.00
6022.98 ± 0.0716.63 ± 0.0712.93 ± 0.0784.44 ± 0.100.64 ± 0.000.80 ± 0.000.29 ± 0.00
9023.48 ± 0.1817.24 ± 0.1513.48 ± 0.1283.77 ± 0.250.62 ± 0.000.79 ± 0.000.31 ± 0.00
12024.18 ± 0.2017.98 ± 0.1514.18 ± 0.1282.78 ± 0.290.61 ± 0.000.78 ± 0.000.32 ± 0.00
603038.38 ± 0.0229.51 ± 0.0423.24 ± 0.0656.68 ± 0.040.46 ± 0.000.64 ± 0.000.47 ± 0.00
6039.00 ± 0.0729.36 ± 0.0123.95 ± 0.0155.27 ± 0.150.48 ± 0.000.64 ± 0.000.48 ± 0.00
9039.46 ± 0.0729.96 ± 0.0124.71 ± 0.0354.22 ± 0.170.46 ± 0.000.63 ± 0.000.49 ± 0.00
12040.39 ± 0.1530.91 ± 0.0825.66 ± 0.0952.01 ± 0.350.44 ± 0.000.62 ± 0.000.51 ± 0.00
584303023.25 ± 0.0816.72 ± 0.0611.00 ± 0.0784.88 ± 0.100.76 ± 0.000.87 ± 0.000.23 ± 0.00
6022.78 ± 0.0416.92 ± 0.0411.34 ± 0.0385.49 ± 0.050.77 ± 0.000.87 ± 0.000.23 ± 0.00
9022.80 ± 0.0217.17 ± 0.0311.51 ± 0.0285.47 ± 0.030.76 ± 0.000.88 ± 0.000.24 ± 0.00
12023.30 ± 0.1017.59 ± 0.1011.79 ± 0.0884.82 ± 0.130.75 ± 0.000.87 ± 0.000.25 ± 0.00
603037.53 ± 0.0327.65 ± 0.2218.33 ± 0.2760.48 ± 0.070.59 ± 0.000.71 ± 0.010.37 ± 0.00
6035.99 ± 0.0527.29 ± 0.0218.40 ± 0.0363.67 ± 0.110.60 ± 0.000.72 ± 0.000.37 ± 0.00
9036.04 ± 0.0627.64 ± 0.0618.72 ± 0.0763.56 ± 0.120.59 ± 0.000.72 ± 0.000.38 ± 0.00
12036.39 ± 0.0427.83 ± 0.0918.84 ± 0.1262.85 ± 0.080.58 ± 0.000.71 ± 0.000.38 ± 0.00
596303018.66 ± 0.0913.47 ± 0.1110.09 ± 0.1085.82 ± 0.140.71 ± 0.000.89 ± 0.000.21 ± 0.00
6017.87 ± 0.0812.89 ± 0.069.67 ± 0.0386.99 ± 0.120.74 ± 0.000.89 ± 0.000.20 ± 0.00
9017.87 ± 0.0912.93 ± 0.069.71 ± 0.0386.99 ± 0.130.75 ± 0.000.89 ± 0.000.20 ± 0.00
12017.95 ± 0.0512.98 ± 0.039.76 ± 0.0286.89 ± 0.070.74 ± 0.000.90 ± 0.000.20 ± 0.00
603030.46 ± 0.1022.78 ± 0.0817.57 ± 0.0862.29 ± 0.250.52 ± 0.000.78 ± 0.000.33 ± 0.00
6029.00 ± 0.1321.43 ± 0.1416.36 ± 0.1365.83 ± 0.300.56 ± 0.000.80 ± 0.000.31 ± 0.00
9028.79 ± 0.0521.35 ± 0.0716.28 ± 0.0766.32 ± 0.130.57 ± 0.010.80 ± 0.000.31 ± 0.00
12028.83 ± 0.1621.37 ± 0.1616.34 ± 0.1666.22 ± 0.370.57 ± 0.010.81 ± 0.000.31 ± 0.00
Note. Values in bold indicate the best evaluation outcome for each metric in each learning scenario, and grey highlights denote the best model in each scenario based on the best-achieved evaluation metrics. Note. PID: patient identification; PH: prediction horizon; LL: lag length; RMSE: root mean square error; SD: standard deviation; MAE: mean absolute error; MAPE: mean absolute percentage error; r2: coefficient of determination; MCC: Matthew’s correlation coefficient; SE: surveillance error; ASE: average surveillance error.
Table A3. The evaluation results for non-stacking models created by long short-term memory learners using Ohio 2018 dataset.
Table A3. The evaluation results for non-stacking models created by long short-term memory learners using Ohio 2018 dataset.
PIDPHLLEvaluation Metric
RMSE ± SD
(mg/dL)
MAE ± SD
(mg/dL)
MAPE ± SD (%)r2 ± SD
(%)
MCC ± SD (%)SE < 0.5 ± SD
(%)
ASE ± SD
559303023.12 ± 0.4316.60 ± 0.6611.10 ± 0.6387.19 ± 0.470.74 ± 0.010.86 ± 0.010.24 ± 0.01
6023.51 ± 0.3616.79 ± 0.5411.02 ± 0.6486.76 ± 0.400.74 ± 0.010.87 ± 0.010.23 ± 0.01
9025.50 ± 1.1917.44 ± 0.6410.71 ± 0.1384.39 ± 1.440.72 ± 0.030.87 ± 0.010.23 ± 0.00
12032.86 ± 13.2023.72 ± 10.6015.55 ± 8.0171.35 ± 23.130.63 ± 0.190.78 ± 0.160.31 ± 0.15
603038.39 ± 0.8227.05 ± 0.5316.65 ± 0.2164.46 ± 1.520.57 ± 0.010.75 ± 0.000.35 ± 0.00
6038.73 ± 4.4127.75 ± 3.5817.37 ± 1.5063.53 ± 8.420.54 ± 0.070.73 ± 0.050.37 ± 0.05
9037.77 ± 3.2726.72 ± 2.0416.92 ± 0.4765.46 ± 6.010.58 ± 0.020.75 ± 0.010.35 ± 0.02
12036.08 ± 1.4725.38 ± 0.8416.62 ± 0.2568.60 ± 2.560.59 ± 0.020.75 ± 0.010.34 ± 0.01
563303021.59 ± 0.6415.33 ± 0.459.69 ± 0.1977.31 ± 1.340.72 ± 0.010.89 ± 0.000.22 ± 0.00
6021.73 ± 0.4615.52 ± 0.339.82 ± 0.3277.03 ± 0.960.73 ± 0.000.89 ± 0.000.22 ± 0.01
9024.91 ± 1.8417.49 ± 1.3810.96 ± 1.0269.71 ± 4.550.69 ± 0.030.87 ± 0.020.24 ± 0.02
12024.04 ± 1.8916.94 ± 1.1510.65 ± 0.7271.79 ± 4.430.69 ± 0.010.87 ± 0.010.24 ± 0.01
603033.02 ± 0.6224.13 ± 0.6115.07 ± 0.1847.03 ± 2.010.51 ± 0.010.75 ± 0.020.33 ± 0.01
6034.44 ± 2.4825.05 ± 2.2415.80 ± 1.3742.17 ± 8.460.48 ± 0.090.74 ± 0.060.35 ± 0.03
9034.32 ± 1.2324.45 ± 1.0415.16 ± 0.6342.73 ± 4.130.52 ± 0.010.77 ± 0.020.34 ± 0.01
12034.13 ± 1.5924.66 ± 1.1015.27 ± 0.6243.33 ± 5.270.50 ± 0.020.76 ± 0.020.34 ± 0.01
570303024.78 ± 3.9618.97 ± 3.768.84 ± 1.3086.33 ± 4.120.82 ± 0.010.94 ± 0.010.16 ± 0.02
6025.83 ± 5.1119.99 ± 4.769.28 ± 1.8785.02 ± 5.590.81 ± 0.030.93 ± 0.020.17 ± 0.03
9023.09 ± 2.2817.15 ± 2.098.26 ± 0.7488.25 ± 2.300.82 ± 0.010.94 ± 0.000.15 ± 0.01
12022.92 ± 1.4916.16 ± 1.158.04 ± 0.6588.47 ± 1.520.81 ± 0.020.94 ± 0.010.15 ± 0.01
603038.34 ± 2.6529.98 ± 2.5213.56 ± 0.9567.77 ± 4.480.75 ± 0.010.88 ± 0.010.25 ± 0.02
6035.80 ± 1.5026.75 ± 1.8512.68 ± 0.4371.95 ± 2.310.75 ± 0.000.88 ± 0.010.23 ± 0.01
9037.00 ± 2.4827.94 ± 1.8613.17 ± 0.9969.98 ± 4.090.75 ± 0.030.87 ± 0.020.24 ± 0.02
12035.80 ± 2.6225.82 ± 2.7012.58 ± 0.9571.89 ± 4.090.75 ± 0.020.88 ± 0.010.23 ± 0.02
575303027.20 ± 0.5718.25 ± 0.4513.14 ± 0.7180.24 ± 0.820.69 ± 0.000.82 ± 0.020.28 ± 0.01
6027.52 ± 0.7618.26 ± 0.3713.07 ± 0.3279.77 ± 1.130.69 ± 0.010.82 ± 0.000.28 ± 0.01
9028.37 ± 0.9918.89 ± 0.8813.78 ± 0.6978.51 ± 1.510.68 ± 0.010.80 ± 0.010.30 ± 0.01
12029.33 ± 1.1219.83 ± 1.6313.69 ± 0.6077.03 ± 1.740.65 ± 0.050.80 ± 0.020.29 ± 0.01
603038.09 ± 0.0327.47 ± 0.5220.48 ± 1.2061.36 ± 0.070.54 ± 0.020.70 ± 0.000.41 ± 0.01
6039.96 ± 0.8428.84 ± 0.2721.39 ± 1.0757.46 ± 1.780.55 ± 0.030.68 ± 0.010.44 ± 0.01
9038.15 ± 0.5227.58 ± 0.2220.56 ± 0.4961.24 ± 1.060.52 ± 0.010.68 ± 0.010.42 ± 0.01
12039.47 ± 1.2828.64 ± 0.4321.35 ± 0.4458.48 ± 2.690.54 ± 0.010.67 ± 0.010.43 ± 0.01
588303019.23 ± 0.1114.16 ± 0.118.53 ± 0.1283.77 ± 0.190.74 ± 0.000.92 ± 0.000.19 ± 0.00
6019.60 ± 0.2314.57 ± 0.158.83 ± 0.0783.13 ± 0.390.74 ± 0.010.92 ± 0.000.19 ± 0.00
9020.33 ± 0.8615.00 ± 0.738.87 ± 0.3681.84 ± 1.540.73 ± 0.010.92 ± 0.000.19 ± 0.01
12021.99 ± 1.7416.39 ± 1.079.64 ± 0.7778.69 ± 3.390.69 ± 0.020.91 ± 0.020.20 ± 0.02
603031.32 ± 0.5323.12 ± 0.5614.05 ± 0.6857.00 ± 1.480.57 ± 0.010.79 ± 0.020.30 ± 0.02
6030.46 ± 0.6022.48 ± 0.3914.04 ± 0.2359.33 ± 1.610.60 ± 0.010.79 ± 0.010.30 ± 0.01
9032.01 ± 0.5323.06 ± 0.3314.11 ± 0.4755.07 ± 1.480.58 ± 0.020.80 ± 0.010.30 ± 0.01
12035.57 ± 4.2125.60 ± 2.7415.65 ± 1.6944.02 ± 13.550.50 ± 0.080.76 ± 0.030.33 ± 0.03
591303026.00 ± 0.5419.63 ± 0.5415.81 ± 0.7574.78 ± 1.040.58 ± 0.010.74 ± 0.000.35 ± 0.01
6026.33 ± 0.4219.55 ± 0.2415.65 ± 0.4074.16 ± 0.830.60 ± 0.000.75 ± 0.010.34 ± 0.01
9027.44 ± 1.0220.46 ± 0.5815.63 ± 0.9871.90 ± 2.100.55 ± 0.050.74 ± 0.010.34 ± 0.01
12027.16 ± 0.8820.13 ± 0.6315.75 ± 0.8572.48 ± 1.780.57 ± 0.030.74 ± 0.020.34 ± 0.01
603036.51 ± 0.2028.36 ± 0.2623.32 ± 0.2750.32 ± 0.540.37 ± 0.020.63 ± 0.000.47 ± 0.00
6037.52 ± 0.9328.36 ± 0.3222.47 ± 0.5747.52 ± 2.580.36 ± 0.040.63 ± 0.010.47 ± 0.00
9037.92 ± 1.4429.32 ± 1.1624.31 ± 1.5146.38 ± 4.100.39 ± 0.040.63 ± 0.010.48 ± 0.01
12037.07 ± 1.6728.38 ± 1.1422.37 ± 0.8948.73 ± 4.570.37 ± 0.020.63 ± 0.020.47 ± 0.02
Note. Values in bold indicate the best evaluation outcome for each metric in each learning scenario, and grey highlights denote the best model in each scenario based on the best-achieved evaluation metrics. Note. PID: patient identification; PH: prediction horizon; LL: lag length; RMSE: root mean square error; SD: standard deviation; MAE: mean absolute error; MAPE: mean absolute percentage error; r2: coefficient of determination; MCC: Matthew’s correlation coefficient; SE: surveillance error; ASE: average surveillance error.
Table A4. The evaluation results for non-stacking models created by long short-term memory learners using Ohio 2020 dataset.
Table A4. The evaluation results for non-stacking models created by long short-term memory learners using Ohio 2020 dataset.
PIDPHLLEvaluation Metric
RMSE ± SD
(mg/dL)
MAE ± SD
(mg/dL)
MAPE ± SD (%)r2 ± SD
(%)
MCC ± SD (%)SE < 0.5 ± SD
(%)
ASE ± SD
540303025.76 ± 1.2619.38 ± 0.6214.84 ± 0.2484.25 ± 1.550.67 ± 0.010.79 ± 0.000.31 ± 0.00
6024.84 ± 0.4218.48 ± 0.7013.81 ± 1.2485.37 ± 0.490.67 ± 0.020.80 ± 0.010.29 ± 0.02
9028.02 ± 3.6421.40 ± 2.6815.98 ± 2.3081.18 ± 4.680.63 ± 0.030.76 ± 0.030.33 ± 0.04
12027.92 ± 1.8221.00 ± 1.9915.38 ± 2.2981.48 ± 2.400.63 ± 0.020.76 ± 0.020.32 ± 0.04
603042.60 ± 1.1531.84 ± 0.4123.25 ± 0.5357.07 ± 2.320.48 ± 0.020.64 ± 0.010.45 ± 0.00
6041.36 ± 0.5830.69 ± 0.3722.40 ± 0.2059.56 ± 1.120.50 ± 0.020.66 ± 0.000.44 ± 0.00
9043.78 ± 2.8032.44 ± 2.0223.51 ± 1.6654.55 ± 5.780.50 ± 0.040.64 ± 0.020.45 ± 0.02
12048.17 ± 1.3934.62 ± 2.0924.69 ± 2.3345.10 ± 3.150.48 ± 0.040.63 ± 0.030.48 ± 0.03
544303021.23 ± 0.5315.00 ± 0.499.93 ± 0.3582.45 ± 0.870.76 ± 0.010.89 ± 0.000.21 ± 0.01
6020.66 ± 0.3114.71 ± 0.439.99 ± 0.5383.40 ± 0.500.75 ± 0.010.88 ± 0.020.22 ± 0.01
9022.55 ± 0.4515.56 ± 0.3710.40 ± 0.2780.21 ± 0.790.72 ± 0.010.88 ± 0.010.22 ± 0.00
12023.38 ± 2.9416.49 ± 1.8111.35 ± 1.3078.51 ± 5.180.71 ± 0.040.84 ± 0.030.24 ± 0.03
603031.43 ± 0.0523.19 ± 0.0815.59 ± 0.1661.46 ± 0.120.58 ± 0.010.76 ± 0.000.32 ± 0.00
6030.45 ± 0.1222.09 ± 0.4514.81 ± 0.5263.83 ± 0.290.59 ± 0.020.78 ± 0.010.31 ± 0.01
9032.39 ± 0.6122.91 ± 0.3215.40 ± 0.3959.04 ± 1.550.57 ± 0.010.76 ± 0.010.33 ± 0.01
12036.19 ± 1.3825.61 ± 0.4017.44 ± 0.1048.85 ± 3.940.52 ± 0.040.74 ± 0.010.36 ± 0.01
552303016.72 ± 0.4410.31 ± 0.248.04 ± 0.2280.45 ± 1.010.71 ± 0.020.90 ± 0.010.16 ± 0.01
6021.54 ± 3.5114.67 ± 3.6211.21 ± 2.3766.99 ± 10.530.59 ± 0.140.85 ± 0.040.22 ± 0.04
9018.81 ± 1.5012.58 ± 1.529.73 ± 0.9875.16 ± 3.970.69 ± 0.010.89 ± 0.010.19 ± 0.01
12020.91 ± 5.4414.00 ± 4.2311.01 ± 3.8768.05 ± 17.090.69 ± 0.080.85 ± 0.100.22 ± 0.08
603025.47 ± 0.3016.27 ± 0.2413.02 ± 0.2754.73 ± 1.050.61 ± 0.010.83 ± 0.010.24 ± 0.01
6027.15 ± 1.0018.20 ± 0.9215.02 ± 0.9348.51 ± 3.760.58 ± 0.030.78 ± 0.020.28 ± 0.02
9027.51 ± 2.9817.70 ± 1.9614.55 ± 1.7346.78 ± 11.780.56 ± 0.060.80 ± 0.040.27 ± 0.04
12040.75 ± 25.3732.17 ± 26.9926.17 ± 21.8345.82 ± 170.040.33 ± 0.440.60 ± 0.380.53 ± 0.49
567303026.21 ± 1.0018.74 ± 1.0014.41 ± 1.0179.74 ± 1.560.61 ± 0.010.77 ± 0.010.32 ± 0.02
6025.54 ± 0.3218.38 ± 0.2813.83 ± 0.5580.78 ± 0.480.61 ± 0.010.78 ± 0.000.31 ± 0.01
9024.64 ± 0.9717.85 ± 0.8113.48 ± 0.6682.10 ± 1.410.60 ± 0.010.78 ± 0.010.31 ± 0.01
12027.89 ± 3.4520.96 ± 3.2616.17 ± 2.9476.86 ± 5.470.57 ± 0.050.74 ± 0.040.35 ± 0.06
603043.16 ± 1.2732.69 ± 1.2127.34 ± 1.2345.19 ± 3.240.44 ± 0.020.60 ± 0.020.53 ± 0.02
6040.13 ± 1.2230.57 ± 1.1425.05 ± 1.9652.61 ± 2.860.45 ± 0.010.62 ± 0.020.50 ± 0.03
9042.89 ± 2.2932.84 ± 2.0326.97 ± 2.5745.79 ± 5.740.41 ± 0.010.60 ± 0.020.53 ± 0.03
12045.08 ± 4.5234.30 ± 3.0126.78 ± 0.5639.83 ± 12.300.40 ± 0.060.58 ± 0.040.54 ± 0.04
584303026.87 ± 0.7719.56 ± 0.7213.10 ± 0.5579.81 ± 1.160.72 ± 0.020.84 ± 0.010.26 ± 0.01
6025.31 ± 1.3218.27 ± 0.9511.49 ± 0.5282.05 ± 1.890.75 ± 0.010.86 ± 0.010.23 ± 0.01
9025.93 ± 1.0319.25 ± 0.8213.00 ± 0.6581.19 ± 1.470.74 ± 0.010.85 ± 0.010.26 ± 0.01
12027.62 ± 0.8020.65 ± 1.2113.36 ± 0.3578.66 ± 1.240.72 ± 0.020.84 ± 0.010.27 ± 0.00
603041.45 ± 1.5831.50 ± 1.9121.43 ± 2.1751.75 ± 3.640.55 ± 0.030.67 ± 0.040.42 ± 0.04
6042.14 ± 1.6032.72 ± 1.7823.12 ± 1.6050.12 ± 3.740.55 ± 0.010.64 ± 0.040.45 ± 0.03
9041.75 ± 0.9032.60 ± 0.8322.86 ± 1.0051.08 ± 2.110.56 ± 0.010.65 ± 0.020.44 ± 0.02
12047.83 ± 3.5437.15 ± 4.3425.97 ± 4.3735.58 ± 9.660.46 ± 0.050.59 ± 0.070.50 ± 0.08
596303019.96 ± 0.2814.31 ± 0.0310.83 ± 0.1883.78 ± 0.450.70 ± 0.010.87 ± 0.000.23 ± 0.00
6021.15 ± 0.6515.31 ± 0.4011.64 ± 0.4181.77 ± 1.120.69 ± 0.010.86 ± 0.010.24 ± 0.01
9022.54 ± 0.8216.38 ± 0.9512.32 ± 0.9079.29 ± 1.500.66 ± 0.040.85 ± 0.010.25 ± 0.01
12033.46 ± 10.2925.29 ± 8.4519.64 ± 6.9251.54 ± 25.670.50 ± 0.160.75 ± 0.100.36 ± 0.11
603030.97 ± 0.1922.79 ± 0.1717.23 ± 0.2261.02 ± 0.480.52 ± 0.010.78 ± 0.000.33 ± 0.00
6030.28 ± 0.7222.17 ± 0.7116.97 ± 0.4562.72 ± 1.770.56 ± 0.020.79 ± 0.000.32 ± 0.01
9031.70 ± 1.2523.44 ± 1.2217.94 ± 1.2159.12 ± 3.240.52 ± 0.030.78 ± 0.010.34 ± 0.02
12036.31 ± 9.6827.21 ± 8.4821.03 ± 6.8743.87 ± 30.660.43 ± 0.210.71 ± 0.130.40 ± 0.11
Note. Values in bold indicate the best evaluation outcome for each metric in each learning scenario, and grey highlights denote the best model in each scenario based on the best-achieved evaluation metrics. Note. PID: patient identification; PH: prediction horizon; LL: lag length; RMSE: root mean square error; SD: standard deviation; MAE: mean absolute error; MAPE: mean absolute percentage error; r2: coefficient of determination; MCC: Matthew’s correlation coefficient; SE: surveillance error; ASE: average surveillance error.

References

  1. DiMeglio, L.A.; Evans-Molina, C.; Oram, R.A. Type 1 Diabetes. Lancet 2018, 391, 2449–2462. [Google Scholar] [CrossRef] [PubMed]
  2. Melin, J.; Lynch, K.F.; Lundgren, M.; Aronsson, C.A.; Larsson, H.E.; Johnson, S.B.; Rewers, M.; Barbour, A.; Bautista, K.; Baxter, J.; et al. Is Staff Consistency Important to Parents’ Satisfaction in a Longitudinal Study of Children at Risk for Type 1 Diabetes: The TEDDY Study. BMC Endocr. Disord. 2022, 22, 19. [Google Scholar] [CrossRef] [PubMed]
  3. Khadem, H.; Nemat, H.; Elliott, J.; Benaissa, M. Interpretable Machine Learning for Inpatient COVID-19 Mortality Risk Assessments: Diabetes Mellitus Exclusive Interplay. Sensors 2022, 22, 8757. [Google Scholar] [CrossRef] [PubMed]
  4. Yamada, T.; Shojima, N.; Noma, H.; Yamauchi, T.; Kadowaki, T. Sodium-Glucose Co-Transporter-2 Inhibitors as Add-on Therapy to Insulin for Type 1 Diabetes Mellitus: Systematic Review and Meta-Analysis of Randomized Controlled Trials. Diabetes Obes. Metab. 2018, 20, 1755–1761. [Google Scholar] [CrossRef] [PubMed]
  5. Smith, A.; Harris, C. Type 1 Diabetes: Management Strategies. Am. Fam. Physician 2018, 98, 154–162. [Google Scholar]
  6. Hamilton, K.; Stanton-Fay, S.H.; Chadwick, P.M.; Lorencatto, F.; de Zoysa, N.; Gianfrancesco, C.; Taylor, C.; Coates, E.; Breckenridge, J.P.; Cooke, D.; et al. Sustained Type 1 Diabetes Self-Management: Specifying the Behaviours Involved and Their Influences. Diabet. Med. 2021, 38, e14430. [Google Scholar] [CrossRef]
  7. Campbell, F.; Lawton, J.; Rankin, D.; Clowes, M.; Coates, E.; Heller, S.; De Zoysa, N.; Elliott, J.; Breckenridge, J.P. Follow-Up Support for Effective Type 1 Diabetes Self-Management (The FUSED Model): A Systematic Review and Meta-Ethnography of the Barriers, Facilitators and Recommendations for Sustaining Self-Management Skills after Attending a Structured Education Programme. BMC Health Serv. Res. 2018, 18, 898. [Google Scholar] [CrossRef]
  8. Cummings, C.; Benjamin, N.E.; Prabhu, H.Y.; Cohen, L.B.; Goddard, B.J.; Kaugars, A.S.; Humiston, T.; Lansing, A.H. Habit and Diabetes Self-Management in Adolescents With Type 1 Diabetes. Health Psychol. 2022, 41, 13–22. [Google Scholar] [CrossRef]
  9. McCarthy, M.M.; Grey, M. Type 1 Diabetes Self-Management From Emerging Adulthood Through Older Adulthood. Diabetes Care 2018, 41, 1608–1614. [Google Scholar] [CrossRef]
  10. Saoji, N.; Palta, M.; Young, H.N.; Moreno, M.A.; Rajamanickam, V.; Cox, E.D. The Relationship of Type 1 Diabetes Self-Management Barriers to Child and Parent Quality of Life: A US Cross-Sectional Study. Diabet. Med. 2018, 35, 1523–1530. [Google Scholar] [CrossRef]
  11. Butler, A.M.; Weller, B.E.; Rodgers, C.R.R.; Teasdale, A.E. Type 1 Diabetes Self-Management Behaviors among Emerging Adults: Racial/Ethnic Differences. Pediatr. Diabetes 2020, 21, 979–986. [Google Scholar] [CrossRef]
  12. Dai, X.; Luo, Z.C.; Zhai, L.; Zhao, W.P.; Huang, F. Artificial Pancreas as an Effective and Safe Alternative in Patients with Type 1 Diabetes Mellitus: A Systematic Review and Meta-Analysis. Diabetes Ther. 2018, 9, 1269–1277. [Google Scholar] [CrossRef]
  13. Bekiari, E.; Kitsios, K.; Thabit, H.; Tauschmann, M.; Athanasiadou, E.; Karagiannis, T.; Haidich, A.B.; Hovorka, R.; Tsapas, A. Artificial Pancreas Treatment for Outpatients with Type 1 Diabetes: Systematic Review and Meta-Analysis. BMJ 2018, 361, 1310. [Google Scholar] [CrossRef]
  14. Zhang, Y.; Sun, J.; Liu, L.; Qiao, H. A Review of Biosensor Technology and Algorithms for Glucose Monitoring. J. Diabetes Complicat. 2021, 35, 107929. [Google Scholar] [CrossRef]
  15. Choudhary, P.; Amiel, S.A. Hypoglycaemia in Type 1 Diabetes: Technological Treatments, Their Limitations and the Place of Psychology. Diabetologia 2018, 61, 761–769. [Google Scholar] [CrossRef]
  16. Tagougui, S.; Taleb, N.; Rabasa-Lhoret, R. The Benefits and Limits of Technological Advances in Glucose Management around Physical Activity in Patients Type 1 Diabetes. Front. Endocrinol. 2019, 10, 818. [Google Scholar] [CrossRef]
  17. Laffel, L.M.; Kanapka, L.G.; Beck, R.W.; Bergamo, K.; Clements, M.A.; Criego, A.; Desalvo, D.J.; Goland, R.; Hood, K.; Liljenquist, D.; et al. Effect of Continuous Glucose Monitoring on Glycemic Control in Adolescents and Young Adults With Type 1 Diabetes: A Randomized Clinical Trial. JAMA 2020, 323, 2388–2396. [Google Scholar] [CrossRef]
  18. Martens, T.; Beck, R.W.; Bailey, R.; Ruedy, K.J.; Calhoun, P.; Peters, A.L.; Pop-Busui, R.; Philis-Tsimikas, A.; Bao, S.; Umpierrez, G.; et al. Effect of Continuous Glucose Monitoring on Glycemic Control in Patients With Type 2 Diabetes Treated With Basal Insulin: A Randomized Clinical Trial. JAMA 2021, 325, 2262–2272. [Google Scholar] [CrossRef]
  19. Pickup, J.C. Is Insulin Pump Therapy Effective in Type 1 Diabetes? Diabet. Med. 2019, 36, 269–278. [Google Scholar] [CrossRef]
  20. Ranjan, A.G.; Rosenlund, S.V.; Hansen, T.W.; Rossing, P.; Andersen, S.; Nørgaard, K. Improved Time in Range Over 1 Year Is Associated With Reduced Albuminuria in Individuals With Sensor-Augmented Insulin Pump–Treated Type 1 Diabetes. Diabetes Care 2020, 43, 2882–2885. [Google Scholar] [CrossRef]
  21. Mian, Z.; Hermayer, K.L.; Jenkins, A. Continuous Glucose Monitoring: Review of an Innovation in Diabetes Management. Am. J. Med. Sci. 2019, 358, 332–339. [Google Scholar] [CrossRef] [PubMed]
  22. Aggarwal, A.; Pathak, S.; Goyal, R. Clinical and Economic Outcomes of Continuous Glucose Monitoring System (CGMS) in Patients with Diabetes Mellitus: A Systematic Literature Review. Diabetes Res. Clin. Pract. 2022, 186, 109825. [Google Scholar] [CrossRef] [PubMed]
  23. Burckhardt, M.A.; Smith, G.J.; Cooper, M.N.; Jones, T.W.; Davis, E.A. Real-World Outcomes of Insulin Pump Compared to Injection Therapy in a Population-Based Sample of Children with Type 1 Diabetes. Pediatr. Diabetes 2018, 19, 1459–1466. [Google Scholar] [CrossRef] [PubMed]
  24. Cardona-Hernandez, R.; Schwandt, A.; Alkandari, H.; Bratke, H.; Chobot, A.; Coles, N.; Corathers, S.; Goksen, D.; Goss, P.; Imane, Z.; et al. Glycemic Outcome Associated With Insulin Pump and Glucose Sensor Use in Children and Adolescents With Type 1 Diabetes. Data From the International Pediatric Registry SWEET. Diabetes Care 2021, 44, 1176–1184. [Google Scholar] [CrossRef]
  25. Rytter, K.; Schmidt, S.; Rasmussen, L.N.; Pedersen-Bjergaard, U.; Nørgaard, K. Education Programmes for Persons with Type 1 Diabetes Using an Insulin Pump: A Systematic Review. Diabetes. Metab. Res. Rev. 2021, 37, e3412. [Google Scholar] [CrossRef]
  26. Vashist, S.K. Non-Invasive Glucose Monitoring Technology in Diabetes Management: A Review. Anal. Chim. Acta 2012, 750, 16–27. [Google Scholar] [CrossRef]
  27. Alrezj, O.; Benaissa, M.; Alshebeili, S.A. Digital Bandstop Filtering in the Quantitative Analysis of Glucose from Near-Infrared and Midinfrared Spectra. J. Chemom. 2020, 34, e3206. [Google Scholar] [CrossRef]
  28. Khadem, H.; Nemat, H.; Elliott, J.; Benaissa, M. Signal Fragmentation Based Feature Vector Generation in a Model Agnostic Framework with Application to Glucose Quantification Using Absorption Spectroscopy. Talanta 2022, 243, 123379. [Google Scholar] [CrossRef]
  29. Khadem, H.; Eissa, M.R.; Nemat, H.; Alrezj, O.; Benaissa, M. Classification before Regression for Improving the Accuracy of Glucose Quantification Using Absorption Spectroscopy. Talanta 2020, 211, 120740. [Google Scholar] [CrossRef]
  30. Vettoretti, M.; Cappon, G.; Facchinetti, A.; Sparacino, G. Advanced Diabetes Management Using Artificial Intelligence and Continuous Glucose Monitoring Sensors. Sensors 2020, 20, 3870. [Google Scholar] [CrossRef]
  31. Nemat, H.; Khadem, H.; Elliott, J.; Benaissa, M. Causality Analysis in Type 1 Diabetes Mellitus with Application to Blood Glucose Level Prediction. Comput. Biol. Med. 2023, 153, 106535. [Google Scholar] [CrossRef]
  32. Xie, J.; Wang, Q. Benchmarking Machine Learning Algorithms on Blood Glucose Prediction for Type i Diabetes in Comparison with Classical Time-Series Models. IEEE Trans. Biomed. Eng. 2020, 67, 3101–3124. [Google Scholar] [CrossRef]
  33. Nemat, H.; Khadem, H.; Elliott, J.; Benaissa, M. Data Fusion of Activity and CGM for Predicting Blood Glucose Levels. In Knowledge Discovery in Healthcare Data 2020, Proceedings of the 5th International Workshop on Knowledge Discovery in Healthcare Data Co-Located with 24th European Conference on Artificial Intelligence (ECAI 2020), Santiago de Compostela, Spain (virtual), 29–30 August 2020; Bach, K., Bunescu, R., Marling, C., Wiratunga, N., Eds.; CEUR Workshop Proceedings: Aachen, Germany, 2020; Volume 2675, pp. 120–124. [Google Scholar]
  34. Woldaregay, A.Z.; Årsand, E.; Botsis, T.; Albers, D.; Mamykina, L.; Hartvigsen, G. Data-Driven Blood Glucose Pattern Classification and Anomalies Detection: Machine-Learning Applications in Type 1 Diabetes. J. Med. Internet Res. 2019, 21, e11030. [Google Scholar] [CrossRef]
  35. Khadem, H.; Nemat, H.; Elliott, J.; Benaissa, M. Multi-Lag Stacking for Blood Glucose Level Prediction. In Knowledge Discovery in Healthcare Data 2020, Proceedings of the 5th International Workshop on Knowledge Discovery in Healthcare Data Co-Located with 24th European Conference on Artificial Intelligence (ECAI 2020), Santiago de Compostela, Spain (virtual), 29–30 August 2020; Bach, K., Bunescu, R., Marling, C., Wiratunga, N., Eds.; CEUR Workshop Proceedings: Aachen, Germany, 2020; Volume 2675, pp. 146–150. [Google Scholar]
  36. Boughton, C.K.; Hovorka, R. Is an Artificial Pancreas (Closed-Loop System) for Type 1 Diabetes Effective? Diabet. Med. 2019, 36, 279–286. [Google Scholar] [CrossRef]
  37. Bremer, A.A.; Arreaza-Rubín, G. Analysis of “Artificial Pancreas (AP) Systems for People With Type 2 Diabetes: Conception and Design of the European CLOSE Project”. J. Diabetes Sci. Technol. 2019, 13, 268–270. [Google Scholar] [CrossRef]
  38. Woldaregay, A.Z.; Årsand, E.; Walderhaug, S.; Albers, D.; Mamykina, L.; Botsis, T.; Hartvigsen, G. Data-Driven Modeling and Prediction of Blood Glucose Dynamics: Machine Learning Applications in Type 1 Diabetes. Artif. Intell. Med. 2019, 98, 109–134. [Google Scholar] [CrossRef]
  39. Nemat, H.; Khadem, H.; Eissa, M.R.; Elliott, J.; Benaissa, M. Blood Glucose Level Prediction: Advanced Deep-Ensemble Learning Approach. IEEE J. Biomed. Health Inform. 2022, 26, 2758–2769. [Google Scholar] [CrossRef]
  40. Felizardo, V.; Garcia, N.M.; Pombo, N.; Megdiche, I. Data-Based Algorithms and Models Using Diabetics Real Data for Blood Glucose and Hypoglycaemia Prediction—A Systematic Literature Review. Artif. Intell. Med. 2021, 118, 102120. [Google Scholar] [CrossRef]
  41. Semenoglou, A.-A.; Spiliotis, E.; Assimakopoulos, V. Image-Based Time Series Forecasting: A Deep Convolutional Neural Network Approach. Neural Netw. 2023, 157, 39–53. [Google Scholar] [CrossRef]
  42. Garg, A.; Zhang, W.; Samaran, J.; Savitha, R.; Foo, C.S. An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 2508–2517. [Google Scholar] [CrossRef]
  43. De Oliveira, J.F.L.; Silva, E.G.; De Mattos Neto, P.S.G. A Hybrid System Based on Dynamic Selection for Time Series Forecasting. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 3251–3263. [Google Scholar] [CrossRef] [PubMed]
  44. Cichos, F.; Gustavsson, K.; Mehlig, B.; Volpe, G. Machine Learning for Active Matter. Nat. Mach. Intell. 2020, 2, 94–103. [Google Scholar] [CrossRef]
  45. Lim, B.; Zohren, S. Time-Series Forecasting with Deep Learning: A Survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef]
  46. Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep Learning for Time Series Classification: A Review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef]
  47. Zhu, T.; Wang, W.; Yu, M. A Novel Blood Glucose Time Series Prediction Framework Based on a Novel Signal Decomposition Method. Chaos Solitons Fractals 2022, 164, 112673. [Google Scholar] [CrossRef]
  48. Tejedor, M.; Woldaregay, A.Z.; Godtliebsen, F. Reinforcement Learning Application in Diabetes Blood Glucose Control: A Systematic Review. Artif. Intell. Med. 2020, 104, 101836. [Google Scholar] [CrossRef]
  49. Aiello, E.M.; Lisanti, G.; Magni, L.; Musci, M.; Toffanin, C. Therapy-Driven Deep Glucose Forecasting. Eng. Appl. Artif. Intell. 2020, 87, 103255. [Google Scholar] [CrossRef]
  50. Asad, M.; Qamar, U. A Review of Continuous Blood Glucose Monitoring and Prediction of Blood Glucose Level for Diabetes Type 1 Patient in Different Prediction Horizons (PH) Using Artificial Neural Network (ANN). Adv. Intell. Syst. Comput. 2020, 1038, 684–695. [Google Scholar] [CrossRef]
  51. Li, K.; Daniels, J.; Liu, C.; Herrero, P.; Georgiou, P. Convolutional Recurrent Neural Networks for Glucose Prediction. IEEE J. Biomed. Health Inform. 2020, 24, 603–613. [Google Scholar] [CrossRef]
  52. Zhang, M.; Flores, K.B.; Tran, H.T. Deep Learning and Regression Approaches to Forecasting Blood Glucose Levels for Type 1 Diabetes. Biomed. Signal Process. Control 2021, 69, 102923. [Google Scholar] [CrossRef]
  53. Tena, F.; Garnica, O.; Lanchares, J.; Hidalgo, J.I.; Cappon, G.; Herrero, P.; Sacchi, L.; Coltro, W. Ensemble Models of Cutting-Edge Deep Neural Networks for Blood Glucose Prediction in Patients with Diabetes. Sensors 2021, 21, 7090. [Google Scholar] [CrossRef]
  54. Wadghiri, M.Z.; Idri, A.; El Idrissi, T.; Hakkoum, H. Ensemble Blood Glucose Prediction in Diabetes Mellitus: A Review. Comput. Biol. Med. 2022, 147, 105674. [Google Scholar] [CrossRef]
  55. Daniels, J.; Herrero, P.; Georgiou, P. A Multitask Learning Approach to Personalized Blood Glucose Prediction. IEEE J. Biomed. Health Inform. 2022, 26, 436–445. [Google Scholar] [CrossRef]
  56. Yang, T.; Yu, X.; Ma, N.; Wu, R.; Li, H. An Autonomous Channel Deep Learning Framework for Blood Glucose Prediction. Appl. Soft Comput. 2022, 120, 108636. [Google Scholar] [CrossRef]
  57. Zhu, T.; Li, K.; Chen, J.; Herrero, P.; Georgiou, P. Dilated Recurrent Neural Networks for Glucose Forecasting in Type 1 Diabetes. J. Healthc. Inform. Res. 2020, 4, 308–324. [Google Scholar] [CrossRef]
  58. Martinsson, J.; Schliep, A.; Eliasson, B.; Mogren, O. Blood Glucose Prediction with Variance Estimation Using Recurrent Neural Networks. J. Healthc. Inform. Res. 2020, 4, 1–18. [Google Scholar] [CrossRef]
  59. Rodríguez-Rodríguez, I.; Rodríguez, J.V.; Molina-García-Pardo, J.M.; Zamora-Izquierdo, M.Á.; Martínez-Inglés, M.T. A Comparison of Different Models of Glycemia Dynamics for Improved Type 1 Diabetes Mellitus Management with Advanced Intelligent Analysis in an Internet of Things Context. Appl. Sci. 2020, 10, 4381. [Google Scholar] [CrossRef]
  60. Marling, C.; Bunescu, R. The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020. In Proceedings of the 5th International Workshop on Knowledge Discovery in Healthcare Data Co-Located with 24th European Conference on Artificial Intelligence, KDH@ECAI 2020, Santiago de Compostela, Spain & Virtually, 29–30 August 2020; NIH Public Access: Bethesda, MD, USA, 2020; Volume 2675, pp. 71–74. [Google Scholar]
  61. Kwiatkowski, D.; Phillips, P.C.B.; Schmidt, P.; Shin, Y. Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root: How Sure Are We That Economic Time Series Have a Unit Root? J. Econom. 1992, 54, 159–178. [Google Scholar] [CrossRef]
  62. Dickey, D.A.; Fuller, W.A. Distribution of the Estimators for Autoregressive Time Series with a Unit Root. J. Am. Stat. Assoc. 2012, 74, 427–431. [Google Scholar] [CrossRef]
  63. Sagi, O.; Rokach, L. Ensemble Learning: A Survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
  64. Breiman, L. Stacked Regressions. Mach. Learn. 1996, 24, 49–64. [Google Scholar] [CrossRef]
  65. Zhu, Q. On the Performance of Matthews Correlation Coefficient (MCC) for Imbalanced Dataset. Pattern Recognit. Lett. 2020, 136, 71–80. [Google Scholar] [CrossRef]
  66. Klonoff, D.C.; Lias, C.; Vigersky, R.; Clarke, W.; Parkes, J.L.; Sacks, D.B.; Kirkman, M.S.; Kovatchev, B. The Surveillance Error Grid. J. Diabetes Sci. Technol. 2014, 8, 658–672. [Google Scholar] [CrossRef] [PubMed]
  67. Friedman, M. A Comparison of Alternative Tests of Significance for the Problem of m Rankings on JSTOR. Ann. Math. Stat. 1940, 11, 86–92. [Google Scholar] [CrossRef]
  68. Fisher, R. Statistical Methods and Scientific Induction. J. R. Stat. Soc. Ser. B 1955, 17, 69–78. [Google Scholar] [CrossRef]
  69. Nemenyi, P.B. Distribution-Free Multiple Comparisons; Princeton University: Princeton, NJ, USA, 1963. [Google Scholar]
  70. Holm, S. A Simple Sequentially Rejective Multiple Test Procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar]
  71. Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  72. Van Rossum, G.; Drake, F.L. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009; ISBN 1441412697. [Google Scholar]
  73. Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A System for Large-Scale Machine Learning. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
  74. McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; Volume 445, pp. 51–56. [Google Scholar]
  75. Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array Programming with {NumPy}. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
  76. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  77. Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
  78. Seabold, S.; Perktold, J. Statsmodels: Econometric and Statistical Modeling with Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010. [Google Scholar]
  79. Terpilowski, M. Scikit-Posthocs: Pairwise Multiple Comparison Tests in Python. J. Open Source Softw. 2019, 4, 1169. [Google Scholar] [CrossRef]
  80. Benavoli, A.; Corani, G.; Mangili, F. Should We Really Use Post-Hoc Tests Based on Mean-Ranks? J. Mach. Learn. Res. 2016, 17, 152–161. [Google Scholar]
Figure 1. Blueprint for generating non-stacking, stacking, and nested stacking blood glucose level prediction models. Rectangular and oval blocks represent sequences of lag or future data and regression learners, respectively. Note. BGL: blood glucose level; PA: physical activity; II: insulin injection; CI: carbohydrate intake; LSTM: long short-term memory; MLP: multilayer perceptron.
Figure 1. Blueprint for generating non-stacking, stacking, and nested stacking blood glucose level prediction models. Rectangular and oval blocks represent sequences of lag or future data and regression learners, respectively. Note. BGL: blood glucose level; PA: physical activity; II: insulin injection; CI: carbohydrate intake; LSTM: long short-term memory; MLP: multilayer perceptron.
Bioengineering 10 00487 g001
Figure 2. Critical difference diagrams based on Nemenyi test for pairwise comparison of the non-stacking, stacking, and nested stacking modelling approaches: (a) LSTM learner, 30 min PH, and RMSE metric, (b) LSTM learner, 30 min PH, and MAE metric, (c) LSTM learner, 30 min PH, and MAPE metric, (d) LSTM learner, 30 min PH, and r2 metric, (e) LSTM learner, 30 min PH, and MCC metric, (f) LSTM learner, 30 min PH, and SE50 metric, (g) LSTM learner, 30 min PH, and ASE metric, (h) LSTM learner, 60 min PH, and RMSE metric, (i) LSTM learner, 60 min PH, and MAE metric, (j) LSTM learner, 60 min PH, and MAPE metric, (k) LSTM learner, 60 min PH, and r2 metric, (l) LSTM learner, 60 min PH, and MCC metric, (m) LSTM learner, 60 min PH, and SE50 metric, (n) LSTM learner, 60 min PH, and ASE metric. Note. LSTM: long short-term memory; PH: prediction horizon; RMSE: root mean square error; MAE: mean absolute error; MAPE: mean absolute percentage error; r2: coefficient of determination; MCC: Matthew’s correlation coefficient; SE: surveillance error; ASE: average surveillance error.
Figure 2. Critical difference diagrams based on Nemenyi test for pairwise comparison of the non-stacking, stacking, and nested stacking modelling approaches: (a) LSTM learner, 30 min PH, and RMSE metric, (b) LSTM learner, 30 min PH, and MAE metric, (c) LSTM learner, 30 min PH, and MAPE metric, (d) LSTM learner, 30 min PH, and r2 metric, (e) LSTM learner, 30 min PH, and MCC metric, (f) LSTM learner, 30 min PH, and SE50 metric, (g) LSTM learner, 30 min PH, and ASE metric, (h) LSTM learner, 60 min PH, and RMSE metric, (i) LSTM learner, 60 min PH, and MAE metric, (j) LSTM learner, 60 min PH, and MAPE metric, (k) LSTM learner, 60 min PH, and r2 metric, (l) LSTM learner, 60 min PH, and MCC metric, (m) LSTM learner, 60 min PH, and SE50 metric, (n) LSTM learner, 60 min PH, and ASE metric. Note. LSTM: long short-term memory; PH: prediction horizon; RMSE: root mean square error; MAE: mean absolute error; MAPE: mean absolute percentage error; r2: coefficient of determination; MCC: Matthew’s correlation coefficient; SE: surveillance error; ASE: average surveillance error.
Bioengineering 10 00487 g002
Table 1. Demographic information of contributors and summary of statistical properties of blood glucose data (the focal modality) in the Ohio datasets.
Table 1. Demographic information of contributors and summary of statistical properties of blood glucose data (the focal modality) in the Ohio datasets.
DatasetPIDSexAgeSetBlood Glucose Data
CountRange
(mg/dL)
Mean
(mg/dL)
SD
(mg/dL)
MR (%)HOR (%)ER (%)HRR (%)
2018559female40–60Train10,65540–400167.5370.4412.063.6555.9840.37
Test244445–400168.9367.7814.813.0359.8637.11
563male40–60Train11,01340–400146.9450.518.802.8272.8124.36
Test256962–313167.3846.154.710.7060.4538.85
570male40–60Train10,98146–377187.562.335.731.9742.9755.07
Test267260–388215.7166.995.050.4129.0470.55
575female40–60Train11,86540–400141.7760.2710.438.7168.6222.66
Test258940–342150.4960.534.945.3763.5031.13
588female40–60Train12,63940–400164.9950.513.691.0463.5635.40
Test260666–354175.9848.663.420.1553.2646.58
591female40–60Train10,84640–397156.0158.0317.593.9463.9732.09
Test275943–291144.8351.423.155.1867.2727.55
2020540male20–40Train11,91440–369136.7854.759.767.0872.6620.25
Test236052–400149.9466.466.745.6468.1826.19
544male40–60Train10,53348–400165.1260.0819.111.4763.7834.75
Test271562–335156.4854.1415.471.2268.2930.50
552male20–40Train866145–345146.8854.6322.303.8972.0524.06
Test179247–305138.1150.2385.713.5780.0216.41
567female20–40Train10,75040–400154.4360.8824.916.7563.4029.84
Test238840–351146.2555.0020.188.3367.3824.29
584male40–60Train12,02740–400192.3465.299.130.8047.6951.51
Test266141–400170.4860.7612.401.0161.8637.13
596male60–80Train10,85840–367147.1749.3425.352.0873.9923.93
Test266349–305146.9850.799.762.7875.0722.16
Note. PID: patient identification; SD: standard deviation; MR: missingness rate; HOR: hypoglycaemic rate; ER: euglycaemic rate; HRR: hyperglycaemic rate. Hypoglycaemia, euglycaemia, and hyperglycaemia refer to when the blood glucose level is, respectively, less than 70 mg/dL, between 70 and 180 mg/dL, and more than 180 mg/dL. Both hypoglycaemia and hyperglycaemia are adverse glycaemic events.
Table 2. The evaluation results for the best non-stacking models created using Ohio datasets.
Table 2. The evaluation results for the best non-stacking models created using Ohio datasets.
DatasetPIDLearnerPHEvaluation Metric
RMSE ± SD
(mg/dL)
MAE ± SD
(mg/dL)
MAPE ± SD (%)r2 ± SD
(%)
MCC ± SD (%)SE < 0.5 ± SD
(%)
ASE ± SD
2018559MLP3019.65 ± 0.0613.56 ± 0.038.78 ± 0.0390.75 ± 0.050.77 ± 0.000.90 ± 0.000.19 ± 0.00
6031.36 ± 0.0622.78 ± 0.0615.18 ± 0.0776.30 ± 0.080.63 ± 0.000.79 ± 0.000.31 ± 0.00
LSTM3023.12 ± 0.4316.60 ± 0.6611.10 ± 0.6387.19 ± 0.470.74 ± 0.010.86 ± 0.010.24 ± 0.01
6036.08 ± 1.4725.38 ± 0.8416.62 ± 0.2568.60 ± 2.560.59 ± 0.020.75 ± 0.010.34 ± 0.01
563MLP3018.71 ± 0.0513.46 ± 0.068.47 ± 0.0482.97 ± 0.090.74 ± 0.000.91 ± 0.000.19 ± 0.00
6030.65 ± 0.0121.69 ± 0.0413.46 ± 0.0454.36 ± 0.040.57 ± 0.010.81 ± 0.000.30 ± 0.00
LSTM3021.59 ± 0.6415.33 ± 0.459.69 ± 0.1977.31 ± 1.340.72 ± 0.010.89 ± 0.000.22 ± 0.00
6033.02 ± 0.6224.13 ± 0.6115.07 ± 0.1847.03 ± 2.010.51 ± 0.010.75 ± 0.020.33 ± 0.01
570MLP3017.44 ± 0.0312.47 ± 0.036.38 ± 0.0393.34 ± 0.030.86 ± 0.000.96 ± 0.000.12 ± 0.00
6029.00 ± 0.1420.97 ± 0.1310.73 ± 0.0481.62 ± 0.180.79 ± 0.000.91 ± 0.000.20 ± 0.00
LSTM3022.92 ± 1.4916.16 ± 1.158.04 ± 0.6588.47 ± 1.520.81 ± 0.020.94 ± 0.010.15 ± 0.01
6035.80 ± 1.5026.75 ± 1.8512.68 ± 0.4371.95 ± 2.310.75 ± 0.000.88 ± 0.010.23 ± 0.01
575MLP3024.12 ± 0.0616.05 ± 0.1011.43 ± 0.0984.48 ± 0.070.73 ± 0.000.86 ± 0.000.24 ± 0.00
6035.63 ± 0.1725.66 ± 0.2018.91 ± 0.1766.19 ± 0.320.57 ± 0.010.71 ± 0.000.38 ± 0.00
LSTM3027.20 ± 0.5718.25 ± 0.4513.14 ± 0.7180.24 ± 0.820.69 ± 0.000.82 ± 0.020.28 ± 0.01
6038.09 ± 0.0327.47 ± 0.5220.48 ± 1.2061.36 ± 0.070.54 ± 0.020.70 ± 0.000.41 ± 0.01
588MLP3018.07 ± 0.3513.50 ± 0.158.29 ± 0.0185.66 ± 0.560.76 ± 0.010.93 ± 0.000.18 ± 0.00
6030.36 ± 0.1122.68 ± 0.1314.16 ± 0.1259.60 ± 0.280.58 ± 0.000.77 ± 0.000.31 ± 0.00
LSTM3019.23 ± 0.1114.16 ± 0.118.53 ± 0.1283.77 ± 0.190.74 ± 0.000.92 ± 0.000.19 ± 0.00
6030.46 ± 0.6022.48 ± 0.3914.04 ± 0.2359.33 ± 1.610.60 ± 0.010.79 ± 0.010.30 ± 0.01
591MLP3022.98 ± 0.1116.61 ± 0.0512.99 ± 0.0380.32 ± 0.180.65 ± 0.010.80 ± 0.000.29 ± 0.00
6034.98 ± 0.0526.93 ± 0.0821.91 ± 0.1354.41 ± 0.120.39 ± 0.000.65 ± 0.000.45 ± 0.00
LSTM3026.33 ± 0.4219.55 ± 0.2415.65 ± 0.4074.16 ± 0.830.60 ± 0.000.75 ± 0.010.34 ± 0.01
6036.51 ± 0.2028.36 ± 0.2623.32 ± 0.2750.32 ± 0.540.37 ± 0.020.63 ± 0.000.47 ± 0.00
2020540MLP3022.88 ± 0.1317.45 ± 0.1012.71 ± 0.0487.60 ± 0.140.68 ± 0.000.81 ± 0.000.27 ± 0.00
6039.84 ± 0.1430.49 ± 0.1222.96 ± 0.1362.48 ± 0.270.52 ± 0.000.66 ± 0.000.44 ± 0.00
LSTM3024.84 ± 0.4218.48 ± 0.7013.81 ± 1.2485.37 ± 0.490.67 ± 0.020.80 ± 0.010.29 ± 0.02
6041.36 ± 0.5830.69 ± 0.3722.40 ± 0.2059.56 ± 1.120.50 ± 0.020.66 ± 0.000.44 ± 0.00
544MLP3017.37 ± 0.0312.14 ± 0.038.21 ± 0.0388.26 ± 0.040.78 ± 0.000.92 ± 0.000.18 ± 0.00
6028.49 ± 0.0320.74 ± 0.0414.16 ± 0.0568.32 ± 0.070.63 ± 0.000.78 ± 0.000.30 ± 0.00
LSTM3021.23 ± 0.5315.00 ± 0.499.93 ± 0.3582.45 ± 0.870.76 ± 0.010.89 ± 0.000.21 ± 0.01
6030.45 ± 0.1222.09 ± 0.4514.81 ± 0.5263.83 ± 0.290.59 ± 0.020.78 ± 0.010.31 ± 0.01
552MLP3014.06 ± 0.038.25 ± 0.116.48 ± 0.0986.18 ± 0.050.75 ± 0.000.92 ± 0.000.14 ± 0.00
6023.83 ± 0.0314.57 ± 0.1011.75 ± 0.1260.36 ± 0.090.64 ± 0.000.84 ± 0.000.22 ± 0.00
LSTM3016.72 ± 0.4410.31 ± 0.248.04 ± 0.2280.45 ± 1.010.71 ± 0.020.90 ± 0.010.16 ± 0.01
6025.47 ± 0.3016.27 ± 0.2413.02 ± 0.2754.73 ± 1.050.61 ± 0.010.83 ± 0.010.24 ± 0.01
567MLP3022.72 ± 0.0416.47 ± 0.0412.48 ± 0.0384.80 ± 0.050.64 ± 0.000.80 ± 0.000.28 ± 0.00
6038.38 ± 0.0229.51 ± 0.0423.24 ± 0.0656.68 ± 0.040.46 ± 0.000.64 ± 0.000.47 ± 0.00
LSTM3024.64 ± 0.9717.85 ± 0.8113.48 ± 0.6682.10 ± 1.410.60 ± 0.010.78 ± 0.010.31 ± 0.01
6040.13 ± 1.2230.57 ± 1.1425.05 ± 1.9652.61 ± 2.860.45 ± 0.010.62 ± 0.020.50 ± 0.03
584MLP3022.78 ± 0.0416.92 ± 0.0411.34 ± 0.0385.49 ± 0.050.77 ± 0.000.87 ± 0.000.23 ± 0.00
6035.99 ± 0.0527.29 ± 0.0218.40 ± 0.0363.67 ± 0.110.60 ± 0.000.72 ± 0.000.37 ± 0.00
LSTM3025.31 ± 1.3218.27 ± 0.9511.49 ± 0.5282.05 ± 1.890.75 ± 0.010.86 ± 0.010.23 ± 0.01
6041.45 ± 1.5831.50 ± 1.9121.43 ± 2.1751.75 ± 3.640.55 ± 0.030.67 ± 0.040.42 ± 0.04
596MLP3017.87 ± 0.0812.89 ± 0.069.67 ± 0.0386.99 ± 0.120.74 ± 0.000.89 ± 0.000.20 ± 0.00
6035.99 ± 0.0527.29 ± 0.0218.40 ± 0.0363.67 ± 0.110.60 ± 0.000.72 ± 0.000.37 ± 0.00
LSTM3019.96 ± 0.2814.31 ± 0.0310.83 ± 0.1883.78 ± 0.450.70 ± 0.010.87 ± 0.000.23 ± 0.00
6030.28 ± 0.7222.17 ± 0.7116.97 ± 0.4562.72 ± 1.770.56 ± 0.020.79 ± 0.000.32 ± 0.01
Note. PID: patient identification; PH: prediction horizon; LL: lag length; RMSE: root mean square error; SD: standard deviation; MAE: mean absolute error; MAPE: mean absolute percentage error; r2: coefficient of determination; MCC: Matthew’s correlation coefficient; SE: surveillance error; ASE: average surveillance error.
Table 3. The evaluation results for the stacking models created using Ohio datasets.
Table 3. The evaluation results for the stacking models created using Ohio datasets.
DatasetPIDLearnerPHEvaluation Metric
RMSE ± SD
(mg/dL)
MAE ± SD
(mg/dL)
MAPE ± SD (%)r2 ± SD
(%)
MCC ± SD (%)SE < 0.5 ± SD
(%)
ASE ± SD
2018559MLP3019.00 ± 0.1113.19 ± 0.088.79 ± 0.0591.35 ± 0.100.78 ± 0.000.90 ± 0.000.19 ± 0.00
6031.25 ± 0.4122.67 ± 0.2215.22 ± 0.2476.46 ± 0.610.64 ± 0.000.79 ± 0.000.31 ± 0.00
LSTM3022.90 ± 0.4915.77 ± 0.179.97 ± 0.0987.43 ± 0.540.76 ± 0.010.89 ± 0.000.21 ± 0.00
6034.95 ± 0.1724.99 ± 0.1116.61 ± 0.0570.56 ± 0.290.61 ± 0.010.76 ± 0.000.33 ± 0.00
563MLP3018.54 ± 0.0513.03 ± 0.038.10 ± 0.0083.28 ± 0.080.74 ± 0.010.92 ± 0.000.18 ± 0.00
6029.87 ± 0.1821.22 ± 0.1413.36 ± 0.0456.67 ± 0.510.58 ± 0.010.81 ± 0.000.30 ± 0.00
LSTM3021.25 ± 0.0514.97 ± 0.069.38 ± 0.0278.05 ± 0.110.73 ± 0.000.89 ± 0.000.21 ± 0.00
6033.20 ± 0.1623.55 ± 0.0714.44 ± 0.0246.46 ± 0.530.52 ± 0.000.78 ± 0.000.32 ± 0.00
570MLP3017.49 ± 0.1112.43 ± 0.106.36 ± 0.0393.30 ± 0.090.86 ± 0.010.96 ± 0.000.12 ± 0.00
6028.65 ± 0.0820.90 ± 0.0710.91 ± 0.0482.06 ± 0.100.78 ± 0.000.91 ± 0.000.20 ± 0.00
LSTM3021.58 ± 1.5015.59 ± 1.557.70 ± 0.4989.77 ± 1.440.84 ± 0.010.94 ± 0.000.14 ± 0.01
6032.48 ± 0.6923.55 ± 0.6211.82 ± 0.0676.93 ± 0.980.76 ± 0.000.89 ± 0.000.22 ± 0.00
575MLP3024.21 ± 0.0415.70 ± 0.0911.25 ± 0.1984.36 ± 0.050.74 ± 0.000.86 ± 0.000.24 ± 0.00
6036.42 ± 0.4126.35 ± 0.7719.85 ± 1.5764.68 ± 0.790.57 ± 0.020.71 ± 0.000.40 ± 0.02
LSTM3027.73 ± 0.1218.09 ± 0.0912.67 ± 0.0979.48 ± 0.180.66 ± 0.000.82 ± 0.000.27 ± 0.00
6038.34 ± 0.0927.48 ± 0.0619.59 ± 0.1260.86 ± 0.180.54 ± 0.000.68 ± 0.000.41 ± 0.00
588MLP3018.24 ± 0.1913.51 ± 0.128.17 ± 0.0285.39 ± 0.300.75 ± 0.010.93 ± 0.000.18 ± 0.00
6029.65 ± 0.2121.84 ± 0.1813.14 ± 0.0861.46 ± 0.550.57 ± 0.010.80 ± 0.000.29 ± 0.00
LSTM3018.91 ± 0.0814.03 ± 0.148.43 ± 0.2584.30 ± 0.130.75 ± 0.000.92 ± 0.000.18 ± 0.01
6030.67 ± 0.2022.29 ± 0.2513.54 ± 0.4958.76 ± 0.540.60 ± 0.010.81 ± 0.010.29 ± 0.01
591MLP3022.88 ± 0.0716.60 ± 0.0413.03 ± 0.0680.49 ± 0.120.65 ± 0.000.80 ± 0.000.29 ± 0.00
6034.43 ± 0.0626.80 ± 0.0522.09 ± 0.0955.84 ± 0.140.41 ± 0.000.65 ± 0.000.45 ± 0.00
LSTM3025.51 ± 0.0118.80 ± 0.0514.79 ± 0.0875.73 ± 0.030.59 ± 0.000.76 ± 0.000.33 ± 0.00
6036.68 ± 0.1628.44 ± 0.0523.78 ± 0.0349.87 ± 0.440.42 ± 0.000.64 ± 0.000.47 ± 0.00
2020540MLP3022.34 ± 0.0217.13 ± 0.0312.58 ± 0.0388.18 ± 0.020.68 ± 0.000.82 ± 0.000.27 ± 0.00
6039.40 ± 0.0930.32 ± 0.1322.95 ± 0.1063.29 ± 0.170.52 ± 0.000.66 ± 0.000.44 ± 0.00
LSTM3024.13 ± 0.1418.24 ± 0.0613.57 ± 0.0386.20 ± 0.170.66 ± 0.000.80 ± 0.000.29 ± 0.00
6040.86 ± 0.0530.62 ± 0.1123.06 ± 0.1860.53 ± 0.090.51 ± 0.000.66 ± 0.000.44 ± 0.00
544MLP3016.96 ± 0.0212.01 ± 0.058.14 ± 0.0888.81 ± 0.030.79 ± 0.000.92 ± 0.000.18 ± 0.00
6028.36 ± 0.1720.72 ± 0.0414.21 ± 0.0868.62 ± 0.370.64 ± 0.000.78 ± 0.000.30 ± 0.00
LSTM3020.85 ± 0.2514.84 ± 0.2010.01 ± 0.1483.08 ± 0.400.73 ± 0.000.88 ± 0.000.22 ± 0.00
6031.30 ± 0.2322.55 ± 0.1015.44 ± 0.0761.77 ± 0.570.59 ± 0.000.76 ± 0.000.33 ± 0.00
552MLP3014.19 ± 0.039.00 ± 0.067.10 ± 0.0385.92 ± 0.050.72 ± 0.000.91 ± 0.000.15 ± 0.00
6023.78 ± 0.0415.52 ± 0.2012.62 ± 0.1860.53 ± 0.140.61 ± 0.010.84 ± 0.000.23 ± 0.00
LSTM3017.65 ± 0.2211.92 ± 0.209.79 ± 0.2178.23 ± 0.530.69 ± 0.000.88 ± 0.010.19 ± 0.01
6026.93 ± 0.2317.97 ± 0.1715.04 ± 0.1449.39 ± 0.850.58 ± 0.010.78 ± 0.000.28 ± 0.00
567MLP3022.67 ± 0.2216.17 ± 0.2212.39 ± 0.2184.86 ± 0.290.64 ± 0.010.81 ± 0.000.28 ± 0.00
6037.82 ± 0.2428.14 ± 0.1822.42 ± 0.2357.94 ± 0.520.48 ± 0.000.66 ± 0.000.46 ± 0.00
LSTM3023.74 ± 0.0916.86 ± 0.1412.96 ± 0.1483.41 ± 0.130.62 ± 0.000.79 ± 0.000.30 ± 0.00
6038.75 ± 0.4129.24 ± 0.3123.40 ± 0.4655.84 ± 0.920.47 ± 0.010.64 ± 0.010.48 ± 0.01
584MLP3021.89 ± 0.0915.96 ± 0.1410.64 ± 0.1386.60 ± 0.110.77 ± 0.000.89 ± 0.000.22 ± 0.00
6035.42 ± 0.4226.73 ± 0.5217.97 ± 0.5364.79 ± 0.830.60 ± 0.010.73 ± 0.010.36 ± 0.01
LSTM3024.79 ± 0.0618.21 ± 0.0812.51 ± 0.1382.82 ± 0.080.76 ± 0.000.86 ± 0.000.25 ± 0.00
6038.65 ± 0.2929.33 ± 0.1220.14 ± 0.0158.09 ± 0.630.60 ± 0.000.70 ± 0.000.39 ± 0.00
596MLP3017.76 ± 0.0912.85 ± 0.099.71 ± 0.1187.16 ± 0.130.75 ± 0.000.90 ± 0.000.20 ± 0.00
6028.80 ± 0.1921.37 ± 0.1316.53 ± 0.1166.29 ± 0.440.59 ± 0.010.80 ± 0.000.31 ± 0.00
LSTM3019.06 ± 0.1613.55 ± 0.0810.27 ± 0.0685.21 ± 0.240.72 ± 0.000.88 ± 0.000.22 ± 0.00
6030.01 ± 0.1022.25 ± 0.1017.31 ± 0.1663.39 ± 0.250.56 ± 0.000.80 ± 0.000.32 ± 0.00
Note. PID: patient identification; PH: prediction horizon; LL: lag length; RMSE: root mean square error; SD: standard deviation; MAE: mean absolute error; MAPE: mean absolute percentage error; r2: coefficient of determination; MCC: Matthew’s correlation coefficient; SE: surveillance error; ASE: average surveillance error.
Table 4. The evaluation results for the nested stacking models created using Ohio datasets.
Table 4. The evaluation results for the nested stacking models created using Ohio datasets.
DatasetPIDLearnerPHEvaluation Metric
RMSE ± SD
(mg/dL)
MAE ± SD
(mg/dL)
MAPE ± SD (%)r2 ± SD
(%)
MCC ± SD (%)SE < 0.5 ± SD
(%)
ASE ± SD
2018559MLP3019.67 ± 0.0513.54 ± 0.058.89 ± 0.0390.72 ± 0.050.79 ± 0.000.90 ± 0.000.19 ± 0.00
6033.44 ± 0.2823.54 ± 0.1615.27 ± 0.0473.05 ± 0.460.63 ± 0.000.78 ± 0.000.31 ± 0.00
LSTM3019.69 ± 0.1913.51 ± 0.188.83 ± 0.1790.71 ± 0.180.79 ± 0.000.90 ± 0.000.19 ± 0.00
6033.93 ± 0.4823.82 ± 0.2815.31 ± 0.0572.25 ± 0.790.63 ± 0.010.78 ± 0.000.31 ± 0.00
563MLP3018.85 ± 0.1013.15 ± 0.088.27 ± 0.0282.72 ± 0.190.76 ± 0.010.91 ± 0.000.18 ± 0.00
6031.82 ± 0.5422.38 ± 0.3813.84 ± 0.1150.81 ± 1.660.55 ± 0.010.80 ± 0.010.30 ± 0.00
LSTM3019.00 ± 0.0713.24 ± 0.068.31 ± 0.0382.44 ± 0.130.76 ± 0.010.91 ± 0.000.19 ± 0.00
6031.65 ± 0.5122.37 ± 0.6113.79 ± 0.1051.35 ± 1.590.55 ± 0.030.80 ± 0.010.31 ± 0.01
570MLP3018.34 ± 0.1112.85 ± 0.086.58 ± 0.0592.64 ± 0.090.86 ± 0.000.96 ± 0.000.12 ± 0.00
6031.09 ± 0.2822.21 ± 0.1411.54 ± 0.0378.88 ± 0.380.77 ± 0.000.89 ± 0.000.21 ± 0.00
LSTM3018.57 ± 0.2213.11 ± 0.126.65 ± 0.0892.45 ± 0.180.86 ± 0.000.96 ± 0.000.12 ± 0.00
6031.61 ± 0.6022.60 ± 0.5411.53 ± 0.0278.16 ± 0.840.77 ± 0.000.90 ± 0.000.21 ± 0.00
575MLP3026.18 ± 0.0916.60 ± 0.1912.40 ± 0.2781.71 ± 0.120.73 ± 0.000.84 ± 0.000.26 ± 0.01
6036.98 ± 0.3326.43 ± 0.5019.46 ± 1.3963.57 ± 0.650.54 ± 0.010.70 ± 0.010.40 ± 0.02
LSTM3026.01 ± 0.9116.47 ± 0.3212.02 ± 0.6681.93 ± 1.250.73 ± 0.000.84 ± 0.010.25 ± 0.01
6037.05 ± 0.6226.29 ± 0.2818.96 ± 0.1363.44 ± 1.220.54 ± 0.000.70 ± 0.000.39 ± 0.00
588MLP3018.50 ± 0.1113.63 ± 0.088.11 ± 0.0584.98 ± 0.170.74 ± 0.000.93 ± 0.000.18 ± 0.00
6029.43 ± 0.0721.42 ± 0.1713.01 ± 0.4262.05 ± 0.170.62 ± 0.000.82 ± 0.010.28 ± 0.01
LSTM3018.26 ± 0.1413.56 ± 0.278.23 ± 0.3285.37 ± 0.220.76 ± 0.010.93 ± 0.000.18 ± 0.01
6029.54 ± 0.2821.33 ± 0.2112.84 ± 0.0961.77 ± 0.740.62 ± 0.010.82 ± 0.000.27 ± 0.00
591MLP3023.07 ± 0.0916.48 ± 0.0412.89 ± 0.0680.16 ± 0.150.64 ± 0.010.80 ± 0.000.29 ± 0.00
6035.68 ± 0.1127.65 ± 0.0823.12 ± 0.0752.56 ± 0.290.42 ± 0.000.65 ± 0.000.46 ± 0.00
LSTM3023.08 ± 0.1016.52 ± 0.0712.98 ± 0.0880.14 ± 0.170.63 ± 0.000.80 ± 0.000.29 ± 0.00
6035.68 ± 0.2127.69 ± 0.1223.16 ± 0.0852.57 ± 0.550.42 ± 0.000.65 ± 0.010.46 ± 0.00
2020540MLP3022.36 ± 0.0316.96 ± 0.0512.59 ± 0.0388.15 ± 0.030.67 ± 0.000.82 ± 0.000.27 ± 0.00
6038.81 ± 0.2629.34 ± 0.1422.04 ± 0.1064.38 ± 0.470.53 ± 0.010.68 ± 0.000.43 ± 0.00
LSTM3022.39 ± 0.1116.99 ± 0.0912.61 ± 0.0888.12 ± 0.120.67 ± 0.010.81 ± 0.000.27 ± 0.00
6038.74 ± 0.1829.32 ± 0.1822.05 ± 0.1564.52 ± 0.330.53 ± 0.010.68 ± 0.000.43 ± 0.00
544MLP3016.86 ± 0.1111.89 ± 0.068.02 ± 0.0688.94 ± 0.140.78 ± 0.000.92 ± 0.000.17 ± 0.00
6028.92 ± 0.1420.88 ± 0.0514.33 ± 0.0267.36 ± 0.310.63 ± 0.000.77 ± 0.000.30 ± 0.00
LSTM3016.96 ± 0.1511.95 ± 0.118.07 ± 0.0988.80 ± 0.190.78 ± 0.010.92 ± 0.000.18 ± 0.00
6028.84 ± 0.1920.81 ± 0.1014.34 ± 0.1367.54 ± 0.420.63 ± 0.000.77 ± 0.000.30 ± 0.00
552MLP3013.87 ± 0.168.88 ± 0.327.07 ± 0.2486.56 ± 0.320.72 ± 0.010.92 ± 0.000.15 ± 0.01
6024.61 ± 0.1116.04 ± 0.3613.43 ± 0.3057.73 ± 0.380.60 ± 0.000.82 ± 0.000.25 ± 0.00
LSTM3013.86 ± 0.029.00 ± 0.067.13 ± 0.0686.58 ± 0.030.72 ± 0.000.92 ± 0.000.15 ± 0.00
6023.97 ± 0.4415.47 ± 0.3212.76 ± 0.3859.91 ± 1.470.61 ± 0.000.83 ± 0.010.24 ± 0.01
567MLP3021.81 ± 0.2815.58 ± 0.1411.71 ± 0.3086.00 ± 0.350.65 ± 0.010.82 ± 0.010.27 ± 0.01
6037.50 ± 0.1827.95 ± 0.1321.97 ± 0.1858.65 ± 0.390.49 ± 0.000.66 ± 0.000.46 ± 0.00
LSTM3022.02 ± 0.0715.70 ± 0.0511.96 ± 0.0785.72 ± 0.080.64 ± 0.000.82 ± 0.000.27 ± 0.00
6037.77 ± 0.2528.19 ± 0.2222.38 ± 0.3658.05 ± 0.550.48 ± 0.000.66 ± 0.000.46 ± 0.00
584MLP3022.35 ± 0.5816.74 ± 0.6711.54 ± 0.5486.03 ± 0.730.77 ± 0.010.88 ± 0.010.24 ± 0.01
6035.77 ± 0.4927.25 ± 0.4918.79 ± 0.4464.11 ± 0.990.61 ± 0.010.73 ± 0.010.37 ± 0.01
LSTM3022.19 ± 0.1116.54 ± 0.1711.38 ± 0.1786.24 ± 0.130.77 ± 0.000.88 ± 0.000.23 ± 0.00
6036.02 ± 0.0627.37 ± 0.1218.91 ± 0.1463.60 ± 0.120.61 ± 0.000.72 ± 0.000.37 ± 0.00
596MLP3017.78 ± 0.2412.67 ± 0.139.52 ± 0.1087.13 ± 0.350.74 ± 0.000.89 ± 0.000.20 ± 0.00
6028.54 ± 0.2420.79 ± 0.0915.74 ± 0.2766.89 ± 0.550.58 ± 0.020.81 ± 0.000.30 ± 0.00
LSTM3017.57 ± 0.2512.49 ± 0.149.35 ± 0.0987.43 ± 0.360.75 ± 0.010.89 ± 0.000.20 ± 0.00
6028.68 ± 0.3720.97 ± 0.0715.96 ± 0.3166.55 ± 0.870.58 ± 0.020.81 ± 0.000.31 ± 0.00
Note. PID: patient identification; PH: prediction horizon; LL: lag length; RMSE: root mean square error; SD: standard deviation; MAE: mean absolute error; MAPE: mean absolute percentage error; r2: coefficient of determination; MCC: Matthew’s correlation coefficient; SE: surveillance error; ASE: average surveillance error.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khadem, H.; Nemat, H.; Elliott, J.; Benaissa, M. Blood Glucose Level Time Series Forecasting: Nested Deep Ensemble Learning Lag Fusion. Bioengineering 2023, 10, 487. https://doi.org/10.3390/bioengineering10040487

AMA Style

Khadem H, Nemat H, Elliott J, Benaissa M. Blood Glucose Level Time Series Forecasting: Nested Deep Ensemble Learning Lag Fusion. Bioengineering. 2023; 10(4):487. https://doi.org/10.3390/bioengineering10040487

Chicago/Turabian Style

Khadem, Heydar, Hoda Nemat, Jackie Elliott, and Mohammed Benaissa. 2023. "Blood Glucose Level Time Series Forecasting: Nested Deep Ensemble Learning Lag Fusion" Bioengineering 10, no. 4: 487. https://doi.org/10.3390/bioengineering10040487

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop