Analysis of Machine Learning Approaches’ Performance in Prediction Problems with Human Activity Patterns

Torres-López, Ricardo; Casillas-Pérez, David; Pérez-Aracil, Jorge; Cornejo-Bueno, Laura; Alexandre, Enrique; Salcedo-Sanz, Sancho

doi:10.3390/math10132187

Open AccessArticle

Analysis of Machine Learning Approaches’ Performance in Prediction Problems with Human Activity Patterns

by

Ricardo Torres-López

¹,

David Casillas-Pérez

^2,*

,

Jorge Pérez-Aracil

³

,

Laura Cornejo-Bueno

¹

,

Enrique Alexandre

¹

and

Sancho Salcedo-Sanz

¹

Department of Signal Processing and Communications, Universidad de Alcalá, 28805 Alcalá de Henares, Spain

²

Department of Signal Processing and Communications, Universidad Rey Juan Carlos, 28942 Fuenlabrada, Spain

³

Department of Computer Systems Engineering, Universidad Politécnica de Madrid, 28038 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(13), 2187; https://doi.org/10.3390/math10132187

Submission received: 8 May 2022 / Revised: 15 June 2022 / Accepted: 20 June 2022 / Published: 23 June 2022

(This article belongs to the Special Issue Machine Learning and Statistical Modeling with Applications in Real-World Data and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Prediction problems in timed datasets related to human activities are especially difficult to solve, because of the specific characteristics and the scarce number of predictive (input) variables available to tackle these problems. In this paper, we try to find out whether Machine Learning (ML) approaches can be successfully applied to these problems. We deal with timed datasets with human activity patterns, in which the input variables are exclusively related to the day or type of day when the prediction is carried out and, usually, to the meteorology of those days. These problems with a marked human activity pattern frequently appear in mobility and traffic-related problems, delivery prediction (packets, food), and many other activities, usually in cities. We evaluate the performance in these problems of different ML methods such as artificial neural networks (multi-layer perceptrons, extreme learning machines) and support vector regression algorithms, together with an Analogue-type (KNN) approach, which serves as a baseline algorithm and provides information about when it is expected that ML approaches will fail, by looking for similar situations in the past. The considered ML algorithms are evaluated in four real prediction problems with human activity patterns, such as school absences, bike-sharing demand, parking occupation, and packets delivered in a post office. The results obtained show the good performance of the ML algorithms, revealing that they can deal with scarce information in all the problems considered. The results obtained have also revealed the importance of including meteorology as the input variables, showing that meteorology is frequently behind demand peaks or valleys in this kind of problem. Finally, we show that having a number of similar situations in the past (training set) prevents ML algorithms from making important mistakes in the prediction obtained.

Keywords:

human activity patterns; machine learning approaches; performance evaluation; regression algorithms; timed data

MSC:

62J02

1. Introduction

This paper deals with prediction problems in timed datasets with a marked human activity pattern [1]. Prediction in these datasets is a hard task, with specific characteristics: the importance of the time tag in which the prediction is carried out, the existence of very few exogenous variables to complement the information of the time series (meteorology in the majority of cases), or specific peculiarities and structures for each problem add an extreme difficulty to any prediction task carried out on these datasets. In many cases, Machine Learning (ML) approaches are the only techniques flexible enough to deal with this type of problem.

The analysis of human activities through the use of different algorithmic approaches, including ML methods, has gained the attention of researchers in the last few years. For example, in [2], the analysis of human daily behaviors from smart mobile data was tackled. In [3], a Random Forest (RF) approach is applied to a problem of human activity recognition, based on different predictive variables including the mobile phone position. In [4], a deep belief network is applied to a problem of human activity recognition, obtaining better results than alternative ML approaches such as Support Vector Machines (SVMs) or Artificial Neural Networks (ANNs). In close connection with this last work, [5] proposes the use of a deep residual network to solve a problem of human activity recognition with data from the Internet of Things. A comparison with state-of-the-art ML techniques is carried out. In [6], fuzzy recognition algorithms are applied to segment a given video into a sequence of events belonging to the same human activity. In [7], the performance of five different ML methods for human activity recognition is analyzed. Specifically, naive Bayes, C4.5 decision trees, RF, multi-layer perceptrons, and SVM are considered over real data from accelerometers, to recognize human activities carried out by a given person. In [8], an algorithm for risky driving detection patterns based on speed time series is proposed. In [9], ML techniques for detecting daily routine activities and deviations from them are presented. Decision trees and RF algorithms were considered in that problem.

However, there are other types of tasks that are not specifically human activities on their own, but are completely related and have a clear human activity pattern. Problems related to human mobility and traffic patterns [10,11], consumption prediction [12,13] (of goods or services such as electricity, etc.), attendance at events or any other activity with timetable, etc., are some examples of these problems. There are some previous works in the literature dealing with different approaches for this kind of problem, some of them applying ML algorithms. For example, in [14], a problem of electricity price forecasting is tackled, by applying deep neural networks hybridized with particle swarm optimization. A Long Short-Term Memory (LSTM) neural network model was specifically considered in this problem, with a clear pattern of human activity and marked differences on different days of the week. In [15], the prediction of electricity price on the HUPX market and electricity load in Montenegro is tackled by applying an ANN. In both cases, variables related to meteorology and the day of the week are included in the prediction. In close connection with this problem, the prediction of electricity consumption at homes is another problem that involves timed data with a clear human activity pattern, as shown in [16,17,18], where the problem is tackled using ML algorithms. Furthermore, a problem with midterm daily load forecasting in power systems is tackled using three different ML models such as gradient boosting trees and ARIMA methods. In [19], the analysis of large-scale human mobility by the use of ML techniques is carried out. Somewhat related to that work, in [20], a study on human mobility using ML clustering techniques in the Chicago metropolitan area over a demographic representative sample of its population is carried out. The K-means algorithm is applied to a principal component analysis of the data to reduce their dimensionality. Analysis of average weeks’ human mobility and on weekends is carried out, detecting completely different patterns, as expected. In [21], different ML approaches such as gradient boosting machines, Support Vector Regression algorithms (SVR), boosted trees, and Extreme Gradient Boosting Trees (XGBTs) are tested in a problem of bike-sharing demand prediction. Predictive variables include data from weather, day of the week, and type of day (holidays, etc.). Excellent prediction results were reported in a problem of bike demand estimation in Seoul (South Korea). The main and common characteristic of all the problems previously reviewed is that they are usually represented as timed data, with a given resolution. Prediction problems related to these tasks are characterized by a reduced number of exogenous variables, and meteorology is usually the most important variable to take into account. Furthermore, these problems are very specific, and some of them have different patterns, which affect the analysis or prediction, such as different behavior on weekends, holidays, or special days, etc.

In this paper, we analyze the performance of several ML algorithms such as neural networks or SVR algorithms in different prediction problems with human activity patterns. We evaluate the effect of meteorology on each problem, as well as the importance of having a complete dataset with regular information about specific cases to obtain good-quality predictions in these problems. For comparison purposes, we include an Analogue algorithm for prediction (a version of the well-known KNN approach), which takes into account the similarities of current prediction with past events, so this approach fits the structure of these human activity pattern prediction problems. We consider four different problems that can be modeled as regression problems and solved with ML algorithms. Specifically, the prediction of students’ absences at a school in Madrid, the occupancy of parking in San Sebastian (Spain), the bike-sharing demand in Madrid, and a problem of the prediction of the number of packets delivered by a post office in Guadalajara (Spain) are considered. All of them have in common a marked human activity pattern. The results obtained show that ML is a good option to obtain accurate prediction results in this kind of timed dataset, the meteorology is important in all the cases analyzed, and the datasets must be informative and complete enough to avoid large errors in the predictions provided by the ML approaches.

The main contributions of this work can be summarized as follows:

We discuss whether ML approaches can be successfully applied to prediction problems in timed datasets when they are characterized by a marked human activity pattern.
We analyze the information available in this kind of problem, trying to discover the importance of each input variable considered among those available: previous values of the objective function, day of the week, type of the day (ordinary, bank holiday, etc.), and the meteorology of the zone.
We analyze the effect of persistence (similar days in the past produce a similar prediction) in these problems and how it reflects in the information available to train the ML approaches.
We show the performance of ML approaches in four real timed datasets with a clear human activity pattern on them: school absences, bike-sharing demand, parking occupancy, and packet delivery in a post office.

The rest of the paper is structured as follows: The next section describes the timed data available for this study and the specific characteristics and human activity patterns of each problem. Section 3 describes the most important characteristics of the prediction methods considered in this paper. Section 4 presents the results obtained in each prediction problem, with specific details of the ML method’s performance on each of the problems considered. Section 5 closes the paper with some final remarks and conclusions on the research carried out. We also include a table of abbreviations at the end of the manuscript to ease the reading.

2. Timed Data with Human Activity Patterns

In this section, we define what can be considered a prediction problem with a human activity pattern. We also describe some real datasets with this characteristic, which will be used as benchmarks to evaluate the performance of ML regression techniques on this type of problem.

As previously stated, many human activities can be modeled as timed data, with a specific time resolution (hourly, daily, etc). In many cases, the time tag in which the activity is carried out is key to obtaining a prediction in the dataset. Furthermore, there are not many exogenous variables associated with these prediction problems, and in many cases, meteorology is the only extra information available. A prediction problem with these characteristics is considered a human activity pattern prediction task.

In this paper, we consider several timed datasets with such human activity patterns. Specifically, we obtained data on school absences at a private school in Madrid, data on bike-sharing demand in Madrid city (from the Public Madrid Company BiciMad), data on public parking occupancy in San Sebastián city (Basque Country, Spain), and data from a post office at Azuqueca de Henares (Guadalajara, Spain) about the number of daily packets delivered. Some of these datasets can be freely accessed and downloaded from the Spanish Government data repository at [22]. In addition, all datasets considered in this work can be accessed through the following repository: https://github.com/ksyas/HAP_DB.git (Last accessed date 22 June 2022). In all cases, a prediction problem can be formulated on these datasets as a regression problem, in which the number of absences in a school, number of bicycles rented (demand), percentage of parking occupancy, and number of packets delivered must be estimated. Daily data were considered in all cases, and one-day-ahead prediction was considered. Figure 1 shows the times series of each considered dataset, with details on the date ranges in which the data are available. In addition, we obtained meteorological data for the location of each dataset, using a Reanalysis project (see [23] for a description of Reanalysis data and its use in ML-based prediction problems). Specifically, ERA 5 Reanalysis data from the European Center for Medium-range Weather Forecasts (ECMWF) was used, from which different variables such as temperature or precipitation were downloaded for the dates in which the datasets have samples.

Table 1, Table 2, Table 3, Table 4 and Table 5 show the predictive (input) variables and also the output variables (Table 5) considered in each problem. These input variables vary with the problem, due to the very specific characteristics of each prediction task considered. Note that, in all of them, the type of day or the day of the week in which the prediction is carried out is a relevant variable for the prediction. We also included the objective data of the previous days and the meteorology, as previously stated, including the temperature and rainfall of the previous days and also on the prediction day (prediction values would be used for this in a real application). In the experimental section, we discuss whether meteorology is a key factor (or not) for each of these human activity pattern problems.

3. Methods: Machine Learning Regression Techniques

In this section, we briefly describe the most important characteristics of some ML regression techniques that can be applied to timed datasets with human activity patterns. In this section, we also describe the Analogue algorithm we used as a baseline for comparison purposes. We start the section by introducing regression problems as the framework we used for ML prediction in timed datasets with human activity patterns, and then, we describe the different ML regressors considered.

3.1. Regression Problems

In a data-driven regression problem, it is assumed that there exists a function f that relates the input space

X

to the output space

Y

:

\begin{matrix} f : X & \to Y \\ x & \mapsto y = f (x) \end{matrix}

(1)

A priori, the function f is unknown, but we count on a set of input–output pairs

D = {(x_{i}, y_{i}) \in X \times Y | 0 < i \leq N}

that fulfill the following expression:

f (x_{i}) = y_{i}, \forall (x_{i}, y_{i}) \in D

(2)

Supervised ML methods for regression provide an estimation

\hat{f}

of the true function f, which maps the whole input space

X

into the output space

Y

, fitting well in the known set of observations

D

, but also suitable in the whole domain

X

.

In a regression problem, the output space

Y

is a subset of the real numbers

R

or a subset of the Cartesian product

R^{n}

. Consequently, the output is a infinite dense set of real numbers. In this case, the mapping function f is assumed to be smooth in some grade, mainly continuous. The input–output pairs of observations

D

should cover most of the input domain

X

in such a way that it is represented throughout the whole domain. The main objective is to find an estimation of f that fulfills

D

, but extrapolate the behavior to the whole input space. The next Algorithm 1 summarizes the methodology followed for implementing the human activity ML predictor.

Algorithm 1 Pseudo-code of ML methods for prediction problems with human activity patterns.

Input:: Meteorological, day of the week, and and other calendar-related variables, persistence human activity variables.
Output:: Prediction of the objective variables.
1:: Build database $D$ with the input–output variables at day n and their corresponding temporal lag for feeding the ML methods: SVM, ELM, MLP, according to Table 5.
2:: Split the database $D$ into two subsets for training ( $70 %$ ) and testing ( $30 %$ ).
3:: Train the considered ML methods in Step 1 and test them with the previous subsets. Keep the statistic results to compare.
4:: return Results, the best model and its prediction on the objective of the human activity variables.

3.1.1. Support Vector Regression

SVR [24,25] is an ML approach for regression problems, well established also for function approximation. There are several versions of this approach, but in this work, we considered the classical

ϵ

-SVR model, described in detail in [25]. This

ϵ

-SVR version has been successfully applied before in a large number of problems and applications in science and engineering [26]. Figure 2 shows an example of the

ϵ

-SVR procedure for a two-dimensional regression problem.

To train this model, it is necessary to solve the following optimization problem [25]:

\begin{matrix} min_{w, b, ξ} & \frac{1}{2} ∥ w ∥ + C \sum_{i = 1}^{N} ξ_{i} + ξ_{i}^{*} \\ s . t . & y_{i} - w^{⊤} ϕ (x_{i}) - b \leq ϵ + ξ_{i}, & i \in {1, \dots, N} \\ - y_{i} + w^{⊤} ϕ (x_{i}) + b \leq ϵ + ξ_{i}^{*}, & i \in {1, \dots, N} \\ ξ_{i}, ξ_{i}^{*} \geq 0, & i \in {1, \dots, N} \end{matrix} .

(3)

Details on the solution process for the SVR algorithm and its tuning and optimization can be found in [25].

3.1.2. Multi-Layer Perceptron

Multi-Layer Perceptron (MLP) is a type of ANN successfully applied to a large number of classification and regression problems in the literature [27,28]. MLP is usually structured into several layers: an input layer, several hidden layers (at least one), and an output layer. All of them consist of a number of special processing units called neurons. Usually, these layers are disposed consecutively, and each neuron of a layer is connected to the other neurons of the next layer (feed-forward structure) by means of weighted links, as can be seen in Figure 3. The values of the MLP weights determine the capacity of the network to learn the problem, when enough training samples are used. Thus, the training process usually consists of assigning values to these weights, in such a way that the assignment minimizes the error between the output given by the MLP and the corresponding expected output in the training set. The number of neurons in the hidden layers is another parameter to be optimized [28,29]. In our experiments, we developed a cross-validation process varying this parameter between 10 and 50 neurons. The number of hidden neurons was around 20, depending on the dataset, but we observed that the experimental Root-Mean-Squared Error (RMSE) reduction was not significant.

The classical well-known Stochastic Gradient Descent (SGD) or backpropagation algorithm is often applied to train the MLP [30]. There are also alternative training algorithms for MLPs that have previously shown excellent performance, such as the Levenberg–Marquardt algorithm [31].

3.1.3. Extreme Learning Machines

An Extreme Learning Machine (ELM) [32] is a special type of training method for MLPs (see Section 3.1.2). In the ELM approach, the weights between the inputs and the hidden nodes are assigned at random, usually by using a uniform probability distribution. Then, the ELM algorithm calculates the output matrix H of the hidden layer and computes the Moore–Penrose pseudo-inverse [33,34] of this matrix (

H^{+}

); see [32] for details. The pseudo-inverse matrix calculation can be carried out as follows:

H^{+} = {(H^{⊤} H)}^{- 1} H^{⊤}

(4)

Then, the optimal values of the output layer weights can be directly obtained by multiplying

H^{+}

times the problem’s target vector (see [35] for details). The ELM method has shown excellent performance in classification and regression problems with respect to alternative ML training methods. In addition its training, the computational efficiency is much better than that of other classifiers or regression approaches such as SVM algorithms or MLPs [35].

3.2. Analogue Method for Prediction

The Analogue Method (ANA) is a prediction/reconstruction algorithm first introduced in the context of atmospheric sciences and with application in time series/timed data prediction [36,37,38]. The ANA is based on the principle that two similar states (of the atmosphere) lead to similar local effects [39]. More specifically, two states are considered as “Analogues”, when there is a resemblance between them, in terms of an analogy criterion or distance and objective variables. Thus, the ANA consists of searching for a certain number of past situations in an archive, in such a way that they present similar properties to that of a target situation for any chosen predictors or variables. The application in timed data is therefore straightforward from the original version in atmospheric sciences. Note that the ANA algorithm is a type of KNN ([40]), and we used it in this work as a baseline algorithm for comparison purposes.

In this paper, we applied the ANA to the prediction of timed data, as follows: Given a partition of the data into train (

S^{V}

) and test sets (

S^{T}

), the ANA process starts by obtaining the most similar situations in the past (train set), in terms of predictive variables, for each prediction to be performed on the test set. In other words, this is equivalent to, for each test sample

x_{T}

(in time

T \in S^{T}

), obtaining the most similar situation (or average of k most similar situations) in the past (training period), located in time

T^{*} \in S^{V}

. Then, the prediction for

x_{T}

is obtained as

x_{T} = x_{T^{*}}

.

4. Experiments and Results

This section presents the results obtained by the ML regression methods described above in different prediction problems with human activity patterns. First, the performance of the ML algorithms is evaluated in these datasets, and then, the effect of including meteorology is also evaluated at this point.

4.1. Regression Metrics

We considered three common regression metrics for measuring the performance of the prediction methods considered: the RMSE, Mean Average Error (MAE), and the Pearson correlation coefficient (

R^{2}

). We provide a brief description of these metrics here:

\begin{matrix} RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}} \\ MAE = \frac{1}{N} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} | \\ R^{2} = \frac{\sum_{i = 1}^{n} (y_{i} - E [y]) ({\hat{y}}_{i} - E [\hat{y}])}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - E [y])}^{2}} \sqrt{\sum_{i = 1}^{n} {({\hat{y}}_{i} - E [\hat{y}])}^{2}}} \end{matrix},

(5)

where

\hat{y}

represents the prediction of the regressor and y is the ground truth. The subscript i is used to refer to a single sample

y_{i} = y [i]

. A good performance of ML methods reports an RMSE and MAE close to 0 and

R^{2}

close to

\pm 1

. The RMSE is more sensitive to a single misprediction than the MAE, so retrieving low RMSE shows that the method does not commit strong mispredictions.

R^{2}

is bounded in the

[- 1, 1]

interval and measures how linearly correlated the output estimated by the method and the ground truth are.

4.2. Results

First, the results of the MLP, ELM, SVR, and ANA in the school absences dataset are reported in Table 6.

As can be seen, the ANA baseline approach has a high performance in this dataset, indicating that similar days in the past had similar school absence figures. The best performance among all ML approaches is the SVR, which outperforms the ANA approach, with the best MAE of

4.82

. The effect of including meteorology in the prediction is positive since all methods improve their results when meteorology is considered, which indicates that bad weather has an influence on school absences, especially on the RMSE. ELM and MLP obtain the worst results, even worse than ANA, which is considered the baseline. This result shows that the input variables used for the prediction are not enough for these methods to explain all the variability in the absence problem. Absences could be produced by health problems, such as flu, and maybe seasonal, like in this last example. The last column in the table shows the training time (tt, in seconds) for the different ML regressors considered. As can be seen, the ANA approach is the fastest, and the SVR shows the highest training time among all the approaches evaluated. Note that the ELM is the fastest ML approach. Figure 4 shows the performance of the different regressors considered in the test set of the school absences dataset, including meteorology input variables.

The second dataset with a human activity pattern where we study the performance of the ML regression methods is the bike-sharing demand data in Madrid city (BiciMad). Table 7 shows the numerical performance of the ML regression techniques in comparison with the ANA approach.

As can be seen, the best result in this dataset is obtained by the MLP algorithm, closely followed by the ELM, with 1428 and 1506 RMSEs, respectively. The SVR performs worse in this problem, obtaining a 1552 RMSE, and the ANA algorithm does not perform well in this case, with a 1804 RMSE. Note that R

^{2}

reaches 83% in the result obtained by the MLP, which indicates a much better performance of the prediction algorithms on this dataset than in the previous one. Considering the meteorology-related input variables also has a positive impact on the prediction capacity of the ML approaches in this dataset, improving their performance in all cases, as expected, since meteorological information is key in this problem about the number of rented bicycles (demand) in Madrid city. Compared to the absence dataset (the previous evaluated case), here, the evaluated ML methods report better performance to predict the bike demands. The input variables chosen are suitable for this specific problem and well determine the bike demanding problem.

The visual performance of the different algorithms can be seen in Figure 5 (regression approaches considering meteorology input variables). As can be seen, the accuracy of the prediction in this dataset is much better than in the previous case of the school absences dataset.

Except the ANA method, all other methods predict the bike demand time series well.

We also evaluated here the performance of ML algorithms in the prediction of the parking occupancy in San Sebastian, Northern Spain. Table 8 shows the numerical performance of the ML algorithms in terms of the different quality metrics considered.

In this case, the best ML approach in terms of the MAE is the ELM with

104.92

. However, in terms of the RMSE and

R^{2}

, the MLP reaches the best results with

144.52

and

87 %

, respectively. Both the MLP and ELM report very good performance on this dataset as in the previous one. The difference in the MAE and RMSE shows that the ELM behaves slightly worse in predicting a few samples. The SVR obtains worse results. The ANA approach does not obtain competitive results either in this problem. As in the previous cases, considering meteorology inputs in the problem improves the performance of the ML regression techniques considered, in all cases. We can see that the input variables involved are suitable to predict parking occupancy. Figure 6 visually depicts the performance of all regressors tested in this problem. It is possible to see that the ANA approach fails to produce a correct prediction in the second part of the time series, whereas the ML approaches work consistently better in all cases.

Finally, we dealt with the problem of the prediction of the number of packets delivered in a post office at Azuqueca de Henares, Spain. Table 9 shows the performance of the different regressors considered in this problem.

In this case, the best algorithm seems to be the MLP, which obtains an MAE of

127.6

with R

^{2}

56 %

, closely followed by the SVR with

127.95

and 51. However, the MLP obtains a better RMSE than the ELM,

188.45

and

203.83

, which means that it is less sensitive to strong mispredictions. The ELM results are not suitable for predicting package deliveries, and it reports worse results than its ML counterparts. The ANA obtains very poor results, in this case even worse by adding meteorological variables. In this problem, considering meteorology variables as inputs improves a bit the performance of the algorithms, but less than in the previous cases. Figure 7 visually shows the performance of the different regressors considered in this problem. Note that the number of packets delivered at Azuqueca grows exponentially at the last part of the test set (pandemic months were eliminated from the test set since they were not significant). This exponential growth is, in fact, a collateral effect of the COVID-19 pandemic, where the number of packet deliveries boomed, because of e-commerce growth. In this part of the series, all the regressors have a poor performance, though the ELM seems to be the best algorithm in this part.

Finally, we analyzed the effect of having similar days in the past (training set) on the performance of ML methods considered. We carried out this analysis by considering the school absences dataset, in which we calculated how many similar days to the current one (n) there are in the training set (

k [n]

). We depict this time series together with the prediction of the different ML regression techniques in Figure 8a–c. In fact, we depict

k^{'} [n] = k [n] / 20

(instead of

k [n]

) for a better matching with the prediction of the ML algorithms as a black line in the figures. As can be seen, there are day types in the test set without a similar counterpart in the training set, which leads to a lack of similar situations in the past to train the ML algorithms. It is possible to see that the most important prediction errors in all ML algorithms considered are produced on these days without similarity in the training set. This means that having enough information on similar day types in the past (training set) is key to obtaining the good performance of the ML algorithms in these prediction problems with human activity patterns. Note that this behavior is somehow related to the concept of the persistence of the system, a concept related in turn to the memory of the system. More specifically, persistence is an important characteristic of many complex systems in nature, related to how long the system remains in a certain state before changing to a different one [41]. In the case of prediction problems with human activity patterns, persistence is related to the existing information in the training set to predict a given unseen sample in the test set. In other words, the different ML approaches can obtain good predictions when they are trained with similar cases, but if there is not a given persistence level in the system (i.e., there are no similar cases in the past), the ML training quality degrades, as shown in this final experiment for the school absences prediction problem.

5. Conclusions

Prediction problems in timed datasets with human activity patterns present special characteristics that can be exploited to improve the performance of ML algorithms on them. In these prediction problems, the time tag (hour of the day, day of the week, etc.) in which the prediction is carried out is extremely important, together with meteorology predictive variables, which are usually associated with these problems. In this paper, we showed how ML approaches can be used to obtain accurate predictions in problems with a marked human activity pattern. We focused on four real prediction problems with human activity patterns: the prediction of students’ absences to a school in Madrid, the occupancy of parking at San Sebastian, the bike-sharing demand in Madrid, and the number of packets delivered by a post office in Guadalajara. We showed that the ML prediction approaches considered (different neural networks and SVMs) can obtain a good prediction accuracy on these datasets, with only the information of previous values of the objective variables (autoregression), the day of the week in which the prediction is carried out, and the meteorology of the zone. We compared the results obtained using ML algorithms with those of an ANA method for prediction (a version of the KNN algorithm), showing that the ML algorithms outperform classical approaches on this type of problem. We also showed that including meteorology variables as inputs improves the performance of ML approaches in all cases analyzed. Finally, we discussed the effect of not having enough similar situations in the past for ML approaches. We found that it is in fact in these cases where the ML approaches make the most important mistakes in the predictions. The most important limitation when dealing with prediction problems with human activity pattern problems is the lack of input variables useful for improving the prediction performance of ML algorithms. Meteorological variables seem to play an important role to improve the prediction, but they are not at all the most important variables involved in these problems. Thus, it is very difficult to find out alternative variables that may improve the prediction, further than the day or types of day when the prediction is obtained. Finding a different time series highly correlated with the objective one would improve the performance of the ML approaches in this kind of problem greatly, but it is not easy to find such a time series, even if it exists. Future work should clarify this point, i.e., whether there are alternative specific predictive variables for each problem that may improve ML prediction results. We will also explore the inclusion of adaptive time–space analysis methods, such as empirical model decomposition or wavelet-based decomposition, to improve the input information taken by the ML methods for carrying out the prediction in this kind of timed data with human activity patterns. Finally, the application of deep learning methods could be an option for larger datasets, and it is a line of research we would like to explore as well.

Author Contributions

Conceptualization, R.T.-L., L.C.-B., D.C.-P., S.S.-S., J.P.-A., and E.A.; methodology, R.T.-L., S.S.-S., D.C.-P., and J.P.-A.; software, R.T.-L., and L.C.-B.; validation, R.T.-L., and E.A.; investigation, R.T.-L., L.C.-B., S.S.-S., D.C.-P., and E.A.; data curation, R.T.-L.; writing—original draft preparation, R.T.-L., S.S.-S., D.C.-P., and J.P.-A.; writing—review and editing, E.A., S.S.-S., L.C.-B., and D.C.-P.; visualization, R.T.-L.; supervision, S.S.-S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the project PID2020-115454GB-C21 of the Spanish Ministry of Science and Innovation (MICINN).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets considered in this work can be accessed through the following repository: https://github.com/ksyas/HAP_DB.git (Last accessed date 22 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

A	Absences
ANA	Analogue Method
ANN	Artificial Neural Networks
ARIMA	Autoregressive Integrated Moving Average
B	Bicycles rented
dw	Day of the week
ECMWF	European Center for Medium-range Weather Forecasts
XGBT	Extreme Gradient Boosting Trees
ELM	Extreme Learning Machine
hd	Holiday
HUPX	Hungarian Power Exchange
KNN	k-Nearest Neighbors
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
ML	Machine Learning
MLP	Multi-Layer Perceptron
P	Parking use
Pd	Packets delivered
R $^{2}$	Pearson correlation coefficient
Ra	Rainfall
RF	Random Forest
RMSE	Root-Mean-Squared Error
SGD	Stochastic Gradient Descent
SVM	Support Vector Machines
SVR	Support Vector Regression
T	Temperature
td	Type of day

References

Cheng, X.; Wang, Z.; Yang, X.; Xu, L.; Liu, Y. Multi-scale detection and interpretation of spatio-temporal anomalies of human activities represented by time-series. Comput. Environ. Urban Syst. 2021, 88, 101627. [Google Scholar] [CrossRef]
Rawassizadeh, R.; Momeni, E.; Dobbins, C.; Gharibshah, J.; Pazzani, M. Scalable daily human behavioral pattern mining from multivariate temporal data. IEEE Trans. Knowl. Data Eng. 2016, 28, 3098–3112. [Google Scholar] [CrossRef]
Ehatisham-ul Haq, M.; Azam, M.A. Opportunistic sensing for inferring in-the-wild human contexts based on activity pattern recognition using smart computing. Future Gener. Comput. Syst. 2020, 106, 374–392. [Google Scholar] [CrossRef]
Hassan, M.M.; Uddin, M.Z.; Mohamed, A.; Almogren, A. A robust human activity recognition system using smartphone sensors and deep learning. Future Gener. Comput. Syst. 2018, 81, 307–313. [Google Scholar]
Keshavarzian, A.; Sharifian, S.; Seyedin, S. Modified deep residual network architecture deployed on serverless framework of IoT platform based on human activity recognition application. Future Gener. Comput. Syst. 2019, 101, 14–28. [Google Scholar]
Zhang, H.; Zhou, W.; Parker, L.E. Fuzzy temporal segmentation and probabilistic recognition of continuous human daily activities. IEEE Trans. Hum.-Mach. Syst. 2015, 45, 598–611. [Google Scholar]
Hussain, R.G.; Ghazanfar, M.A.; Azam, M.A.; Naeem, U.; Rehman, S.U. A performance comparison of machine learning classification approaches for robust activity of daily living recognition. Artif. Intell. Rev. 2019, 52, 357–379. [Google Scholar]
Wang, D.; Pei, X.; Li, L.; Yao, D. Risky driver recognition based on vehicle speed time series. IEEE Trans. Hum.-Mach. Syst. 2017, 48, 63–71. [Google Scholar]
Chifu, E.S.; Chifu, V.R.; Pop, C.B.; Vlad, A.; Salomie, I. Machine Learning based technique for detecting daily routine and deviations. In Proceedings of the 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 6–8 September 2018; pp. 183–189. [Google Scholar]
Lin, X.; Wells, P.; Sovacool, B.K. The death of a transport regime? The future of electric bicycles and transportation pathways for sustainable mobility in China. Technol. Forecast. Soc. Chang. 2018, 132, 255–267. [Google Scholar] [CrossRef] [Green Version]
Yang, S.; Li, H.; Luo, Y.; Li, J.; Song, Y.; Zhou, T. Spatiotemporal adaptive fusion graph network for short-term traffic flow forecasting. Mathematics 2022, 10, 1594. [Google Scholar]
Opoku, E.E.O.; Kufuor, N.K.; Manu, S.A. Gender, electricity access, renewable energy consumption and energy efficiency. Technol. Forecast. Soc. Chang. 2021, 173, 121121. [Google Scholar] [CrossRef]
Harantová, V.; Kalašová, A.; Skřivánek Kubíková, S.; Mazanec, J.; Jordová, R. The impact of mobility on shopping preferences during the COVID-19 Pandemic: The evidence from the Slovak Republic. Mathematics 2022, 10, 1394. [Google Scholar] [CrossRef]
Gundu, V.; Simon, S.P. PSO–LSTM for short term forecast of heterogeneous time series electricity price signals. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 2375–2385. [Google Scholar] [CrossRef]
Pavićević, M.; Popović, T. Forecasting Day-Ahead Electricity Metrics with Artificial Neural Networks. Sensors 2022, 22, 1051. [Google Scholar] [CrossRef] [PubMed]
Gonzalez-Briones, A.; Hernandez, G.; Corchado, J.M.; Omatu, S.; Mohamad, M.S. Machine learning models for electricity consumption forecasting: A review. In Proceedings of the 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, UK, 1–3 May 2019; pp. 1–6. [Google Scholar]
Albuquerque, P.C.; Cajueiro, D.O.; Rossi, M.D. Machine learning models for forecasting power electricity consumption using a high dimensional dataset. Expert Syst. Appl. 2021, 187, 115917. [Google Scholar] [CrossRef]
Grossi, L.; Nan, F. Robust forecasting of electricity prices: Simulations, models and the impact of renewable sources. Technol. Forecast. Soc. Chang. 2019, 141, 305–318. [Google Scholar] [CrossRef]
Toch, E.; Lerner, B.; Ben-Zion, E.; Ben-Gal, I. Analyzing large-scale human mobility data: A survey of machine learning methods and applications. Knowl. Inf. Syst. 2019, 58, 501–523. [Google Scholar] [CrossRef]
Jiang, S.; Ferreira, J.; González, M.C. Clustering daily patterns of human activities in the city. Data Min. Knowl. Discov. 2012, 25, 478–510. [Google Scholar] [CrossRef] [Green Version]
Sathishkumar, V.; Park, J.; Cho, Y. Using data mining techniques for bike-sharing demand prediction in metropolitan city. Comput. Commun. 2020, 153, 353–366. [Google Scholar]
Spanish Goverment Database. 2021. Available online: https://datos.gob.es/ (accessed on 20 October 2021).
Salcedo-Sanz, S.; Ghamisi, P.; Piles, M.; Werner, M.; Cuadra, L.; Moreno-Martínez, A.; Izquierdo-Verdiguier, E.; Muñoz-Marí, J.; Mosavi, A.; Camps-Valls, G. Machine learning information fusion in Earth observation: A comprehensive review of methods, applications and data sources. Inf. Fusion 2020, 63, 256–272. [Google Scholar] [CrossRef]
Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
Zhang, F.; O’Donnell, L.J. Support vector regression. In Machine Learning; Elsevier: New York, NY, USA, 2020; pp. 123–140. [Google Scholar]
Salcedo-Sanz, S.; Rojo-Álvarez, J.L.; Martínez-Ramón, M.; Camps-Valls, G. Support vector machines in engineering: An overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2014, 4, 234–267. [Google Scholar] [CrossRef]
Haykin, S.; Network, N. A comprehensive foundation. Neural Netw. 2004, 2, 41. [Google Scholar]
Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006; Volume 4. [Google Scholar]
Gupta, T.K.; Raza, K. Optimizing deep feedforward neural network architecture: A tabu search based approach. Neural Process. Lett. 2020, 51, 2855–2870. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Hagan, M.T.; Menhaj, M.B. Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 1994, 5, 989–993. [Google Scholar] [CrossRef]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Moore, E.H. On the reciprocal of the general algebraic matrix. Bull. Am. Math. Soc. 1920, 26, 394–395. [Google Scholar]
Ben-Israel, A.; Greville, T.N. Generalized Inverses: Theory and Applications; Springer Science & Business Media: New York, NY, USA, 2003; Volume 15. [Google Scholar]
Albadra, M.A.A.; Tiuna, S. Extreme learning machine: A review. Int. J. Appl. Eng. Res. 2017, 12, 4610–4623. [Google Scholar]
Delle Monache, L.; Eckel, F.A.; Rife, D.L.; Nagarajan, B.; Searight, K. Probabilistic weather prediction with an analog ensemble. Mon. Weather Rev. 2013, 141, 3498–3516. [Google Scholar] [CrossRef] [Green Version]
Chardon, J.; Hingray, B.; Favre, A.C.; Autin, P.; Gailhard, J.; Zin, I.; Obled, C. Spatial similarity and transferability of analog dates for precipitation downscaling over France. J. Clim. 2014, 27, 5056–5074. [Google Scholar] [CrossRef]
Alessandrini, S.; Delle Monache, L.; Sperati, S.; Nissen, J. A novel application of an analog ensemble for short-term wind power forecasting. Renew. Energy 2015, 76, 768–781. [Google Scholar] [CrossRef]
Lorenz, E.N. Atmospheric predictability as revealed by naturally occurring Analogues. J. Atmos. Sci. 1969, 26, 636–646. [Google Scholar] [CrossRef] [Green Version]
Shakhnarovich, G.; Darrell, T.; Indyk, P. Nearest-neighbor methods in learning and vision. IEEE Trans. Neural Netw. 2008, 19, 377. [Google Scholar]
Salcedo-Sanz, S.; Casillas-Pérez, D.; Del Ser, J.; Casanova-Mateo, C.; Cuadra, L.; Piles, M.; Camps-Valls, G. Persistence in complex systems. Phys. Rep. 2022, 957, 1–73. [Google Scholar] [CrossRef]

Figure 1. Time series with human activity patterns considered in this work. (a) School absences time series. (b) BiciMad time series. (c) Parking occupancy time series. (d) Packet delivery time series.

Figure 2. Example of a support-vector-regression process for a two-dimensional regression problem, with an

ϵ

-insensitive loss function.

Figure 2. Example of a support-vector-regression process for a two-dimensional regression problem, with an

ϵ

-insensitive loss function.

Figure 3. Structure of an MLP neural network, with one hidden layer.

Figure 4. Performance of the different regression techniques considered in the prediction problem on school absences dataset. (a) ANA; (b) ELM; (c) MLP; (d) SVR.

Figure 5. Performance of the different regressors considered in the bike-sharing demand prediction problem in Madrid. (a) ANA; (b) ELM; (c) MLP; (d) SVR.

Figure 6. Performance of the different regressors considered in the prediction problem on San Sebastian parking occupancy dataset. (a) ANA; (b) ELM; (c) MLP; (d) SVR.

Figure 7. Performance of the different regressors considered in the prediction problem on packet delivery dataset at Azuqueca. (a) ANA; (b) ELM; (c) MLP; (d) SVR.

Figure 8. Performance of the different regressors considered in the prediction problem on the school absences problem and similar days to the current one in the training set. (a) ELM; (b) MLP; (c) SVR. The black line represents the number of similar days in the training set normalized by 20, that is

k^{'} [n] = k [n] / 20

.

Figure 8. Performance of the different regressors considered in the prediction problem on the school absences problem and similar days to the current one in the training set. (a) ELM; (b) MLP; (c) SVR. The black line represents the number of similar days in the training set normalized by 20, that is

k^{'} [n] = k [n] / 20

.

Table 1. Variables involved in the school absences prediction problem.

Variable	Acr.	Description/Units
Type of Day	td	School holiday (1), national holiday (2), holiday eve (3), local holiday (4), return from holidays (5), weekend (6), school day (7)
Temperature	T	$^{\circ}$ C
Rainfall	Ra	mm/24 h
Absences	A	$N \cup {0}$

Table 2. Variables involved in BiciMad’s bike-sharing demand prediction problem.

Variable	Acr.	Description/Units
Day of the Week	dw	Monday (1), Tuesday (2), Wednesday (3), Thursday (4), Friday (5), Saturday (6), Sunday (7)
holiday	hd	No (0), Yes (1)
Temperature	T	$^{\circ}$ C
Rainfall	Ra	mm/24 h
Bicycles Rented (demand)	B	$N \cup {0}$

Table 3. Variables involved in the San Sebastian parking occupancy prediction problem.

Variable	Acr.	Description/Units
Day of the Week	dw	Monday (1), Tuesday (2), Wednesday (3), Thursday (4), Friday (5), Saturday (6), Sunday (7)
Holiday	hd	No (0), Yes (1)
Temperature	T	$^{\circ}$ C
Rainfall	Ra	mm/24 h
Parking Use	P	$N \cup {0}$

Table 4. Variables involved in the Azuqueca post packets delivered prediction problem.

Variable	Acr.	Description/Units
Day of the Week	dw	Monday (1), Tuesday (2), Wednesday (3), Thursday (4), Friday (5), Saturday (6), Sunday (7)
Holiday	hd	No (0), Yes (1)
Temperature	T	$^{\circ}$ C
Rainfall	Ra	mm/24 h
packets Delivered	Pd	$N \cup {0}$

Table 5. Input and output variables for all datasets used for the evaluated ML regression methods. Acronyms of each variable are describe in Table 1, Table 2, Table 3 and Table 4. Brackets

[\cdot]

indicate the time lag of each variable, that is

x [n]

refers to the measure of the variable x at the current day,

x [n - 1]

at the previous day, etc.

Table 5. Input and output variables for all datasets used for the evaluated ML regression methods. Acronyms of each variable are describe in Table 1, Table 2, Table 3 and Table 4. Brackets

[\cdot]

indicate the time lag of each variable, that is

x [n]

refers to the measure of the variable x at the current day,

x [n - 1]

at the previous day, etc.

Dataset	Input Vars.	Output Var.
School Absences	td[n], td[n + 1], td[n − 1], td[n − 2], td[n − 3], td[n − 4], A[n − 1], A[n − 2], A[n − 3], A[n − 4], T[n − 1], Ra[n − 1]	A[n]
Bicycles	dw[n], dw[n − 1], dw[n − 2], dw[n − 3], dw[n − 4], hd[n], hd[n − 1], hd[n − 2], hd[n − 3], hd[n − 4], B[n − 1], B[n − 2], B[n − 3], B[n − 4], T[n], Ra[n]	B[n]
Parking	dw[n], dw[n −1], dw[n −2], dw[n −3], dw[n −4], hd[n], hd[n −1], hd[n −2], hd[n −3], hd[n −4], P[n −1], P[n −2], P[n −3], P[n −4], T[n], Ra[n]	P[n]
Packets	dw[n], dw[n − 1], dw[n − 2], dw[n − 3], dw[n − 4], hd[n], hd[n − 1], hd[n − 2], hd[n − 3], hd[n − 4], Pd[n − 1], Pd[n − 2], Pd[n − 3], Pd[n − 4], T[n − 1], T[n − 2], T[n − 3], T[n − 4], Ra[n − 1], Ra[n − 2], Ra[n − 3], Ra[n − 4]	Pd[n]

Table 6. Performance of the evaluated ML regression methods in the prediction problem on the school absences dataset.

Met. Var.	Methods	RMSE	MAE	R $^{2}$	tt(s)
No	ANA	7.26	5.28	0.20	0.15
	ELM	8.50	5.38	0.13	0.71
	SVR	7.33	4.92	0.20	151.38
	MLP	8.48	5.43	0.06	1.40
Yes	ANA	7.26	5.16	0.20	0.15
	ELM	8.20	5.36	0.16	0.77
	SVR	6.92	4.82	0.27	67.20
	MLP	7.58	5.20	0.17	1.44

Table 7. Performance of the evaluated ML methods in the bike-sharing demand prediction problem in Madrid.

Met. Var.	Methods	RMSE	MAE	R $^{2}$	tt(s)
No	ANA	1931	1322	0.70	0.31
	ELM	1595	1138	0.78	1.33
	SVR	1633	1219	0.77	31.08
	MLP	1560	1113	0.80	1.57
Yes	ANA	1804	1288	0.74	0.35
	ELM	1506	1092	0.81	1.37
	SVR	1552	1183	0.81	31.19
	MLP	1428	1076	0.83	1.35

Table 8. Performance of the evaluated ML methods in the prediction problem on San Sebastian parking occupancy dataset.

Met. Var.	Methods	RMSE	MAE	R $^{2}$	tt(s)
No	ANA	293.65	205.96	0.50	0.42
	ELM	147.07	104.88	0.85	1.87
	SVR	210.88	143.09	0.79	107.91
	MLP	155.32	113.02	0.87	1.54
Yes	ANA	321.84	231.95	0.44	0.42
	ELM	146.21	104.92	0.85	2.13
	SVR	153.88	112.08	0.87	>500
	MLP	144.52	106.96	0.87	1.47

Table 9. Performance of the evaluated ML methods in the packet delivery dataset.

Met. Var.	Methods	RMSE	MAE	R $^{2}$	tt(s)
No	ANA	244.85	146.77	0.28	0.10
	ELM	215.11	131.52	0.42	1.32
	SVR	206.06	127.73	0.49	43.42
	MLP	201.38	131.60	0.53	1.64
Yes	ANA	259.08	167.21	0.27	0.08
	ELM	200.12	132.43	0.48	1.16
	SVR	203.83	127.95	0.51	294.70
	MLP	188.45	127.60	0.56	1.59

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Torres-López, R.; Casillas-Pérez, D.; Pérez-Aracil, J.; Cornejo-Bueno, L.; Alexandre, E.; Salcedo-Sanz, S. Analysis of Machine Learning Approaches’ Performance in Prediction Problems with Human Activity Patterns. Mathematics 2022, 10, 2187. https://doi.org/10.3390/math10132187

AMA Style

Torres-López R, Casillas-Pérez D, Pérez-Aracil J, Cornejo-Bueno L, Alexandre E, Salcedo-Sanz S. Analysis of Machine Learning Approaches’ Performance in Prediction Problems with Human Activity Patterns. Mathematics. 2022; 10(13):2187. https://doi.org/10.3390/math10132187

Chicago/Turabian Style

Torres-López, Ricardo, David Casillas-Pérez, Jorge Pérez-Aracil, Laura Cornejo-Bueno, Enrique Alexandre, and Sancho Salcedo-Sanz. 2022. "Analysis of Machine Learning Approaches’ Performance in Prediction Problems with Human Activity Patterns" Mathematics 10, no. 13: 2187. https://doi.org/10.3390/math10132187

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of Machine Learning Approaches’ Performance in Prediction Problems with Human Activity Patterns

Abstract

1. Introduction

2. Timed Data with Human Activity Patterns

3. Methods: Machine Learning Regression Techniques

3.1. Regression Problems

3.1.1. Support Vector Regression

3.1.2. Multi-Layer Perceptron

3.1.3. Extreme Learning Machines

3.2. Analogue Method for Prediction

4. Experiments and Results

4.1. Regression Metrics

4.2. Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI