Assessment of Variability in Irregularly Sampled Time Series: Applications to Mental Healthcare

Bonilla-Escribano, Pablo; Ramírez, David; Porras-Segovia, Alejandro; Artés-Rodríguez, Antonio

doi:10.3390/math9010071

Open AccessArticle

Assessment of Variability in Irregularly Sampled Time Series: Applications to Mental Healthcare

by

Pablo Bonilla-Escribano

^1,*

,

David Ramírez

¹

,

Alejandro Porras-Segovia

²

and

Antonio Artés-Rodríguez

¹

Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Leganés, Spain and Gregorio Marañón Health Research Institute, 28911 Madrid, Spain

²

Department of Psychiatry, IIS Fundación Jiménez Díaz, 28040 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(1), 71; https://doi.org/10.3390/math9010071

Submission received: 7 December 2020 / Revised: 20 December 2020 / Accepted: 24 December 2020 / Published: 31 December 2020

(This article belongs to the Special Issue Recent Advances in Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

Variability is defined as the propensity at which a given signal is likely to change. There are many choices for measuring variability, and it is not generally known which ones offer better properties. This paper compares different variability metrics applied to irregularly (nonuniformly) sampled time series, which have important clinical applications, particularly in mental healthcare. Using both synthetic and real patient data, we identify the most robust and interpretable variability measures out of a set 21 candidates. Some of these candidates are also proposed in this work based on the absolute slopes of the time series. An additional synthetic data experiment shows that when the complete time series is unknown, as it happens with real data, a non-negligible bias that favors normalized and/or metrics based on the raw observations of the series appears. Therefore, only the results of the synthetic experiments, which have access to the full series, should be used to draw conclusions. Accordingly, the median absolute deviation of the absolute value of the successive slopes of the data is the best way of measuring variability for this kind of time series.

Keywords:

ecological momentary assessment (EMA); Hawkes process; irregularly sampled time series; variability

1. Introduction

Variability, also known as statistical dispersion, scatter, or spread, can be defined as the propensity at which a given signal is likely to change. The analysis of variability is of major importance in many fields, such as astrophysics [1,2], hydrology [3], agriculture [4], or ecology [5,6]. Measuring variability also has important applications in medicine. For example, it can be used to accurately measure the bioequivalence of doses of different drugs [7], predict medication demand in a hospital [8], or to measure heart rate variability, which in turn can be used as a biomarker of anxiety [9]. Variability is of particular usefulness in mental healthcare, as fluctuations in suicidal ideation have been identified as a phenotypic marker for stress-induced suicide risk [10]. In this paper, we focus on how to mathematically describe variability in nonuniformly sampled time series, which can be used in turn to alleviate problems in mental healthcare. It must be mentioned that there is a surge in the application of statistics and machine learning to solve other problems in medicine, such as classifying normal versus cancerous cells [11] or diagnosing mental disorders with electroencephalogram data [12]. For a discussion about how these methods should be incorporated in practice, refer to the work in [13].

A variability measure is an operator that transforms a time series into a single scalar, whose value hinges on the dispersion of the values of the sample. In order to make them easier to understand, many of those measures are expressed in the same units as the studied data and are non-negative, in such a way that a value of 0 indicates no change at all. Most of the measures commonly used in psychiatry have these characteristics. One possible way to achieve this is to summarize the expected distance with respect to a reference point [14]. For instance, if the reference point is the sample mean, and the expected distance is measured as the square root of the sum of the squared distances to the mean, divided by the sample size minus one, the dispersion will be expressed in terms of standard deviations. However, if the median and the median of the absolute distances are the selected criteria, the dispersion will be measured in terms of median absolute deviations, which can be more robust to outliers.

Another approach to express variability in the same units of the observations is to compute the difference of some sorted observations, thereby removing the need of a reference value. This can be useful when such a computation can get skewed by the inherent noise or other factors. In particular, if the difference is taken from the most extreme values, variability is measured in terms of range, but it is possible to compute it making use of the different percentiles of the dataset. In other cases, it may be convenient to measure variability independently of both the particular scale and the units, so that different datasets can be compared [15]. The simplest way to achieve this is to obtain ratios of variability measures, for example, the ratio of the standard deviation and the mean expresses variability as coefficients of variation [16].

The analysis of variability requires the definition of a variable of interest whose changes are computed. In psychiatry, this variable of interest is usually a symptom, which has traditionally been measured by questionnaires administered in clinical sites. This method of assessment presents several disadvantages, such as recall bias (difficulty to remember past events) [17] or ecological invalidity (by asking the questions outside the usual environment) [18]. To alleviate this, a research line has tried to obtain extemporaneous feedback from the participants by regularly requesting them to answer a set of questions via their electronic devices. This approach, which can be categorized as ecological momentary assessment (EMA) [19], suffers from fatigue effects. This consists in a decrease in the number of questions answered as time goes by, which can result in withdrawal from the study [20].

To decrease such fatigue, our research group has recently adopted a new sampling approach for the questions that consists in randomly asking some of them out of a fixed pool, thereby decreasing the sense of repetitiveness [21]. This approach results in studies with longer follow-up periods, at the expense of every set of responses for the same question being sampled non-equidistantly (nonuniformly). Because of this, special care must be taken when measuring variability when using this approach.

In this paper, we study and propose metrics which are suited for this kind of data. However, the findings of this research are not only useful for this specific type of design, as they are also applicable to any analysis in which a significant amount of missing data is produced, yielding a nonuniformly sampled time series. Missing data represent a particularly common issue in EMA studies and it is usually dealt with discarding data points that do not meet certain criteria, such as removing responses provided over 30 min after being prompted [22]. We expect that our findings will provide an accurate method of analyzing variability in non-equidistantly (nonuniformly) sampled time series, what will also allow to better cope with missing data in mental health time series.

The rest of the paper is organized as follows. Section 2 summarizes the main variability measures that have been used in the literature. As most of these measures have only been applied to equidistant, dense time series, in Section 3 we propose some modifications to those measures so that they can be applied to irregularly sampled data. In this section, we also present some novel methods of measuring variability that could also be advantageous for this task. Then, Section 4 analyzes these methods using synthetic data, while Section 5 makes use of real data to assess their robustness. To correctly interpret the results, Section 6 provides additional simulations. Finally, in Section 7 we summarize our findings, discuss their implications, and we point to next lines of research.

2. Review of Variability Assessment

In this section, we summarize the most important measures used for assessing variability in mental healthcare time series. One of the most popular measures is the root mean square of successive differences (RMSSD) [10], that is,

RMSSD = \sqrt{\frac{1}{T - 1} \sum_{t = 1}^{T - 1} {(x_{t + 1} - x_{t})}^{2}},

(1)

where T is the number of observations, and

x_{t}

is the time series value, for instance, the response of the patient recorded at (discrete) time t in the considered application. The square is taken so that increases and decreases are not mutually canceled. This measure assumes that changes in either direction count the same. Such an assumption is usually fulfilled in psychiatry research, as commonly used scales are sometimes measured so that higher scores indicate better health status, and sometimes so that higher scores indicate worse health status [23].

The RMSSD contains some information about the temporal evolution of the assessments, as it is only invariant to complete reversal and some pairwise random permutations of the observations. This measure, normalized by the mean or by the standard deviation of the sample, also known as coefficient of variation (CoV) and ratio of determination (RoD), respectively, has been explored in [24] to produce variability measurements less sensitive to the mean value of the observations. Finally, notice that the square root is taken to recover the natural units of the explored variable in the series, though some studies do not do it [25]. This choice is not important, since the square root is a monotone transformation: it is similar to working with standard deviations or variances.

Another relevant measure of variability is the standard deviation (SD) of the observations [26]. This measure has the advantage of providing an interpretable parametric meaning in the case that the observations are normally distributed. However, this method does not carry any information about the order of the data (any permutation yields the same SD), and this approach is not appropriate when the distribution of the data contains many outliers or it is highly nonsymmetric [27]. A related measure that has also been employed is the standard error of the mean (SEM) [28], which is given by the SD divided by the square root of the number of observations.

Although simple and nonlinear, the range of the observations, measured as the difference between the most extreme values, has also been used [29]. On the other hand, more refined measures, such as the entropy (H) or the Teager–Kaiser energy operator (TKEO) have also been explored [30]. The former comes from the Information Theory field and it is a measure of data uncertainty, but it is not sensitive to local variations. We shall estimate it as

H = - \sum_{\begin{matrix} {i = 1, . . ., N} \\ \ {i : \hat{p} (r_{i}) = 0} \end{matrix}} \hat{p} (r_{i}) \cdot {log}_{2} (\hat{p} (r_{i})),

(2)

where

R = {r_{1}, \dots, r_{N}}

is the set of N possible values in the time series and

\hat{p} (r_{t})

is the estimated probability of

r_{t}

, which is given by

\hat{p} (r_{t}) = \frac{# (r_{t} = {x_{t}}_{t = 1}^{T})}{T}

, with

# (\cdot)

being the count function. Notice that (2) is measured in bits, and the sum explicitly removes terms involving null probabilities, which otherwise would create numerical indeterminations. Indeed, it is judicious to do so, as one of the properties of the entropy is that incorporating or discarding events with zero probability does not alter its value. The TKEO is an approximation of the energy of a signal, which depends on its amplitude and frequency. As highly variable signals have high frequency components, larger TKEO values are associated with greater variability. This metric requires at least three data points to be computed, and it may produce negative values under some circumstances [31]. To avoid this, we will use the mean of the absolute value of the TKEO (MATKEO) as in [32], that is,

MATKEO = \frac{1}{T - 2} \sum_{t = 2}^{T - 1} |x_{t}^{2} - (x_{t - 1} \cdot x_{t + 1})| .

(3)

One final comment is in order. The variance terms of mixed-effect models have been used as an alternative approach to measure variability, either with raw [33] or with grand mean centered data [34]. However, we do not further analyze this model-based approach, as it is unfeasible to make fair comparisons of models tailored for specific settings, and these methods are somehow equivalent to computing the standard deviation of the data.

3. Variability Assessment of Irregularly Sampled Data

The variability measures presented above were not designed to cope with irregularly (nonuniformly) sampled time series. Therefore, in this section we propose some modifications to adapt them to such setting. First, notice that if (1) were directly applied to nonuniformly sampled data, its estimation would produce a non-desired result, as the same change in the scores, e.g., 20 units, would weight the same regardless of the time that it took such a change to happen, e.g., one day or one month. To correct this, we argue that the RMSSD in (1) can be understood as a finite-difference approximation of the square root of the mean of the squared first derivative, that is,

RMSSD = \sqrt{\frac{1}{T - 1} \sum_{t = 1}^{T - 1} {(x_{t + 1} - x_{t})}^{2}} \approx \sqrt{\frac{1}{t_{N} - t_{1}} \int_{t_{1}}^{t_{N}} {(\frac{d x_{t}}{d t})}^{2} d t},

(4)

where

t_{1}

and

t_{N}

are the times at which the first and the last observations are taken, respectively. Based on this observation, more refined approximations of the integral can be used, for instance,

\sqrt{\frac{1}{t_{N} - t_{1}} \int_{t_{1}}^{t_{N}} {(\frac{d x_{t}}{d t})}^{2} d t} \approx \sqrt{\frac{1}{t_{N} - t_{1}} \sum_{t = 1}^{T - 1} {(\frac{x_{t + 1} - x_{t}}{h_{t}})}^{2} h_{t}} = \sqrt{\frac{1}{t_{N} - t_{1}} \sum_{t = 1}^{T - 1} \frac{{(x_{t + 1} - x_{t})}^{2}}{h_{t}}},

(5)

which is a finite-difference approximation with (possibly) non-homogeneous step

h_{t}

between

x_{t + 1}

and

x_{t}

. Moreover, (5) reduces to (1) when

h_{t} = 1, \forall t = 1, \dots, T - 1

, in which case

\frac{t_{N} - t_{1}}{T - 1} = 1

as well. This measure, which we call the root mean square of successive slopes (RMSSS), is expressed in units of change in the data per unit of time. Building on this interpretation, we can propose measures that take into account the sampling interval, i.e., based on slopes. Specifically, we propose to work with the square root of the squared slopes of successive samples, that is, the absolute value. Then, we can use the measures presented in the previous section and apply them to absolute-value slopes (therefore, we shall append the subscript “as” to the abbreviations of those measures when there is risk of confusion) instead of the raw observations (the acronyms of which will end by the subscript “raw”).

We introduce other measures for summarizing the time series into a scalar, which have been useful in other fields. In particular, we shall explore the potential of (1) the interquartile range (IQR) [35], that is, the value of the 75th percentile minus that of the 25th one. (2) The Gini’s mean difference (GMD) [36], namely, the expected value of the absolute difference of every possible pair of the observed values,

GMD = \frac{1}{T^{2}} \sum_{t = 1}^{T} \sum_{t^{'} = 1}^{T} | x_{t} - x_{t^{'}} | .

(6)

Moreover, (3) the median absolute deviation (MAD) [37], given by

MAD = median ({\{| x_{t} - \tilde{x} |\}}_{t = 1}^{T}),

(7)

where

median (\cdot)

is the median operator and

\tilde{x} = median ({x_{t}}_{t = 1}^{T})

.

Notice that for clarity of presentation, (6) and (7) compute the corresponding measures for the raw scores, as the previous equations, but the variability measures can easily be assessed in terms of the absolute value of the slopes. For instance, the GMD of the absolute value of the slopes is given by

\frac{1}{{(T - 1)}^{2}} \sum_{t = 1}^{T - 1} \sum_{t^{'} = 1}^{T - 1} | z_{t} - z_{t^{'}} |,

(8)

where

z_{t} = | \frac{x_{t + 1} - x_{t}}{h_{t}} |

. For an schematic view of all the measures analyzed in this work, refer to Figure 1.

One final comment is in order. Overall, the complexity of the studied variability measures is comparable, since most measures are

O (T)

computationally complex, where

O (\cdot)

denotes the big O notation [38]. All the metrics based on the absolute value of the slopes requires the additional computation of the absolute value of the successive slopes, but this is an inexpensive operation, which requires a number of operations which increases linearly with the number of observations. On the other hand, the (exact) sorting of the observed values used to compute some metrics, like IQR, is

O (T log T)

computationally complex [39]. Further, normalized metrics (i.e., SEM, CoV, and RoD) need to iterate twice the time series, but this can be done parallel. Finally, notice that the complexity of computing H_raw also increases linearly as the number of possible values the time series can take on, N, is increased. The only metric that could be computationally more demanding is GMD, which has quadratic complexity in the number of observations, as can be seen in (6). However, for time series with few to moderate observations, the differences in time complexity are negligible in practice.

4. Comparison on Synthetic Data

In this section, we study several variability measures using synthetic data. To the best of our knowledge, there is no previous work that has simulated responses to mental health questionnaires to assess it. Therefore, we describe a possible way to achieve this, which is supported by the item response theory (IRT). The IRT states that the answers that a person provides to a questionnaire can be modeled by a transformation of a given latent variable [40], which we shall denote by

Ψ

. Without loss of generality, let

Ψ \in {ψ \in R : 0 \leq ψ \leq 100}

. For instance, if

Ψ

measures the participants’ mood, 100 may correspond with an objective, saturated feeling of elation, while 0 would be the worst possible feeling of sorrow and any value in between them represents an (objective) feeling proportional to the distance to the extremes. In our experiments, we set the possible responses to

R = {x \in Z : 0 \leq x \leq 100}

. In doing so, we account for the measurement error given by the precision of the scale. However, it is important to model other sources of noise, such as respondent inconsistency [41]. Indeed, it would be very unlikely that a participant always reports the same value in the questionnaire for the same value of

Ψ

. Furthermore, different participants, all sharing the same latent value of

Ψ

, may provide higher or lower values in the questionnaire if they tend to be optimistic or pessimistic, respectively.

Defining t as time in days, we exemplify three different changes for the latent state over 60 days as follows,

Ψ_{t} = \frac{100}{60} t

(9)

Ψ_{t} = 20 cos (\frac{2 π t}{7}) + 50

(10)

Ψ_{t} = 40 cos (\frac{2 π t}{7}) + 50 .

(11)

Thus, (9) represents a slow linear drift towards the maximum value, while (10) and (11) model oscillations around the medium value with a periodicity of one week. Notice that (11) has double the amplitude of (10). These are prototypic types of behaviors that were found to be good examples of other more general cases which were analyzed but are not shown here for simplicity. Those other more general cases include a cosine with increasing amplitude or abrupt ground levels changes as time goes on, and a cubic function. In all those cases we arrived to the same conclusions.

Another limitation that should be considered is the temporal resolution. In our experiments, we restrict the observations to a maximum of one per day. Consequently, the first step is to obtain the days when the data will be acquired by sampling uniformly without replacement the set of 60 possible days. Then, using Equation (9), (10), or (11), the true value of the latent state is obtained. To account for the respondent inconsistency, we model the answer the participants would be willing to provide as a random realization of a normal distribution with mean

Ψ_{t}

and standard deviation

S D_{n}

, as normal distributions are well-suited for modeling responses to Likert questionnaires [42]. Finally, to account for the error produced by the precision of the scale, the observed data will be projected onto the set

R

.

In order to objectively assess the variability measures, we follow an approach similar to that of [43]. That is, we evaluate the measures in a favorable scenario, and then we obtain the performance degradation by computing the Cohen’s d between the observed values in the favorable scenario and those observed in unfavorable scenarios. In particular, we compute it as

Cohen ’ s d = \frac{{\bar{x}}_{r e f} - {\bar{x}}_{c o m p}}{\sqrt{\frac{(N_{r e f} - 1) \times {(S D_{r e f})}^{2} + (N_{c o m p} - 1) \times {(S D_{c o m p})}^{2}}{N_{r e f} + N_{c o m p} - 2}}},

(12)

where

{\bar{x}}_{r e f}

is the mean in the reference set, in other words, the mean of a given variability measure in the favorable scenario. Moreover,

{\bar{x}}_{c o m p}

represents the mean of the comparison set, i.e., the mean of the same variability measure in the unfavorable scenario, and

N_{r e f}

(conversely

N_{c o m p}

) is the number of observations in the reference (conversely, comparison) dataset. Finally,

S D_{r e f}

(conversely

S D_{c o m p}

) stands for the standard deviation in the reference (conversely, comparison) dataset.

We shall complicate the problem by reducing the number of observations and increasing the variance of the Gaussian function for the same

Ψ_{t}

. In the favorable scenario, we observe the 60 possible answers with very low respondent inconsistency, i.e.,

S D_{n} = 0.5

. Figure 2 shows one realization under such favorable settings for the three explored changes, and the mean values of the variability measures over 4000 realizations are presented in Figure 3a, Figure 4a and Figure 5a. Notice that in these figures, to facilitate the interpretation, the same colormap has been used, which spans the range of all the data in the three figures. Finally, the performance degradation is depicted in Figure 3b, Figure 4b and Figure 5b, also using a single colormap, which spans the full range of the data in these three new figures. To further ease the interpretation, Figure 3b, Figure 4b and Figure 5b show the absolute value of the Cohen’s d, as it is of little relevance whether the measure underestimates or overestimates the variability in the complicated scenario: for the robustness analysis the importance lies in the magnitude of the discrepancy. As previously explained, each row in these figures corresponds to a different value of

S D_{n}

and number of observations,

N_{c o m p}

, where this reduced number of observations is obtained by randomly sampling the 60 days without replacement.

As can be seen in Figure 3a, Figure 4a and Figure 5a, some measures are insensitive to the kind of change that

Ψ

describes. For instance, RoD_raw and H_raw provide very similar values for the two cosine functions, and COV_as and RoD_as cannot distinguish any of the three profiles. On the other hand, MAD_raw and GMD_raw are difficult to interpret, as they yield higher variability values for the slow linear drift than the periodic function, while the vast majority of the remaining measures experiment a nearly twofold increase when the amplitude of the cosine is multiplied by 2. Another issue regarding interpretability is that RoD_raw, RoD_as, and CoV_as are not well-defined when the data do not vary at all, because their computation would involve a division of 0 by 0 under such a situation. Similarly, CoV_raw is undetermined if all observed values are equal to zero due to the same problem. This was not observed during the simulations, as

Ψ

was studied under three different varying conditions and there was also noise.

With respect to the degradation in the performance as the samples become more scarce and less reliable, Figure 3b shows that measures based on the raw value of the points of the time series struggle in the simple linear drift: SEM_raw is the one with the highest absolute value of Cohen’s d for most combinations of

S D_{n}

and

N_{c o m p}

, followed by RoD_raw, RMSSD and CoV_raw (recall that a magnitude of Cohen’s d greater than

0.8

is considered a large difference [44]). In general, the rest of the variability measures based on the raw scores perform worse than those based on the absolute value of the slopes, with the notable exceptions of GMD_raw, which performs exceptionally well for low values of added noise, and SD_raw and IQR_raw, which offer competitive robustness. The rest of the measures based on the slopes, except for SEM_as, MATKEO_as, IQR_as, SD_as, and RMSSS, are robust to both added noise and missing observations. Figure 4b reveals that the worst performing measure is RoD_raw, followed by SEM_raw, when

Ψ

describes a cosine with amplitude 20. When we compare Figure 3b and Figure 4b we see that even if most of the raw-based variability measures perform better with the cosine profile than in the slow drift case (especially SEM_raw for low values of noise and missing observations) they are still surpassed by slope-based ones, even if RMSSS, CoV_as, RoD_as, and SEM_as are less reliable with the periodic profile than in the linear case. When analyzing the effect of doubling the amplitude of the cosine, i.e., comparing Figure 4b and Figure 5b, we observe that RMSSS, SEM_raw, and RMSSD are the measures whose performance is more affected.

When analyzing Figure 3b, Figure 4b and Figure 5b as a whole, we see how badly SEM_raw, RoD_raw, RMSSD, CoV_raw, and MATKEO_raw systematically perform in all the cases. RMSSD is one of the less robust measures for nonuniformly sampled observations, in spite of its wide applicability with uniform sampling. On the other hand, when looking at the most reliable ways of measuring variability, we notice that CoV_as and RoD_as offer averagely robustness for the linear case, although they fail to be robust enough for the cosine changes. Therefore, they should only be used if linear changes are expected. Similarly, IQR_as, H_raw, GMD_as, and SD_as only behave reliably if the underlying change is periodic. When the sign of the the Cohen’s d value is considered (recall that Figure 3b, Figure 4b and Figure 5b show its absolute value) it is observed that the measures tend to underestimate variation as the comparison set is distorted by removing observations and adding noise. When comparing the first, sixth, and eleventh rows of these figures, we can analyze the performance with the same number of observations (i.e., 25) but increasing noise levels. The raw-based variability measures are the ones more affected by noise, notably RMSSD in the linear case. In general, noise in the responses causes more distortion than missing observations, what suggests that a small, yet reliable dataset can be more informative than a larger, noisier one. Finally, only SD_raw, and both range_raw, MAD_raw and range_as and MAD_as are consistently subject to small degradations in the three cases, but MAD_as is the most stable one for all the cases. In light of these findings, we conclude that the MAD_as is the most reasonable measure to assess variability in irregularly sampled time series.

5. Comparison on Real Data

The previous section has shown the performance of the measures under three different synthetic scenarios. As the underlying data-generating process was known, it was possible to objectively characterize the robustness of each measure by computing the Cohen’s d of the variability scores obtained from the complete time series used as a reference and a sampled version thereof. To complement that analysis, we now perform a similar experiment upon real data. However, the best reference that can be used with real data, namely, all the available responses of a participant, is already a sample of the complete time series. Therefore, we compute the variability measures for all the responses of all participants and use them as the reference sets. Then, up to 1000 degenerated time series are obtained by removing a given percentage of the observations in the reference sequence selecting the observations to discard uniformly without replacement at random, and the measures are computed thereon to obtain the comparison set. Finally, the Cohen’s d is computed between the sets in order to assess the robustness of each variability measure.

The dataset contains the responses from the 198 participants who rated at least four times (so that the most restrictive metric, MATKEO_as, can be computed) each of the following items: (a) “Today I feel the wish to live” and (b) “Today I feel tired during the day because of my sleep problems.” For rating the items, participants used a slider with integer precision on scale that goes from 0 to 100. The mean ±SD sequence length of the above-mentioned items among all participants are

16.79 \pm 11.00

and

12.01 \pm 8.01

, respectively. The mean distance ±SD among subjects is

7.85 \pm 9.66

and

11.17 \pm 12.59

days, respectively. The results are shown in Figure 6, where the absolute value of the Cohen’s d has been taken so that the results can be more easily shown on a semi-logarithmic plot. The sample is composed of patients with a history of suicidal thoughts or behaviors, and all of them gave their informed consent.

Overall, both Figure 6a,b exhibit a similar behavior, suggesting that the ensuing conclusions are not limited to a specific kind of questions, as the same pattern is observed for different topics (wish to live and sleep problems). Indeed, taking into account the error bars, there is not a significant loss in performance as more data are ignored. Other peculiarity of Figure 6 is that the Cohen’s d values are significantly smaller than those of Figure 3b, Figure 4b and Figure 5b. Furthermore, according to Figure 6, GMD_raw is the most reliable measure, clearly outperforming the rest. Equation (6) shows that such a variability measure is very reliable to simple loss of observations (without adding noise). Indeed, the synthetic data experiments also showcased very small absolute values of Cohen’s d for low levels of added noise. Finally, when analyzing in more detail Figure 6, we see that SD_raw, MATKEO_raw CoV_raw, and RMSSD and SEM_as behave robustly.

6. Additional Synthetic Data Experiment

The two previous sections have shown that there is a disagreement between the results of the synthetic data experiments and those of the real data. On one hand, synthetic experiments show that the assessment of variability becomes harder as more data are lost, and that one of the ways of obtaining robust measures is to make use of the distance among the observations by considering the absolute value of the slopes of the consecutive data points in the time series. On the other hand, results with real data depict optimistic reliability for every measure of variability, regardless of the percentage of missing observations, and they favor simple measures that are based on raw values. This discrepancy warrants an additional synthetic data analysis to ascertain if the fact that the reference used in the real data experiments is itself a sample of the complete time series can bias the results. For this purpose, let us consider the following experiment. First, the given set of days that will be used as a reference out of the 60 possible days is sampled according to a Hawkes process whose (conditional) intensity function is

λ^{*} (t) = μ + α \sum_{t_{i} < t} e^{- (t - t_{i})} .

(13)

The intensity function measures the number of times an observation is acquired per unit of time [45]. In (13),

μ

is a parameter expressing the basal intensity and

α

controls the increase of the probability of observing additional data points after the observations have been acquired. That is,

α

determines the burstiness of the locations of the data points in the sequence: low values of

α

produce more evenly spaced time series, while high values yield points that tend to follow each other forming clusters. In this way, it is also possible to study if the relative location of the points used in the reference sequence has any influence on the results. We take this cluster-producing sampling, and the simplest way of modeling it, i.e., the Hawkes process, as a relevant example of possible kinds of structures in the location of the data points in the reference set that could induce a bias.

Second, (11) is used to simulate the time series on the sampled instances obtained using the Hawkes process in (13), yielding

N_{r e f}

observations from which the variability measures will be evaluated, thereby obtaining the reference set. Third, 70% of those

N_{r e f}

observations are removed by means of uniform sampling without replacement, and the variability measures are computed in order to get the comparison set. Last, the process is repeated, and the Cohen’s d is obtained. Figure 7 shows those values of the Cohen’s d (keeping the sign) for 4000 repetitions and different values of

μ

and

α

, which, in turn, determines the mean number of observations in the reference set, explicitly denoted by “mean(

N_{r e f}

)”, for clarity. As can be seen, the magnitude of the Cohen’s d in Figure 7 lies within the same range of those of the real data analysis (Figure 6), which are significantly smaller than those of the first synthetic data experiments (Figure 3b, Figure 4b and Figure 5b). Therefore, the low density of observations in the reference set causes a bias towards low values of Cohen’s d. This stems from the fact that the lower the number of observations for the same time series, the lower are the chances of observing changes, in such a way that the comparison set becomes more and more similar to the reference one. In fact, the large majority of the variability measures attain the best results for mean sample densities as low as

(16.5 \times 100) / 60 = 27.5

%, as depicted in the bottom row of Figure 7.

However, such a bias does not equally affect all the variability measures. Indeed, the value of Cohen’s d of the measures based on the raw answers decreases faster to 0 than those based on the absolute value of the slopes, as the reference time series has fewer observations, particularly for simpler methods such as RMSSD, CoV_raw, RoD_raw, and SEM_raw. Nonetheless, MAD_as, GMA_as, IQR_as, and SE_as, all of which performed averagely to poorly in the real data experiments, have a slight bias that increases the value of the Cohen’s d as the reference losses observations. This can be explained by the fact that the absolute value of the slopes lies on a larger range of values than that of the raw observations, so the measures based on the absolute slopes can still capture some variability even when the reference observations are scant. Therefore, when the observations are removed in the comparison set, part of such variability is lost, and this discrepancy is shown in the Cohen’s d between the two sets. On the other hand, normalization contributes to increase the bias towards low Cohen’s d values. This explains why SEM_as, RoD_as, and Co_as, which are based on the absolute slopes, suffer from it. In addition, SEM_raw, RoD_raw, and Co_raw are the measures that decrease faster to zero as the reference becomes lighter. Therefore, the division by an already biased normalizing factor, such as the SD, amplifies the bias in variability measures.

With regard to the possible influence of the relative location of the observations used as a reference, Figure 7 allows us to compare different combinations of

μ

and

α

, which have roughly the same number of mean observations in the reference set. In particular, it is possible to compare: (1) the 2nd and 5th rows to observe an increase from

α = 0.5

to

α = 0.7

with

mean (N_{r e f}) \approx 45

; (2) the 8th and 10th rows to see what happens when the burstiness is increased from

α = 0.1

to

α = 0.5

with

mean (N_{r e f}) \approx 32

; and (3) the 11th and 13th rows to study the case wherein

α = 0.3

and

α = 0.7

with

mean (N_{r e f}) \approx 28

. When the differences among those rows is inspected, the values are very low and no consistent pattern is found, suggesting the density of the observations used as a benchmark is what determines the bias induced when the reference time series is incomplete.

In summary, this additional synthetic data experiment shows that using incomplete time series as references induces non-negligible bias, reducing the differences between the reference and comparison sets for simple variability measures based on raw observations. Therefore, the real data results from Section 5, which have very low density of observations in the benchmark (<15%), should not be considered for analyzing the robustness of the variability measures. Instead, the synthetic data experiments from Section 4, which have access to the complete sequence, are the ones that should be taken into account.

7. Conclusions and Future Work

In this paper, we have summarized the main variability measures used in the literature, with special emphasis in mental healthcare time series. Datasets in this kind of studies have a strong tendency to be nonuniformly sampled regardless of the experimental design; even studies that aim for data uniformity end up being nonuniform due to the large amount of missing data. We have explored the ability of different commonly used variability measures to address this issue, and we have proposed and studied the potential of other variability indicators not explored yet to cope with this inimical situation. We compared a total 21 candidates, exploring their robustness and interpretability using synthetic data. Thanks to the known profiles used in the synthetic data, some variability measures, such as MAD_raw and GMD_raw, have been identified as difficult to interpret, as they provided higher values for profiles that intuitively are less variable. On the other hand, these experiments identified measures based on the absolute value of the slopes of consecutive observations as the most robust ones. In particular, MAD_as was the best candidate.

However, we reached different conclusions in the experiments performed upon real data, as they showed that the most robust measures are the ones that are normalized and/or based on the raw values, such as RMSSD. To shed some light on this, an additional synthetic data experiment showed that low number of observations used in a benchmark (such as the one of the real data analysis) induces a non-negligible bias that favors the variability measures that are normalized and/or based on the raw value of the series. Therefore, only the results of the real data experiments should be used to determine which is the most reliable measure, as the full time series is used as a reference. We can therefore conclude that the best way to assess variability is computing the MAD_as, namely, the median absolute deviation of the absolute value of the successive slopes of the data in the time series. Finally, it would be instructive to conduct additional analyses on real and dense time series, as data density would decrease the risk of bias.

Author Contributions

Conceptualization, P.B.-E., D.R., and A.A.-R.; methodology, P.B.-E.; software, P.B.-E.; validation, D.R. and A.A.-R.; formal analysis, P.B.-E., D.R. and A.A.-R.; investigation, P.B.-E.; resources, A.A.-R.; data curation, P.B.-E.; writing—original draft preparation, P.B.-E.; writing—review and editing, P.B.-E., D.R., and A.P.-S.; visualization, P.B.-E.; supervision, D.R. and A.A.-R.; project administration, A.A.-R.; funding acquisition, A.A.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministerio de Ciencia, Innovación y Universidades under grant TEC2017-92552-EXP (aMBITION), by the Ministerio de Ciencia, Innovación y Universidades, jointly with the European Commission (ERDF), under grants TEC2017-86921-C2-2-R (CAIMAN) and RTI2018-099655-BI00 (CLARA), by the Comunidad de Madrid under grant Y2018/TCS-4705 (PRACTICO-CM), and by Fundación BBVA under project Deep-DARWIN.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of University Hospital Fundación Jiménez Díaz on the 25 June 2017 under IRB approval LSRG-1-005-16.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data are not publicly available due to ethical and privacy restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

(·)_as	(subscript indicating) computed upon the absolute value of the slopes
(·)_n	(subscript indicating) of the noise
(·)_raw	(subscript indicating) computed upon the (raw) scores
CoV	Coefficient of variation
EMA	Ecological momentary assessment
GMD	Gini’s mean difference
H	Entropy
IQR	Interquartile range
IRT	Item response theory
MAD	Median absolute deviation
MATKEO	Mean of the absolute value of the TKEO
RMSSD	Root mean square of successive differences
RMSSS	Root mean square of successive slopes
RoD	Ratio of determination
SD	Standard deviation
SEM	Standard error of the mean
TKEO	Teager–Kaiser energy operator

References

Takata, T.; Mukuta, Y.; Mizumoto, Y. Modeling the variability of active galactic nuclei by an infinite mixture of Ornstein–Uhlenbeck (OU) processes. Astrophys. J. 2018, 869, 178. [Google Scholar] [CrossRef] [Green Version]
Heil, L.; Uttley, P.; Klein-Wolt, M. Power colours: Simple X-ray binary variability comparison. Mon. Not. R. Astron. Soc. 2015, 448, 3339–3347. [Google Scholar] [CrossRef] [Green Version]
Bürger, G.; Chen, Y. Regression-based downscaling of spatial variability for hydrologic applications. J. Hydrol. 2005, 311, 299–317. [Google Scholar] [CrossRef]
Armand, R.; Bockstaller, C.; Auzet, A.V.; Van Dijk, P. Runoff generation related to intra-field soil surface characteristics variability: Application to conservation tillage context. Soil Tillage Res. 2009, 102, 27–37. [Google Scholar] [CrossRef]
Köhnke, M.C.; Malchow, H. Impact of parameter variability and environmental noise on the Klausmeier model of vegetation pattern formation. Mathematics 2017, 5, 69. [Google Scholar] [CrossRef] [Green Version]
Morris, A.; Börger, L.; Crooks, E. Individual variability in dispersal and invasion speed. Mathematics 2019, 7, 795. [Google Scholar] [CrossRef] [Green Version]
Jiang, W.; Makhlouf, F.; Schuirmann, D.J.; Zhang, X.; Zheng, N.; Conner, D.; Lawrence, X.Y.; Lionberger, R. A bioequivalence approach for generic narrow therapeutic index drugs: Evaluation of the reference-scaled approach and variability comparison criterion. AAPS J. 2015, 17, 891–901. [Google Scholar] [CrossRef]
Silva-Aravena, F.; Ceballos-Fuentealba, I.; Álvarez-Miranda, E. Inventory management at a Chilean hospital pharmacy: Case study of a dynamic decision-aid tool. Mathematics 2020, 8, 1962. [Google Scholar] [CrossRef]
Kim, H.G.; Cheon, E.J.; Bai, D.S.; Lee, Y.H.; Koo, B.H. Stress and heart rate variability: A meta-analysis and review of the literature. Psychiatry Investig. 2018, 15, 235. [Google Scholar] [CrossRef] [Green Version]
Oquendo, M.A.; Galfalvy, H.C.; Choo, T.H.; Kandlur, R.; Burke, A.K.; Sublette, M.E.; Miller, J.M.; Mann, J.J.; Stanley, B.H. Highly variable suicidal ideation: A phenotypic marker for stress induced suicide risk. Mol. Psychiatry 2020, 1–8. [Google Scholar] [CrossRef]
Hajiramezanali, E.; Imani, M.; Braga-Neto, U.; Qian, X.; Dougherty, E.R. Scalable optimal Bayesian classification of single-cell trajectories under regulatory model uncertainty. BMC Genom. 2019, 20, 1–11. [Google Scholar] [CrossRef] [Green Version]
Gore, E.; Rathi, S. Surveying machine learning algorithms on EEG signals data for mental health assessment. In Proceedings of the 2019 IEEE Pune Section International Conference (PuneCon), Pune, India, 18–20 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Wiens, J.; Saria, S.; Sendak, M.; Ghassemi, M.; Liu, V.X.; Doshi-Velez, F.; Jung, K.; Heller, K.; Kale, D.; Saeed, M.; et al. Do no harm: A roadmap for responsible machine learning for health care. Nat. Med. 2019, 25, 1337–1340. [Google Scholar] [CrossRef]
Siegel, A.F. Variability: Dealing with diversity. In Practical Business Statistics; Academic Press: Cambridge, MA, USA, 2012; pp. 95–121, Chapter 5. [Google Scholar]
Peña, D.; Linde, A. Dimensionless measures of variability and dependence for multivariate continuous distributions. Commun. Stat. Theory Methods 2007, 36, 1845–1854. [Google Scholar] [CrossRef]
Teoh, W.L.; Khoo, M.B.; Castagliola, P.; Yeong, W.C.; Teh, S.Y. Run-sum control charts for monitoring the coefficient of variation. Eur. J. Oper. Res. 2017, 257, 144–158. [Google Scholar] [CrossRef]
Boschloo, L.; Nolen, W.A.; Spijker, A.T.; Hoencamp, E.; Kupka, R.; Penninx, B.W.; Schoevers, R.A. The Mood Disorder Questionnaire (MDQ) for detecting (hypo) manic episodes: Its validity and impact of recall bias. J. Affect. Disord. 2013, 151, 203–208. [Google Scholar] [CrossRef]
Davidson, C.L.; Anestis, M.D.; Gutierrez, P.M. Ecological momentary assessment is a neglected methodology in suicidology. Arch. Suicide Res. 2017, 21, 1–11. [Google Scholar] [CrossRef]
Kim, J.; Nakamura, T.; Kikuchi, H.; Sasaki, T.; Yamamoto, Y. Co-variation of depressive mood and locomotor dynamics evaluated by ecological momentary assessment in healthy humans. PLoS ONE 2013, 8, e74979. [Google Scholar] [CrossRef] [Green Version]
Moitra, E.; Gaudiano, B.A.; Davis, C.H.; Ben-Zeev, D. Feasibility and acceptability of post-hospitalization ecological momentary assessment in patients with psychotic-spectrum disorders. Compr. Psychiatry 2017, 74, 204–213. [Google Scholar] [CrossRef] [Green Version]
Porras-Segovia, A.; Molina-Madueño, R.M.; Berrouiguet, S.; López-Castroman, J.; Barrigón, M.L.; Pérez-Rodríguez, M.S.; Marco, J.H.; Díaz-Oliván, I.; de León, S.; Courtet, P.; et al. Smartphone-based ecological momentary assessment (EMA) in psychiatric patients and student controls: A real-world feasibility study. J. Affect. Disord. 2020, 274, 733–741. [Google Scholar] [CrossRef]
Timmer, B.H.; Hickson, L.; Launer, S. Ecological momentary assessment: Feasibility, construct validity, and future applications. Am. J. Audiol. 2017, 26, 436–442. [Google Scholar] [CrossRef]
Hartley, J. Some thoughts on Likert-type scales. Int. J. Clin. Health Psychol. 2014, 14, 83–86. [Google Scholar] [CrossRef] [Green Version]
Choo, T.H.; Oquendo, M.A.; Stanley, B.; Galfalvy, H. RMSSD-based variability measures for suicidal ideation from EMA data. Biol. Psychiatry 2020, 87, S214. [Google Scholar] [CrossRef]
Babinski, D.E.; Welkie, J. Feasibility of ecological momentary assessment of negative emotion in girls with ADHD: A pilot study. Psychol. Rep. 2020, 123, 1027–1043. [Google Scholar] [CrossRef]
Shiyko, M.P.; Siembor, B.; Greene, P.B.; Smyth, J.; Burkhalter, J.E. Intra-individual study of mindfulness: Ecological momentary perspective in post-surgical lung cancer patients. J. Behav. Med. 2019, 42, 102–110. [Google Scholar] [CrossRef]
Schindler, T.M. Variability, range, interquartile range, and standard deviation. Am. Med. Writ. Assoc. AMWA J. 2015, 30, 132. [Google Scholar]
Bowen, R.; Baetz, M.; Hawkes, J.; Bowen, A. Mood variability in anxiety disorders. J. Affect. Disord. 2006, 91, 165–170. [Google Scholar] [CrossRef]
Black, A.C.; Cooney, N.L.; Justice, A.C.; Fiellin, L.E.; Pietrzak, R.H.; Lazar, C.M.; Rosen, M.I. Momentary assessment of PTSD symptoms and sexual risk behavior in male OEF/OIF/OND Veterans. J. Affect. Disord. 2016, 190, 424–428. [Google Scholar] [CrossRef] [Green Version]
Tsanas, A.; Saunders, K.; Bilderbeck, A.; Palmius, N.; Osipov, M.; Clifford, G.; Goodwin, G.; De Vos, M. Daily longitudinal self-monitoring of mood variability in bipolar disorder and borderline personality disorder. J. Affect. Disord. 2016, 205, 225–233. [Google Scholar] [CrossRef] [Green Version]
Bovik, A.C.; Maragos, P. Conditions for positivity of an energy operator. IEEE Trans. Signal Process. 1994, 42, 469–471. [Google Scholar] [CrossRef]
Jabloun, F.; Cetin, A.E.; Erzin, E. Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process. Lett. 1999, 6, 259–261. [Google Scholar] [CrossRef] [Green Version]
Hedeker, D.; Mermelstein, R.J.; Berbaum, M.L.; Campbell, R.T. Modeling mood variation associated with smoking: An application of a heterogeneous mixed-effects model for analysis of ecological momentary assessment (EMA) data. Addiction 2009, 104, 297–307. [Google Scholar] [CrossRef] [Green Version]
Cox, R.C.; Sterba, S.K.; Cole, D.A.; Upender, R.P.; Olatunji, B.O. Time of day effects on the relationship between daily sleep and anxiety: An ecological momentary assessment approach. Behav. Res. Ther. 2018, 111, 44–51. [Google Scholar] [CrossRef]
Forejt, M.; Brázdová, Z.D.; Novák, J.; Zlámal, F.; Forbelská, M.; Bienert, P.; Mořkovská, P.; Zavřelová, M.; Pohořalá, A.; Jurášková, M.; et al. Higher energy intake variability as predisposition to obesity: Novel approach using interquartile range. Cent. Eur. J. Public Health 2017, 25, 321–325. [Google Scholar] [CrossRef]
Yitzhaki, S. Gini’s mean difference: A superior measure of variability for non-normal distributions. Metron 2003, 61, 285–316. [Google Scholar]
Leys, C.; Ley, C.; Klein, O.; Bernard, P.; Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 2013, 49, 764–766. [Google Scholar] [CrossRef] [Green Version]
Knuth, D.E. The Art of Computer Programming: Fundamental Algorithms, 3rd ed.; Addison Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1997; Volume 1. [Google Scholar]
Chen, Z.; Zhang, A. A survey of approximate quantile computation on large-scale data. IEEE Access 2020, 8, 34585–34597. [Google Scholar] [CrossRef]
Allik, J. A mixed-binomial model for Likert-type personality measures. Front. Psychol. 2014, 5, 371. [Google Scholar] [CrossRef] [Green Version]
Biderman, M.D.; Reddock, C.M. The relationship of scale reliability and validity to respondent inconsistency. Personal. Individ. Differ. 2012, 52, 647–651. [Google Scholar] [CrossRef]
DeWees, T.A.; Mazza, G.L.; Golafshar, M.A.; Dueck, A.C. Investigation into the effects of using normal distribution theory methodology for Likert scale patient-reported outcome data from varying underlying distributions including floor/ceiling effects. Value Health 2020, 23, 625–631. [Google Scholar] [CrossRef]
Munoz, M.L.; van Roon, A.; Riese, H.; Thio, C.; Oostenbroek, E.; Westrik, I.; de Geus, E.J.C.; Gansevoort, R.; Lefrandt, J.; Nolte, I.M.; et al. Validity of (ultra-) short recordings for heart rate variability measurements. PLoS ONE 2015, 10, e0138921. [Google Scholar] [CrossRef] [Green Version]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Academic Press: Cambridge, MA, USA, 2013. [Google Scholar]
Daley, D.J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods; Springer: New York, NY, USA, 2003. [Google Scholar]

Figure 1. Block diagram of the analyzed variability measures. The ellipsis is used to prevent repetition of the blocks for the metrics based on the absolute value of the successive slopes. The meaning of all quantities can be found in the table of abbreviations.

Figure 2. One random realization in the favorable scenario. (a) Linear function. (b) Cosine function with amplitude 20. (c) Cosine function with amplitude 40.

Figure 3. Results for 4000 realizations with a linear function. The meaning of all quantities can be found in the table of abbreviations. (a) Mean values in the favorable scenario. (b) Performance degradation.

Figure 4. Results for 4000 realizations with a cosine function with amplitude 20. The meaning of all quantities can be found in the table of abbreviations. (a) Mean values in the favorable scenario. (b) Performance degradation.

Figure 5. Results for 4000 realizations with a cosine function with amplitude 40. The meaning of all quantities can be found in the table of abbreviations. (a) Mean values in a favorable scenario. (b) Performance degradation.

Figure 6. Performance degradation in real data for questions: (a) “Today I feel the wish to live.” (b) “Today I feel tired during the day because of my sleep problems.”

Figure 7. Results for 4000 reference subsequences obtained sampling observations with a Hawkes process of a cosine of amplitude 40, and then removing 70% of the observations. The colormap spans the full range. The meaning of all quantities can be found in the table of abbreviations.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bonilla-Escribano, P.; Ramírez, D.; Porras-Segovia, A.; Artés-Rodríguez, A. Assessment of Variability in Irregularly Sampled Time Series: Applications to Mental Healthcare. Mathematics 2021, 9, 71. https://doi.org/10.3390/math9010071

AMA Style

Bonilla-Escribano P, Ramírez D, Porras-Segovia A, Artés-Rodríguez A. Assessment of Variability in Irregularly Sampled Time Series: Applications to Mental Healthcare. Mathematics. 2021; 9(1):71. https://doi.org/10.3390/math9010071

Chicago/Turabian Style

Bonilla-Escribano, Pablo, David Ramírez, Alejandro Porras-Segovia, and Antonio Artés-Rodríguez. 2021. "Assessment of Variability in Irregularly Sampled Time Series: Applications to Mental Healthcare" Mathematics 9, no. 1: 71. https://doi.org/10.3390/math9010071

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Assessment of Variability in Irregularly Sampled Time Series: Applications to Mental Healthcare

Abstract

1. Introduction

2. Review of Variability Assessment

3. Variability Assessment of Irregularly Sampled Data

4. Comparison on Synthetic Data

5. Comparison on Real Data

6. Additional Synthetic Data Experiment

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI