A Three-Stage Nonparametric Kernel-Based Time Series Model Based on Fuzzy Data

Hesamian, Gholamreza; Johannssen, Arne; Chukhrova, Nataliya

doi:10.3390/math11132800

Open AccessArticle

A Three-Stage Nonparametric Kernel-Based Time Series Model Based on Fuzzy Data

by

Gholamreza Hesamian

¹

,

Arne Johannssen

^2,*

and

Nataliya Chukhrova

³

¹

Department of Statistics, Payame Noor University, Tehran 19395-3697, Iran

²

Faculty of Business Administration, University of Hamburg, 20146 Hamburg, Germany

³

HafenCity University of Hamburg, 20457 Hamburg, Germany

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(13), 2800; https://doi.org/10.3390/math11132800

Submission received: 27 May 2023 / Revised: 13 June 2023 / Accepted: 19 June 2023 / Published: 21 June 2023

(This article belongs to the Special Issue Mathematical Data Science with Applications in Business, Industry, and Medicine)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, a nonlinear time series model is developed for the case when the underlying time series data are reported by

L R

fuzzy numbers. To this end, we present a three-stage nonparametric kernel-based estimation procedure for the center as well as the left and right spreads of the unknown nonlinear fuzzy smooth function. In each stage, the nonparametric Nadaraya–Watson estimator is used to evaluate the center and the spreads of the fuzzy smooth function. A hybrid algorithm is proposed to estimate the unknown optimal bandwidths and autoregressive order simultaneously. Various goodness-of-fit measures are utilized for performance assessment of the fuzzy nonlinear kernel-based time series model and for comparative analysis. The practical applicability and superiority of the novel approach in comparison with further fuzzy time series models are demonstrated via a simulation study and some real-life applications.

Keywords:

fuzzy regression; fuzzy time series model; nonparametric time series analysis; time series analysis

MSC:

03E72; 37M10; 62A86

1. Introduction

The field of time series analysis comprises methods used to analyze the characteristics of a response variable with respect to time. It takes into consideration the fact that observations made over time may have an internal structure (such as autocorrelations, trends, seasonal and/or cyclic variations) that should be accounted for. The main aims of time series analysis are as follows:

Trend analysis: to identify the underlying pattern or trend in the data over time, such as an upward or downward trend.
Seasonality analysis: to identify if the data exhibit a repeating pattern over a set period, such as daily, weekly, or yearly.
Forecasting: to forecast future values using historical data.
Anomaly detection: to identify any unusual or unexpected observations in the data that deviate from the normal pattern.
Model selection: to choose an appropriate model to represent the underlying relationships between variables in the data.
Noise reduction: to remove any unwanted variability or random fluctuations from the data to improve the accuracy of predictions and make the underlying patterns more clear.

These aims can inform decision makers, provide insight into the underlying patterns and relationships in the data, and support the development of data-driven strategies in various fields such as economics, engineering, finance, and more (see, e.g., [1,2,3,4,5,6,7,8]).

Common time series models rely on exact observations and ensure crisp predictions. However, due to various uncertainty factors, it is sometimes preferable to make predictions using imprecise values. For instance, we usually observe imprecise observations in carbon emissions, social benefits and oil reserves, among others [9]. Traditional statistical time series models fail to address prediction problems based on ambiguous or vague information represented by fuzzy data. This shortcoming can be overcome by time series models that use techniques of fuzzy statistics. In general, fuzzy statistics is a branch of statistics that deals with uncertainty and imprecision, e.g., in the data. It includes, for instance, the fields of fuzzy estimation, fuzzy regression, fuzzy clustering, and fuzzy hypothesis testing [10,11,12].

Fuzzy time series models were originally introduced in 1993 [13], and since then they have replaced conventional (crisp) time series approaches when observations are uncertain. When considering fuzzy time series models, the prediction of future values requires three principal steps. In step 1, the exact data are reported. In step 2, through the identification of fuzzy logical relations [14,15], the predictions are transformed into fuzzy quantities. Finally, step 3 provides a defuzzification approach [16,17,18,19,20,21,22] to transform the fuzzy values into crisp ones. The techniques used to identify fuzzy logical relations in step 2 primarily involve fuzzy logical relation groups and matrices [13,23,24,25,26,27,28,29,30,31,32,33,34], soft computing methods [35,36,37,38,39,40,41,42,43,44], and statistical approaches in interaction with fuzzy logic [21,24,45,46,47]. Step 2 is an essential part of the predictive power of the presented model. Fuzzy time series models that rely on imprecise observations have attracted substantial attention in recent years, mainly due to their high applicability to real-life problems.

In fact, a lot of researchers have focused on time series models using imprecise observations. The soft computing techniques employed in this framework are mostly combinations of artificial neural networks, evolutionary algorithms, fuzzy and rough sets. These approaches are widely used for crisp or fuzzy forecasts based on crisp past observations such as electricity load, stock index prices and temperature (for a review of these techniques, we refer to [48,49,50,51,52,53,54,55,56]). In addition, various methods combine techniques of time series and fuzzy regression analysis [57]. For some recent advances in fuzzy regression analysis, see [58,59,60,61,62,63].

The reliability of forecasting methods generally requires exact observations in the sample. But there is often only vague information that is given in terms of imprecise quantities. Moreover, there are various real-world problems related to biological, economic, environmental, medical and sociological data where we face inaccurate instead of accurate data. In many real-life applications, e.g., monthly Co

_{2}

emission, annual sea surface temperature or the water level of a lake, conventional observations are often reported as mean values. In such cases, the data obtained are not sufficient informative since some information contained in the range of the data is neglected. To overcome this shortcoming, one alternative would be to report such kind of data as interval valued (comparable to conventional confidence intervals). However, a potential shortcoming of interval-valued data is the fact that all values within the interval have the same importance. To avoid this issue of interval-valued data, the reported data can alternatively be represented with help of fuzzy numbers [64]. These fuzzy quantities can be modeled via experts opinion, or as simple alternative, they can be constructed via a method proposed by Buckley [65]. In this approach, conventional confidence intervals are employed to construct fuzzy numbers around the conventional mean values.

In addition to the abovementioned methods, there are also fuzzy time series models that rely on fuzzy data, but comparatively few overall. In this regard, Hesamian and Akbari [66] first suggested a fuzzy semi-parametric time series model (FSPTSM) based on fuzzy data, non-fuzzy coefficients, and fuzzy smooth functions. Secondly, Zarei et al. [67] used a specific variant of the FSPTSM [66] for triangular fuzzy data and different distance measures for fuzzy data. And thirdly, Hesamian et al. [68] introduced a forward additive time series model (FATSM) for fuzzy observations.

In this paper we develop a fuzzy nonparametric time series model (FNPTSM) for fuzzy observations that is inspired by nonparametric regression models and kernel smoothing methods [57]. As an initial idea, note that in nonparametric regression analysis, the Nadaraya-Watson estimator [69,70] is fairly common. Now, let us consider the issue of parameter estimation in the nonlinear regression model

x_{t} = f (x_{t - 1}, x_{t - 2}, \dots, x_{t - p}) + ϵ_{t}

with

f : R^{p} \to R

. Based on this general model, a simple nonparametric way of estimating the function f is to employ the kernel-based Nadaraya–Watson estimator

\begin{matrix} \hat{f} (x_{t}) = \sum_{j = p + 1}^{T^{*}} w^{h} (t, j) x_{j} \end{matrix}

(1)

with

\begin{matrix} w^{h} (t, j) = \frac{\sum_{i = 1}^{p} K (\frac{x_{t - i} - x_{j - i}}{h})}{\sum_{j = p + 1}^{T^{*}} \sum_{i = 1}^{p} K (\frac{x_{t - i} - x_{j - i}}{h})}, \end{matrix}

where K is a kernel function and

h > 0

the bandwidth parameter. Note that the estimator (1) is a weighted average of

x_{1}, x_{2}, \dots, x_{T}

using the weights

w^{h} (t, j)

. As for determining the optimal bandwidth h, the Generalized Cross Validation (GCV) criterion

\hat{h} = arg min_{h > 0} GCV (h) = arg min_{h > 0} \frac{1}{T^{*} - p} \sum_{t = p + 1}^{T^{*}} {(\frac{x_{t} - \sum_{j = p + 1}^{T^{*}} w^{h} (t, j) x_{j}}{1 - \frac{tr (W_{h})}{T^{*} - p}})}^{2}

can be utilized, where

tr (W_{h})

is the trace of the matrix

W_{h} = [w^{h} (t, j)]

. It is a matter of fact that the estimated values of f are ensured to be within the range of the response variable. This beneficial property is one of the reasons why we apply the Nadaraya–Watson kernel-based estimator for our fuzzy time series model. By utilizing this idea, the proposed FNPTSM provides an estimation procedure of the unknown (nonlinear) relationship between the fuzzy observations in three stages. The advantage of this methodology is that it considerably decreases the complexity in the estimation procedure. While the other fuzzy time series models [66,67,68] are based on estimating the unknown components of the model by unifying the centers and spreads of fuzzy data and their corresponding predicted values, our proposed method provides a smooth estimation procedure according to three separate stages. In the framework of a simulation study and two real-data examples, the efficiency and appropriateness of the FNPTSM is assessed in comparison with previous time series models for fuzzy data by utilizing four approved goodness-of-fit criteria.

The paper is organized as follows. First, we recall some necessary concepts related to fuzzy numbers in Section 2. In Section 3, the three-stage nonparametric kernel-based time series model using fuzzy data is presented. In Section 4, various application examples are given. Concluding remarks are provided in Section 5.

2. Fuzzy Numbers

In this section, we introduce basic definitions of fuzzy numbers that are needed to develop our proposed method.

A fuzzy set

\tilde{A}

is a mapping on

X

that assigns a specific degree of membership

0 \leq μ_{\tilde{A}} (x) \leq 1

to each

x \in X

. In addition, a fuzzy number (FN)

\tilde{A}

is a convex normalized fuzzy set on the real line

R

with an upper semi-continuous membership function of bounded support [71]. In many real applications, vague data a can be reported as

\tilde{A}

: “about a”. Such fuzzy data can often be represented via a special case of FNs, so called

L R

-FNs, which split

μ_{\tilde{A}}

into two curves: a part on the left and a part on the right of the modal value. So, when considering real-life applications in fuzzy environments,

L R

-FNs play an important role. The membership function of an

L R

-FN

μ_{\tilde{A}} (x) = {(a; l_{a}, r_{a})}_{L R}

can be defined by:

\begin{matrix} μ_{\tilde{A}} (x) = \{\begin{matrix} L (\frac{a - x}{l_{a}}) & if x \leq a \\ R (\frac{x - a}{r_{a}}) & if x > a \end{matrix} \end{matrix}

(2)

In (2), L and R are continuous and strictly decreasing functions from

[0, 1]

to

[0, 1]

satisfying

L (0) = R (0) = 1

and

L (1) = R (1) = 0

. In addition,

a \in R

represents the modal value, while

l_{a} > 0

and

r_{a} > 0

are the left spreads and right spreads of

\tilde{A}

, respectively. The set of all

L R

-FNs is represented by

F_{L R} (R)

. A special case of an

L R

-FN is the so-called triangular fuzzy number (TFN), whose membership function has the following form:

μ_{\tilde{A}} (x) = \{\begin{matrix} \frac{x - (a - l_{a})}{l_{a}} & a - l_{a} \leq x \leq a \\ \frac{a + r_{a} - x}{r_{a}} & a < x \leq a + r_{a} \\ 0 & otherwise \end{matrix}

There are various operations that can be defined between two

L R

-FNs, i.e., between

\tilde{A} = {(a; l_{a}, r_{a})}_{L R}

and

\tilde{B} = {(b; l_{b}, r_{b})}_{L R}

. For instance, as we need both operations in this paper, we define Addition and Scalar multiplication of

\tilde{A}

and

\tilde{B}

in the following [72]:

Addition: $\tilde{A} \oplus \tilde{B} = {(a + b; l_{a} + l_{b}, r_{a} + r_{b})}_{L R}$
Scalar multiplication:

$λ \otimes \tilde{A} = \{\begin{matrix} {(λ a; λ l_{a}, λ r_{a})}_{L R} & if λ > 0 \\ {(λ a; - λ r_{a}, - λ l_{a})}_{R L} & if λ < 0 \end{matrix}$

Moreover, there are numerous concepts used to define distances between two

L R

-FNs

\tilde{A} = {(a; l_{a}, r_{a})}_{L R}

and

\tilde{B} = {(b; l_{b}, r_{b})}_{L R}

[71]. Here, we utilize the squared error distance measure D for performance evaluation of the FNPTSM in comparison with other models. It is defined as

D (\tilde{A}, \tilde{B}) = {(({(a - b)}^{2} + c_{1} {(l_{a} - l_{b})}^{2} + c_{2} {(r_{a} - r_{b})}^{2}) / 3)}^{0.5}

with

c_{1} = \int_{0}^{1} L^{- 1} (α) d α

and

c_{2} = \int_{0}^{1} R^{- 1} (α) d α

[73].

3. Nonparametric Kernel-Based Time Series Model for Fuzzy Data

In this section, the FNPTSM is developed along with the suggested parameter estimation method.

3.1. The Model

First, we recall the definition of fuzzy time series data.

Definition 1.

Let

{\tilde{x}}_{T} = {{\tilde{x}}_{1}, {\tilde{x}}_{2}, \dots, {\tilde{x}}_{T}}

be a set of FNs of size T. Then,

{\tilde{x}}_{T}

is called fuzzy time series data if

{{\tilde{x}}_{1}, {\tilde{x}}_{2}, \dots, {\tilde{x}}_{T}}

is the vague concept of ordinary time series data

{x_{1}, x_{2}, \dots, x_{T}}

[68,74].

As discussed in the Introduction, there are many situations where it is preferable to report exact data x by an FN

\tilde{x}

as “about x”. Then,

\tilde{x}

is the respective vague concept of x.

Definition 2.

Let

{\tilde{x}}_{T} = {{\tilde{x}}_{1}, {\tilde{x}}_{2}, \dots, {\tilde{x}}_{T}}

be fuzzy time series data. The FNPTSM for fuzzy time series data

{\tilde{x}}_{T}

is then defined by

{\tilde{x}}_{t} = \tilde{f} ({\tilde{x}}_{t - 1}, {\tilde{x}}_{t - 2}, \dots, {\tilde{x}}_{t - p}) \oplus {\tilde{ϵ}}_{t},

(3)

where

${\tilde{x}}_{t} = {(x_{t}; l_{x_{t}}, r_{x_{t}})}_{L R}$ ,
$\tilde{f} ({\tilde{x}}_{t - 1}, {\tilde{x}}_{t - 2}, \dots, {\tilde{x}}_{t - p}) = (f (x_{t - 1}, x_{t - 2}, \dots, x_{t - p});$
$l_{f (l_{x_{t - 1}}, l_{x_{t - 2}}, \dots, l_{x_{t - p}})}, r_{f (r_{x_{t - 1}}, r_{x_{t - 2}}, \dots, r_{x_{t - p}})})_{L R}$ ,
${\tilde{ϵ}}_{t} = {(ϵ_{t}; l_{ϵ_{t}}, r_{ϵ_{t}})}_{L R}$ ’s are fuzzy errors, where $ϵ_{t} \in R$ and $l_{ϵ_{t}}, r_{ϵ_{t}} \in R^{+}$ .

Remark 1.

Note that (3) provides an FN in the form

{\tilde{x}}_{t}^{*} =

(x_{t}^{*}; l_{x_{t}^{*}},

r_{x_{t}^{*}})_{L R}

with

x_{t}^{*} =

f (x_{t - 1}, x_{t - 2},

\dots, x_{t - p}) + ϵ_{t}

,

l_{x_{t}^{*}} =

l_{f (l_{x_{t - 1}}, \dots, l_{x_{t - p}}) + l_{ϵ_{t}}}

and

r_{x_{t}^{*}} = r_{f (r_{x_{t - 1}}, \dots, r_{x_{t - p}}) + r_{ϵ_{t}}}

with

t = 1, 2, \dots, T

. According to Definition 1, as

{x_{1}, x_{2}, \dots, x_{T}}

is ordinary time series data,

{\tilde{x}}_{T}^{*} = {{\tilde{x}}_{1}^{*}, {\tilde{x}}_{2}^{*}, \dots, {\tilde{x}}_{T}^{*}}

is also a vague concept of ordinary time series data

{x_{1}^{*}, x_{2}^{*}, \dots, x_{T}^{*}}

. Thus, the proposed fuzzy time series model (3) generates new fuzzy time series data.

3.2. Three-Stage Estimation Method for the Nonlinear Fuzzy Smooth Function

Below, we suggest a three-stage method to estimate the unknown fuzzy smooth function

\tilde{f}

in (3). For this purpose, the fuzzy predictions are obtained based on a within-sample forecast

x_{T^{*}} = {(x_{1}, x_{2}, \dots, x_{T^{*}})}^{⊤}

with

T^{*} < T

. From (3), one can get three ordinary nonlinear time series models as (1)

x_{t} = f (x_{t - 1}, x_{t - 2}, \dots, x_{t - p}) + ϵ_{t}

, (2)

l_{x_{t}} = l_{f (l_{x_{t - 1}}, \dots, l_{x_{t - p}}) + l_{ϵ_{t}}}

, and (3)

r_{x_{t}} = r_{f (r_{x_{t - 1}}, \dots, r_{x_{t - p}}) + r_{ϵ_{t}}}

for

t = 1, 2, \dots, T^{*}

. Therefore, to estimate the fuzzy smooth function at

\tilde{x} =

{(x; l_{x}, r_{x})}_{T}

with

x =

{(x_{1}, x_{2}, \dots, x_{p})}^{⊤}

,

l_{x} =

(l_{x_{1}},

l_{x_{2}}, \dots,

l_{x_{p}})^{⊤}

and

r_{x} =

(r_{x_{1}},

r_{x_{2}}, \dots,

r_{x_{p}})^{⊤}

, we follow the three-stage procedure below:

Stage (1): Consider the nonlinear regression model $l_{x_{t}} = l_{f (x_{t - 1}, x_{t - 2}, \dots, x_{t - p})} + l_{ϵ_{t}}$ . Based on the time series data $l_{x_{t}} = {(l_{x_{t - 1}}, \dots, l_{x_{t - p}})}^{⊤}$ , we employ the weighted Nadaraya–Watson estimator to estimate $l_{f}$ for a within-sample forecast $T^{*} \leq T$ at $l_{x} = {(l_{x_{1}}, \dots, l_{x_{p}})}^{⊤})$ as

$l_{\hat{f} (l_{x_{t}})} = \sum_{j = p + 1}^{T^{*}} w^{h_{l}} (t, j) l_{x_{j}},$

where

$\begin{matrix} w^{h_{l}} (t, j) = \frac{\sum_{i = 1}^{p} K (\frac{l_{x_{t - i}} - l_{x_{j - i}}}{h_{l}})}{\sum_{j = p + 1}^{T^{*}} \sum_{i = 1}^{p} K (\frac{l_{x_{t - i}} - l_{x_{j - i}}}{h_{l}})} \end{matrix}$

(4)

with kernel function $K (.)$ and bandwidth parameter $h_{l} > 0$ . The optimal value of $h_{l}$ can be estimated by implementing the GCV criterion,

${\hat{h}}_{l} = arg min_{h_{l} > 0} GCV (h) = arg min_{h_{l} > 0} \frac{1}{T^{*} - p} \sum_{t = p + 1}^{T^{*}} {(\frac{l_{x_{t}} - \sum_{j = p + 1}^{T^{*}} w^{h_{l}} (t, j) l_{x_{j}}}{1 - \frac{tr (W_{h_{l}})}{T^{*} - p}})}^{2},$

(5)

where $tr (W_{h_{l}})$ is the trace of the matrix $W_{h_{l}} = [w^{h_{l}} (t, j)]$ with $w^{h_{l}} (t, j)$ as defined in (4).
Stage (2): Consider the nonlinear regression model $r_{x_{t}} = r_{f (x_{t - 1}, x_{t - 2}, \dots, x_{t - p})} + r_{ϵ_{t}}$ . Based on the within-sample time series forecast data $r_{x_{t}} = {(r_{x_{t - 1}}, \dots, r_{x_{t - p}})}^{⊤}$ , $t = 1, 2, \dots, T^{*}$ , the weighted Nadaraya–Watson estimation of $r_{f}$ at $r_{x} = {(r_{x_{1}}, \dots, r_{x_{p}})}^{⊤})$ can be established via

$r_{\hat{f} (r_{x_{t}})} = \sum_{j = p + 1}^{T^{*}} w^{h_{r}} (t, j) r_{x_{j}},$

where

$\begin{matrix} w^{h_{r}} (t, j) = \frac{\sum_{i = 1}^{p} K (\frac{r_{x_{t - i}} - r_{x_{j - i}}}{h_{r}})}{\sum_{j = p + 1}^{T^{*}} \sum_{i = 1}^{p} K (\frac{r_{x_{t - i}} - r_{x_{j - i}}}{h_{r}})} \end{matrix}$

(6)

and $h_{r} > 0$ is a bandwidth parameter. The optimal value of $h_{r}$ can be estimated using the GCV criterion,

${\hat{h}}_{r} = arg min_{h_{r} > 0} GCV (h) = arg min_{h_{r} > 0} \frac{1}{T^{*} - p} \sum_{t = p + 1}^{T^{*}} {(\frac{r_{x_{t}} - \sum_{j = p + 1}^{T^{*}} w^{h_{r}} (t, j) r_{x_{j}}}{1 - \frac{tr (W_{h_{r}})}{T^{*} - p}})}^{2},$

(7)

where $tr (W_{h_{r}})$ is the trace of the matrix $W_{h_{r}} = [w^{h_{r}} (t, j)]$ with $w^{h_{r}} (t, j)$ as defined in (6).
Stage (3): Consider the nonlinear regression model $x_{t} =$ $f (x_{t - 1}, x_{t - 2}, \dots, x_{t - p}) + ϵ_{t}$ . Based on the within-sample time series forecast data $(x_{t} =$ $(x_{t - 1},$ $x_{t - 2}, \dots,$ $x_{t - p})^{⊤})$ , $t = 1, 2, \dots, T^{*}$ , a nonparametric estimator f can be achieved as

$\hat{f} (x_{t}) = \sum_{j = p + 1}^{T^{*}} w^{h} (t, j) x_{j},$

where

$\begin{matrix} w^{h} (t, j) = \frac{\sum_{i = 1}^{p} K (\frac{x_{t - i} - x_{j - i}}{h})}{\sum_{j = p + 1}^{T^{*}} \sum_{i = 1}^{p} K (\frac{x_{t - i} - x_{j - i}}{h})} \end{matrix}$

(8)

and bandwidth parameter $h > 0$ . Similar to the previous stages, the optimal value of h is estimated with the help of the GCV criterion,

$\hat{h} = arg min_{h > 0} GCV (h) = arg min_{h > 0} \frac{1}{T^{*} - p} \sum_{t = p + 1}^{T^{*}} {(\frac{x_{t} - \sum_{j = p + 1}^{T^{*}} w^{h} (t, j) x_{j}}{1 - \frac{tr (W_{h})}{T^{*} - p}})}^{2},$

(9)

where $tr (W_{h})$ is the trace of the matrix $W_{h} = [w^{h} (t, j)]$ with $w^{h} (t, j)$ , as defined in (8).

Therefore, the forecast

{\tilde{x}}_{T^{*} + k}

with time lag

k \in N

can be achieved by an

L R

-FN via

{\tilde{\hat{x}}}_{T^{*} + k} =

({\hat{x}}_{T^{*} + k};

l_{{\hat{x}}_{T^{*} + k}},

r_{{\hat{x}}_{T^{*} + k}})_{L R}

with

\begin{matrix} {\hat{x}}_{T^{*} + k} & = \sum_{j = p + 1}^{T^{*} + k - 1} \frac{\sum_{i = 1}^{p} K (\frac{x_{t - i} - x_{j - i}}{\hat{h}})}{\sum_{j = p + 1}^{T^{*} + k - 1} \sum_{i = 1}^{p} K (\frac{x_{t - i} - x_{j - i}}{\hat{h}})} \cdot x_{j}, \\ l_{{\hat{x}}_{T^{*} + k}} & = \sum_{j = p + 1}^{T^{*} + k - 1} \frac{\sum_{i = 1}^{p} K (\frac{l_{x_{t - i}} - l_{x_{j - i}}}{{\hat{h}}_{l}})}{\sum_{j = p + 1}^{T^{*} + k - 1} \sum_{i = 1}^{p} K (\frac{l_{x_{t - i}} - l_{x_{j - i}}}{{\hat{h}}_{l}})} \cdot l_{x_{j}}, \\ r_{{\hat{x}}_{T^{*} + k}} & = \sum_{j = p + 1}^{T^{*} + k - 1} \frac{\sum_{i = 1}^{p} K (\frac{r_{x_{t - i}} - r_{x_{j - i}}}{{\hat{h}}_{r}})}{\sum_{j = p + 1}^{T^{*} + k} \sum_{i = 1}^{p} K (\frac{r_{x_{t - i}} - r_{x_{j - i}}}{{\hat{h}}_{r}})} \cdot r_{x_{j}} . \end{matrix}

According to Stages (2) and (3), it can be seen that the spreads of the fuzzy prediction

{\tilde{x}}_{T^{*} + k}

are always non-negative.

Remark 2.

Since the proposed time series model relies on fuzzy data, let us recall the previous time series models based on fuzzy data [66,67,68]. First, Hesamian and Akbari [66] proposed a fuzzy semi-parametric autoregressive integrated moving average (ARIMA) model as follows:

{\tilde{x}}_{i} = ⨁_{l = 1}^{p} (θ_{l} \otimes {\tilde{x}}_{i - l} \oplus \tilde{f} (t_{i}) \oplus {\tilde{ϵ}}_{i}), i = p + 1, \dots, T .

The parameters of the model are estimated by employing a hybrid method including a nonparametric kernel-based method and least absolute deviations. For a second time series model based on fuzzy data, Zarei et al. [67] applied the method [66] to estimate the model parameters and the fuzzy smooth function based on a specific distance, kernel and triangular fuzzy numbers. Finally, Hesamian et al. [68] proposed the fuzzy nonlinear time series model

{\tilde{x}}_{t} = \tilde{f} ({\tilde{x}}_{t - 1}, {\tilde{x}}_{t - 2}, \dots, {\tilde{x}}_{t - p}) \oplus {\tilde{ϵ}}_{t}, t = 1, 2, \dots, T,

where

\tilde{f} ({\tilde{x}}_{t - 1}, {\tilde{x}}_{t - 2}, \dots, {\tilde{x}}_{t - p}) = ⨁_{l = 1}^{p} f_{l} ({\tilde{x}}_{t - l}) .

As for the estimation of the unknown fuzzy smooth functions

{\tilde{f}}_{l}

, they applied a forward additive nonparametric technique.

Remark 3.

We have extended some common performance measures used to compare the predictive accuracy of different time series models that we implement in Section 4. For this purpose, a time series model is first estimated based on a within-sample fuzzy time series dataset of size

T^{*} < T

and then the performance of the model is evaluated via the remaining fuzzy time series dataset of size

T - T^{*}

.

Mean Forecast Error:

$MFE = \frac{\sum_{t = T^{*} + 1}^{T} D^{2} ({\tilde{\hat{x}}}_{t}, {\tilde{x}}_{t})}{T - T^{*}}$
Mean Absolute Scaled Error:

$MASE = \frac{\sum_{t = T^{*} + 1}^{T} q_{t}}{T - T^{*}}$

with

$q_{t} = \frac{D ({\tilde{\hat{x}}}_{t}, {\tilde{x}}_{t})}{\frac{1}{T - T^{*}} \sum_{t = T^{*} + 1}^{T} D^{2} ({\tilde{x}}_{t}, {\tilde{x}}_{t - 1})}$
Basis of the Index of Agreement:

$BIA = 1 - \frac{\sum_{t = T^{*} + 1}^{T} D^{2} ({\tilde{x}}_{t}, {\tilde{\hat{x}}}_{t})}{\sum_{t = T^{*} + 1}^{T} {(D ({\tilde{x}}_{t}, \tilde{\bar{x}}) + D (\tilde{\bar{x}}, {\tilde{\hat{x}}}_{t}))}^{2}}$

with

$\tilde{\bar{x}} = \frac{\sum_{t = T^{*} + 1}^{T} {\tilde{x}}_{t}}{T - T^{*}}$
Mean Similarity Measure:

$MSM = \frac{1}{T - T^{*}} \sum_{t = T^{*} + 1}^{T} \frac{\int min {{\tilde{\hat{x}}}_{t} (x), {\tilde{x}}_{t} (x)} d x}{\int max {{\tilde{\hat{x}}}_{t} (x), {\tilde{x}}_{t} (x)} d x}$

Let A and B be two fuzzy time series models. As

MSM : F_{L R} (R) \times F_{L R} (R) \to [0, 1]

is a similarity measure, values of MSM above 0.5 show a good degree of similarity between the fuzzy responses and their fuzzy predictions. If we observe

{MSM}_{B} < {MSM}_{A}

, then model A outperforms model B. Further, if

{MFE}_{A} < {MFE}_{B}

,

{MASE}_{A} < {MASE}_{B}

or

{BIA}_{A} < {BIA}_{B}

, then model A acts better in terms of prediction accuracy compared to model B.

Remark 4.

While the proposed estimation procedure does not depend on the shape functions L and R corresponding to fuzzy data, the performance measures MFE, MASE and MSM depend on these shape functions. Therefore, the selected type of the shape functions L and R may affect the prediction criteria. For instance, assume that the data have reported by

{\tilde{x}}_{t} = {(x_{t}, l_{x_{t}}, r_{x_{t}})}_{L R}

with

L (x) = 1 - x

and

R (x) = \sqrt{1 - x}

. That is,

c_{1} = \frac{1}{2}

and

c_{2} = \frac{2}{3}

. Therefore, the distance between

{\tilde{x}}_{t}

and its prediction is

D^{2} ({\tilde{x}}_{t}, {\tilde{\hat{x}}}_{t}) = {(x_{t} - {\hat{x}}_{t})}^{2} + \frac{1}{2} {(l_{x_{t}} - l_{{\hat{x}}_{t}})}^{2} + \frac{2}{3} {(r_{x_{t}} - r_{{\hat{x}}_{t}})}^{2}

. This implies that the MFE criterion is more sensitive to right spreads than to left spreads in this case. Considering

L (x) = R (x) = 1 - x

, it can be seen that

D^{2} ({\tilde{x}}_{t}, {\tilde{\hat{x}}}_{t})

would be equally dependent from the left and right spreads. However, when we compare the performance of fuzzy time series models, it is reasonable that the shape functions L and R are assumed to be the same for all the considered models. Thus, following this approach, the performance criteria are not sensitive to the selection of L and R since

c_{1}

and

c_{2}

remain fixed for each model.

3.3. Selection of Autoregressive Order and Optimal Bandwidths

When implementing the FNPTSM (3), it is necessary to select the optimal bandwidths h,

h_{l}

and

h_{r}

, to choose the kernel function and to determine the autoregressive order p. The procedure used to select the autoregressive order and the optimal bandwidths is proposed as follows:

(1): Let $p = 1$ .
(2): (2.1) Compute ${\hat{h}}_{l}^{p}$ based on (5).
(2.2) Compute ${\hat{h}}_{r}^{p}$ based on (7).
(2.3) Compute ${\hat{h}}^{p}$ based on (9).
(3): Let $p = p + 1$ and return to (2) until

$\hat{p} = arg min_{p} {RMSE}_{p},$

where

${RMSE}_{p} = \sqrt{\frac{\sum_{i = p + 1}^{T^{*}} D^{2} ({\tilde{\hat{x}}}_{i}, {\tilde{x}}_{i})}{T^{*} - p}} .$

Then,

\hat{p}

,

{\hat{h}}_{p}

,

{\hat{h}}_{l}^{p}

and

{\hat{h}}_{r}^{p}

are the optimal values.

4. Numerical Examples

In this section, the effectiveness of the FNPTSM is investigated considering a simulation study and application examples that rely on fuzzy data. Recall that there are three other time series models that are based on fuzzy data (see Remark 2), i.e., the models introduced by Hesamian and Akbari [66], Zarei et al. [67] and Hesamian et al. [68]. However, as the method of Zarei et al. [67] is based on Hesamian and Akbari’s method [66] (with a different distance measure), we omit this technique in the comparisons below. Thus, we compare our proposed method with the models suggested by Hesamian and Akbari (FSPTSM) [66] and Hesamian et al. (FATSM) [68] via three different kernel functions (Gaussian, Epanechnikov, and triweight).

Example 1.

In this example, 10 fuzzy datasets, each of size 300, are generated by the following FNPTSM:

{\tilde{x}}_{t} = f ({\tilde{x}}_{t - 1}, {\tilde{x}}_{t - 2}, {\tilde{x}}_{t - 3}) \oplus {\tilde{ϵ}}_{t}, t = 4, 5, \dots, 300,

where

1.: $f ({\tilde{x}}_{1}, {\tilde{x}}_{2}, {\tilde{x}}_{3}) = {(x_{1} - cos (x_{2}) - exp (\frac{x_{3}}{1 + | x_{3} |}); {cos}^{2} (0.9 \prod_{j = 1}^{3} l_{x_{j}}), exp (0.002 \prod_{j = 1}^{3} r_{x_{j}}))}_{L R}$
2.: ${\tilde{x}}_{j} = {(x_{j}; l_{x_{j}}, r_{x_{j}})}_{L R}$ , $j = 1, 2, 3$ are the initial values with $x_{j} \sim N (0, 1)$ , and $l_{x_{j}}$ and $r_{x_{j}}$ are random variables following $U (0, 0.2)$ and $U (0, 0.9)$ , respectively,
3.: ${\tilde{ϵ}}_{t} = {(ϵ_{t}; l_{ϵ_{t}}, r_{ϵ_{t}})}_{L R}$ with $ϵ_{t} \sim N (0, 4)$ , $l_{ϵ_{t}}$ and $r_{ϵ_{t}}$ are random variables following $U (0, 0.4)$ and $U (0, 0.5)$ , respectively, and
4.: $L (x) = 1 - x^{2}$ and $R (x) = 1 - x$ .

The kernels Gaussian, Epanechnikov, and triweight are applied to predict

{\tilde{x}}_{t}

. Based on the 10 sample fuzzy datasets (each of size 300), the mean values of the goodness-of-fit measures and their corresponding bandwidth mean values are summarized in Table 1. Consulting the results for the FNPTSM, it is evident that the best results among various kernels are obtained via the Gaussian kernel (lowest values of

\bar{MFE}

,

\bar{MASE}

and largest values of

\bar{BIA}

,

\bar{MSM}

). In addition, the results of the FSPTSM and FATSM can also be found in Table 1. Comparing these results with the results of the FNPTSM, it is obvious that the FNPTSM provides more accurate predictions compared to both other methods for all three kernels, as all the considered goodness-of-fit measures show better results for the FNPTSM. That is, we observe the lowest values of

\bar{MFE}

,

\bar{MASE}

and the largest values of

\bar{BIA}

,

\bar{MSM}

for the FNPTSM.

Example 2.

Three models, the FNPTSM, FSPTSM and FATSM, are implemented to analyze the dataset in Table 2 taken from [67].

Eighty percent of the data were used for parameter estimation and the rest were applied to fit the model. The goodness-of-fit values are given in Table 3 for the three kernels. The best results among various kernels are obtained by employing the triweight kernel (lowest values of MFE, MASE, and largest value of MSM). The results of the FSPTSM and FATSM are also given in Table 3. As for the FSPTSM , the best results are obtained based on the triweight kernel with

MFE = 1.6252

,

MASE = 1.3371

,

MSM = 0.1205

and

BIA = 0.9443

. The best results of FATSM are also obtained based on the triweight kernel with

MFE = 0.254

,

MASE = 0.733

,

MSM = 0.358

and

BIA = 0.958

. However, all goodness-of-fit measures related to the FNPTSM show a better performance compared to both the FSPTSM and FATSM, i.e., the lowest values of MFE, MASE and the largest values of BIA, MSM are observed for the FNPTSM. The results show that the newly presented FNPTSM is more efficient than the FSPTSM and FATSM for the data in Table 3. The plot of the fuzzy data and corresponding estimates based on the triweight kernel is given in Figure 1 for all methods (FNPTSM , FSPTSM, FATSM).

Example 3.

In this example, we employ the FNPTSM and both the FSPTSM and FATSM to predict the global land–ocean temperature [75]. For this purpose, we use the global land–ocean temperature from January 2000 to December 2020, as shown in Figure 2. The data are reported as average values for each month. Therefore, the data can also be interpreted as “mean of each month" and appropriately modeled via triangular fuzzy numbers. Inspired by Buckley [65], this dataset can be used to evaluate the global land-ocean temperature with the help of a TFN

{\tilde{x}}_{t} = {({\bar{x}}_{t}; Z_{0.005} s_{t} / \sqrt{n_{t}}, 0.15 Z_{0.025} s_{t} / \sqrt{n_{t}})}_{T}

, where

n_{t}

,

{\bar{x}}_{t}

, and

s_{t}

denote the number of days, mean, and standard deviation of the global land–ocean temperature in the

t^{th}

month, respectively, and

Z_{α}

is the α-quantile of the standard normal distribution. However, since we do not have daily values of the global land–ocean temperature, we model the monthly global land–ocean temperature for a month t via

{\tilde{x}}_{t}

= (x_{t};

0.2 x_{t},

0.15 x_{t})_{T}

.

In this example, 200 observations were used to estimate the parameters. A further 52 observations were used to fit the model. The goodness-of-fit values that correspond to the FNPTSM, FSPTSM and FATSM are given in Table 4. The results reveal that the FNPTSM outperforms the FSPTSM and FATSM for the global land–ocean temperature dataset. Note that the best results of the proposed FNPTSM are obtained based on the Gaussian kernel and the best results of both the FSPTSM and FATSM are given when implementing the triweight kernel. The fuzzy data, along with the corresponding estimations related to the FNPTSM (based on the Gaussian kernel) as well as of the FSPTSM and FATSM (based on the triweight kernel), are visualized in Figure 3. In comparison to the FSPTSM and FATSM, the values predicted by the FNPTSM are closer to the fuzzy observations, which reveals that the proposed FNPTSM performs better for the global land–ocean temperature dataset.

5. Conclusions

Nonparametric statistical inference deals with situations where the functional relationships of the involved distribution functions are unspecified. In this regard, nonparametric time series models were broadly utilized to identify the “best fit” curve for a given time series of data. However, there are numerous situations where the available data are fuzzy rather than exact. In this paper, a nonparametric kernel-based time series model that relies on fuzzy data was introduced. The Nadaraya–Watson estimator was utilized to provide a fuzzy time series model within a three-stage procedure. Some popular goodness-of-fit measures have been implemented to investigate the performance of the fuzzy nonparametric time series model based on different kernel functions. The effectiveness and feasibility of the proposed time series model were also compared with the performance of existing time series models based on fuzzy data. Considering three common kernel functions (Gaussian, Epanechnikov, and triweight), the results indicated the superior performance of our proposed method in comparison to previous approaches. In addition to the performance aspect, the handling of the new nonparametric kernel-based time series model is much simpler than that of the previous methods, as we implemented an estimation procedure that is divided into three independent stages. In addition, our proposed time series model can be employed for arbitrary shapes of

L R

fuzzy numbers. However, the model can be applied only for

L R

fuzzy numbers, and thus it could be a promising future direction to develop a more general methodology that can handle arbitrary fuzzy numbers. Future studies could also focus on extending our approach to cases where the underlying time series data contain outliers. Finally, extending the proposed methodology for other nonlinear models such as wavelet-based or neural network-based time series models are further ideas for future research.

Author Contributions

Conceptualization, G.H.; methodology, G.H., A.J. and N.C.; software, G.H.; validation, G.H.; formal analysis, A.J.; investigation, A.J. and N.C.; writing—original draft, G.H.; writing—review & editing, A.J. and N.C.; supervision, A.J. and N.C.; project administration, N.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the respective references as mentioned in the main text.

Acknowledgments

The authors thank the three anonymous reviewers for their valuable feedback and suggestions, which were important and helpful for improving the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods; Springer: New York, NY, USA, 2009. [Google Scholar]
Shumway, R.H.; Stoffer, D.S. Time Series Analysis and Its Applications; Springer: London, UK, 2017. [Google Scholar]
Box, G.E.P.; Jenkins, G.M. Time Series Analysis: Forecasting and Control; Holden-Day: San Francisco, CA, USA, 1976. [Google Scholar]
Chukhrova, N.; Johannssen, A. State Space Models and the Kalman Filter in Stochastic Claims Reserving: Forecasting, Filtering and Smoothing. Risks 2017, 5, 30. [Google Scholar] [CrossRef] [Green Version]
Chukhrova, N.; Johannssen, A. Stochastic Claims Reserving Methods with State Space Representations—A Review. Risks 2021, 9, 198. [Google Scholar] [CrossRef]
Palma, W. Time Series Analysis; Wiley: Hoboken, NJ, USA, 2016. [Google Scholar]
Tong, H. Nonlinear Time Series: A Dynamical System Approach; Oxford University Press: Oxford, UK, 1990. [Google Scholar]
Woodward, W.A.; Gray, H.L.; Elliott, A.C. Applied Time Series Analysis; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Yang, X.; Liu, B. Uncertain time series analysis with imprecise observations. Fuzzy Optim. Decis. Mak. 2019, 18, 263–278. [Google Scholar] [CrossRef]
Chukhrova, N.; Johannssen, A. Generalized One-Tailed Hypergeometric Test with Applications in Statistical Quality Control. J. Qual. Technol. 2020, 52, 14–39. [Google Scholar] [CrossRef]
Chukhrova, N.; Johannssen, A. Non-parametric fuzzy hypothesis testing for quantiles applied to clinical characteristics of COVID-19. Int. J. Intell. Syst. 2021, 36, 2922–2963. [Google Scholar] [CrossRef]
Chukhrova, N.; Johannssen, A. Employing fuzzy hypothesis testing to improve modified p charts for monitoring the process fraction nonconforming. Inf. Sci. 2023, 633, 141–157. [Google Scholar] [CrossRef]
Song, Q.; Chissom, B.S. Fuzzy time series and its models. Fuzzy Sets Syst. 1993, 54, 269–277. [Google Scholar] [CrossRef]
Sun, C.; Li, H. Parallel fuzzy relation matrix factorization towards algebraic formulation, universal approximation and interpretability of MIMO hierarchical fuzzy systems. Fuzzy Sets Syst. 2022, 450, 68–86. [Google Scholar] [CrossRef]
Sun, C.; Li, H. Construction of universal approximations for multi-input single-output Hierarchical Fuzzy Systems. IEEE Trans. Fuzzy Syst. 2023; in press. [Google Scholar]
Yu, H.K. Weighted fuzzy time-series models for TAIEX forecasting. Phys. A Stat. Mech. Appl. 2005, 349, 609–624. [Google Scholar] [CrossRef]
Chen, S.M.; Tanuwijaya, K. Multivariate fuzzy forecasting based on fuzzy time series and automatic clustering techniques. Expert Syst. Appl. 2011, 38, 10594–10605. [Google Scholar] [CrossRef]
Huang, Y.L.; Horng, S.J.; He, M.; Fan, P.; Kao, T.W.; Khan, M.K.; Lai, J.L.; Kuo, I.H. A hybrid forecasting model for enrollments based on aggregated fuzzy time series and particle swarm optimization. Expert Syst. Appl. 2011, 38, 8014–8023. [Google Scholar] [CrossRef]
Li, S.T.; Kuo, S.C.; Cheng, Y.C.; Chen, C.C. Deterministic vector long-term forecasting for fuzzy time series. Fuzzy Sets Syst. 2010, 161, 1852–1870. [Google Scholar] [CrossRef]
Peng, H.W.; Wu, S.F.; Wei, C.C.; Lee, S.J. Time series forecasting with a neuro-fuzzy modeling scheme. Appl. Soft Comput. 2015, 32, 481–493. [Google Scholar] [CrossRef]
Duru, O.; Bulut, E. A nonlinear clustering method for fuzzy time series: Histogram damping partition under the optimized cluster paradox. Appl. Soft Comput. 2014, 24, 742–748. [Google Scholar] [CrossRef]
Bose, M.; Mali, K. A novel data partitioning and rule selection technique for modeling high-order fuzzy time series. Appl. Soft Comput. 2018, 63, 87–96. [Google Scholar] [CrossRef]
Uslu, V.R.; Bas, E.; Yolcu, U.; Egrioglu, E. A fuzzy time series approach based on weights determined by the number of recurrences of fuzzy relations. Swarm Evol. Comput. 2014, 15, 19–26. [Google Scholar]
Bulut, E. Modeling seasonality using the fuzzy integrated logical forecasting (FILF) approach. Expert Syst. Appl. 2014, 41, 1806–1812. [Google Scholar] [CrossRef]
Chen, M.Y.; Chen, B.T. Online fuzzy time series analysis based on entropy discretization and a fast Fourier transform. Appl. Soft Comput. 2014, 14, 156–166. [Google Scholar] [CrossRef]
Singh, P.; Borah, B. Forecasting stock index price based on M-factors fuzzy time series and particle swarm optimization. Int. J. Approx. Reason. 2014, 55, 812–833. [Google Scholar] [CrossRef]
Chen, S.M.; Chen, S.W. Fuzzy forecasting based on two-factors second-order fuzzy-trend logical relationship groups and the probabilities of trends of fuzzy logical relationships. IEEE Trans. Cyber. 2015, 45, 405–417. [Google Scholar]
Cheng, S.H.; Chen, S.M.; Jian, W.S. Fuzzy time series forecasting based on fuzzy logical relationships and similarity measures. Inf. Sci. 2016, 327, 272–287. [Google Scholar] [CrossRef]
Sadaei, H.J.; Enayatifar, R.; Abdullah, A.H.; Gani, A. Short-term load forecasting using a hybrid model with a refined exponentially weighted fuzzy time series and an improved harmony search. Int. J. Electr. Power Energy Syst. 2014, 62, 118–129. [Google Scholar] [CrossRef]
Ye, F.; Zhang, L.; Zhang, D.; Fujita, H.; Gong, Z. A novel forecasting method based on multi-order fuzzy time series and technical analysis. Inf. Sci. 2016, 367–368, 41–57. [Google Scholar] [CrossRef]
Efendi, R.; Ismail, Z.; Deris, M.M. A new linguistic out-sample approach of fuzzy time series for daily forecasting of Malaysian electricity load demand. Appl. Soft Comput. 2015, 28, 422–430. [Google Scholar] [CrossRef]
Talarposhtia, F.M.; Hossein, J.S.; Rasul, E.; Guimaraesc, F.G.; Mahmud, M.; Eslami, T. Stock market forecasting by using a hybrid model of exponential fuzzy time series. Int. J. Approx. Reason. 2016, 70, 79–98. [Google Scholar]
Wang, W.; Liu, X. Fuzzy forecasting based on automatic clustering and axiomatic fuzzy set classification. Inf. Sci. 2015, 294, 78–94. [Google Scholar] [CrossRef]
Sadaei, H.J.; Enayatifar, R.; Lee, M.H.; Mahmud, M. A hybrid model based on differential fuzzy logic relationships and imperialist competitive algorithm for stock market forecasting. Appl. Soft Comput. 2016, 40, 132–149. [Google Scholar] [CrossRef]
Aladag, C.H.; Yolcu, U.; Egrioglu, E. A high order fuzzy time series forecasting model based on adaptive expectation and artificial neural network. Math. Comput. Simul. 2010, 81, 875–882. [Google Scholar] [CrossRef]
Chen, M.Y. A high-order fuzzy time series forecasting model for internet stock trading. Future Gener. Comput. Syst. 2014, 37, 461–467. [Google Scholar] [CrossRef]
Egrioglu, E.; Aladag, C.H.; Yolcu, U. Fuzzy time series forecasting with a novel hybrid approach combining fuzzy c-means and neural networks. Expert Syst. Appl. 2013, 40, 854–857. [Google Scholar] [CrossRef]
Yolcu, O.C.; Yolcu, U.; Egrioglu, E.; Aladag, C.H. High order fuzzy timeseries forecasting method based on an intersection operation. Appl. Math. Model. 2016, 40, 8750–8765. [Google Scholar] [CrossRef]
Singh, P.; Borah, B. High-order fuzzy-neuro expert system for daily temperature forecasting. Knowl. Based Syst. 2013, 46, 12–21. [Google Scholar] [CrossRef]
Yolcu, O.C.; Lam, H.K. A combined robust fuzzy time series method for prediction of time series. Neurocomputing 2017, 247, 87–101. [Google Scholar] [CrossRef] [Green Version]
Yolcu, O.C.; Alpaslan, F. Prediction of TAIEX based on hybrid fuzzy time series model with single optimization process. Appl. Soft Comput. 2018, 66, 18–33. [Google Scholar] [CrossRef]
Aladag, C.H. Using multiplicative neuron model to establish fuzzy logic relationships. Expert Syst. Appl. 2013, 40, 850–853. [Google Scholar] [CrossRef]
Gaxiola, F.; Melin, P.; Valdez, F.; Castillo, O. Interval type-2 fuzzy weight adjustment for back propagation neural networks with application in time series prediction. Inf. Sci. 2014, 260, 1–14. [Google Scholar] [CrossRef]
Wei, L.Y. A hybrid ANFIS model based on empirical mode decomposition for stock time series forecasting. Appl. Soft Comput. 2016, 42, 368–376. [Google Scholar] [CrossRef]
Sadaei, H.J.; Enayatifar, R.; Guimaraes, F.G.; Mahmud, M.; Alzamil, Z.A. Combining ARFIMA models and fuzzy time series for the forecast of long memory time series. Neurocomputing 2016, 175, 782–796. [Google Scholar] [CrossRef]
Torbat, S.; Khashei, M.; Bijari, M. A hybrid probabilistic fuzzy ARIMA model for consumption forecasting in commodity markets. Econ. Anal. Policy 2018, 58, 22–31. [Google Scholar] [CrossRef]
Kocak, C. ARMA(p, q)-type high order fuzzy time series forecast method based on fuzzy logic relations. Appl. Soft Comput. 2017, 58, 92–103. [Google Scholar] [CrossRef]
Abhishekh, S.S.G.; Singh, S.R. A score function-based method of forecasting using intuitionistic fuzzy time series. New Math. Nat. Comput. 2018, 14, 91–111. [Google Scholar] [CrossRef]
Cheng, C.H.; Chen, C.H. Fuzzy time series model based on weighted association rule for financial market forecasting. Expert Syst. 2018, 35, 23–30. [Google Scholar] [CrossRef]
Guan, H.; Dai, Z.; Zhao, A.; He, J. A novel stock forecasting model based on High-order-fuzzy-fluctuation trends and back propagation neural network. PLoS ONE 2018, 13, e0192366. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gupta, C.; Jain, G.; Tayal, D.K.; Castillo, O. ClusFuDE: Forecasting low dimensional numerical data using an improved method based on automatic clustering, fuzzy relationships and differential evolution. Eng. Appl. Artif. Intell. 2018, 71, 175–189. [Google Scholar] [CrossRef]
Gautam, S.S.; Singh, S. A refined method of forecasting based on high-order intuitionistic fuzzy time series data. Prog. Artif. Intell. 2018, 7, 339–350. [Google Scholar]
Li, R. Water quality forecasting of Haihe River based on improved fuzzy time series model. Desal. Water Treat. 2018, 106, 285–291. [Google Scholar] [CrossRef] [Green Version]
Novak, V. Detection of structural breaks in time series using fuzzy techniques. Int. J. Fuzzy Logic Intell. Syst. 2018, 18, 1–12. [Google Scholar] [CrossRef] [Green Version]
Phan, T.T.H.; Big, A.; Caillault, E.P. A new fuzzy logic-based similarity measure applied to large gap imputation for uncorrelated multivariate time series. Appl. Comput. Intel. Soft Comput. 2018, 2018, 1–15. [Google Scholar] [CrossRef]
Rahim, N.F.; Othman, M.; Sokkalingam, R.; Kadir, E.A. Forecasting crude palm oil prices using fuzzy rule-based time series method. IEEE Access 2018, 6, 32216–32224. [Google Scholar] [CrossRef]
Chukhrova, N.; Johannssen, A. Fuzzy regression analysis: Systematic review and bibliography. Appl. Soft Comput. 2019, 84, 105708. [Google Scholar] [CrossRef]
Akbari, M.G.; Hesamian, G. Linear model with exact inputs and interval-valued fuzzy outputs. IEEE Trans. Fuzzy Syst. 2017, 26, 518–530. [Google Scholar] [CrossRef]
Hesamian, G.; Akbari, M.G. Semi-parametric partially logistic regression model with exact inputs and intuitionistic fuzzy outputs. Appl. Soft Comput. 2017, 58, 517–526. [Google Scholar] [CrossRef]
Hesamian, G.; Akbari, M.G.; Asadollahi, M. Fuzzy semi-parametric partially linear model with fuzzy inputs and fuzzy outputs. Expert Syst. Appl. 2017, 71, 230–239. [Google Scholar] [CrossRef]
Akbari, M.G.; Hesamian, G. Elastic net oriented to fuzzy semiparametric regression model with fuzzy explanatory variables and fuzzy responses. IEEE Trans. Fuzzy Syst. 2019, 27, 2433–2442. [Google Scholar] [CrossRef]
Hesamian, G.; Akbari, M.G. A fuzzy additive regression model with exact predictors and fuzzy responses. Appl. Soft Comput. 2020, 95, 106507. [Google Scholar] [CrossRef]
Hesamian, G.; Torkian, F.; Johannssen, A.; Chukhrova, N. A fuzzy nonparametric regression model based on an extended center and range method. J. Comput. Appl. Math. 2023, 2023, 115377. [Google Scholar] [CrossRef]
Viertl, R. Statistical Methods for Fuzzy Data; Wiley: New York, NY, USA, 2011. [Google Scholar]
Buckley, J.J. Fuzzy Statistics, Studies in Fuzziness and Soft Computing; Springer: Berlin, Germany, 2006. [Google Scholar]
Hesamian, G.; Akbari, M.G. A semi-parametric model for time series based on fuzzy data. IEEE Trans. Fuzzy Syst. 2018, 26, 2953–2966. [Google Scholar] [CrossRef]
Zarei, R.; Akbari, M.G.; Chachi, J. Modeling autoregressive fuzzy time series data based on semi-parametric methods. Soft Comput. 2020, 24, 7295–7304. [Google Scholar] [CrossRef]
Hesamian, G.; Torkian, F.; Yarmohammadi, M. A fuzzy nonparametric time series model based on fuzzy data. Iran. J. Fuzzy Syst. 2022, 19, 61–72. [Google Scholar]
Golub, G.H.; Heath, M.; Wahba, G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 1979, 21, 215–223. [Google Scholar] [CrossRef]
Craven, P.; Wahba, G. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 1979, 31, 377–403. [Google Scholar] [CrossRef]
Chukhrova, N.; Johannssen, A. Fuzzy hypothesis testing: Systematic review and bibliography. Appl. Soft Comput. 2021, 106, 107331. [Google Scholar] [CrossRef]
Lee, K.H. First Course on Fuzzy Theory and Applications; Springer: Berlin, Germany, 2005. [Google Scholar]
Coppi, R.; D’Urso, P.; Giordani, P.; Santoro, A. Least squares estimation of a linear regression model with LR-fuzzy response. Comput. Stat. Data Anal. 2006, 51, 267–286. [Google Scholar] [CrossRef]
Grzegorzewski, P. Testing statistical hypotheses with vague data. Fuzzy Sets Syst. 2000, 11, 501–510. [Google Scholar] [CrossRef]
Mills, T.C. Applied Time Series Analysis: A Practical Guide to Modelling and Forecasting; Academic Press: London, UK, 2019. [Google Scholar]

Figure 1. Plot of

x - l

, x,

x - r

and

\hat{x} - \hat{r}

,

\hat{x}

,

\hat{x} + \hat{r}

for FNPTSM, FSPTSM and FATSM (based on the triweight kernel) in Example 2.

Figure 1. Plot of

x - l

, x,

x - r

and

\hat{x} - \hat{r}

,

\hat{x}

,

\hat{x} + \hat{r}

for FNPTSM, FSPTSM and FATSM (based on the triweight kernel) in Example 2.

Figure 2. Time series on global temperature in Example 3.

Figure 3. Plot of

x - l

, x,

x - r

and

\hat{x} - \hat{r}

,

\hat{x}

,

\hat{x} + \hat{r}

for FNPTSM, FSPTSM and FATSM in Example 3.

Figure 3. Plot of

x - l

, x,

x - r

and

\hat{x} - \hat{r}

,

\hat{x}

,

\hat{x} + \hat{r}

for FNPTSM, FSPTSM and FATSM in Example 3.

Table 1. The mean performance measures of the FNPTSM, FSPTSM and FATSM corresponding to some specific kernels in Example 1.

Method	Kernel	Results	Goodness-of-Fit Criteria
FNPTSM	Gaussian		$\bar{MFE} = 1.0452$
		$\bar{\hat{h}} = 0.45$	$\bar{MASE} = 1.6089$
		$\bar{\hat{h^{l}}} = 0.04$	$\bar{BIA} = 0.9996$
		$\bar{\hat{h^{r}}} = 0.22$	$\bar{MSM} = 0.4167$
	Epanechnikov		$\bar{MFE} = 1.1728$
		$\bar{\hat{h}} = 0.66$	$\bar{MASE} = 1.6478$
		$\bar{\hat{h^{l}}} = 0.05$	$\bar{BIA} = 0.9992$
		$\bar{\hat{h^{r}}} = 0.39$	$\bar{MSM} = 0.3953$
	triweight		$\bar{MFE} = 1.2482$
		$\bar{\hat{h}} = 1.89$	$\bar{MASE} = 1.6339$
		$\bar{\hat{h^{l}}} = 0.08$	$\bar{BIA} = 0.9991$
		$\bar{\hat{h^{r}}} = 0.54$	$\bar{MSM} = 0.3721$
FSPTSM	Gaussian	${\bar{h}}_{opt} = 0.07$	$\bar{MFE} = 8.9728$
		${\hat{θ}}_{1} = 0.5575$	$\bar{MASE} = 4.2652$
		${\hat{θ}}_{2} = - 0.0956$	$\bar{BIA} = 0.9536$
		${\hat{θ}}_{3} = - 0.1247$	$\bar{MSM} = 0.2207$
	Epanechnikov	${\bar{h}}_{opt} = 0.02$	$\bar{MFE} = 5.2530$
		${\hat{θ}}_{1} = - 0.6424$	$\bar{MASE} = 4.5383$
		${\hat{θ}}_{2} = - 0.5168$	$\bar{BIA} = 0.9603$
		${\hat{θ}}_{3} = - 0.4258$	$\bar{MSM} = 0.2920$
	triweight	${\bar{h}}_{opt} = 0.13$	$\bar{MFE} = 11.5231$
		${\hat{θ}}_{1} = 0.3757$	$\bar{MASE} = 3.8643$
		${\hat{θ}}_{2} = - 0.1576$	$\bar{BIA} = 0.9738$
		${\hat{θ}}_{3} = - 0.2568$	$\bar{MSM} = 0.2133$
FATSM	Gaussian		$\bar{MFE} = 1.392$
		${\bar{\hat{h}}}_{1} = 1.8$	$\bar{MASE} = 18.028$
		${\bar{\hat{h}}}_{2} = 0.7$	$\bar{BIA} = 0.972$
		${\bar{\hat{h}}}_{3} = 0.03$	$\bar{MSM} = 0.321$
	Epanechnikov		$\bar{MFE} = 1.353$
		${\bar{\hat{h}}}_{1} = 1.75$	$\bar{MASE} = 17.548$
		${\bar{\hat{h}}}_{2} = 1.20$	$\bar{BIA} = 0.976$
		${\bar{\hat{h}}}_{3} = 0.6$	$\bar{MSM} = 0.339$
	triweight		$\bar{MFE} = 1.348$
		${\bar{\hat{h}}}_{1} = 2$	$\bar{MASE} = 16.459$
		${\bar{\hat{h}}}_{2} = 1.5$	$\bar{BIA} = 0.979$
		${\bar{\hat{h}}}_{3} = 0.5$	$\bar{MSM} = 0.349$

Table 2. Fuzzy time series data in Example 2.

t	${\tilde{x}}_{t}$	t	${\tilde{x}}_{t}$
1	${(1.7337; 0.8051)}_{T}$	15	${(2.9145; 1.1507)}_{T}$
2	${(2.3302; 0.9228)}_{T}$	16	${(2.6085; 1.1335)}_{T}$
3	${(1.3199; 0.7742)}_{T}$	17	${(3.0432; 0.4489)}_{T}$
4	${(5.0507; 0.8948)}_{T}$	18	${(6.8010; 0.9588)}_{T}$
5	${(1.4206; 1.0540)}_{T}$	19	${(4.9351; 0.8115)}_{T}$
6	${(4.0273; 0.9331)}_{T}$	20	${(3.5672; 0.6054)}_{T}$
7	${(2.8624; 1.0480)}_{T}$	21	${(3.8828; 1.1579)}_{T}$
8	${(4.7107; 1.0647)}_{T}$	22	${(0.5183; 0.9652)}_{T}$
9	${(4.1098; 1.1028)}_{T}$	23	${(3.6846; 0.8175)}_{T}$
10	${(4.4843; 1.0881)}_{T}$	24	${(3.5117; 0.4248)}_{T}$
11	${(1.3249; 1.0064)}_{T}$	25	${(2.8294; 0.7956)}_{T}$
12	${(3.2249; 0.5503)}_{T}$	26	${(2.3836; 1.0101)}_{T}$
13	${(3.2916; 0.5049)}_{T}$	27	${(3.9454; 0.7558)}_{T}$
14	${(3.5508; 0.9268)}_{T}$	28	${(3.6012; 1.0266)}_{T}$

Table 3. The performance measures of the FNPTSM, FSPTSM and FATSM corresponding to some specific kernels in Example 2.

Method	Kernel	Results	Goodness-of-Fit Criteria
FNPTSM	Gaussian	$\hat{p} = 2$	$MFE = 0.1127$
		$\hat{h} = 0.64$	$MASE = 0.4030$
		$\hat{h^{l}} = 0.22$	$BIA = 0.9629$
		$\hat{h^{r}} = 0.22$	$MSM = 0.3791$
	Epanechnikov	$\hat{p} = 2$	$MFE = 0.1133$
		$\hat{h} = 1.35$	$MASE = 0.4016$
		$\hat{h^{l}} = 0.09$	$BIA = 0.9639$
		$\hat{h^{r}} = 0.09$	$MSM = 0.3836$
	triweight	$\hat{p} = 3$	$MFE = 0.1087$
		$\hat{h} = 1.44$	$MASE = 0.3722$
		$\hat{h^{l}} = 0.65$	$BIA = 0.9626$
		$\hat{h^{r}} = 0.65$	$MSM = 0.4439$
FSPTSM	Gaussian	$\hat{h} = 0.13, \hat{p} = 3$	$MFE = 3.6124$
		${\hat{θ}}_{1} = - 0.2305$	$MASE = 2.1452$
		${\hat{θ}}_{2} = - 0.0613$	$BIA = 0.9666$
		${\hat{θ}}_{3} = - 0.4164$	$MSM = 0.0061$
	Epanechnikov	$\hat{h} = 0.16$	$MFE = 1.5749$
		$\hat{p} = 1$	$MASE = 1.3135$
		${\hat{θ}}_{1} = - 0.3192$	$BIA = 0.9438$
			$MSM = 0.1080$
	triweight	$\hat{h} = 0.23$	$MFE = 1.6252$
		$\hat{p} = 1$	$MASE = 1.3371$
		${\hat{θ}}_{1} = - 0.3437$	$BIA = 0.9443$
			$MSM = 0.1205$
FATSM	Gaussian	$\hat{p} = 2$	$MFE = 0.327$
		${\hat{h}}_{1} = 0.2$	$MASE = 0.943$
		${\hat{h}}_{2} = 0.3$	$BIA = 0.954$
			$MSM = 0.341$
	Epanechnikov	$\hat{p} = 2$	$MFE = 0.386$
		${\hat{h}}_{1} = 0.3$	$MASE = 1.302$
		${\hat{h}}_{2} = 1.5$	$BIA = 0.948$
			$MSM = 0.3208$
	triweight	$\hat{p} = 3$	$MFE = 0.254$
		${\hat{h}}_{1} = 0.5$	$MASE = 0.733$
		${\hat{h}}_{2} = 0.1$	$BIA = 0.958$
		${\hat{h}}_{3} = 0.05$	$MSM = 0.358$

Table 4. The performance measures of the FNPTSM, FSPTSM and FATSM corresponding to some specific kernels in Example 3.

Method	Kernel	Results	Goodness-of-Fit Criteria
FNPTSM	Gaussian	$\hat{p} = 2$	$MFE = 0.0061$
		$\hat{h} = 0.030$	$MASE = 14.6082$
		$\hat{h^{l}} = 0.006$	$BIA = 0.9783$
		$\hat{h^{r}} = 0.005$	$MSM = 0.3781$
	Epanechnikov	$\hat{p} = 2$	$MFE = 0.0061$
		$\hat{h} = 0.051$	$MASE = 14.6360$
		$\hat{h^{l}} = 0.013$	$BIA = 0.9782$
		$\hat{h^{r}} = 0.007$	$MSM = 0.3787$
	triweight	$\hat{p} = 2$	$MFE = 0.0067$
		$\hat{h} = 0.032$	$MASE = 14.8885$
		$\hat{h^{l}} = 0.011$	$BIA = 0.9761$
		$\hat{h^{r}} = 0.010$	$MSM = 0.3921$
FSPTSM	Gaussian	$\hat{h} = 0.05, \hat{p} = 3$	$MFE = 0.0153$
		${\hat{θ}}_{1} = 0.3912$	$MASE = 20.4387$
		${\hat{θ}}_{2} = 0.2045$	$BIA = 0.9462$
		${\hat{θ}}_{3} = - 0.1916$	$MSM = 0.3714$
	Epanechnikov	$\hat{h} = 0.10, \hat{p} = 3$	$MFE = 0.0148$
		${\hat{θ}}_{1} = 0.4011$	$MASE = 20.0997$
		${\hat{θ}}_{2} = 0.2095$	$BIA = 0.9483$
		${\hat{θ}}_{3} = - 0.1894$	$MSM = 0.3711$
	triweight	$\hat{h} = 0.11, \hat{p} = 3$	$MFE = 0.0157$
		${\hat{θ}}_{1} = 0.3643$	$MASE = 19.7872$
		${\hat{θ}}_{2} = 0.1853$	$BIA = 0.9457$
		${\hat{θ}}_{3} = - 0.2175$	$MSM = 0.3954$
FATSM	Gaussian	$\hat{p} = 3$	$MFE = 0.011$
		${\hat{h}}_{1} = 1.5$	$MASE = 19.625$
		${\hat{h}}_{2} = 1.75$	$BIA = 0.964$
		${\hat{h}}_{3} = 0.05$	$MSM = 0.310$
	Epanechnikov	$\hat{p} = 3$	$MFE = 0.010$
		${\hat{h}}_{1} = 1.25$	$MASE = 17.625$
		${\hat{h}}_{2} = 1.73$	$BIA = 0.966$
		${\hat{h}}_{3} = 0.1$	$MSM = 0.33$
	triweight	$\hat{p} = 3$	$MFE = 0.0098$
		${\hat{h}}_{1} = 1.5$	$MASE = 16.745$
		${\hat{h}}_{2} = 1.77$	$BIA = 0.960$
		${\hat{h}}_{3} = 0.13$	$MSM = 0.345$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hesamian, G.; Johannssen, A.; Chukhrova, N. A Three-Stage Nonparametric Kernel-Based Time Series Model Based on Fuzzy Data. Mathematics 2023, 11, 2800. https://doi.org/10.3390/math11132800

AMA Style

Hesamian G, Johannssen A, Chukhrova N. A Three-Stage Nonparametric Kernel-Based Time Series Model Based on Fuzzy Data. Mathematics. 2023; 11(13):2800. https://doi.org/10.3390/math11132800

Chicago/Turabian Style

Hesamian, Gholamreza, Arne Johannssen, and Nataliya Chukhrova. 2023. "A Three-Stage Nonparametric Kernel-Based Time Series Model Based on Fuzzy Data" Mathematics 11, no. 13: 2800. https://doi.org/10.3390/math11132800

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Three-Stage Nonparametric Kernel-Based Time Series Model Based on Fuzzy Data

Abstract

1. Introduction

2. Fuzzy Numbers

3. Nonparametric Kernel-Based Time Series Model for Fuzzy Data

3.1. The Model

3.2. Three-Stage Estimation Method for the Nonlinear Fuzzy Smooth Function

3.3. Selection of Autoregressive Order and Optimal Bandwidths

4. Numerical Examples

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI