Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors

Huang, Chunli; Zhao, Xu; Cheng, Weihu; Ji, Qingqing; Duan, Qiao; Han, Yufei

doi:10.3390/math10091433

Open AccessArticle

Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors

by

Chunli Huang

¹,

Xu Zhao

^1,*,

Weihu Cheng

¹,

Qingqing Ji

^2,3,

Qiao Duan

⁴ and

Yufei Han

⁵

¹

Faculty of Science, Beijing University of Technology, Beijing 100124, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

⁴

Faculty of Humanities and Social Sciences, Beijing University of Technology, Beijing 100124, China

⁵

School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(9), 1433; https://doi.org/10.3390/math10091433

Submission received: 14 March 2022 / Revised: 19 April 2022 / Accepted: 21 April 2022 / Published: 24 April 2022

(This article belongs to the Special Issue Computational Statistics and Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

Air pollution is a major global problem, closely related to economic and social development and ecological environment construction. Air pollution data for most regions of China have a close correlation with time and seasons and are affected by multidimensional factors such as meteorology and air quality. In contrast with classical peaks-over-threshold modeling approaches, we use a deep learning technique and three new dynamic conditional generalized Pareto distribution (DCP) models with weather and air quality factors for fitting the time-dependence of the air pollutant concentration and make statistical inferences about their application in air quality analysis. Specifically, in the proposed three DCP models, a dynamic autoregressive exponential function mechanism is applied for the time-varying scale parameter and tail index of the conditional generalized Pareto distribution, and a sufficiently high threshold is chosen using two threshold selection procedures. The probabilistic properties of the DCP model and the statistical properties of the maximum likelihood estimation (MLE) are investigated, simulating and showing the stability and sensitivity of the MLE estimations. The three proposed models are applied to fit the

{PM}_{2.5}

time series in Beijing from 2015 to 2021. Real data are used to illustrate the advantages of the DCP, especially compared to the estimation volatility of GARCH and AIC or BIC criteria. The DCP model involving both the mixed weather and air quality factors performs better than the other two models with weather factors or air quality factors alone. Finally, a prediction model based on long short-term memory (LSTM) is used to predict

{PM}_{2.5}

concentration, achieving ideal results.

Keywords:

generalized Pareto distribution; peaks over threshold; dynamic conditional autoregressive modeling; threshold selection; long short-term memory

MSC:

62P12

1. Introduction

Air pollution, closely related to economic and social development as well as ecological environment construction, is a global problem that destroys human living environments. In recent years, the Chinese government has attached great importance to the prevention and control of air pollution. The World Air Quality Report 2021 released by the Swiss company IQAir pointed out that air quality in China continued to improve in 2021. Compared to 2020,

{PM}_{2.5}

concentrations decreased in 66% of Chinese cities [1]. However, China still faces environmental challenges. Air pollution is mainly composed of harmful gases and particulate matter, which are released into the atmosphere by natural or human activities, and its concentration is far beyond the self-purification capacity of the atmosphere, resulting in changes in the composition of the atmosphere, endangering human health and living environments. Smog, as a seriously harmful air pollutant, has received growing attention. Severe smog levels pose a huge threat to China’s public health [2,3,4,5]. Short-term exposure to air pollution will cause cough, dyspnea, headache, fatigue and other phenomena, while long-term exposure to air pollution will lead to respiratory diseases, cardiovascular damage, nervous system damage and other diseases, and may even lead to birth defects and death [6,7,8,9]. Climate warming, sea-level rises, acid rain, the hole in the ozone layer and other particulate pollution directly highlight the environmental problems caused by air pollution, harming human survival and development. To improve air quality and human living environments, air pollution has become a key topic for researchers, and monitoring, assessment, prediction and prevention have become important research directions in the study of air pollution. Smog is also closely linked to China’s economic development [10,11,12,13]. It is necessary to make full use of multidimensional big data and give full play to the advantages of statistics and artificial intelligence technology. By virtue of interdisciplinary development, researchers have been able to vigorously develop statistical modeling theory and deep learning technology for accurate prediction and effective control of urban air quality. As a result, a solid theoretical foundation and effective technical support can be provided for improving the capabilities and level of ecological and environmental conservation.

Fine particulate matter (

{PM}_{2.5}

) is the most common object in the studies of pollutant concentration, and the higher the concentration in the air, the more serious the air pollution [14,15,16]. From a statistical point of view, pollutant concentration prediction has become an important research direction in air pollution forecasting and prevention. The existing research on air pollutant concentration mainly focuses on the sources, concentration distributions, fluctuations, affecting factors, adverse effects on human health and so on. Quantitative prediction of pollutant concentration is the most common statistical method for dealing with air pollution problems, and multivariate regression, cluster analysis and principal component analysis are the most frequently used statistical models. Chen Songxi, an academician of the Chinese Academy of Sciences, applied non-parametric statistics to the national air pollution assessment and prevention research and proposed a method for adjusting spatio-temporal meteorological factors to remove the meteorological confounding effect in atmospheric environmental monitoring, providing a scientific method for accurately measuring pollutant discharge and evaluating air pollution control [17,18,19,20,21,22]. Wang et al. (2020) [23] established a spatio-temporal

O_{3}

pollution land use regression (LUR) model suitable for large cities based on parametric, non-parametric and semi-parametric classical statistical algorithms combined with meteorological factors, with the ability to monitor

O_{3}

concentration with high spatio-temporal resolution.

In order to improve the prediction accuracy, classical extreme value theory (EVT) has attracted more and more attention. Compared with fine weather, researchers are more concerned with observations of pollutant concentrations exceeding a certain high threshold. Pickands (1975) [24] pointed out that observations above a certain threshold can be approximated well by the generalized Pareto distribution (GPD). As a branch of EVT, GPD plays an important role in many fields. In the field of hydrometeorology, the GPD model is used to analyze and forecast natural phenomena such as floods, wind and rainfall [25,26]. In the financial field, stock yield is non-normal and thick-tailed, which can be well fitted by a GPD [27,28]. In the field of insurance, insurance losses are generally non-negative with a thick tail, and a GPD is usually used to predict the maximum loss [29,30].

Due to the strong time correlation of observations, a traditional model with fixed parameters cannot perfectly fit the time-series observations in reality. To solve this problem, many researchers have conducted in-depth research on the dynamic extreme value distribution model and the dynamic over-threshold GPD model. Using the autoregressive mechanism of the GARCH model, Zhao et al. (2018) [31] established an autoregressive conditional Fréchet model with time-dependent parameters (type II GEV) for the sequence of daily maximum stock returns. They solved the maximum likelihood estimation of the model parameters and studied the large-sample properties of the parameters. Chavez-Demoulin et al. (2014) [32] applied a Bayesian method to update the time-varying GPD parameters for the UBS stock price, which was a non-parametric method applied to a POT–GPD model. Kelly and Jiang (2014) [33] built a dynamic tail model with POT–GDP for panel data and measured the tail risk of the S&P 500 index. Massacci (2017) [34] studied the time-dependent dynamic parameter estimation of a GPD through the score-based approach, in order to accurately estimate the tail index from U.S. size-sorted decile stock portfolios. Shen et al. (2020) [27] established an autoregressive conditional Pareto (ACP) distribution model via an exponential function. The maximum likelihood estimation of the parameter was given, and its properties were studied. Based on the parameter estimation, they employed the ACP model to the Dow Jones Industrial Average and the S&P 500 index. Deng et al. (2020) [35] applied a dynamic model to air quality management, taking time and meteorological factors into consideration and establishing a dynamic conditional autoregressive Weibull distribution model (type III GEV) via the maximum daily pollutant concentrations. The probabilistic properties of the autoregressive model were investigated in their study. As is well known, it is difficult to fit the

{PM}_{2.5}

time series accurately, so the model selection and statistical inferences are the most important and common challenges for real data applications.

Threshold selection is a critical issue for fitting the autoregressive conditional generalized Pareto distribution model. In practice, the threshold should be chosen in advance. If the threshold is too large, the sample size of observations exceeding the threshold will be too small, which may increase the variance of the parameter estimation and affect the estimation effect. If the threshold is too small, the sample size can be increased, but the estimator is prone to bias. Choulakian and Stephens (2001) [36] transformed the threshold selection into the goodness-of-fit test of the model. Through the selection method, an appropriate threshold was chosen, allowing the exceedance to follow the GPD, and the threshold selection was carried out at the same time as testing the model. Bermudez et al. (2001) [37] used a Bayesian predictive approach to the peaks-over-threshold (POT) method, which can also be applied to small-sample situations. Bader et al. (2018) [38] proposed an automated threshold selection procedure based on a sequence of goodness-of-fit tests, and attained automatic threshold selection by applying stopping rules, which transform the results of ordered, sequentially tested hypotheses to control the false discovery rate. Yang et al. (2018) [39] developed an empirical threshold selection method based on the relationship between eigenvalues and thresholds. Schneider et al. (2021) [40] proposed selecting the threshold by minimizing the asymptotic mean squared error of the Hill estimator.

With the continuous development of artificial intelligence and machine learning, an increasing number of scholars have applied traditional machine learning methods to statistical prediction models in recent years and achieved good results in terms of accuracy and time efficiency. Boznar et al. (1993) [41] compared prediction results based on the three-layer neural network perceptron with the results generated by a traditional atmospheric diffusion model. Neagu et al. (2002) [42] used a fuzzy neural network model to predict the concentration of nitrogen oxide pollutants, achieving good results. Esfandani and Nematzadeh (2016) [43] proposed a prediction model for air quality in Tehran based on a feedback neural network. Amarpuri et al. (2019) [44] established a convolutional long short-term memory network to predict carbon dioxide emissions and achieved ideal results. An air pollution prediction model based on LSTM is a good choice for predicting

{PM}_{2.5}

concentrations. García et al. (2020) [45] analyzed the concentrations of nitrogen dioxide (

{NO}_{2}

), nitrogen oxides (

{NO}_{X}

), particulate matter (

{PM}_{10}

) and toluene (

C_{7} H_{8}

) at eight sites in Madrid (Spain) through seven regression-based machine learning models and time-series models. Sánchez-Pérez et al. (2020) [46] established a complete spatio-temporal dispersion model for pollutants through a network simulation method, to obtain the concentrations of pollutants released at any time in a given space. Sayeed et al. (2021) [47] used a generalized deep convolutional neural network (CNN) model to predict air pollutants, which could predict the hourly pollutant concentration within 7 days with relatively high accuracy.

In this paper, dynamic autoregressive mechanisms are applied, and weather and air quality factors are also involved in our model. The framework of this paper follows that of [27]. The three main contributions of this paper are as follows. First, we construct a dynamic conditional generalized Pareto distribution (DCP) with both weather and air quality factors to fit the smog observations, considering the time-dependency of the scale parameter and the tail index of the GPD. Secondly, the threshold is chosen using a threshold selection program rather than by specifying a quantile. Thirdly, the

{PM}_{2.5}

time series are predicted by combining the DCP model with deep learning technology.

2. DCP Model

2.1. Conditional Distribution

The cumulative distribution function of a three-parameter GPD is defined as

G_{μ, ξ, σ} (x) = {\begin{matrix} 1 - {(1 + \frac{ξ}{σ} (x - μ))}^{- \frac{1}{ξ}}, & ξ \neq 0 \\ 1 - \exp (- \frac{x - μ}{σ}), & ξ = 0 \end{matrix},

(1)

where

μ \in (- \infty, \infty)

is the location parameter,

ξ \in (- \infty, \infty)

is the shape parameter and

σ \in (0, \infty)

the scale parameter. In Equation (1),

μ \leq x < \infty

when

ξ \geq 0

,

μ < x < μ - σ / ξ

when

ξ < 0

. In particular, the GPD is an exponential distribution when

ξ = 0

.Additionally, the two-parameter GPD should be mentioned here for its extensive use, especially in parameter estimation. The classical two-parameter GPD(

ξ, σ

) is obtained by taking

μ = 0

in (1) [48]. The GPD has an important property that if X is a random variable (r.v.) distributed according to a GPD(

ξ, σ

), then the r.v.

Y = X - u | X > u

has a GPD(

ξ, σ + ξ u

) for a threshold u. This means that the shape parameter does not alter in the “excess over the threshold” operation.

Let {

Q_{t}}_{t = 1}^{n}

be the time sequence of the daily moving average concentration of

{PM}_{2.5}

at time t for smog occurrences, where n denotes the size of the observations. Let

F_{t} (q_{t} | F_{t - 1})

be the conditional cumulative distribution of

Q_{t}

, where

F_{t - 1}

denotes the available information set until time

t - 1

. In practice, the underlying distribution

F_{t} (q_{t} | F_{t - 1})

of the dataset is unknown. Based on the famous Pickands–Balkema–de Haan theorem [24,49], the standard practice is to employ GPD modeling for the tail region if the dataset under the POT framework of the original distribution is in the maximum domain of attraction. The obvious limitation is that the time characteristics of

Q_{t}

are totally ignored, which may result in the loss of some sample information and cause the statistical inference result to be inaccurate if the dataset depends strongly on time. To solve this problem, Massacci (2017) [34] and Shen et al. (2020) [27] proposed a dynamic GPD framework under the parameters (

u, ζ_{t}, α_{t}

) as follows:

G_{t}^{u} (q_{t} | F_{t - 1}) = 1 - {(1 + \frac{q_{t} - u}{α_{t}})}^{- ζ_{t}}, q_{t} > u > 0, ζ_{t} > 0, α_{t} > 0,

where the parameters

α_{t}

and

ζ_{t}

are time varying, and

ζ_{t}

is the tail index and u is the selected threshold. When the POT approach is employed, based on the Pickands–Balkema–de Haan theorem [24,49], the conditional distribution of the positive excess

F_{t}^{u} (q_{t} | F_{t - 1})

can be approximated by the dynamic GPD if the distribution satisfies the condition of the theorem. That is,

lim_{u \to + \infty} sup_{u \leq q_{t} < + \infty} | F_{t}^{u} (q_{t} | F_{t - 1}) - G_{t}^{u} (q_{t} | F_{t - 1}) | = 0,

where

\begin{matrix} F_{t}^{u} (q_{t} | F_{t - 1}) & = P (u < Q_{t} \leq q_{t} | Q_{t} > u; F_{t - 1}) \\ = \frac{F_{t} (q_{t} | F_{t - 1}) - F_{t} (u | F_{t - 1})}{1 - F_{t} (u | F_{t - 1})}, 0 < u \leq q_{t}, \end{matrix}

F_{t}^{u} (q_{t} | F_{t - 1})

is assumed to be the dynamic GPD, so we can rewrite

F_{t} (q_{t} | F_{t - 1})

as

\begin{matrix} F_{t} (q_{t} | F_{t - 1}) & = P (Q_{t} \leq q_{t} | F_{t - 1}) \\ = [1 - F_{t} (u | F_{t - 1})] F_{t}^{u} (q_{t} | F_{t - 1}) + F_{t} (u | F_{t - 1}) \\ = P_{t} G_{t}^{u} (q_{t} | F_{t - 1}) + 1 - P_{t}, \end{matrix}

(2)

where

P_{t} = P (Q_{t} > u | F_{t - 1})

.

For a given threshold value u, we focus on the exceedance

Q_{t} > u

and define

Y_{t} =

max

(Q_{t} - u, 0)

. Based on Equation (2), the corresponding conditional cumulative distribution function

P (Y_{t} \leq y_{t} | F_{t - 1}) = H_{t} (y_{t} | F_{t - 1})

of

Y_{t}

is as follows [27,34]:

\begin{matrix} H_{t} (y_{t} | F_{t - 1}) & = I (y_{t} = 0) (1 - P_{t}) + I (y_{t} > 0) F_{t} (y_{t} + u | F_{t - 1}) \\ = I (y_{t} = 0) (1 - P_{t}) + I (y_{t} > 0) [P_{t} G_{t}^{u} (y_{t} + u | F_{t - 1}) + 1 - P_{t}] \\ = I (y_{t} = 0) (1 - P_{t}) + I (y_{t} > 0) [1 - P_{t} {(1 + \frac{y_{t}}{α_{t}})}^{- ζ_{t}}], \end{matrix}

where

I (\cdot)

is an indicator function.

P_{t}

is approximated as a power law multiplied by a time-varying function slowly varying at infinity. Massacci (2017) [34] parameterized the function and obtained the following formula for

P_{t}

:

P_{t} = {(1 + u)}^{- ζ_{t}} .

(3)

From Equation (3), the cumulative distribution function

H_{t} (Y_{t} | F_{t - 1})

of

Y_{t}

becomes

H_{t} (y_{t} | F_{t - 1}) = I (y_{t} = 0) [1 - {(1 + u)}^{- ζ_{t}}] + I (y_{t} > 0) [1 - {(1 + u)}^{- ζ_{t}} {(1 + \frac{y_{t}}{α_{t}})}^{- ζ_{t}}], u > 0, α_{t} > 0, ζ_{t} > 0 .

(4)

By solving the inverse function of

Y_{t}

, we obtain

Y_{t} = α_{t} I (P_{t} > Z_{t}) [{(\frac{P_{t}}{Z_{t}})}^{\frac{1}{ζ_{t}}} - 1],

(5)

where

Z_{t}

follows a uniform distribution in (0,1) and

P_{t}

is as given in (3). Equation (4) contains three distributions from EVT: the POT framework for the GPD, the power law for the conditional probability of

Y_{t}

greater than 0 and the uniform distribution of

Z_{t}

.

2.2. Model Specification

Shen et al. (2022) [27] assumed that

ζ_{t} / α_{t} = b

for simplicity, and the form of

α_{t}

was modeled as follows:

log α_{t} = β_{0} + β_{1} log α_{t - 1} + β_{2} exp (- β_{3} Y_{t - 1}),

where

0 \leq β_{1} \leq 1, β_{2} > 0, β_{3} > 0

.

In this paper, after threshold selection for the dynamic conditional generalized Pareto (DCP) model, we concentrate on the autoregression of both

α_{t}

and

ζ_{t}

, which are the critical parameters reflecting the tail behavior. We impose a dynamic structure on the time-dependent parameters (

α_{t}, ζ_{t}

) and consider weather and air quality factors.

Specifically, the DCP model with weather and air quality factors assumes the form

log α_{t} = β_{0} + β_{1} log α_{t - 1} + η_{1} (Q_{t - 1}, T_{t - 1}, H_{t - 1}, W_{t - 1}, S O 2_{t - 1}, N O 2_{t - 1}, C O_{t - 1}),

(6)

log ζ_{t} = γ_{0} + γ_{1} log ζ_{t - 1} + η_{2} (Q_{t - 1}, T_{t - 1}, H_{t - 1}, W_{t - 1}, S O 2_{t - 1}, N O 2_{t - 1}, C O_{t - 1}),

(7)

where

β_{1}, γ_{1} \in (0, 1)

,

β_{0}, γ_{0} \in R

,

η_{1} (\cdot)

and

η_{2} (\cdot)

are the observation-driven functions for

log α_{t}

and

log ζ_{t}

.

T_{t}, H_{t}, W_{t}

denote daily average temperature, average relative humidity and average wind speed, and

S O 2_{t}, N O 2_{t}, C O_{t}

denote daily moving average concentrations of sulfur dioxide, nitrogen dioxide and carbon monoxide on day t, respectively. These three weather factors and three air quality factors are commonly considered in studies on smog. Other weather and air quality factors could also be considered, but the model complexity may be increased, and the effect may be weakened by adding too many factors.

We use continuous monotonic exponential functions in

η_{1} (\cdot)

and

η_{2} (\cdot)

, as in other studies in the literature [27,31], for simplicity, flexibility and easy interpretation. From Equation (5), there exists a positive association between

α_{t}

and

Y_{t}

, while

ζ_{t}

and

Y_{t}

are negatively correlated. An increasing

η_{1} (\cdot)

and a decreasing

η_{2} (\cdot)

ensure that a large

Y_{t}

is followed by a large

α_{t}

and small

ζ_{t}

, so we choose the autoregressive process with weather factors as

log α_{t} = β_{0} + β_{1} log α_{t - 1} - β_{2} exp (- β_{3} Y_{t - 1} + β_{4} T_{t - 1} + β_{5} W_{t - 1} + β_{6} H_{t - 1}),

(8)

log ζ_{t} = γ_{0} + γ_{1} log ζ_{t - 1} + γ_{2} exp (- γ_{3} Y_{t - 1} + γ_{4} T_{t - 1} + γ_{5} W_{t - 1} + γ_{6} H_{t - 1}),

(9)

where

Y_{t}

is given in (5), and {

Z_{t}

}

\overset{i . i . d}{\sim} U (0, 1)

,

β_{i}, γ_{i} \in R

,

i = 0, 4, 5, 6

,

0 \leq β_{1} \neq γ_{1} < 1

,

β_{j}, γ_{j} > 0

,

j = 2, 3

. Equations (8) and (9) mean that an extreme event observed at time

t - 1

(large

Y_{t - 1}

) causes the distribution of

Y_{t}

to have a larger scale (large

α_{t}

) and a heavier tail (small

ζ_{t}

). That is why the exceedances tend to occur at around the same period in our examples.

In addition,

α_{t}

and

ζ_{t}

of the DCP model with air quality factors and the model with mixed weather and air quality factors are expressed in the same way as in Equations (10)–(13), respectively.

log α_{t} = β_{0} + β_{1} log α_{t - 1} - β_{2} exp (- β_{3} Y_{t - 1} - β_{4} S O 2_{t - 1} - β_{5} N O 2_{t - 1} - β_{6} C O_{t - 1}),

(10)

log ζ_{t} = γ_{0} + γ_{1} log ζ_{t - 1} + γ_{2} exp (- γ_{3} Y_{t - 1} - γ_{4} S O 2_{t - 1} - γ_{5} N O 2_{t - 1} - γ_{6} C O_{t - 1}),

(11)

where

β_{i}, γ_{i} \in R

,

i = 0, 4, 5, 6

,

0 \leq β_{1} \neq γ_{1} < 1

,

β_{j}, γ_{j} > 0

,

j = 2, 3

.

log α_{t} = β_{0} + β_{1} log α_{t - 1} - β_{2} exp (- β_{3} Y_{t - 1} - β_{4} S O 2_{t - 1} - β_{5} C O_{t - 1} + β_{6} W_{t - 1} + β_{7} H_{t - 1}),

(12)

log ζ_{t} = γ_{0} + γ_{1} log ζ_{t - 1} + γ_{2} exp (- γ_{3} Y_{t - 1} - γ_{4} S O 2_{t - 1} - γ_{5} C O_{t - 1} + γ_{6} W_{t - 1} + γ_{7} H_{t - 1}),

(13)

where

β_{i}, γ_{i} \in R

,

i = 0, 4, 5, 6, 7

,

0 \leq β_{1} \neq γ_{1} < 1

,

β_{j}, γ_{j} > 0

,

j = 2, 3

.

Specific details of model applications will be discussed in Section 5 and Section 6.

3. Estimation and Properties

In this section, we consider the maximum likelihood estimation method for estimating the parameters in the DCP models.

3.1. Maximum Likelihood Estimation

Taking the weather factors model as an example, we denote

Θ_{s} = {θ = (β_{0}, β_{1}, β_{2}, β_{3}, β_{4},

β_{5}, β_{6}, γ_{0}, γ_{1}, γ_{2}, γ_{3}, γ_{4}, γ_{5}, γ_{6}) | 0 \leq β_{1} \neq γ_{1} < 1, β_{2} > 0, β_{3} > 0, γ_{2} > 0, γ_{3} > 0, β_{i},

γ_{i} \in R, i = 0, 4, 5, 6}

as the parameter space in the DCP model with weather factors. The conditional probability function of

Y_{t}

can be obtained according to Equation (4) as

h_{t} (Y_{t} | F_{t - 1}) = I (Y_{t} = 0) [1 - {(1 + u)}^{- ζ_{t}}] + I (Y_{t} > 0) [\frac{ζ_{t}}{α_{t}} {(1 + u)}^{- ζ_{t}} {(1 + \frac{Y_{t}}{α_{t}})}^{- ζ_{t} - 1}],

where

u > 0, α_{t} > 0, ζ_{t} > 0

.

The corresponding log-likelihood function with respect to the parameter

θ

is

\begin{matrix} L_{n} (θ) = \frac{1}{n} \sum_{t = 1}^{n} {I (Y_{t} = 0) log [1 - {(1 + u)}^{- ζ_{t}}] + I (Y_{t} > 0) [log ζ_{t} - log α_{t} \\ - ζ_{t} log (1 + u) - (ζ_{t} + 1) log (1 + \frac{Y_{t}}{α_{t}})]} . \end{matrix}

(14)

Next, the process of maximum likelihood estimation is briefly introduced. With reference to [38,50], we adopt two threshold selection methods. The details are given in Section 5. After obtaining a sufficient high threshold u, we can obtain all

Y_{t}

from

Y_{t} =

max

(Q_{t} - u, 0)

. We choose

log α_{1}

as

\frac{β_{0} - β_{2} / 2}{1 - β_{1}}

and

log ζ_{1}

as

\frac{γ_{0} + γ_{2} / 2}{1 - γ_{1}}

, which lies in the middle of

G = (\frac{β_{0} - β_{2}}{1 - β_{1}}, \frac{β_{0}}{1 - β_{1}}) \times (\frac{γ_{0}}{1 - γ_{1}}, \frac{γ_{0} + γ_{2}}{1 - γ_{1}})

, and obtain all

α_{t}

and

ζ_{t}

with the DCP model, using

Y_{t}, log α_{1}, log ζ_{1}

and

θ

. Finally, we calculate the likelihood function using Equation (14) and obtain the MLE estimator of

θ

. The details are shown in Section 6.

3.2. Statistical Properties

The dynamic evolution Equations (6) and (7) without weather and air quality factors can be rewritten as

\begin{matrix} log α_{t} & = & β_{0} + β_{1} log α_{t - 1} - β_{2} exp {- β_{3} α_{t - 1} I (P_{t - 1} > Z_{t - 1}) [{(P_{t - 1} / Z_{t - 1})}^{1 / ζ_{t - 1}} - 1]}, \end{matrix}

(15)

\begin{matrix} log ζ_{t} & = & γ_{0} + γ_{1} log ζ_{t - 1} + γ_{2} exp {- γ_{3} α_{t - 1} I (P_{t - 1} > Z_{t - 1}) [{(P_{t - 1} / Z_{t - 1})}^{1 / ζ_{t - 1}} - 1]}, \end{matrix}

(16)

where

{Z_{t}}

is an

i . i . d .

sequence of uniform distribution in (0,1) random variables and

P_{t}

is as given in (3).

Next, we propose the stationary and geometrically ergodic process of

{α_{t}, ζ_{t}}

given in (15) and (16).

Theorem 1.

If parameters

β_{2}, β_{3}, γ_{2}, γ_{3} > 0, β_{0}, γ_{0} \in R

and

0 \leq β_{1} \neq γ_{1} < 1

, the latent process

{α_{t}, ζ_{t}}

is defined as stationary and geometrically ergodic.

Assumption 1.

Assume the parameter space Θ is a compact set of

Θ_{s}

. Suppose the observations

{Y_{t}}_{t = 1}^{n}

are generated from a stationary and ergodic DCP process with the true parameter

θ_{0}

which is in the interior of Θ.

Denote

L_{n} (θ)

based on

θ

and an arbitrary initial value

({\tilde{α}}_{1}, {\tilde{ζ}}_{1})

as

{\tilde{L}}_{n} (θ)

.

Theorem 2 (Consistency).

Under Assumption 1, there exists a sequence

{{\hat{θ}}_{n}}_{n \geq 1}

of local maximizers of

{\tilde{L}}_{n} (θ)

such that

{\hat{θ}}_{n} \to_{p} θ_{0}

and

| | {\hat{θ}}_{n} - θ_{0} | | \leq τ_{n}

, where

τ_{n} = O_{p} (n^{- r}), 0 < r < 1 / 2

.

Theorem 3 (Asymptotic Normality).

Under the same conditions as in Theorem 2,

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}) \to N (0, M_{0}^{- 1})

, where

{\hat{θ}}_{n}

is given in Theorem 2 and

M_{0}

is the Fisher information matrix with

θ_{0}

.

Theorems 2 and 3 show the existence and asymptotic normality, respectively, of the MLE

{\hat{θ}}_{n}

. However, the uniqueness of the MLE must be proved. Proposition 1 gives an answer to this.

Proposition 1 (Asymptotic Uniqueness).

Under the same conditions as in Theorem 2,

P ({\hat{θ}}_{n}

is the unique global maximizer of

{\tilde{L}}_{n} (θ)

over

Θ) \to 1

, where

{\hat{θ}}_{n}

is given in Theorem 2.

The proofs of Theorems 1–3 and Proposition 1 are shown in Appendix A.

4. Long Short-Term Memory Model

The main purpose of LSTM is to solve the problem of long-distance dependency in the training of recurrent neural networks. First, it is necessary to set the cell memory unit, and introduce the forget gate, input gate and output gate into the recurrent neural network (RNN), so that information transmission can be controlled. The state (namely the memory unit) update is also based on these “gates”, which ensure that the LSTM model can save long-distance information. Under the influence of the memory unit, these “gates” will be in a controllable range. LSTM then can save, update and read long-distance information, and gradient explosion or disappearance during training are well solved. In the time-series data for air pollution, more comprehensive long-distance dependence information can be extracted. From the perspective of the whole model, the main components are as follows: output gate

o_{t}

, input gate

i_{t}

, memory unit

C_{t}

and forget gate

f_{t}

. The structure is shown in Figure 1.

The first “gate” that the LSTM passes through is the “forget gate”, which discards part of the information in the previous memory unit. This step is realized by a sigmoid function, which uses the weighted values of the current input and the output of the previous moment to obtain a number in the range of 0–1, which controls the information transfer. The value 1 represents complete retention and 0 represents complete discarding. The details are given in Equation (17):

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}),

(17)

The input gate controls what information is added to the cell, and the calculation process is shown in Equations (18) and (19):

\begin{matrix} i_{t} & = & σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}), \end{matrix}

(18)

\begin{matrix} C_{t} & = & f_{t} \cdot C_{t - 1} + i_{t} \cdot tanh (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{c}), \end{matrix}

(19)

The output gate controls what information is used for the task output at this moment, and the calculation process is shown in (20) and (21):

\begin{matrix} o^{t} & = & σ (W_{0} \cdot [h_{t - 1}, x_{t}] + b_{0}), \end{matrix}

(20)

\begin{matrix} h^{t} & = & o^{t} \cdot tanh (C_{t}), \end{matrix}

(21)

In the above equations,

W_{i}

,

W_{f}

and

W_{0}

denote the weight matrices of the corresponding gate,

b_{i}

,

b_{f}

and

b_{0}

denote the corresponding gate bias matrices,

σ

and tanh denote the activation functions,

o_{t}

denotes the output gate,

i_{t}

denotes the input gate,

C_{t}

denotes the memory unit,

f_{t}

denotes the forget gate,

x_{t}

denotes the input at time t and

h_{t}

denotes the output at time t.

5. Simulation Study

In this section, the performance of the MLE for the DCP models is investigated using six numerical experiments. To investigate the performance of the MLE, we generate data from the three DCP models given in (5) and (8)–(13), with the parameters shown in Table 1. These sets of parameters are the MLEs obtained from an analysis of real observations in Beijing from 3 January 2015 to 8 August 2020 and from 1 January 2018 to 8 August 2020, where the weather factors are from the China Meteorological Data Service Center and the air quality factors are from the China National Environmental Monitoring Center. In addition, the estimations of

β_{1}

and

β_{2}

are close to 0, especially in the three models from 3 January 2015 to 8 August 2020, which indicates that the scale parameter

α_{t}

can be considered a constant to a certain extent (a consideration that will be realized in future research). Due to more attention being given to the tail index

ζ_{t}

and the wider applicability of the DCP models, we made no changes to

α_{t}

.

Figure 2 displays a line graph of the

{PM}_{2.5}

concentration time series in Beijing. As shown in Figure 2, with the improvement in the national environmental governance level and public awareness of environmental protection, the

{PM}_{2.5}

concentration generally shows a downward trend. Figure 2 shows that 2018 is a noteworthy year with significant governance effects, suggesting that

{PM}_{2.5}

concentrations after this need to be analyzed separately. Hence, real data from 1 January 2018 to 8 August 2020 are also fitted to the three DCP models, in addition to the real data from 3 January 2015 to 8 August 2020. According to the World Air Quality Report 2021 released by IQAir [1], China has seen a 21% overall reduction in annual

{PM}_{2.5}

concentrations since 2018, which justifies the separation of the data from 1 January 2018 to 8 August 2020.

Threshold selection is a key issue in extreme value analysis based on the POT method. For these two sets of observations, from 3 January 2015 to 8 August 2020 and 1 January 2018 to 8 August 2020, we select two thresholds determined by Bader et al. (2018) [38] and Davison and Smith (1990) [50]. Bader et al. (2018) [38] proposed an automated threshold selection procedure using a stop rule that controls the false discovery rate in ordered hypothesis testing. The ForwardStop rule provides an automated selection procedure combined with sequential hypothesis testing when the level of desired error control and a set of thresholds are given. Based on the goodness-of-fit of the GPD, Davison and Smith (1990) [50] proposed a threshold selection approach where the threshold is chosen as the lowest value above which the GPD fits the exceedances adequately. In this study, the threshold selection results were 2.4660 using the method of Bader et al. (2018) [38] for

{PM}_{2.5}

data from 3 January 2015 to 8 August 2020 and 0.5716 using the method of Davison and Smith (1990) for

{PM}_{2.5}

data from 1 January 2018 to 8 August 2020, and approximately

4 %

and

18 %

of the corresponding real data exceeded these two thresholds, respectively, ensuring a sufficiently high value.

Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 show the averages of the mean values and the standard deviations with different sample sizes from the three DCP models in the above two periods. We also calculated the corresponding root mean squared error (RMSE) and absolute bias (Abias) to measure the estimation effect, as shown in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7. We obtained simulated exceedances

Y_{t}

with lengths of 1000 and 2000, respectively. The experiments were repeated 500 times for each sample size. As shown in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7, the RMSE and Abias values for the parameter estimations using the real data from 1 January 2018 to 8 August 2020 were mostly smaller than those from 3 January 2015 to 8 August 2020, while those for the sample size of 2000 were mostly smaller than those for the sample size of 1000, and the parameter estimation of the tail index

ζ_{t}

was better than that of

α_{t}

. The values of RMSE and Abias explain the validity of our estimation.

To enable observation of the performance of our model more directly, Figure 3 depicts the dynamics of the tail index

{\hat{ζ}}_{t}

estimated by MLE (red line) and the simulated tail index

ζ_{t}

(black line) under the experiments for n = 2000. We can see that the estimated tail index

{\hat{ζ}}_{t}

was almost the same as the simulated tail index

ζ_{t}

in the three DCP models. In addition, we calculated the correlation between the two series, and the results were 0.9328, 0.9559, 0.9921, 0.9964, 0.9591 and 0.9679, corresponding to Figure 3a–f, respectively, which shows the similarity of the two curves better. However, we cannot judge the simulation effect from the similarity of the two curves only. Figure 3 illustrates the sufficiency of our estimation.

6. Real Data Applications

In this section, to verify the performance of the DCP models, we consider the weather and air quality time series in Beijing from 3 January 2015 to 8 August 2020 and from 1 January 2018 to 8 August 2020 obtained from the China Meteorological Data Service Center and China National Environmental Monitoring Center, as the sample for our experimental analysis. Based on these two periods observations, we employed the three DCP models (one with weather factors, one with air quality factors and the last with mixed weather and air quality factors) given in Equations (5) and (8)–(13) to fit the smog data and used the MLE method described in Section 3 to estimate the parameters. In both cases, the DCP models showed their superiority in reflecting the time dependence of the pollutant concentration, providing a potential warning signal for smog prevention and control. First, we made a fat-tailed diagnosis of the observations using an exponential QQ plot. Figure 4 shows that the real data for

{PM}_{2.5}

in Beijing from 3 January 2015 to 8 August 2020 and from 1 January 2018 to 8 August 2020 are fat tailed.

From the results in Table 1, we can see that the tail index

ζ_{t}

is more affected by

Y_{t - 1}

, which is consistent with Figure 5. The estimated tail index of the DCP can reflect the severity of smog to some extent and may even play an early warning role for smog disruption. The graph of the estimated tail index

ζ_{t}

and positive exceedances

Y_{t}

from the three DCP models is given in Figure 5, which shows that there is a strong negative correlation between

ζ_{t}

and

Y_{t}

, and the tail index volatility is more intuitive. It is interesting to note that Figure 5a,c,e and Figure 5b,d,f have very similar variation tendencies, and it can clearly be seen that the tail index

ζ_{t}

starts to decline in the middle of each year, and at the end of the year the tail index becomes lower, which can be regarded as an effective indicator for measuring the level of smog.

Using the estimated parameters given in Table 1, we generated a sequence of fitted

{\hat{Y}}_{t}

values based on Equation (5) and plotted the line graphs of the fitted

{\hat{Y}}_{t}

values and real exceedances

Y_{t}

during the period from 1 January 2018 to 8 August 2020, as shown in Figure 6. It can be seen that the true values

Y_{t}

and the estimated values

{\hat{Y}}_{t}

from the three models were almost consistent in trend, and the three models were more sensitive to the estimation of

Y_{t}

, but the model with mixed weather factors and air quality factors showed values closest to the true values, which also verifies the superiority of the mixed model over the other two models and is consistent with the conclusion mentioned in Section 5. A comparison between the fitted

{\hat{Y}}_{t}

and real

Y_{t}

exceedances during the period from 3 January 2015 to 8 August 2020 was also performed, and similar results were obtained. Due to limited space, only Figure 6 is shown.

Next, we compared the estimated variances from the DCP models and GARCH, as shown in Figure 7. Similarly to [27], we calculated the conditional variance

\begin{matrix} Var (Y_{t} | F_{t - 1}) = \frac{α_{t}^{2}}{{(1 + u)}^{ζ_{t}} ζ_{t} - 1} [\frac{2}{ζ_{t} - 2} - \frac{1}{{(1 + u)}^{ζ_{t}} (ζ_{t} - 1)}] . \end{matrix}

(22)

Figure 7 shows that the standard deviations given by the DCP models and GARCH had similar trends, indicating that the DCP models could accurately reflect the volatility in a sense. Compared to the estimated volatility of GARCH, the DCP models are more sensitive in smog instances, thus potentially playing a better role in early warning. This is clearest in Figure 7e, where the fluctuation is largest.

We computed AIC and BIC from the DCP and dynamic conditional Weibull (DCW) model given in [35]. The results are presented in Table 8. As shown in Table 8, the DCP model is more suitable than the DCW model, based on AIC and BIC criteria.

Finally, we used our proposed three models (5) and (8)–(13) to predict the daily PM_2.5 values from 9 August 2020 to 31 December 2021. We present only the results for the mixed models in (5), (12) and (13) here, with a training sample from 1 January 2018 to 8 August 2020, since similar results were obtained from the three models. The tail index

ζ_{t}

given in (13) and PM_2.5 given in (5) were predicted by using the real weather and air quality factors and the parameter estimation results given in Table 1. In order to analyze the fluctuating tendency and correlations of

ζ_{t}

and PM_2.5, the prediction results are presented together in Figure 8. From Figure 8, we can see that there is a strong negative correlation between

ζ_{t}

and PM_2.5, which enables the tail index to be used as a warning signal for air pollution. Furthermore, compared with the real smog values, the predictability of the future variation of PM_2.5 performs relatively well, as the real and predicted values are relative close and have a similar tendency.

In addition, we used weather and air quality factors (including

S O_{2}, N O_{2}, C O, O_{3}

) with the LSTM technique to predict daily PM_2.5 values. The air pollution data from 1 January 2015 to 25 November 2019 were used as training data, those from 26 November 2019 to 8 August 2020 were used as verification data and those from 9 August 2020 to 31 December 2021 were used as test data. The LSTM network was trained by using the weather, air quality factors and

{PM}_{2.5}

time series in Beijing to construct a training set. Then, various weather and air quality factors were input into the test set to predict the

{PM}_{2.5}

from 9 August 2020 to 31 December 2021 over a long period. As shown in Figure 9, the trend of the prediction for

{PM}_{2.5}

was accurate, especially when the true values of

{PM}_{2.5}

were less than 100. To better evaluate the experimental results quantitatively, the RMSE and coefficient of determination (

R^{2}

) were calculated, and the results were 20.14 and 0.65, respectively.

7. Conclusions

In this paper, we investigated the prediction of pollutant concentrations using statistical inference methods and deep learning techniques. On the one hand, we proposed three models combined with the autoregressive structure under the POT framework. After obtaining two sufficiently high thresholds selected using the methods of by Bader et al. (2018) [38] and Davison and Smith (1990) [50], the DCP models provided a direct dynamic modeling of exceedances in the

{PM}_{2.5}

time series, such that the scale parameter and tail index of the conditional generalized Pareto distribution changed over time. Weather and air quality factors were added to the DCP models for better performance and higher efficiency. The maximum likelihood estimation method was introduced to estimate the parameters in the DCP models, and its asymptotic properties were investigated. Simulation studies were carried out to demonstrate the validity and sufficiency of the estimation, revealing that the parameter estimation of the DCP models was not sufficiently accurate but the tail index dynamics could be well approximated in the DCP models. Real data applications were used to present the superiority of the DCP models, showing that they could shed new light on the prevention and control of smog. On the other hand, based on the factors used in the mixed DCP model, we used LSTM to study the prediction of pollutant concentrations, and achieved satisfactory results. This paper aimed to improve the prediction ability for the concentration of pollutants, and valuable results were achieved. Given the requirements of the air pollution control target for further promoting ecological and environmental protection in the next five years, the proposed approaches and results in our paper are useful. To some extent, they could provide a theoretical basis and effective tools for improving the national air quality forecasting system, thus benefiting public health.

Nevertheless, there are still some points to be considered. In the DCP model with the autoregressive structure, it is meaningful to add weather and air quality factors, enriching the model and making it consistent, stable and sensitive. However, the relationship between the factors has not been scrutinized carefully, resulting in a lack of attention to its impacts on the model. In addition, it is possible to obtain better results when we compare other estimation methods. Therefore, prediction, as an important direction in our study of air pollutant concentrations, still has a long way to go. Combining artificial intelligence and machine learning, the prediction accuracy will certainly be improved by using a forecast combination, synthesizing the methods used to obtain the estimated results. Finally, with the help of combination forecasting, the advantages of individual forecasts are retained, and effective information is fully utilized to comprehensively forecast air pollution. In this work, we strive to make valuable advances in the intersection of statistics and machine learning and to provide effective theoretical and technical support for national continuous improvement of the modernization level of ecological environmental governance.

Author Contributions

Data curation, X.Z. and C.H.; funding acquisition, X.Z.; methodology, X.Z. and W.C.; project administration, X.Z.; software, X.Z., C.H. and Q.J.; supervision, W.C.; writing—original draft, X.Z., C.H. and Q.J.; writing—review and editing, Q.D. and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant number 11801019).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The weather data and the air quality data were obtained from the China Meteorological Data Service Center and the China National Environmental Monitoring Center.

Acknowledgments

The authors would like to thank the referees and editors for their very helpful and constructive comments, which have significantly improved the quality of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The procedure for the proof follows Shen (2020) [27] and Zhao et al. (2018) [31]. The main difference from Zhao et al. (2018) [31] is the distribution of observations

{Y_{t}}

, and the main difference from Shen (2020) [27] is that we consider the autoregressive structure on both the scale and tail index series.

Proof of Theorem 1.

The proof of the theorem is similar to Shen et al. (2020) [27] and Zhao et al. (2018) [31], so this proof is omitted. □

Before proving Theorems 2 and 3, we first provide some lemmas.

Lemma A1 (Identifiability).

If

Y_{t} (θ) = Y_{t} (θ_{0})

almost sure (a.s.) for all t, we have

θ = θ_{0}

, where

{Y_{t}}

is given in (5).

Proof.

Denote

α_{t} = α_{t} (θ), ζ_{t} = ζ_{t} (θ)

and

α_{t}^{0} = α_{t} (θ_{0}), ζ_{t}^{0} = ζ_{t} (θ_{0}), P_{t}^{0} = {(1 + u)}^{- ζ_{t}^{0}}

. From

Y_{t} (θ) = Y_{t} (θ_{0})

a.s., we arrive at

α_{t} I (P_{t} > Z_{t}) [{(P_{t} / Z_{t})}^{1 / ζ_{t}} - 1] = α_{t}^{0} I (P_{t}^{0} > Z_{t}) [{(P_{t}^{0} / Z_{t})}^{1 / ζ_{t}^{0}} - 1] a . s .,

so

\begin{matrix} α_{t} [{(P_{t} / Z_{t})}^{1 / ζ_{t}} - 1] & = & α_{t}^{0} [{(P_{t}^{0} / Z_{t})}^{1 / ζ_{t}^{0}} - 1] a . s ., \\ I (P_{t} > Z_{t}) & = & I (P_{t}^{0} > Z_{t}) a . s . . \end{matrix}

After straightforward manipulations,

α_{t} {(P_{t} / Z_{t})}^{1 / ζ_{t}} - α_{t}^{0} {(P_{t}^{0} / Z_{t})}^{1 / ζ_{t}^{0}} = α_{t} - α_{t}^{0} a . s .,

Denote

ℑ_{t} = σ (Y_{t}, Y_{t - 1}, \dots)

, then

Z_{t} ⊥ ℑ_{t - 1}

and

α_{t}, α_{t}^{0}, ζ_{t}, ζ_{t}^{0}, P_{t}, P_{t}^{0} \in ℑ_{t - 1} .

Therefore, the above equation holds if and only if

α_{t} = α_{t}^{0}

and

ζ_{t} = ζ_{t}^{0}

a . s .

From the autoregressive equations of

log α_{t}

and

log ζ_{t}

,

log α_{t} = log α_{t}^{0}

and

log ζ_{t} = log ζ_{t}^{0}

a . s .

can be rewritten as

\begin{matrix} β_{0}^{0} + β_{1}^{0} log α_{t - 1} - β_{2}^{0} exp {- β_{3}^{0} Y_{t - 1}} & = & β_{0} + β_{1} log α_{t - 1} - β_{2} exp {- β_{3} Y_{t - 1}}, \\ γ_{0}^{0} + γ_{1}^{0} log ζ_{t - 1} + γ_{2}^{0} exp {- γ_{3}^{0} Y_{t - 1}} & = & γ_{0} + γ_{1} log ζ_{t - 1} + γ_{2} exp {- γ_{3} Y_{t - 1}} . \end{matrix}

After rearrangement, the above two equations can be expressed as

\begin{matrix} β_{0}^{0} - β_{0} + (β_{1}^{0} - β_{1}) log α_{t - 1} & = & β_{2}^{0} exp {- β_{3}^{0} Y_{t - 1}} - β_{2} exp {- β_{3} Y_{t - 1}}, \\ γ_{0}^{0} - γ_{0} + (γ_{1}^{0} - γ_{1}) log ζ_{t - 1} & = & γ_{2} exp {- γ_{3} Y_{t - 1}} - γ_{2}^{0} exp {- γ_{3}^{0} Y_{t - 1}} . \end{matrix}

Since

α_{t - 1} \in ℑ_{t - 2}, ζ_{t - 1} \in ℑ_{t - 2}

and

Y_{t - 1} \notin ℑ_{t - 2}

, then

β_{i} = β_{i}^{0}

and

γ_{i} = γ_{i}^{0}

must hold for

i = 0, 1, 2, 3

. □

In the following, we denote (

α_{t} (θ)

,

ζ_{t} (θ)

) (or (

α_{t}

,

ζ_{t}

) for simplicity) as the time-dependent scale parameter and tail index based on

θ

and the true initial (

α_{1}^{0}

,

ζ_{1}^{0}

), and denote (

α_{t} (θ_{0})

,

ζ_{t} (θ_{0})

) (or (

α_{t}^{0}

,

ζ_{t}^{0}

) for simplicity) as the unobserved true hidden process based on the true

θ_{0}

and the true initial (

α_{1}^{0}

,

ζ_{1}^{0}

), and denote the t-th iterate series (

{\tilde{α}}_{t} (θ)

,

\tilde{ζ_{t}} (θ)

) (or (

{\tilde{α}}_{t}

,

{\tilde{ζ}}_{t}

) for simplicity) as the scale parameter and tail index series based on

θ

and an arbitrary initial (

{\tilde{α}}_{1}

,

{\tilde{ζ}}_{1}

), and denote

(α_{L}, α_{U})

and

(ζ_{L}, ζ_{U})

as the uniform bound of

α_{t}

(or

{\tilde{α}}_{t}

) and

ζ_{t}

(or

{\tilde{ζ}}_{t}

) for all

θ \in Θ

due to the compactness of

Θ

and boundedness of

- β_{2} exp (- β_{3} Y_{t - 1})

and

γ_{2} exp (- γ_{3} Y_{t - 1})

.

Given

(α_{t}, ζ_{t})

, the conditional log-likelihood function of

Y_{t}

is expressed as

l_{t} (θ) = I (Y_{t} = 0) log [1 - {(1 + u)}^{- ζ_{t}}] + I (Y_{t} > 0) [log ζ_{t} - log α_{t} - ζ_{t} log (1 + u) - (ζ_{t} + 1) log (1 + \frac{Y_{t}}{α_{t}})] .

Due to the conditional independence, the log-likelihood function is then given by

L_{n} (θ) = \frac{1}{n} \sum_{t = 1}^{n} l_{t} (θ) .

We denote

l_{t} (θ)

and

L_{n} (θ)

based on

θ

and an arbitrary initial value

({\tilde{α}}_{1}, {\tilde{ζ}}_{1})

as

{\tilde{L}}_{n} (θ)

and

{\tilde{l}}_{t} (θ)

.

Lemma A2.

Under the same conditions as in Theorem 2,

E_{θ_{0}} (\frac{\partial}{\partial θ} l_{t} (θ_{0})) = 0

and

M_{0} = {Var}_{θ_{0}} (\frac{\partial}{\partial θ} l_{t} (θ_{0}))

= - E_{θ_{0}} (\frac{\partial^{2}}{\partial θ \partial θ^{T}} l_{t} (θ_{0}))

and

M_{0}

is finite and positive definite.

Proof.

The proof of the lemma is similar to Zhao et al. (2018) [31], so the proof is omitted. □

Lemma A3.

Under the same conditions as in Theorem 2, if

| | Φ - Φ_{0} | | < τ_{n}

and

τ_{n} ↘ 0

, we have

(a) sup_{1 \leq t \leq n} | α_{t} - α_{t}^{0} | = O (τ_{n}), (b) sup_{1 \leq t \leq n} | \frac{\partial α_{t}}{\partial Φ} - \frac{\partial α_{t}^{0}}{\partial Φ} | = O (τ_{n}), (c) sup_{1 \leq t \leq n} | \frac{\partial^{2} α_{t}}{\partial Φ_{i} \partial Φ_{j}} - \frac{\partial^{2} α_{t}^{0}}{\partial Φ_{i} \partial Φ_{j}} | = O (τ_{n}),

uniformly over

| | Φ - Φ_{0} | | < τ_{n}

, where

Φ = (β_{0}, β_{1}, β_{2}, β_{3})

and

Φ_{0} = (β_{0}^{0}, β_{1}^{0}, β_{2}^{0}, β_{3}^{0})

.

Proof.

The proof of the lemma is similar to Zhao et al. (2018) [31], so the proof is omitted. □

Lemma A4.

Under the same conditions as in Theorem 2, if

| | Ψ - Ψ_{0} | | < τ_{n}

and

τ_{n} ↘ 0

, we have

(a) sup_{1 \leq t \leq n} | ζ_{t} - ζ_{t}^{0} | = O (τ_{n}), (b) sup_{1 \leq t \leq n} | \frac{\partial ζ_{t}}{\partial Ψ} - \frac{\partial ζ_{t}^{0}}{\partial Ψ} | = O (τ_{n}), (c) sup_{1 \leq t \leq n} | \frac{\partial^{2} ζ_{t}}{\partial Ψ_{i} \partial Ψ_{j}} - \frac{\partial^{2} ζ_{t}^{0}}{\partial Ψ_{i} \partial Ψ_{j}} | = O (τ_{n}),

uniformly over

| | Ψ - Ψ_{0} | | < τ_{n}

, where

Ψ = (γ_{0}, γ_{1}, γ_{2}, γ_{3})

and

Ψ_{0} = (γ_{0}^{0}, γ_{1}^{0}, γ_{2}^{0}, γ_{3}^{0})

.

Proof.

The proof of the lemma is similar to Zhao et al. (2018) [31], so the proof is omitted. □

Lemma A5.

Under the same conditions as in Theorem 2,

\frac{\partial^{2}}{\partial θ_{i} \partial θ_{j}} L_{n} (θ_{n}) \to_{p} - m_{θ_{i} θ_{j}} (θ_{0})

, uniformly over

| | θ_{n} - θ_{0} | | < τ_{n}

, where

τ_{n} \sim n^{- r}, r > 0

,

m_{θ_{i} θ_{j}} (θ_{0}) = - E_{θ_{0}} (\frac{\partial^{2}}{\partial θ_{i} \partial θ_{j}} l_{1} (θ_{0}))

.

Proof.

We only prove the case for

\frac{\partial^{2}}{\partial β_{0}^{2}} L_{n} (θ_{n}) \to_{p} - m_{β_{0} β_{0}} (θ_{0})

, as the proof for other cases is similar. From the law of large numbers, we know that

\frac{\partial^{2}}{\partial β_{0}^{2}} L_{n} (θ_{0}) \to_{p} - m_{β_{0} β_{0}} (θ_{0})

. Then, we need to prove that

\frac{\partial^{2}}{\partial β_{0}^{2}} L_{n} (θ_{n}) - \frac{\partial^{2}}{\partial β_{0}^{2}} L_{n} (θ_{0}) \to_{p} 0

uniformly over

| | θ_{n} - θ_{0} | | < τ_{n}

, where

τ_{n} \sim n^{- r}, r > 0

.

By the repeatedly using autoregressive formula,

log α_{t}

can be expressed as

log α_{t} = β_{0} \sum_{k = 1}^{t - 1} β_{1}^{k - 1} - β_{2} \sum_{k = 1}^{t - 1} β_{1}^{k - 1} exp (- β_{3} Y_{t - k}) + β_{1}^{t - 1} log α_{1}^{0} .

We have

\frac{\partial α_{t}}{\partial β_{0}} = α_{t} \sum_{k = 1}^{t - 1} β_{1}^{k - 1}, \frac{\partial^{2} α_{t}}{\partial β_{0}^{2}} = α_{t} {(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2},

\frac{\partial L_{n} (θ)}{\partial β_{0}} = \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) \sum_{k = 1}^{t - 1} β_{1}^{k - 1} [\frac{(ζ_{t} + 1) Y_{t}}{α_{t} + Y_{t}} - 1], \frac{\partial^{2} L_{n} (θ)}{\partial β_{0}^{2}} = - \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) {(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2} \frac{(ζ_{t} + 1) α_{t} Y_{t}}{{(α_{t} + Y_{t})}^{2}} .

Then

\begin{matrix} | \frac{\partial^{2}}{\partial β_{0}^{2}} L_{n} (θ_{n}) - \frac{\partial^{2}}{\partial β_{0}^{2}} L_{n} (θ_{0}) | \\ = & | \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) Y_{t} [{(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2} \frac{(ζ_{t} + 1) α_{t}}{{(α_{t} + Y_{t})}^{2}} - {(\sum_{k = 1}^{t - 1} {(β_{1}^{0})}^{k - 1})}^{2} \frac{(ζ_{t}^{0} + 1) α_{t}^{0}}{{(α_{t}^{0} + Y_{t})}^{2}}] | \\ \leq & \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) Y_{t} | {(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2} \frac{(ζ_{t} + 1) α_{t}}{{(α_{t} + Y_{t})}^{2}} - {(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2} \frac{(ζ_{t}^{0} + 1) α_{t}^{0}}{{(α_{t}^{0} + Y_{t})}^{2}} | \\ + \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) Y_{t} | {(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2} \frac{(ζ_{t}^{0} + 1) α_{t}^{0}}{{(α_{t}^{0} + Y_{t})}^{2}} - {(\sum_{k = 1}^{t - 1} {(β_{1}^{0})}^{k - 1})}^{2} \frac{(ζ_{t}^{0} + 1) α_{t}^{0}}{{(α_{t}^{0} + Y_{t})}^{2}} | \\ = : & I + I I, \end{matrix}

where

\begin{matrix} I & = & \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) Y_{t} {(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2} | \frac{(ζ_{t} + 1) α_{t}}{{(α_{t} + Y_{t})}^{2}} - \frac{(ζ_{t}^{0} + 1) α_{t}^{0}}{{(α_{t}^{0} + Y_{t})}^{2}} | \\ = & \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) Y_{t} {(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2} \frac{| α_{t} (ζ_{t} + 1) {(α_{t}^{0} + Y_{t})}^{2} - α_{t}^{0} (ζ_{t}^{0} + 1) {(α_{t} + Y_{t})}^{2} |}{{(α_{t} + Y_{t})}^{2} {(α_{t}^{0} + Y_{t})}^{2}} \\ \leq & \frac{1}{n} \sum_{t = 1}^{n} \frac{I (Y_{t} > 0) Y_{t} [| α_{t}^{0} (α_{t} α_{t}^{0} + 2 α_{t} Y_{t} + Y_{t}^{2}) | | ζ_{t} - ζ_{t}^{0} | + | α_{t} α_{t}^{0} (ζ_{t}^{0} + 1) - Y_{t}^{2} (ζ_{t} + 1) | | α_{t} - α_{t}^{0} |]}{{(1 - β_{1})}^{2} {(α_{t} + Y_{t})}^{2} {(α_{t}^{0} + Y_{t})}^{2}} \\ \leq & \frac{1}{n {(1 - β_{1})}^{2}} \sum_{t = 1}^{n} I (Y_{t} > 0) \frac{[| α_{t}^{0} (α_{t} α_{t}^{0} + 2 α_{t} Y_{t} + Y_{t}^{2}) | | ζ_{t} - ζ_{t}^{0} | + | α_{t} α_{t}^{0} (ζ_{t}^{0} + 1) - Y_{t}^{2} (ζ_{t} + 1) | | α_{t} - α_{t}^{0} |]}{α_{t} {(α_{t}^{0})}^{2}} \\ \leq & \frac{1}{n {(1 - β_{1})}^{2}} \sum_{t = 1}^{n} I (Y_{t} > 0) [(\frac{Y_{t}^{2}}{α_{t} α_{t}^{0}} + \frac{2 Y_{t}}{α_{t}^{0}} + 1) | ζ_{t} - ζ_{t}^{0} | + | \frac{ζ_{t}^{0} + 1}{α_{t}^{0}} - \frac{Y_{t}^{2} (ζ_{t} + 1)}{α_{t} {(α_{t}^{0})}^{2}} | | α_{t} - α_{t}^{0} |], \end{matrix}

It is known that

{α_{t}, ζ_{t}}

is bounded, so

E (Y_{t}) = α_{t} / [(ξ_{t} - 1) {(1 + γ)}^{ξ_{t}}] < \infty, E (Y_{t}^{2}) = 2 α_{t}^{2} / [(2 - ξ_{t}) (1 - ξ_{t}) {(1 + γ)}^{ξ_{t}}] < \infty

and

\frac{Y_{t}^{2}}{α_{t} α_{t}^{0}} + \frac{2 Y_{t}}{α_{t}^{0}} + 1, \frac{ζ_{t}^{0} + 1}{α_{t}^{0}} - \frac{Y_{t}^{2} (ζ_{t} + 1)}{α_{t} {(α_{t}^{0})}^{2}}

are bounded too. Therefore, by Lemmas A3(a) and A4(a),

I \sim \frac{1}{n} \sum_{t = 1}^{n} (| ζ_{t} - ζ_{t}^{0} | + | α_{t} - α_{t}^{0} |) \sim O_{p} (τ_{n}) \to 0 .

\begin{matrix} I I & = & \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) \frac{| (ξ_{t}^{0} + 1) α_{t}^{0} | Y_{t}}{{(α_{t}^{0} + Y_{t})}^{2}} | {(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2} - {(\sum_{k = 1}^{t - 1} {(β_{1}^{0})}^{k - 1})}^{2} | \\ \leq & \frac{2 τ_{n}}{{(1 - C_{b})}^{3}} \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) \frac{| (ξ_{t}^{0} + 1) α_{t}^{0} | Y_{t}}{{(α_{t}^{0} + Y_{t})}^{2}} \leq \frac{2 τ_{n}}{{(1 - C_{b})}^{3}} \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) | \frac{ξ_{t}^{0} + 1}{α_{t}^{0}} | Y_{t} \\ \leq & \frac{2 M τ_{n}}{{(1 - C_{b})}^{3}} \frac{1}{n} \sum_{t = 1}^{n} I (Y_{t} > 0) Y_{t} = O_{p} (τ_{n}) \to 0 . \end{matrix}

where the first inequality comes from the fact that

\sum_{k = 1}^{t} β_{1}^{k - 1} < 1 / (1 - β_{1}) \leq 1 / (1 - C_{b})

and

\begin{matrix} | {(\sum_{k = 1}^{t - 1} β_{1}^{k - 1})}^{2} - {(\sum_{k = 1}^{t - 1} {(β_{1}^{0})}^{k - 1})}^{2} | \leq | {(\frac{1}{1 - β_{1}})}^{2} - {(\frac{1}{1 - β_{1}^{0}})}^{2} | \leq \frac{2 τ_{n}}{{(1 - C_{b})}^{3}} = O_{p} (τ_{n}), \end{matrix}

where

C_{b}

is a constant and

0 < C_{b} < 1

. The last inequality of

I I

shows the boundedness of

{α_{t}^{0}, ζ_{t}^{0}}

, so there exists

| (ξ_{t}^{0} + 1) / α_{t}^{0} | \leq M

, and

E (Y_{t}) = α_{t} / [(ξ_{t} - 1) {(1 + γ)}^{ξ_{t}}] < \infty

. □

Lemma A6.

Under the same conditions as in Theorem 2, two positive constants C and

C_{b} < 1

exist such that for all

θ \in Θ

and

t \geq 1

,

(a) $| α_{t} - {\tilde{α}}_{t} | \leq C C_{b}^{t - 1}$ , (b) $| \frac{\partial α_{t}}{\partial Φ} - \frac{\partial {\tilde{α}}_{t}}{\partial Φ} | \leq C t C_{b}^{t - 1}$ , (c) $| \frac{\partial^{2} α_{t}}{\partial Φ_{i} \partial Φ_{j}} - \frac{\partial^{2} {\tilde{α}}_{t}}{\partial Φ_{i} \partial Φ_{j}} | \leq C t^{2} C_{b}^{t - 1}$ ,
(d) $| ζ_{t} - {\tilde{ζ}}_{t} | \leq C C_{b}^{t - 1}$ , (e) $| \frac{\partial ζ_{t}}{\partial Ψ} - \frac{\partial {\tilde{ζ}}_{t}}{\partial Ψ} | \leq C t C_{b}^{t - 1}$ (f) $| \frac{\partial^{2} ζ_{t}}{\partial Ψ_{i} \partial Ψ_{j}} - \frac{\partial^{2} {\tilde{ζ}}_{t}}{\partial Ψ_{i} \partial Ψ_{j}} | \leq C t^{2} C_{b}^{t - 1} .$

Proof.

The proof is omitted because it follows from direct calculation. □

Lemma A7.

Under the same conditions as in Theorem A2,

(a) $\frac{\partial^{2}}{\partial θ_{i} \partial θ_{j}} {\tilde{L}}_{n} (θ) \to_{p} - m_{θ_{i} θ_{j}} (θ_{0})$ , uniformly over $| | θ - θ_{0} | | < τ_{n}$ , where $τ_{n} \sim n^{- r}, r > 0$ ,
(b) ${(τ_{n}^{*})}^{- 1} (\frac{\partial}{\partial θ} {\tilde{L}}_{n} (θ_{0}) - \frac{\partial}{\partial θ} L_{n} (θ_{0})) \to_{p} 0$ if $τ_{n}^{*} n \to \infty .$

Proof.

(a) By the result of Lemma A5, we have

\frac{\partial^{2}}{\partial θ_{i} \partial θ_{j}} L_{n} (θ) - \frac{\partial^{2}}{\partial θ_{i} \partial θ_{j}} L_{n} (θ_{0}) \to_{p} 0

uniformly over

| | θ - θ_{0} | | < τ_{n}

, so we need to prove that

\frac{\partial^{2}}{\partial θ_{i} \partial θ_{j}} {\tilde{L}}_{n} (θ) - \frac{\partial^{2}}{\partial θ_{i} \partial θ_{j}} L_{n} (θ) \to_{p} 0

uniformly under the claimed region. Based on Lemma A6, we can use the same method as in the proof of Lemma A5, then it is omitted.

(b) Here we only prove for

\frac{\partial}{\partial β_{0}} {\tilde{L}}_{n} (θ_{0})

, as other proofs are similar.

\begin{matrix} | \frac{1}{τ_{n}^{*}} (\frac{\partial}{\partial β_{0}} {\tilde{L}}_{n} (θ_{0}) - \frac{\partial}{\partial β_{0}} L_{n} (θ_{0})) | = | \frac{1}{n τ_{n}^{*}} \sum_{t = 1}^{n} I (Y_{t} > 0) (\sum_{k = 1}^{t - 1} β_{1}^{k - 1}) (\frac{{\tilde{ξ}}_{t} Y_{t} - {\tilde{α}}_{t}}{{\tilde{α}}_{t} + Y_{t}} - \frac{ξ_{t} Y_{t} - α_{t}}{α_{t} + Y_{t}}) | \\ \leq & \frac{1}{n τ_{n}^{*}} \sum_{t = 1}^{n} (\sum_{k = 1}^{t - 1} β_{1}^{k - 1}) I (Y_{t} > 0) | \frac{{\tilde{ξ}}_{t} Y_{t} - {\tilde{α}}_{t}}{{\tilde{α}}_{t} + Y_{t}} - \frac{ξ_{t} Y_{t} - α_{t}}{α_{t} + Y_{t}} | \\ \leq & \frac{1}{n τ_{n}^{*} (1 - β_{1})} \sum_{t = 1}^{n} I (Y_{t} > 0) | \frac{{\tilde{ξ}}_{t} Y_{t} - {\tilde{α}}_{t}}{{\tilde{α}}_{t} + Y_{t}} - \frac{ξ_{t} Y_{t} - α_{t}}{α_{t} + Y_{t}} | \\ = & \frac{1}{n τ_{n}^{*} (1 - β_{1})} \sum_{t = 1}^{n} I (Y_{t} > 0) | \frac{{\tilde{ξ}}_{t} Y_{t} - {\tilde{α}}_{t}}{{\tilde{α}}_{t} + Y_{t}} - \frac{{\tilde{ξ}}_{t} Y_{t} - \tilde{α_{t}}}{α_{t} + Y_{t}} + \frac{{\tilde{ξ}}_{t} Y_{t} - \tilde{α_{t}}}{α_{t} + Y_{t}} - \frac{ξ_{t} Y_{t} - α_{t}}{α_{t} + Y_{t}} | \\ = & \frac{1}{n τ_{n}^{*} (1 - β_{1})} \sum_{t = 1}^{n} I (Y_{t} > 0) | \frac{- ({\tilde{ξ}}_{t} Y_{t} - {\tilde{α}}_{t}) ({\tilde{α}}_{t} - α_{t})}{({\tilde{α}}_{t} + Y_{t}) (α_{t} + Y_{t})} + \frac{Y_{t} ({\tilde{ξ}}_{t} - ξ_{t})}{α_{t} + Y_{t}} - \frac{{\tilde{α}}_{t} - α_{t}}{α_{t} + Y_{t}} | \\ \leq & \frac{C}{n τ_{n}^{*} (1 - β_{1})} \sum_{t = 1}^{n} I (Y_{t} > 0) [C_{b}^{t - 1} (\frac{| {\tilde{ξ}}_{t} Y_{t} - {\tilde{α}}_{t} |}{α_{L}^{2}} + \frac{1}{α_{L}} + 1)] \\ \leq & \frac{C}{n τ_{n}^{*} (1 - β_{1})} \sum_{t = 1}^{n} I (Y_{t} > 0) [C_{b}^{t - 1} (\frac{{\tilde{ξ}}_{t} Y_{t}}{α_{L}^{2}} + \frac{α_{U}}{α_{L}^{2}} + \frac{1}{α_{L}} + 1)], \end{matrix}

where the second-to-last inequality comes from Lemma A6 (a) and (d). Next, we need to prove the boundedness of

\sum_{t = 1}^{n} C_{b}^{t - 1} {\tilde{ξ}}_{t} Y_{t}

.

\begin{matrix} E (\sum_{t = 1}^{n} C_{b}^{t - 1} ξ_{t} Y_{t}) \leq E (\sum_{t = 1}^{\infty} C_{b}^{t - 1} ξ_{t} Y_{t}) = \sum_{t = 1}^{\infty} C_{b}^{t - 1} E (ξ_{t} Y_{t}) \\ = & \sum_{t = 1}^{\infty} C_{b}^{t - 1} E [E (ξ_{t} Y_{t}) | ξ_{t}] = \sum_{t = 1}^{\infty} C_{b}^{t - 1} E [\frac{α_{t} ξ_{t}}{(ξ_{t} - 1) {(1 + u)}^{ξ_{t}}}] \leq \sum_{t = 1}^{\infty} C_{b}^{t - 1} E (\frac{ξ_{U} α_{U}}{ξ_{L} - 1}) < \infty . \end{matrix}

Therefore, when

n τ_{n}^{*} \to \infty

,

\frac{1}{τ_{n}^{*}} [\frac{\partial}{\partial β_{0}} {\tilde{L}}_{n} (θ_{0}) - \frac{\partial}{\partial β_{0}} L_{n} (θ_{0})] \to_{p} 0 .

□

Lemma A8.

Under the same conditions as in Theorem 2,

\frac{1}{\sqrt{n}} \sum_{t = 1}^{n} \frac{\partial l_{t} (θ_{0})}{\partial θ} \to N (0, M_{0}),

where

M_{0}

is the Fisher information matrix at

θ_{0}

.

Proof.

The proof of the lemma is similar Zhao et al. (2018) [31], so it is omitted. □

Proof of Theorem 2.

Theorem 2 can be proved by Lemmas A6–A8, and the details can be seen in Shen et al. (2020) [27]. The proof is similar to Shen et al. (2020) [27], so it is omitted. The main difference is that we denote

f_{n} (t, y) = τ_{n}^{- 2} {\tilde{L}}_{n} (β_{0} + τ_{n} t, Φ^{0} + τ_{n} y), Φ^{0} = (β_{1}^{0}, β_{2}^{0}, β_{3}^{0}, γ_{0}^{0}, γ_{0}^{1}, γ_{2}^{0}, γ_{3}^{0})

, where

t \in R, y \in R^{7}

. □

Proof of Theorem 3.

Theorem 3 can be proved by Lemmas A7 and A8, and the detail can be seen in Zhao et al. (2018) [31]. The proof is similar to Zhao et al. (2018) [31], so it is omitted. □

References

2021 World Air Quality Report. 2022. Available online: https://www.iqair.com/world-air-quality-report (accessed on 9 April 2022).
Sun, H.; Yang, X.; Leng, Z. Research on the spatial effects of haze pollution on public health: Spatial–temporal evidence from the Yangtze River Delta urban agglomerations, China. Environ. Sci. Pollut. Res. 2022, 1–20. [Google Scholar] [CrossRef] [PubMed]
Shen, W.T.; Yu, X.; Zhong, S.B.; Ge, H.R. Population health effects of air pollution: Fresh evidence from China health and retirement longitudinal survey. Front. Public Health 2021, 9, 779552. [Google Scholar] [CrossRef] [PubMed]
Maji, K.J.; Dikshit, A.K.; Arora, M.; Deshpande, A. Estimating premature mortality attributable to PM2. 5 exposure and benefit of air pollution control policies in China for 2020. Sci. Total Environ. 2018, 612, 683–693. [Google Scholar] [CrossRef] [PubMed]
Bell, J.E.; Brown, C.L.; Conlon, K.; Herring, S.; Kunkel, K.E.; Lawrimore, J.; Luber, G.; Schreck, C.; Smith, A.; Uejio, C. Changes in extreme events and the potential impacts on human health. J. Air Waste Manag. Assoc. 2018, 68, 265–287. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Raaschou-Nielsen, O.; Andersen, Z.J.; Beelen, R.; Samoli, E.; Stafoggia, M.; Weinmayr, G.; Hoffmann, B.; Fischer, P.; Nieuwenhuijsen, M.J.; Brunekreef, B.; et al. Air pollution and lung cancer incidence in 17 European cohorts: Prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). Lancet Oncol. 2013, 14, 813–822. [Google Scholar] [CrossRef]
Gan, W.Q.; Davies, H.W.; Koehoorn, M.; Brauer, M. Association of long-term exposure to community noise and traffic-related air pollution with coronary heart disease mortality. Am. J. Epidemiol. 2012, 175, 898–906. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Brunekreef, B.; Beelen, R.; Hoek, G.; Schouten, L.; Bausch-Goldbohm, S.; Fischer, P.; Armstrong, B.; Hughes, E.; Jerrett, M.; van den Brandt, P. Effects of long-term exposure to traffic-related air pollution on respiratory and cardiovascular mortality in the Netherlands: The NLCS-AIR study. Res. Rep. 2009, 139, 5–71. [Google Scholar]
Kim, S.Y.; Sheppard, L.; Kim, H. Health effects of long-term air pollution: Influence of exposure prediction methods. Epidemiology 2009, 20, 442–450. [Google Scholar] [CrossRef]
Cheng, Z.; Li, L.; Liu, J. Identifying the spatial effects and driving factors of urban PM2. 5 pollution in China. Ecol. Indic. 2017, 82, 61–75. [Google Scholar] [CrossRef]
Gan, T.; Yang, H.; Liang, W.; Liao, X. Do economic development and population agglomeration inevitably aggravate haze pollution in China? New evidence from spatial econometric analysis. Environ. Sci. Pollut. Res. 2021, 28, 5063–5079. [Google Scholar] [CrossRef]
Ma, R.; Wang, C.; Jin, Y.; Zhou, X. Estimating the effects of economic agglomeration on haze pollution in Yangtze River Delta China using an econometric analysis. Sustainability 2019, 11, 1893. [Google Scholar] [CrossRef] [Green Version]
Xie, Q.; Xu, X.; Liu, X. Is there an EKC between economic growth and smog pollution in China? New evidence from semiparametric spatial autoregressive models. J. Clean. Prod. 2019, 220, 873–883. [Google Scholar] [CrossRef]
Zhang, X.; Xu, X.; Ding, Y.; Liu, Y.; Zhang, H.; Wang, Y.; Zhong, J. The impact of meteorological changes from 2013 to 2017 on PM2. 5 mass reduction in key regions in China. Sci. China Earth Sci. 2019, 62, 1885–1902. [Google Scholar] [CrossRef]
Fontes, T.; Li, P.; Barros, N.; Zhao, P. Trends of PM2. 5 concentrations in China: A long term approach. J. Environ. Manag. 2017, 196, 719–732. [Google Scholar] [CrossRef]
Pui, D.Y.H.; Chen, S.C.; Zuo, Z. PM2.5 in China: Measurements, sources, visibility and health effects, and mitigation. Particuology 2014, 13, 1–26. [Google Scholar] [CrossRef]
Liang, X.; Zou, T.; Guo, B.; Li, S.; Zhang, H.; Zhang, S.; Huang, H.; Chen, S.X. Assessing Beijing’s PM2.5 pollution: Severity, weather impact, APEC and winter heating. Proc. R. Soc. A 2015, 471, 20150257. [Google Scholar] [CrossRef] [Green Version]
Zhang, S.; Guo, B.; Dong, A.; He, J.; Xu, Z.; Chen, S.X. Cautionary tales on air-quality improvement in Beijing. Proc. R. Soc. A 2017, 473, 20170457. [Google Scholar] [CrossRef] [Green Version]
Chen, L.; Guo, B.; Huang, J.; He, J.; Wang, H.; Zhang, S.; Chen, S.X. Assessing air-quality in Beijing-Tianjin-Hebei Region: The method and mixed tales of PM2.5 and O₃. Atmos. Environ. 2018, 193, 290–301. [Google Scholar] [CrossRef]
Wu, H.; Zheng, X.; Zhu, J.; Lin, W.; Zheng, H.; Chen, X.; Wang, W.; Wang, Z.; Chen, S.X. Improving PM2.5 forecasts in China suing an initial error transport mode. Environ. Sci. Technol. 2020, 54, 10493–10501. [Google Scholar] [CrossRef]
Wan, Y.; Xu, M.; Huang, H.; Xi Chen, S. A spatio-temporal model for the analysis and prediction of fine particulate matter concentration in Beijing. Enviromentrics 2020, 32, e2648. [Google Scholar] [CrossRef]
Zhu, Y.; Liang, Y.; Chen, S. Assessing local emission for air pollution via data experiments. Atmos. Environ. 2021, 252, 118323. [Google Scholar] [CrossRef]
Wang, J.; Cohan, D.S.; Xu, H. Spatiotemporal ozone pollution LUR models: Suitable statistical algorithms and time scales for a megacity scale. Atmos. Environ. 2020, 237, 117671. [Google Scholar] [CrossRef]
Pickands, J. Statistical inference using extreme order statistics. Ann. Stat. 1975, 3, 119–131. [Google Scholar]
Tencaliec, P.; Favre, A.C.; Naveau, P.; Prieur, C.; Nicolet, G. Flexible semiparametric generalized Pareto modeling of the entire range of rainfall amount. Environmetrics 2020, 31, e2582. [Google Scholar] [CrossRef]
Gharib, A.; Davies, E.G.R.; Goss, G.G.; Faramarzi, M. Assessment of the combined effects of threshold selection and parameter estimation of generalized Pareto distribution with applications to flood frequency analysis. Water 2017, 9, 692. [Google Scholar] [CrossRef]
Shen, Z.Y.; Chen, Y.; Shi, R.X. Modeling tail index with autoregressive conditional Pareto model. J. Bus. Econ. Stat. 2022, 40, 458–466. [Google Scholar] [CrossRef]
Chen, Y.; Yu, W. Setting the margins of Hang Seng Index Futures on different positions using an APARCH-GPD Model based on extreme value theory. Phys. A Stat. Mech. Its Appl. 2020, 544, 123207. [Google Scholar] [CrossRef]
Park, E.; Brorsen, B.W.; Harri, A. Using Bayesian Kriging for spatial smoothing in crop insurance rating. Am. J. Agric. Econ. 2019, 101, 330–351. [Google Scholar] [CrossRef]
Liu, X.H.; Zhang, X.; Xue, J.Y. Fraud risk measurement of basic medical insurance for urban and rural residents in China. Econ. Comput. Econ. Cybern. Stud. Res. 2019, 53, 277–296. [Google Scholar] [CrossRef]
Zhao, Z.; Zhang, Z.; Chen, R. Modeling maxima with autoregressive conditional Fréchet model. J. Econom. 2018, 207, 325–351. [Google Scholar] [CrossRef]
Chavez-Demoulin, V.; Embrechts, P.; Sardy, S. Extreme-quantile tracking for financial time seriesl. J. Econom. 2014, 181, 44–52. [Google Scholar] [CrossRef]
Kelly, B.; Jiang, H. Tail risk and asset prices. Rev. Financ. Stud. 2014, 27, 2841–2871. [Google Scholar] [CrossRef] [Green Version]
Massacci, D. Tail risk dynamics in stock returns: Links to the macroeconomy and global markets connectedness. Manag. Sci. 2017, 63, 3072–3089. [Google Scholar] [CrossRef]
Deng, L.; Yu, M.X.; Zhang, Z.J. Statistical learning of the worst regional smog extremes with dynamic conditional modeling. Atmosphere 2020, 11, 665. [Google Scholar] [CrossRef]
Choulakian, V.; Stephens, M.A. Goodness-of-fit tests for the generalized pareto distribution. Technometrics 2001, 43, 478–484. [Google Scholar] [CrossRef]
Bermudez, P.Z.; Turkman, M.A.A.; Turkman, K.F. A predictive approach to tail probability estimation. Extremes 2001, 4, 295–314. [Google Scholar] [CrossRef]
Bader, B.; Yan, J.; Zhang, X.B. Automated threshold selection for extreme value analysis via ordered goodness-of-fit tests with adjustment for false discovery rate. Ann. Appl. Stat. 2018, 12, 310–329. [Google Scholar] [CrossRef]
Yang, X.; Zhang, J.; Ren, W.X. Threshold selection for extreme value estimation of vehicle load effect on bridges. Int. J. Distrib. Sens. Netw. 2018, 14, 1550147718757698. [Google Scholar] [CrossRef] [Green Version]
Schneider, L.F.; Krajina, A.; Krivobokova, T. Threshold selection in univariate extreme value analysis. Extremes 2021, 24, 881–913. [Google Scholar] [CrossRef]
Boznar, M.; Lesjak, M.; Mlakar, P. A neural network-based method for short-term predictions of ambient SO₂ concentrations in highly polluted industrial areas of complex terrain. Atmos. Environ. Part B Urban Atmos. 1993, 27, 221–230. [Google Scholar] [CrossRef]
Neagu, C.D.; Avouris, N.; Kalapanidas, E.; Palade, V. Neural and neuro-fuzzy integration in a knowledge-based system for air quality prediction. Appl. Intell. 2002, 17, 141–169. [Google Scholar] [CrossRef]
Esfandani, M.A.; Nematzadeh, H. Predicting air pollution in tehran: Genetic algorithm and back propagation neural network. J. Data Min. 2016, 4, 49–54. [Google Scholar]
Amarpuri, L.; Yadav, N.; Kumar, G.; Agrawal, S. Prediction of CO₂ emissions using deep learning hybrid approach: A case study in indian context. In Proceedings of the 2019 Twelfth International Conference on Contemporary Computing (IC3), Noida, India, 8–10 August 2019. [Google Scholar]
Menéndez García, L.A.; Sánchez Lasheras, F.; García Nieto, P.J.; Álvarez de Prado, L.; Bernardo Sánchez, A. Predicting Benzene concentration using machine learning and time series algorithms. Mathematics 2020, 8, 2205. [Google Scholar] [CrossRef]
Sánchez-Pérez, J.F.; Mena-Requena, M.R.; Cánovas, M. Mathematical modeling and simulation of a gas emission source using the network simulation method. Mathematics 2020, 8, 1996. [Google Scholar] [CrossRef]
Sayeed, A.; Lops, Y.; Choi, Y.; Jung, J.; Salman, A.K. Bias correcting and extending the PM forecast by CMAQ up to 7 days using deep convolutional neural networks. Atmos. Environ. 2021, 253, 118376. [Google Scholar] [CrossRef]
Kang, S.; Song, J. Parameter and quantile estimation for the generalized Pareto distribution in peaks over threshold framework. J. Korean Stat. Soc. 2017, 46, 487–501. [Google Scholar] [CrossRef]
Balkema, A.A.; De Haan, L. Residual life time at great age. Ann. Probab. 1974, 2, 792–804. [Google Scholar] [CrossRef]
Davison, A.C.; Smith, R.L. Models for exceedances over high thresholds. J. R. Stat. Soc. Ser. B (Methodol.) 1990, 52, 393–425. [Google Scholar] [CrossRef]

Figure 1. Structure of LSTM.

Figure 2. The graph of

{PM}_{2.5}

time series in Beijing.

Figure 2. The graph of

{PM}_{2.5}

time series in Beijing.

Figure 3. Tail index

{\hat{ζ}}_{t}

estimated by MLE and simulated tail index

ζ_{t}

. (a) The DCP model with weather factors from 2015 to 2020. (b) The DCP model with weather factors from 2018 to 2020. (c) The DCP model with air quality factors from 2015 to 2020. (d) The DCP model with air quality factors from 2018 to 2020. (e) The DCP model with mixed factors from 2015 to 2020. (f) The DCP model with mixed factors from 2018 to 2020.

Figure 3. Tail index

{\hat{ζ}}_{t}

estimated by MLE and simulated tail index

ζ_{t}

. (a) The DCP model with weather factors from 2015 to 2020. (b) The DCP model with weather factors from 2018 to 2020. (c) The DCP model with air quality factors from 2015 to 2020. (d) The DCP model with air quality factors from 2018 to 2020. (e) The DCP model with mixed factors from 2015 to 2020. (f) The DCP model with mixed factors from 2018 to 2020.

Figure 4. Exponential QQ plot of real

{PM}_{2.5}

data in Beijing: (a) from 3 January 2015 to 8 August 2020; (b) from 1 January 2018 to 8 August 2020.

Figure 4. Exponential QQ plot of real

{PM}_{2.5}

data in Beijing: (a) from 3 January 2015 to 8 August 2020; (b) from 1 January 2018 to 8 August 2020.

Figure 5. Estimated tail index

{\hat{ζ}}_{t}

from the three DCP models and positive exceedances

Y_{t}

. (a) The DCP model with weather factors from 3 January 2015 to 8 August 2020 with

u = 2.4660

. (b) The DCP model with weather factors from 1 January 2018 to 8 August 2020 with

u = 0.5716

. (c) The DCP model with air quality factors from from 3 January 2015 to 8 August 2020 with

u = 2.4660

. (d) The DCP model with air quality factors from 1 January 2018 to 8 August 2020 with

u = 0.5716

. (e) The DCP model with mixed factors from from 3 January 2015 to 8 August 2020 with

u = 2.4660

. (f) The DCP model with mixed factors from 1 January 2018 to 8 August 2020 with

u = 0.5716

.

Figure 5. Estimated tail index

{\hat{ζ}}_{t}

from the three DCP models and positive exceedances

Y_{t}

. (a) The DCP model with weather factors from 3 January 2015 to 8 August 2020 with

u = 2.4660

. (b) The DCP model with weather factors from 1 January 2018 to 8 August 2020 with

u = 0.5716

. (c) The DCP model with air quality factors from from 3 January 2015 to 8 August 2020 with

u = 2.4660

. (d) The DCP model with air quality factors from 1 January 2018 to 8 August 2020 with

u = 0.5716

. (e) The DCP model with mixed factors from from 3 January 2015 to 8 August 2020 with

u = 2.4660

. (f) The DCP model with mixed factors from 1 January 2018 to 8 August 2020 with

u = 0.5716

.

Figure 6. The line graphs of fitted

{\hat{Y}}_{t}

values and real exceedances

Y_{t}

.

Figure 6. The line graphs of fitted

{\hat{Y}}_{t}

values and real exceedances

Y_{t}

.

Figure 7. Estimated standard deviation for DCP vs. GARCH. (a) The DCP model with weather factors from 3 January 2015 to 8 August 2020. (b) The DCP model with weather factors from 1 January 2018 to 8 August 2020. (c) The DCP model with air quality factors from 3 January 2015 to 8 August 2020. (d) The DCP model with air quality factors from 1 January 2018 to 8 August 2020. (e) The DCP model with mixed factors from 3 January 2015 to 8 August 2020. (f) The DCP model with mixed factors from 1 January 2018 to 8 August 2020.

Figure 8. The line graphs of predicted

{\hat{Y}}_{t}

values and real exceedances

Y_{t}

.

Figure 8. The line graphs of predicted

{\hat{Y}}_{t}

values and real exceedances

Y_{t}

.

Figure 9. The long-term prediction of

{PM}_{2.5}

values in Beijing.

Figure 9. The long-term prediction of

{PM}_{2.5}

values in Beijing.

Table 1. Parameter estimation of the DCP models. Weather1 and weather2 represent the DCP models with weather factors from 3 January 2015 to 8 August 2020 and from 1 January 2018 to 8 August 2020. Air1 and air2, and mixed1 and mixed2 are similar to weather1 and weather2.

	Weather1	Weather2	Air1	Air2	Mixed1	Mixed2
$β_{0}$	0.2989	0.0950	−0.0191	0.5932	−0.0186	0.8154
$β_{1}$	0.0000	0.9210	0.0033	0.3744	0.0000	0.0000
$β_{2}$	0.0000	0.0303	0.0000	0.0602	0.0000	0.0696
$β_{3}$	6.4028	33.6097	2.1760	0.0001	2.8285	0.2035
$β_{4}$	8.2352	0.1524	0.0042	2.0413	0.3989	2.9834
$β_{5}$	0.9159	0.4756	0.3541	0.7755	6.1266	0.0001
$β_{6}$	4.4001	1.0734	0.0001	0.0001	0.1056	0.6691
$β_{7}$					0.6196	0.0001
$γ_{0}$	−1.3618	−0.0116	−0.3471	0.3087	−1.9032	−0.6329
$γ_{1}$	0.3739	0.1931	0.3162	0.0182	0.0475	0.0524
$γ_{2}$	2.1928	1.2830	1.3070	1.0492	3.2466	2.0631
$γ_{3}$	0.3367	0.8814	0.0001	0.5450	0.0001	0.1949
$γ_{4}$	0.0666	0.0192	0.1809	0.1848	0.0655	0.1533
$γ_{5}$	0.1691	0.3138	0.2600	0.2209	0.1140	0.0785
$γ_{6}$	0.0001	0.0001	0.2353	0.2472	0.1141	0.1859
$γ_{7}$					0.0001	0.0001

Table 2. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with weather factors from 3 January 2015 to 8 August 2020.

Para	True Value	n = 1000				n = 2000
Para	True Value	Mean	Sd	RMSE	Abias	Mean	Sd	RMSE	Abias
$β_{0}$	0.2989	0.7935	1.4991	1.5771	0.6634	0.5055	0.9202	0.9422	0.3709
$β_{1}$	0.0000	0.2133	0.2898	0.3596	0.2133	0.1525	0.2499	0.2925	0.1525
$β_{2}$	0.0000	0.6980	1.6177	1.7604	0.6980	0.2982	0.9659	1.0100	0.2982
$β_{3}$	6.4028	6.8126	16.8403	16.8285	8.1358	8.5570	16.9945	17.1137	8.6371
$β_{4}$	8.2352	2.8520	2.4664	5.9203	5.4315	4.1413	2.5876	4.8417	4.1485
$β_{5}$	0.9159	0.8824	0.6961	0.6962	0.5726	0.8390	0.6126	0.6168	0.4842
$β_{6}$	4.4001	1.6663	1.5977	3.1656	2.8693	2.3185	1.4552	2.5390	2.1497
$γ_{0}$	−1.3618	−1.0625	0.8750	0.9239	0.6968	−1.2880	0.9361	0.9380	0.6209
$γ_{1}$	0.3739	0.3519	0.1439	0.1455	0.1155	0.3514	0.1119	0.1140	0.0887
$γ_{2}$	2.1928	1.9536	0.9417	0.9707	0.7376	2.1537	1.0054	1.0051	0.6710
$γ_{3}$	0.3367	1.1742	5.1284	5.1912	1.0035	1.0593	5.0468	5.0933	0.8586
$γ_{4}$	0.0666	0.1206	0.1802	0.1880	0.0764	0.0795	0.0662	0.0674	0.0363
$γ_{5}$	0.1691	0.3291	0.2729	0.3161	0.1722	0.2380	0.1296	0.1467	0.0889
$γ_{6}$	0.0001	0.0259	0.0639	0.0689	0.0258	0.0117	0.0237	0.0263	0.0116

Table 3. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with air quality factors from 3 January 2015 to 8 August 2020.

Para	True Value	n = 1000				n = 2000
Para	True Value	Mean	Sd	RMSE	Abias	Mean	Sd	RMSE	Abias
$β_{0}$	−0.0191	0.6258	1.8437	1.9514	0.7166	0.6809	2.0694	2.1826	0.7723
$β_{1}$	0.0033	0.2380	0.3223	0.3985	0.2368	0.1968	0.2839	0.3434	0.1959
$β_{2}$	0.0000	1.3443	2.2793	2.6442	1.3443	1.0134	2.1222	2.3499	1.0134
$β_{3}$	2.1760	13.6808	28.8885	31.0683	13.3849	14.1147	28.1351	30.5374	13.8677
$β_{4}$	0.0042	3.7857	6.2986	7.3412	3.7834	4.3086	8.0154	9.0910	4.3064
$β_{5}$	0.3541	1.3850	1.9586	2.2116	1.3261	1.2559	1.7233	1.9435	1.1972
$β_{6}$	0.0001	1.6231	2.5665	3.0344	1.6230	1.4659	2.3098	2.7337	1.4658
$γ_{0}$	−0.3471	-0.3156	0.2649	0.2665	0.1966	−0.3339	0.2246	0.2247	0.1594
$γ_{1}$	0.3162	0.2823	0.1097	0.1147	0.0875	0.2996	0.0882	0.0896	0.0689
$γ_{2}$	1.3070	1.5039	0.7812	0.8049	0.3558	1.3508	0.2809	0.2840	0.1981
$γ_{3}$	0.0001	0.1969	1.8029	1.8118	0.1968	0.0543	0.5805	0.5825	0.0542
$γ_{4}$	0.1809	0.2698	0.2871	0.3003	0.1461	0.2182	0.2626	0.2650	0.0926
$γ_{5}$	0.2600	0.3429	0.2274	0.2418	0.1666	0.2927	0.1313	0.1352	0.0965
$γ_{6}$	0.2353	0.3471	0.3084	0.3277	0.2023	0.2860	0.1691	0.1764	0.1214

Table 4. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with mixed weather and air quality factors from 3 January 2015 to 8 August 2020.

Para	True Value	n = 1000				n = 2000
Para	True Value	Mean	Sd	RMSE	Abias	Mean	Sd	RMSE	Abias
$β_{0}$	−0.0186	0.4915	1.5853	1.6639	0.5978	0.3881	1.2049	1.2705	0.4428
$β_{1}$	0.0000	0.2219	0.3009	0.3736	0.2219	0.2228	0.3111	0.3824	0.2228
$β_{2}$	0.0000	0.5849	1.6337	1.7337	0.5849	0.3853	1.2311	1.2888	0.3853
$β_{3}$	2.8285	7.7282	19.5914	20.1758	7.7747	8.2657	19.3077	20.0401	7.9301
$β_{4}$	0.3989	1.5797	2.8494	3.0817	1.6143	1.7926	2.9913	3.2974	1.7668
$β_{5}$	6.1266	2.9585	4.8771	5.8116	5.1223	3.1062	4.3778	5.3150	4.7418
$β_{6}$	0.1056	1.0466	1.1051	1.4507	0.9986	0.9204	0.9381	1.2418	0.8764
$β_{7}$	0.6196	1.3400	2.0752	2.1947	1.3241	1.4397	2.2763	2.4174	1.4125
$γ_{0}$	−1.9032	−1.7337	0.9557	0.9697	0.7988	−1.8052	0.8761	0.8807	0.7080
$γ_{1}$	0.0475	0.0730	0.0899	0.0933	0.0705	0.0666	0.0803	0.0824	0.0628
$γ_{2}$	3.2466	3.0196	0.9698	0.9950	0.8249	3.0970	0.8919	0.9035	0.7355
$γ_{3}$	0.0001	0.0243	0.3376	0.3381	0.0242	0.0007	0.0033	0.0034	0.0006
$γ_{4}$	0.0655	0.0712	0.0517	0.0520	0.0319	0.0688	0.0326	0.0328	0.0236
$γ_{5}$	0.1140	0.1621	0.1388	0.1467	0.0754	0.1370	0.0646	0.0685	0.0486
$γ_{6}$	0.1141	0.1626	0.1192	0.1286	0.0717	0.1345	0.0587	0.0621	0.0428
$γ_{7}$	0.0001	0.0144	0.0311	0.0342	0.0143	0.0088	0.0136	0.0161	0.0087

Table 5. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with weather factors from 1 January 2018 to 8 August 2020.

Para	True Value	n = 1000				n = 2000
Para	True Value	Mean	Sd	RMSE	Abias	Mean	Sd	RMSE	Abias
$β_{0}$	0.0950	0.7499	1.5926	1.7205	0.6839	0.3280	0.8420	0.8728	0.2556
$β_{1}$	0.9210	0.6431	0.3598	0.4543	0.3029	0.8072	0.2381	0.2637	0.1355
$β_{2}$	0.0303	0.4727	1.5367	1.5976	0.4729	0.1854	0.7942	0.8084	0.1764
$β_{3}$	33.6097	24.5359	38.6197	39.6337	36.2003	29.1089	40.3405	40.5506	36.7034
$β_{4}$	0.1524	1.4197	1.9596	2.3321	1.3589	0.4509	0.6220	0.6893	0.3917
$β_{5}$	0.4756	1.0975	1.2445	1.3901	0.9800	0.5554	0.5925	0.5972	0.4135
$β_{6}$	1.0734	1.5782	1.6604	1.7339	1.2105	1.1625	0.8826	0.8862	0.6713
$γ_{0}$	−0.0116	0.1267	0.2986	0.3289	0.2612	0.0886	0.2156	0.2376	0.1884
$γ_{1}$	0.1931	0.1890	0.0823	0.0823	0.0669	0.1917	0.0574	0.0573	0.0451
$γ_{2}$	1.2830	1.1413	0.3072	0.3380	0.2702	1.1741	0.2223	0.2473	0.2020
$γ_{3}$	0.8814	1.0276	0.7411	0.7547	0.4188	0.9004	0.3328	0.3330	0.2592
$γ_{4}$	0.0192	0.0256	0.0366	0.0371	0.0223	0.0222	0.0204	0.0206	0.0163
$γ_{5}$	0.3138	0.4060	0.1500	0.1759	0.1187	0.3642	0.0810	0.0953	0.0718
$γ_{6}$	0.0001	0.0185	0.0297	0.0349	0.0184	0.0107	0.0165	0.0196	0.0106

Table 6. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with air quality factors from 1 January 2018 to 8 August 2020.

Para	True Value	n = 1000				n = 2000
Para	True Value	Mean	Sd	RMSE	Abias	Mean	Sd	RMSE	Abias
$β_{0}$	0.5932	0.9593	1.1273	1.1842	0.4807	0.6085	0.0871	0.0884	0.0667
$β_{1}$	0.3744	0.2434	0.2857	0.3140	0.2832	0.3525	0.0805	0.0834	0.0613
$β_{2}$	0.0602	0.3507	1.1224	1.1583	0.3455	0.0638	0.0350	0.0351	0.0271
$β_{3}$	0.0001	10.7321	25.8047	27.9236	10.7320	1.4567	9.0550	9.1625	1.4566
$β_{4}$	2.0413	4.6154	7.3362	7.7678	4.3355	2.1536	0.3725	0.3887	0.2452
$β_{5}$	0.7755	1.4947	2.1804	2.2939	1.3944	0.8279	0.5246	0.5267	0.2043
$β_{6}$	0.0001	1.0952	2.2492	2.4996	1.0951	0.0561	0.1118	0.1249	0.0560
$γ_{0}$	0.3087	0.3379	0.1903	0.1923	0.1488	0.3385	0.1461	0.1490	0.1185
$γ_{1}$	0.0182	0.0546	0.0650	0.0744	0.0516	0.0392	0.0456	0.0502	0.0358
$γ_{2}$	1.0492	0.9473	0.2153	0.2380	0.1910	0.9757	0.1639	0.1795	0.1448
$γ_{3}$	0.5450	0.4920	0.2818	0.2864	0.2076	0.4996	0.2112	0.2158	0.1548
$γ_{4}$	0.1848	0.2208	0.1069	0.1128	0.0833	0.2031	0.0690	0.0714	0.0420
$γ_{5}$	0.2209	0.2539	0.0943	0.0998	0.0752	0.2445	0.0598	0.0642	0.0441
$γ_{6}$	0.2472	0.2810	0.0979	0.1035	0.0771	0.2714	0.0581	0.0629	0.0479

Table 7. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with mixed weather and air quality factors from 1 January 2018 to 8 August 2020.

Para	True Value	n = 1000				n = 2000
Para	True Value	Mean	Sd	RMSE	Abias	Mean	Sd	RMSE	Abias
$β_{0}$	0.8154	0.8577	0.9534	0.9534	0.3941	0.6715	0.9247	0.9349	0.2631
$β_{1}$	0.0000	0.2211	0.2985	0.3712	0.2211	0.0501	0.1502	0.1582	0.0501
$β_{2}$	0.0696	0.3003	0.9290	0.9563	0.3031	0.0770	0.0731	0.0734	0.0559
$β_{3}$	0.2035	12.2266	26.3745	28.9616	12.1660	6.1382	14.5687	15.7176	6.0234
$β_{4}$	2.9834	2.8016	3.4427	3.4440	2.8849	2.5224	1.1180	1.2083	0.7971
$β_{5}$	0.0001	1.7871	3.2419	3.6989	1.7870	0.9101	1.7622	1.9817	0.9100
$β_{6}$	0.6691	1.0305	1.0784	1.1364	0.8332	0.9123	0.9829	1.0116	0.5585
$β_{7}$	0.0001	1.5230	2.4326	2.8679	1.5229	0.6633	1.4938	1.6330	0.6632
$γ_{0}$	−0.6329	−0.4751	0.3628	0.3953	0.3139	-0.9250	1.1942	1.2282	0.6621
$γ_{1}$	0.0524	0.0620	0.0667	0.0673	0.0538	0.0630	0.0559	0.0569	0.0429
$γ_{2}$	2.0631	1.8830	0.3908	0.4299	0.3385	2.3010	1.1443	1.1677	0.6600
$γ_{3}$	0.1949	0.1904	0.0851	0.0852	0.0628	0.2905	0.9609	0.9647	0.1879
$γ_{4}$	0.1533	0.1799	0.0607	0.0662	0.0471	0.1689	0.0915	0.0927	0.0539
$γ_{5}$	0.0785	0.0982	0.0416	0.0460	0.0344	0.0877	0.0740	0.0745	0.0319
$γ_{6}$	0.1859	0.2243	0.0631	0.0738	0.0533	0.2061	0.1122	0.1139	0.0612
$γ_{7}$	0.0001	0.0123	0.0210	0.0243	0.0122	0.0079	0.0143	0.0163	0.0078

Table 8. Comparison of the DCP and DCW models based on AIC and BIC criteria.

		AIC			BIC
		Weather	Air	Mixed	Weather	Air	Mixed
3 January 2015–8 August 2020	DCP	588.9960	535.8680	531.0784	666.7786	613.6507	619.9729
3 January 2015–8 August 2020	DCW	3327.6850	3907.5960	4908.1610	3411.0240	3990.9350	5002.6110
1 January 2018–8 August 2020	DCP	1035.3950	1019.2530	1001.3240	1035.3950	1086.0280	1077.6380
1 January 2018–8 August 2020	DCW	2717.4300	3219.0270	1700.4180	2788.9740	3290.5710	1700.4180

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, C.; Zhao, X.; Cheng, W.; Ji, Q.; Duan, Q.; Han, Y. Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors. Mathematics 2022, 10, 1433. https://doi.org/10.3390/math10091433

AMA Style

Huang C, Zhao X, Cheng W, Ji Q, Duan Q, Han Y. Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors. Mathematics. 2022; 10(9):1433. https://doi.org/10.3390/math10091433

Chicago/Turabian Style

Huang, Chunli, Xu Zhao, Weihu Cheng, Qingqing Ji, Qiao Duan, and Yufei Han. 2022. "Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors" Mathematics 10, no. 9: 1433. https://doi.org/10.3390/math10091433

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors

Abstract

1. Introduction

2. DCP Model

2.1. Conditional Distribution

2.2. Model Specification

3. Estimation and Properties

3.1. Maximum Likelihood Estimation

3.2. Statistical Properties

4. Long Short-Term Memory Model

5. Simulation Study

6. Real Data Applications

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI