Next Article in Journal
Choosing Industrial Zones Multi-Criteria Problem Solution for Chemical Industries Development Using the Additive Global Criterion Method
Next Article in Special Issue
Short- and Medium-Term Power Demand Forecasting with Multiple Factors Based on Multi-Model Fusion
Previous Article in Journal
Hermite–Hadamard-Type Inequalities and Two-Point Quadrature Formula
Previous Article in Special Issue
Effect of Money Supply, Population, and Rent on Real Estate: A Clustering Analysis in Taiwan
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors

1
Faculty of Science, Beijing University of Technology, Beijing 100124, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
4
Faculty of Humanities and Social Sciences, Beijing University of Technology, Beijing 100124, China
5
School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(9), 1433; https://doi.org/10.3390/math10091433
Submission received: 14 March 2022 / Revised: 19 April 2022 / Accepted: 21 April 2022 / Published: 24 April 2022
(This article belongs to the Special Issue Computational Statistics and Data Analysis)

Abstract

:
Air pollution is a major global problem, closely related to economic and social development and ecological environment construction. Air pollution data for most regions of China have a close correlation with time and seasons and are affected by multidimensional factors such as meteorology and air quality. In contrast with classical peaks-over-threshold modeling approaches, we use a deep learning technique and three new dynamic conditional generalized Pareto distribution (DCP) models with weather and air quality factors for fitting the time-dependence of the air pollutant concentration and make statistical inferences about their application in air quality analysis. Specifically, in the proposed three DCP models, a dynamic autoregressive exponential function mechanism is applied for the time-varying scale parameter and tail index of the conditional generalized Pareto distribution, and a sufficiently high threshold is chosen using two threshold selection procedures. The probabilistic properties of the DCP model and the statistical properties of the maximum likelihood estimation (MLE) are investigated, simulating and showing the stability and sensitivity of the MLE estimations. The three proposed models are applied to fit the PM 2.5 time series in Beijing from 2015 to 2021. Real data are used to illustrate the advantages of the DCP, especially compared to the estimation volatility of GARCH and AIC or BIC criteria. The DCP model involving both the mixed weather and air quality factors performs better than the other two models with weather factors or air quality factors alone. Finally, a prediction model based on long short-term memory (LSTM) is used to predict PM 2.5 concentration, achieving ideal results.

1. Introduction

Air pollution, closely related to economic and social development as well as ecological environment construction, is a global problem that destroys human living environments. In recent years, the Chinese government has attached great importance to the prevention and control of air pollution. The World Air Quality Report 2021 released by the Swiss company IQAir pointed out that air quality in China continued to improve in 2021. Compared to 2020, PM 2.5 concentrations decreased in 66% of Chinese cities [1]. However, China still faces environmental challenges. Air pollution is mainly composed of harmful gases and particulate matter, which are released into the atmosphere by natural or human activities, and its concentration is far beyond the self-purification capacity of the atmosphere, resulting in changes in the composition of the atmosphere, endangering human health and living environments. Smog, as a seriously harmful air pollutant, has received growing attention. Severe smog levels pose a huge threat to China’s public health [2,3,4,5]. Short-term exposure to air pollution will cause cough, dyspnea, headache, fatigue and other phenomena, while long-term exposure to air pollution will lead to respiratory diseases, cardiovascular damage, nervous system damage and other diseases, and may even lead to birth defects and death [6,7,8,9]. Climate warming, sea-level rises, acid rain, the hole in the ozone layer and other particulate pollution directly highlight the environmental problems caused by air pollution, harming human survival and development. To improve air quality and human living environments, air pollution has become a key topic for researchers, and monitoring, assessment, prediction and prevention have become important research directions in the study of air pollution. Smog is also closely linked to China’s economic development [10,11,12,13]. It is necessary to make full use of multidimensional big data and give full play to the advantages of statistics and artificial intelligence technology. By virtue of interdisciplinary development, researchers have been able to vigorously develop statistical modeling theory and deep learning technology for accurate prediction and effective control of urban air quality. As a result, a solid theoretical foundation and effective technical support can be provided for improving the capabilities and level of ecological and environmental conservation.
Fine particulate matter ( PM 2.5 ) is the most common object in the studies of pollutant concentration, and the higher the concentration in the air, the more serious the air pollution [14,15,16]. From a statistical point of view, pollutant concentration prediction has become an important research direction in air pollution forecasting and prevention. The existing research on air pollutant concentration mainly focuses on the sources, concentration distributions, fluctuations, affecting factors, adverse effects on human health and so on. Quantitative prediction of pollutant concentration is the most common statistical method for dealing with air pollution problems, and multivariate regression, cluster analysis and principal component analysis are the most frequently used statistical models. Chen Songxi, an academician of the Chinese Academy of Sciences, applied non-parametric statistics to the national air pollution assessment and prevention research and proposed a method for adjusting spatio-temporal meteorological factors to remove the meteorological confounding effect in atmospheric environmental monitoring, providing a scientific method for accurately measuring pollutant discharge and evaluating air pollution control [17,18,19,20,21,22]. Wang et al. (2020) [23] established a spatio-temporal O 3 pollution land use regression (LUR) model suitable for large cities based on parametric, non-parametric and semi-parametric classical statistical algorithms combined with meteorological factors, with the ability to monitor O 3 concentration with high spatio-temporal resolution.
In order to improve the prediction accuracy, classical extreme value theory (EVT) has attracted more and more attention. Compared with fine weather, researchers are more concerned with observations of pollutant concentrations exceeding a certain high threshold. Pickands (1975) [24] pointed out that observations above a certain threshold can be approximated well by the generalized Pareto distribution (GPD). As a branch of EVT, GPD plays an important role in many fields. In the field of hydrometeorology, the GPD model is used to analyze and forecast natural phenomena such as floods, wind and rainfall [25,26]. In the financial field, stock yield is non-normal and thick-tailed, which can be well fitted by a GPD [27,28]. In the field of insurance, insurance losses are generally non-negative with a thick tail, and a GPD is usually used to predict the maximum loss [29,30].
Due to the strong time correlation of observations, a traditional model with fixed parameters cannot perfectly fit the time-series observations in reality. To solve this problem, many researchers have conducted in-depth research on the dynamic extreme value distribution model and the dynamic over-threshold GPD model. Using the autoregressive mechanism of the GARCH model, Zhao et al. (2018) [31] established an autoregressive conditional Fréchet model with time-dependent parameters (type II GEV) for the sequence of daily maximum stock returns. They solved the maximum likelihood estimation of the model parameters and studied the large-sample properties of the parameters. Chavez-Demoulin et al. (2014) [32] applied a Bayesian method to update the time-varying GPD parameters for the UBS stock price, which was a non-parametric method applied to a POT–GPD model. Kelly and Jiang (2014) [33] built a dynamic tail model with POT–GDP for panel data and measured the tail risk of the S&P 500 index. Massacci (2017) [34] studied the time-dependent dynamic parameter estimation of a GPD through the score-based approach, in order to accurately estimate the tail index from U.S. size-sorted decile stock portfolios. Shen et al. (2020) [27] established an autoregressive conditional Pareto (ACP) distribution model via an exponential function. The maximum likelihood estimation of the parameter was given, and its properties were studied. Based on the parameter estimation, they employed the ACP model to the Dow Jones Industrial Average and the S&P 500 index. Deng et al. (2020) [35] applied a dynamic model to air quality management, taking time and meteorological factors into consideration and establishing a dynamic conditional autoregressive Weibull distribution model (type III GEV) via the maximum daily pollutant concentrations. The probabilistic properties of the autoregressive model were investigated in their study. As is well known, it is difficult to fit the PM 2.5 time series accurately, so the model selection and statistical inferences are the most important and common challenges for real data applications.
Threshold selection is a critical issue for fitting the autoregressive conditional generalized Pareto distribution model. In practice, the threshold should be chosen in advance. If the threshold is too large, the sample size of observations exceeding the threshold will be too small, which may increase the variance of the parameter estimation and affect the estimation effect. If the threshold is too small, the sample size can be increased, but the estimator is prone to bias. Choulakian and Stephens (2001) [36] transformed the threshold selection into the goodness-of-fit test of the model. Through the selection method, an appropriate threshold was chosen, allowing the exceedance to follow the GPD, and the threshold selection was carried out at the same time as testing the model. Bermudez et al. (2001) [37] used a Bayesian predictive approach to the peaks-over-threshold (POT) method, which can also be applied to small-sample situations. Bader et al. (2018) [38] proposed an automated threshold selection procedure based on a sequence of goodness-of-fit tests, and attained automatic threshold selection by applying stopping rules, which transform the results of ordered, sequentially tested hypotheses to control the false discovery rate. Yang et al. (2018) [39] developed an empirical threshold selection method based on the relationship between eigenvalues and thresholds. Schneider et al. (2021) [40] proposed selecting the threshold by minimizing the asymptotic mean squared error of the Hill estimator.
With the continuous development of artificial intelligence and machine learning, an increasing number of scholars have applied traditional machine learning methods to statistical prediction models in recent years and achieved good results in terms of accuracy and time efficiency. Boznar et al. (1993) [41] compared prediction results based on the three-layer neural network perceptron with the results generated by a traditional atmospheric diffusion model. Neagu et al. (2002) [42] used a fuzzy neural network model to predict the concentration of nitrogen oxide pollutants, achieving good results. Esfandani and Nematzadeh (2016) [43] proposed a prediction model for air quality in Tehran based on a feedback neural network. Amarpuri et al. (2019) [44] established a convolutional long short-term memory network to predict carbon dioxide emissions and achieved ideal results. An air pollution prediction model based on LSTM is a good choice for predicting PM 2.5 concentrations. García et al. (2020) [45] analyzed the concentrations of nitrogen dioxide ( NO 2 ), nitrogen oxides ( NO X ), particulate matter ( PM 10 ) and toluene ( C 7 H 8 ) at eight sites in Madrid (Spain) through seven regression-based machine learning models and time-series models. Sánchez-Pérez et al. (2020) [46] established a complete spatio-temporal dispersion model for pollutants through a network simulation method, to obtain the concentrations of pollutants released at any time in a given space. Sayeed et al. (2021) [47] used a generalized deep convolutional neural network (CNN) model to predict air pollutants, which could predict the hourly pollutant concentration within 7 days with relatively high accuracy.
In this paper, dynamic autoregressive mechanisms are applied, and weather and air quality factors are also involved in our model. The framework of this paper follows that of [27]. The three main contributions of this paper are as follows. First, we construct a dynamic conditional generalized Pareto distribution (DCP) with both weather and air quality factors to fit the smog observations, considering the time-dependency of the scale parameter and the tail index of the GPD. Secondly, the threshold is chosen using a threshold selection program rather than by specifying a quantile. Thirdly, the PM 2.5 time series are predicted by combining the DCP model with deep learning technology.

2. DCP Model

2.1. Conditional Distribution

The cumulative distribution function of a three-parameter GPD is defined as
G μ , ξ , σ ( x ) = { 1 1 + ξ σ ( x μ ) 1 ξ , ξ 0 1 exp ( x μ σ ) , ξ = 0 ,
where μ ( , ) is the location parameter, ξ ( , ) is the shape parameter and σ ( 0 , ) the scale parameter. In Equation (1), μ x < when ξ 0 , μ < x < μ σ / ξ when ξ < 0 . In particular, the GPD is an exponential distribution when ξ = 0 .Additionally, the two-parameter GPD should be mentioned here for its extensive use, especially in parameter estimation. The classical two-parameter GPD( ξ , σ ) is obtained by taking μ = 0 in (1) [48]. The GPD has an important property that if X is a random variable (r.v.) distributed according to a GPD( ξ , σ ), then the r.v. Y = X u | X > u has a GPD( ξ , σ + ξ u ) for a threshold u. This means that the shape parameter does not alter in the “excess over the threshold” operation.
Let { Q t } t = 1 n be the time sequence of the daily moving average concentration of PM 2.5 at time t for smog occurrences, where n denotes the size of the observations. Let F t ( q t | F t 1 ) be the conditional cumulative distribution of Q t , where F t 1 denotes the available information set until time t 1 . In practice, the underlying distribution F t ( q t | F t 1 ) of the dataset is unknown. Based on the famous Pickands–Balkema–de Haan theorem [24,49], the standard practice is to employ GPD modeling for the tail region if the dataset under the POT framework of the original distribution is in the maximum domain of attraction. The obvious limitation is that the time characteristics of Q t are totally ignored, which may result in the loss of some sample information and cause the statistical inference result to be inaccurate if the dataset depends strongly on time. To solve this problem, Massacci (2017) [34] and Shen et al. (2020) [27] proposed a dynamic GPD framework under the parameters ( u , ζ t , α t ) as follows:
G t u ( q t | F t 1 ) = 1 1 + q t u α t ζ t , q t > u > 0 , ζ t > 0 , α t > 0 ,
where the parameters α t and ζ t are time varying, and ζ t is the tail index and u is the selected threshold. When the POT approach is employed, based on the Pickands–Balkema–de Haan theorem [24,49], the conditional distribution of the positive excess F t u ( q t | F t 1 ) can be approximated by the dynamic GPD if the distribution satisfies the condition of the theorem. That is,
lim u + sup u q t < + | F t u ( q t | F t 1 ) G t u ( q t | F t 1 ) | = 0 ,
where
F t u ( q t | F t 1 ) = P ( u < Q t q t | Q t > u ; F t 1 ) = F t ( q t | F t 1 ) F t ( u | F t 1 ) 1 F t ( u | F t 1 ) , 0 < u q t ,
F t u ( q t | F t 1 ) is assumed to be the dynamic GPD, so we can rewrite F t ( q t | F t 1 ) as
F t ( q t | F t 1 ) = P ( Q t q t | F t 1 ) = [ 1 F t ( u | F t 1 ) ] F t u ( q t | F t 1 ) + F t ( u | F t 1 ) = P t G t u ( q t | F t 1 ) + 1 P t ,
where P t = P ( Q t > u | F t 1 ) .
For a given threshold value u, we focus on the exceedance Q t > u and define Y t = max ( Q t u , 0 ) . Based on Equation (2), the corresponding conditional cumulative distribution function P ( Y t y t | F t 1 ) = H t ( y t | F t 1 ) of Y t is as follows [27,34]:
H t ( y t | F t 1 ) = I ( y t = 0 ) ( 1 P t ) + I ( y t > 0 ) F t ( y t + u | F t 1 ) = I ( y t = 0 ) ( 1 P t ) + I ( y t > 0 ) P t G t u ( y t + u | F t 1 ) + 1 P t = I ( y t = 0 ) ( 1 P t ) + I ( y t > 0 ) 1 P t 1 + y t α t ζ t ,
where I ( · ) is an indicator function. P t is approximated as a power law multiplied by a time-varying function slowly varying at infinity. Massacci (2017) [34] parameterized the function and obtained the following formula for P t :
P t = ( 1 + u ) ζ t .
From Equation (3), the cumulative distribution function H t ( Y t | F t 1 ) of Y t becomes
H t ( y t | F t 1 ) = I ( y t = 0 ) [ 1 ( 1 + u ) ζ t ] + I ( y t > 0 ) 1 ( 1 + u ) ζ t 1 + y t α t ζ t , u > 0 , α t > 0 , ζ t > 0 .
By solving the inverse function of Y t , we obtain
Y t = α t I ( P t > Z t ) P t Z t 1 ζ t 1 ,
where Z t follows a uniform distribution in (0,1) and P t is as given in (3). Equation (4) contains three distributions from EVT: the POT framework for the GPD, the power law for the conditional probability of Y t greater than 0 and the uniform distribution of Z t .

2.2. Model Specification

Shen et al. (2022) [27] assumed that ζ t / α t = b for simplicity, and the form of α t was modeled as follows:
log α t = β 0 + β 1 log α t 1 + β 2 exp ( β 3 Y t 1 ) ,
where 0 β 1 1 , β 2 > 0 , β 3 > 0 .
In this paper, after threshold selection for the dynamic conditional generalized Pareto (DCP) model, we concentrate on the autoregression of both  α t and ζ t , which are the critical parameters reflecting the tail behavior. We impose a dynamic structure on the time-dependent parameters ( α t , ζ t ) and consider weather and air quality factors.
Specifically, the DCP model with weather and air quality factors assumes the form
log α t = β 0 + β 1 log α t 1 + η 1 ( Q t 1 , T t 1 , H t 1 , W t 1 , S O 2 t 1 , N O 2 t 1 , C O t 1 ) ,
log ζ t = γ 0 + γ 1 log ζ t 1 + η 2 ( Q t 1 , T t 1 , H t 1 , W t 1 , S O 2 t 1 , N O 2 t 1 , C O t 1 ) ,
where β 1 , γ 1 ( 0 , 1 ) , β 0 , γ 0 R , η 1 ( · ) and η 2 ( · ) are the observation-driven functions for log α t and log ζ t . T t , H t , W t denote daily average temperature, average relative humidity and average wind speed, and S O 2 t , N O 2 t , C O t denote daily moving average concentrations of sulfur dioxide, nitrogen dioxide and carbon monoxide on day t, respectively. These three weather factors and three air quality factors are commonly considered in studies on smog. Other weather and air quality factors could also be considered, but the model complexity may be increased, and the effect may be weakened by adding too many factors.
We use continuous monotonic exponential functions in η 1 ( · ) and η 2 ( · ) , as in other studies in the literature [27,31], for simplicity, flexibility and easy interpretation. From Equation (5), there exists a positive association between α t and Y t , while ζ t and Y t are negatively correlated. An increasing η 1 ( · ) and a decreasing η 2 ( · ) ensure that a large Y t is followed by a large α t and small ζ t , so we choose the autoregressive process with weather factors as
log α t = β 0 + β 1 log α t 1 β 2 exp ( β 3 Y t 1 + β 4 T t 1 + β 5 W t 1 + β 6 H t 1 ) ,
log ζ t = γ 0 + γ 1 log ζ t 1 + γ 2 exp ( γ 3 Y t 1 + γ 4 T t 1 + γ 5 W t 1 + γ 6 H t 1 ) ,
where Y t is given in (5), and { Z t } i . i . d U ( 0 , 1 ) , β i , γ i R , i = 0 , 4 , 5 , 6 , 0 β 1 γ 1 < 1 , β j , γ j > 0 , j = 2 , 3 . Equations (8) and (9) mean that an extreme event observed at time t 1 (large Y t 1 ) causes the distribution of Y t to have a larger scale (large α t ) and a heavier tail (small ζ t ). That is why the exceedances tend to occur at around the same period in our examples.
In addition, α t and ζ t of the DCP model with air quality factors and the model with mixed weather and air quality factors are expressed in the same way as in Equations (10)–(13), respectively.
log α t = β 0 + β 1 log α t 1 β 2 exp ( β 3 Y t 1 β 4 S O 2 t 1 β 5 N O 2 t 1 β 6 C O t 1 ) ,
log ζ t = γ 0 + γ 1 log ζ t 1 + γ 2 exp ( γ 3 Y t 1 γ 4 S O 2 t 1 γ 5 N O 2 t 1 γ 6 C O t 1 ) ,
where β i , γ i R , i = 0 , 4 , 5 , 6 , 0 β 1 γ 1 < 1 , β j , γ j > 0 , j = 2 , 3 .
log α t = β 0 + β 1 log α t 1 β 2 exp ( β 3 Y t 1 β 4 S O 2 t 1 β 5 C O t 1 + β 6 W t 1 + β 7 H t 1 ) ,
log ζ t = γ 0 + γ 1 log ζ t 1 + γ 2 exp ( γ 3 Y t 1 γ 4 S O 2 t 1 γ 5 C O t 1 + γ 6 W t 1 + γ 7 H t 1 ) ,
where β i , γ i R , i = 0 , 4 , 5 , 6 , 7 , 0 β 1 γ 1 < 1 , β j , γ j > 0 , j = 2 , 3 .
Specific details of model applications will be discussed in Section 5 and Section 6.

3. Estimation and Properties

In this section, we consider the maximum likelihood estimation method for estimating the parameters in the DCP models.

3.1. Maximum Likelihood Estimation

Taking the weather factors model as an example, we denote Θ s = { θ = ( β 0 , β 1 , β 2 , β 3 , β 4 , β 5 , β 6 , γ 0 , γ 1 , γ 2 , γ 3 , γ 4 , γ 5 , γ 6 ) | 0 β 1 γ 1 < 1 , β 2 > 0 , β 3 > 0 , γ 2 > 0 , γ 3 > 0 , β i , γ i R , i = 0 , 4 , 5 , 6 } as the parameter space in the DCP model with weather factors. The conditional probability function of Y t can be obtained according to Equation (4) as
h t ( Y t | F t 1 ) = I ( Y t = 0 ) [ 1 ( 1 + u ) ζ t ] + I ( Y t > 0 ) [ ζ t α t ( 1 + u ) ζ t 1 + Y t α t ζ t 1 ] ,
where u > 0 , α t > 0 , ζ t > 0 .
The corresponding log-likelihood function with respect to the parameter θ is
L n ( θ ) = 1 n t = 1 n { I ( Y t = 0 ) log [ 1 ( 1 + u ) ζ t ] + I ( Y t > 0 ) [ log ζ t log α t ζ t log ( 1 + u ) ( ζ t + 1 ) log 1 + Y t α t ] } .
Next, the process of maximum likelihood estimation is briefly introduced. With reference to [38,50], we adopt two threshold selection methods. The details are given in Section 5. After obtaining a sufficient high threshold u, we can obtain all Y t from Y t = max ( Q t u , 0 ) . We choose log α 1 as β 0 β 2 / 2 1 β 1 and log ζ 1 as γ 0 + γ 2 / 2 1 γ 1 , which lies in the middle of G = β 0 β 2 1 β 1 , β 0 1 β 1 × γ 0 1 γ 1 , γ 0 + γ 2 1 γ 1 , and obtain all α t and ζ t with the DCP model, using Y t , log α 1 , log ζ 1 and θ . Finally, we calculate the likelihood function using Equation (14) and obtain the MLE estimator of θ . The details are shown in Section 6.

3.2. Statistical Properties

The dynamic evolution Equations (6) and (7) without weather and air quality factors can be rewritten as
log α t = β 0 + β 1 log α t 1 β 2 exp { β 3 α t 1 I ( P t 1 > Z t 1 ) [ ( P t 1 / Z t 1 ) 1 / ζ t 1 1 ] } ,
log ζ t = γ 0 + γ 1 log ζ t 1 + γ 2 exp { γ 3 α t 1 I ( P t 1 > Z t 1 ) [ ( P t 1 / Z t 1 ) 1 / ζ t 1 1 ] } ,
where { Z t } is an i . i . d . sequence of uniform distribution in (0,1) random variables and P t is as given in (3).
Next, we propose the stationary and geometrically ergodic process of { α t , ζ t } given in (15) and (16).
Theorem 1.
If parameters β 2 , β 3 , γ 2 , γ 3 > 0 , β 0 , γ 0 R and 0 β 1 γ 1 < 1 , the latent process { α t , ζ t } is defined as stationary and geometrically ergodic.
Assumption 1.
Assume the parameter space Θ is a compact set of Θ s . Suppose the observations { Y t } t = 1 n are generated from a stationary and ergodic DCP process with the true parameter θ 0 which is in the interior of Θ.
Denote L n ( θ ) based on θ and an arbitrary initial value ( α ˜ 1 , ζ ˜ 1 ) as L ˜ n ( θ ) .
Theorem 2 (Consistency).
Under Assumption 1, there exists a sequence { θ ^ n } n 1 of local maximizers of L ˜ n ( θ ) such that θ ^ n p θ 0 and | | θ ^ n θ 0 | | τ n , where τ n = O p ( n r ) , 0 < r < 1 / 2 .
Theorem 3 (Asymptotic Normality).
Under the same conditions as in Theorem 2, n ( θ ^ n θ 0 ) N ( 0 , M 0 1 ) , where θ ^ n is given in Theorem 2 and M 0 is the Fisher information matrix with θ 0 .
Theorems 2 and 3 show the existence and asymptotic normality, respectively, of the MLE θ ^ n . However, the uniqueness of the MLE must be proved. Proposition 1 gives an answer to this.
Proposition 1 (Asymptotic Uniqueness).
Under the same conditions as in Theorem 2, P ( θ ^ n is the unique global maximizer of L ˜ n ( θ ) over Θ ) 1 , where θ ^ n is given in Theorem 2.
The proofs of Theorems 1–3 and Proposition 1 are shown in Appendix A.

4. Long Short-Term Memory Model

The main purpose of LSTM is to solve the problem of long-distance dependency in the training of recurrent neural networks. First, it is necessary to set the cell memory unit, and introduce the forget gate, input gate and output gate into the recurrent neural network (RNN), so that information transmission can be controlled. The state (namely the memory unit) update is also based on these “gates”, which ensure that the LSTM model can save long-distance information. Under the influence of the memory unit, these “gates” will be in a controllable range. LSTM then can save, update and read long-distance information, and gradient explosion or disappearance during training are well solved. In the time-series data for air pollution, more comprehensive long-distance dependence information can be extracted. From the perspective of the whole model, the main components are as follows: output gate o t , input gate i t , memory unit C t and forget gate f t . The structure is shown in Figure 1.
The first “gate” that the LSTM passes through is the “forget gate”, which discards part of the information in the previous memory unit. This step is realized by a sigmoid function, which uses the weighted values of the current input and the output of the previous moment to obtain a number in the range of 0–1, which controls the information transfer. The value 1 represents complete retention and 0 represents complete discarding. The details are given in Equation (17):
f t = σ W f · h t 1 , x t + b f ,
The input gate controls what information is added to the cell, and the calculation process is shown in Equations (18) and (19):
i t = σ W i · h t 1 , x t + b i ,
C t = f t · C t 1 + i t · tanh W f · h t 1 , x t + b c ,
The output gate controls what information is used for the task output at this moment, and the calculation process is shown in (20) and (21):
o t = σ W 0 · h t 1 , x t + b 0 ,
h t = o t · tanh C t ,
In the above equations, W i , W f and W 0 denote the weight matrices of the corresponding gate, b i , b f and b 0 denote the corresponding gate bias matrices, σ and tanh denote the activation functions, o t denotes the output gate, i t denotes the input gate, C t denotes the memory unit, f t denotes the forget gate, x t denotes the input at time t and h t denotes the output at time t.

5. Simulation Study

In this section, the performance of the MLE for the DCP models is investigated using six numerical experiments. To investigate the performance of the MLE, we generate data from the three DCP models given in (5) and (8)–(13), with the parameters shown in Table 1. These sets of parameters are the MLEs obtained from an analysis of real observations in Beijing from 3 January 2015 to 8 August 2020 and from 1 January 2018 to 8 August 2020, where the weather factors are from the China Meteorological Data Service Center and the air quality factors are from the China National Environmental Monitoring Center. In addition, the estimations of β 1 and β 2 are close to 0, especially in the three models from 3 January 2015 to 8 August 2020, which indicates that the scale parameter α t can be considered a constant to a certain extent (a consideration that will be realized in future research). Due to more attention being given to the tail index ζ t and the wider applicability of the DCP models, we made no changes to α t .
Figure 2 displays a line graph of the PM 2.5 concentration time series in Beijing. As shown in Figure 2, with the improvement in the national environmental governance level and public awareness of environmental protection, the PM 2.5 concentration generally shows a downward trend. Figure 2 shows that 2018 is a noteworthy year with significant governance effects, suggesting that PM 2.5 concentrations after this need to be analyzed separately. Hence, real data from 1 January 2018 to 8 August 2020 are also fitted to the three DCP models, in addition to the real data from 3 January 2015 to 8 August 2020. According to the World Air Quality Report 2021 released by IQAir [1], China has seen a 21% overall reduction in annual PM 2.5 concentrations since 2018, which justifies the separation of the data from 1 January 2018 to 8 August 2020.
Threshold selection is a key issue in extreme value analysis based on the POT method. For these two sets of observations, from 3 January 2015 to 8 August 2020 and 1 January 2018 to 8 August 2020, we select two thresholds determined by Bader et al. (2018) [38] and Davison and Smith (1990) [50]. Bader et al. (2018) [38] proposed an automated threshold selection procedure using a stop rule that controls the false discovery rate in ordered hypothesis testing. The ForwardStop rule provides an automated selection procedure combined with sequential hypothesis testing when the level of desired error control and a set of thresholds are given. Based on the goodness-of-fit of the GPD, Davison and Smith (1990) [50] proposed a threshold selection approach where the threshold is chosen as the lowest value above which the GPD fits the exceedances adequately. In this study, the threshold selection results were 2.4660 using the method of Bader et al. (2018) [38] for PM 2.5 data from 3 January 2015 to 8 August 2020 and 0.5716 using the method of Davison and Smith (1990) for PM 2.5 data from 1 January 2018 to 8 August 2020, and approximately 4 % and 18 % of the corresponding real data exceeded these two thresholds, respectively, ensuring a sufficiently high value.
Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 show the averages of the mean values and the standard deviations with different sample sizes from the three DCP models in the above two periods. We also calculated the corresponding root mean squared error (RMSE) and absolute bias (Abias) to measure the estimation effect, as shown in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7. We obtained simulated exceedances Y t with lengths of 1000 and 2000, respectively. The experiments were repeated 500 times for each sample size. As shown in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7, the RMSE and Abias values for the parameter estimations using the real data from 1 January 2018 to 8 August 2020 were mostly smaller than those from 3 January 2015 to 8 August 2020, while those for the sample size of 2000 were mostly smaller than those for the sample size of 1000, and the parameter estimation of the tail index ζ t was better than that of α t . The values of RMSE and Abias explain the validity of our estimation.
To enable observation of the performance of our model more directly, Figure 3 depicts the dynamics of the tail index ζ ^ t estimated by MLE (red line) and the simulated tail index ζ t (black line) under the experiments for n = 2000. We can see that the estimated tail index ζ ^ t was almost the same as the simulated tail index ζ t in the three DCP models. In addition, we calculated the correlation between the two series, and the results were 0.9328, 0.9559, 0.9921, 0.9964, 0.9591 and 0.9679, corresponding to Figure 3a–f, respectively, which shows the similarity of the two curves better. However, we cannot judge the simulation effect from the similarity of the two curves only. Figure 3 illustrates the sufficiency of our estimation.

6. Real Data Applications

In this section, to verify the performance of the DCP models, we consider the weather and air quality time series in Beijing from 3 January 2015 to 8 August 2020 and from 1 January 2018 to 8 August 2020 obtained from the China Meteorological Data Service Center and China National Environmental Monitoring Center, as the sample for our experimental analysis. Based on these two periods observations, we employed the three DCP models (one with weather factors, one with air quality factors and the last with mixed weather and air quality factors) given in Equations (5) and (8)–(13) to fit the smog data and used the MLE method described in Section 3 to estimate the parameters. In both cases, the DCP models showed their superiority in reflecting the time dependence of the pollutant concentration, providing a potential warning signal for smog prevention and control. First, we made a fat-tailed diagnosis of the observations using an exponential QQ plot. Figure 4 shows that the real data for PM 2.5 in Beijing from 3 January 2015 to 8 August 2020 and from 1 January 2018 to 8 August 2020 are fat tailed.
From the results in Table 1, we can see that the tail index ζ t is more affected by Y t 1 , which is consistent with Figure 5. The estimated tail index of the DCP can reflect the severity of smog to some extent and may even play an early warning role for smog disruption. The graph of the estimated tail index ζ t and positive exceedances Y t from the three DCP models is given in Figure 5, which shows that there is a strong negative correlation between ζ t and Y t , and the tail index volatility is more intuitive. It is interesting to note that Figure 5a,c,e and Figure 5b,d,f have very similar variation tendencies, and it can clearly be seen that the tail index ζ t starts to decline in the middle of each year, and at the end of the year the tail index becomes lower, which can be regarded as an effective indicator for measuring the level of smog.
Using the estimated parameters given in Table 1, we generated a sequence of fitted Y ^ t values based on Equation (5) and plotted the line graphs of the fitted Y ^ t values and real exceedances Y t during the period from 1 January 2018 to 8 August 2020, as shown in Figure 6. It can be seen that the true values Y t and the estimated values Y ^ t from the three models were almost consistent in trend, and the three models were more sensitive to the estimation of Y t , but the model with mixed weather factors and air quality factors showed values closest to the true values, which also verifies the superiority of the mixed model over the other two models and is consistent with the conclusion mentioned in Section 5. A comparison between the fitted Y ^ t and real Y t exceedances during the period from 3 January 2015 to 8 August 2020 was also performed, and similar results were obtained. Due to limited space, only Figure 6 is shown.
Next, we compared the estimated variances from the DCP models and GARCH, as shown in Figure 7. Similarly to [27], we calculated the conditional variance
Var ( Y t | F t 1 ) = α t 2 ( 1 + u ) ζ t ζ t 1 2 ζ t 2 1 ( 1 + u ) ζ t ( ζ t 1 ) .
Figure 7 shows that the standard deviations given by the DCP models and GARCH had similar trends, indicating that the DCP models could accurately reflect the volatility in a sense. Compared to the estimated volatility of GARCH, the DCP models are more sensitive in smog instances, thus potentially playing a better role in early warning. This is clearest in Figure 7e, where the fluctuation is largest.
We computed AIC and BIC from the DCP and dynamic conditional Weibull (DCW) model given in [35]. The results are presented in Table 8. As shown in Table 8, the DCP model is more suitable than the DCW model, based on AIC and BIC criteria.
Finally, we used our proposed three models (5) and (8)–(13) to predict the daily PM2.5 values from 9 August 2020 to 31 December 2021. We present only the results for the mixed models in (5), (12) and (13) here, with a training sample from 1 January 2018 to 8 August 2020, since similar results were obtained from the three models. The tail index ζ t given in (13) and PM2.5 given in (5) were predicted by using the real weather and air quality factors and the parameter estimation results given in Table 1. In order to analyze the fluctuating tendency and correlations of ζ t and PM2.5, the prediction results are presented together in Figure 8. From Figure 8, we can see that there is a strong negative correlation between ζ t and PM2.5, which enables the tail index to be used as a warning signal for air pollution. Furthermore, compared with the real smog values, the predictability of the future variation of PM2.5 performs relatively well, as the real and predicted values are relative close and have a similar tendency.
In addition, we used weather and air quality factors (including S O 2 , N O 2 , C O , O 3 ) with the LSTM technique to predict daily PM2.5 values. The air pollution data from 1 January 2015 to 25 November 2019 were used as training data, those from 26 November 2019 to 8 August 2020 were used as verification data and those from 9 August 2020 to 31 December 2021 were used as test data. The LSTM network was trained by using the weather, air quality factors and PM 2.5 time series in Beijing to construct a training set. Then, various weather and air quality factors were input into the test set to predict the PM 2.5 from 9 August 2020 to 31 December 2021 over a long period. As shown in Figure 9, the trend of the prediction for PM 2.5 was accurate, especially when the true values of PM 2.5 were less than 100. To better evaluate the experimental results quantitatively, the RMSE and coefficient of determination ( R 2 ) were calculated, and the results were 20.14 and 0.65, respectively.

7. Conclusions

In this paper, we investigated the prediction of pollutant concentrations using statistical inference methods and deep learning techniques. On the one hand, we proposed three models combined with the autoregressive structure under the POT framework. After obtaining two sufficiently high thresholds selected using the methods of by Bader et al. (2018) [38] and Davison and Smith (1990) [50], the DCP models provided a direct dynamic modeling of exceedances in the PM 2.5 time series, such that the scale parameter and tail index of the conditional generalized Pareto distribution changed over time. Weather and air quality factors were added to the DCP models for better performance and higher efficiency. The maximum likelihood estimation method was introduced to estimate the parameters in the DCP models, and its asymptotic properties were investigated. Simulation studies were carried out to demonstrate the validity and sufficiency of the estimation, revealing that the parameter estimation of the DCP models was not sufficiently accurate but the tail index dynamics could be well approximated in the DCP models. Real data applications were used to present the superiority of the DCP models, showing that they could shed new light on the prevention and control of smog. On the other hand, based on the factors used in the mixed DCP model, we used LSTM to study the prediction of pollutant concentrations, and achieved satisfactory results. This paper aimed to improve the prediction ability for the concentration of pollutants, and valuable results were achieved. Given the requirements of the air pollution control target for further promoting ecological and environmental protection in the next five years, the proposed approaches and results in our paper are useful. To some extent, they could provide a theoretical basis and effective tools for improving the national air quality forecasting system, thus benefiting public health.
Nevertheless, there are still some points to be considered. In the DCP model with the autoregressive structure, it is meaningful to add weather and air quality factors, enriching the model and making it consistent, stable and sensitive. However, the relationship between the factors has not been scrutinized carefully, resulting in a lack of attention to its impacts on the model. In addition, it is possible to obtain better results when we compare other estimation methods. Therefore, prediction, as an important direction in our study of air pollutant concentrations, still has a long way to go. Combining artificial intelligence and machine learning, the prediction accuracy will certainly be improved by using a forecast combination, synthesizing the methods used to obtain the estimated results. Finally, with the help of combination forecasting, the advantages of individual forecasts are retained, and effective information is fully utilized to comprehensively forecast air pollution. In this work, we strive to make valuable advances in the intersection of statistics and machine learning and to provide effective theoretical and technical support for national continuous improvement of the modernization level of ecological environmental governance.

Author Contributions

Data curation, X.Z. and C.H.; funding acquisition, X.Z.; methodology, X.Z. and W.C.; project administration, X.Z.; software, X.Z., C.H. and Q.J.; supervision, W.C.; writing—original draft, X.Z., C.H. and Q.J.; writing—review and editing, Q.D. and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant number 11801019).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The weather data and the air quality data were obtained from the China Meteorological Data Service Center and the China National Environmental Monitoring Center.

Acknowledgments

The authors would like to thank the referees and editors for their very helpful and constructive comments, which have significantly improved the quality of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The procedure for the proof follows Shen (2020) [27] and Zhao et al. (2018) [31]. The main difference from Zhao et al. (2018) [31] is the distribution of observations { Y t } , and the main difference from Shen (2020) [27] is that we consider the autoregressive structure on both the scale and tail index series.
Proof of Theorem 1.
The proof of the theorem is similar to Shen et al. (2020) [27] and Zhao et al. (2018) [31], so this proof is omitted. □
Before proving Theorems 2 and 3, we first provide some lemmas.
Lemma A1 (Identifiability).
If Y t ( θ ) = Y t ( θ 0 ) almost sure (a.s.) for all t, we have θ = θ 0 , where { Y t } is given in (5).
Proof. 
Denote α t = α t ( θ ) ,   ζ t = ζ t ( θ ) and α t 0 = α t ( θ 0 ) ,   ζ t 0 = ζ t ( θ 0 ) ,   P t 0 = ( 1 + u ) ζ t 0 . From Y t ( θ ) = Y t ( θ 0 ) a.s., we arrive at
α t I ( P t > Z t ) [ ( P t / Z t ) 1 / ζ t 1 ] = α t 0 I ( P t 0 > Z t ) [ ( P t 0 / Z t ) 1 / ζ t 0 1 ] a . s . ,
so
α t [ ( P t / Z t ) 1 / ζ t 1 ] = α t 0 [ ( P t 0 / Z t ) 1 / ζ t 0 1 ] a . s . , I ( P t > Z t ) = I ( P t 0 > Z t ) a . s . .
After straightforward manipulations,
α t ( P t / Z t ) 1 / ζ t α t 0 ( P t 0 / Z t ) 1 / ζ t 0 = α t α t 0 a . s . ,
Denote t = σ ( Y t , Y t 1 , ) , then Z t t 1 and α t , α t 0 , ζ t , ζ t 0 , P t , P t 0 t 1 . Therefore, the above equation holds if and only if α t = α t 0 and ζ t = ζ t 0 a . s . From the autoregressive equations of log α t and log ζ t , log α t = log α t 0 and log ζ t = log ζ t 0 a . s . can be rewritten as
β 0 0 + β 1 0 log α t 1 β 2 0 exp { β 3 0 Y t 1 } = β 0 + β 1 log α t 1 β 2 exp { β 3 Y t 1 } , γ 0 0 + γ 1 0 log ζ t 1 + γ 2 0 exp { γ 3 0 Y t 1 } = γ 0 + γ 1 log ζ t 1 + γ 2 exp { γ 3 Y t 1 } .
After rearrangement, the above two equations can be expressed as
β 0 0 β 0 + ( β 1 0 β 1 ) log α t 1 = β 2 0 exp { β 3 0 Y t 1 } β 2 exp { β 3 Y t 1 } , γ 0 0 γ 0 + ( γ 1 0 γ 1 ) log ζ t 1 = γ 2 exp { γ 3 Y t 1 } γ 2 0 exp { γ 3 0 Y t 1 } .
Since α t 1 t 2 , ζ t 1 t 2 and Y t 1 t 2 , then β i = β i 0 and γ i = γ i 0 must hold for i = 0 , 1 , 2 , 3 . □
In the following, we denote ( α t ( θ ) , ζ t ( θ ) ) (or ( α t , ζ t ) for simplicity) as the time-dependent scale parameter and tail index based on θ and the true initial ( α 1 0 , ζ 1 0 ), and denote ( α t ( θ 0 ) , ζ t ( θ 0 ) ) (or ( α t 0 , ζ t 0 ) for simplicity) as the unobserved true hidden process based on the true θ 0 and the true initial ( α 1 0 , ζ 1 0 ), and denote the t-th iterate series ( α ˜ t ( θ ) , ζ t ˜ ( θ ) ) (or ( α ˜ t , ζ ˜ t ) for simplicity) as the scale parameter and tail index series based on θ and an arbitrary initial ( α ˜ 1 , ζ ˜ 1 ), and denote ( α L , α U ) and ( ζ L , ζ U ) as the uniform bound of α t (or α ˜ t ) and ζ t (or ζ ˜ t ) for all θ Θ due to the compactness of Θ and boundedness of β 2 exp ( β 3 Y t 1 ) and γ 2 exp ( γ 3 Y t 1 ) .
Given ( α t , ζ t ) , the conditional log-likelihood function of Y t is expressed as
l t ( θ ) = I ( Y t = 0 ) log [ 1 ( 1 + u ) ζ t ] + I ( Y t > 0 ) log ζ t log α t ζ t log ( 1 + u ) ( ζ t + 1 ) log 1 + Y t α t .
Due to the conditional independence, the log-likelihood function is then given by
L n ( θ ) = 1 n t = 1 n l t ( θ ) .
We denote l t ( θ ) and L n ( θ ) based on θ and an arbitrary initial value ( α ˜ 1 , ζ ˜ 1 ) as L ˜ n ( θ ) and l ˜ t ( θ ) .
Lemma A2.
Under the same conditions as in Theorem 2, E θ 0 ( θ l t ( θ 0 ) ) = 0 and M 0 = Var θ 0 ( θ l t ( θ 0 ) ) = E θ 0 ( 2 θ θ T l t ( θ 0 ) ) and M 0 is finite and positive definite.
Proof. 
The proof of the lemma is similar to Zhao et al. (2018) [31], so the proof is omitted. □
Lemma A3.
Under the same conditions as in Theorem 2, if | | Φ Φ 0 | | < τ n and τ n 0 , we have
( a ) sup 1 t n | α t α t 0 | = O ( τ n ) , ( b ) sup 1 t n | α t Φ α t 0 Φ | = O ( τ n ) , ( c ) sup 1 t n | 2 α t Φ i Φ j 2 α t 0 Φ i Φ j | = O ( τ n ) ,
uniformly over | | Φ Φ 0 | | < τ n , where Φ = ( β 0 , β 1 , β 2 , β 3 ) and Φ 0 = ( β 0 0 , β 1 0 , β 2 0 , β 3 0 ) .
Proof. 
The proof of the lemma is similar to Zhao et al. (2018) [31], so the proof is omitted. □
Lemma A4.
Under the same conditions as in Theorem 2, if | | Ψ Ψ 0 | | < τ n and τ n 0 , we have
( a ) sup 1 t n | ζ t ζ t 0 | = O ( τ n ) , ( b ) sup 1 t n | ζ t Ψ ζ t 0 Ψ | = O ( τ n ) , ( c ) sup 1 t n | 2 ζ t Ψ i Ψ j 2 ζ t 0 Ψ i Ψ j | = O ( τ n ) ,
uniformly over | | Ψ Ψ 0 | | < τ n , where Ψ = ( γ 0 , γ 1 , γ 2 , γ 3 ) and Ψ 0 = ( γ 0 0 , γ 1 0 , γ 2 0 , γ 3 0 ) .
Proof. 
The proof of the lemma is similar to Zhao et al. (2018) [31], so the proof is omitted. □
Lemma A5.
Under the same conditions as in Theorem 2, 2 θ i θ j L n ( θ n ) p m θ i θ j ( θ 0 ) , uniformly over | | θ n θ 0 | | < τ n , where τ n n r , r > 0 , m θ i θ j ( θ 0 ) = E θ 0 ( 2 θ i θ j l 1 ( θ 0 ) ) .
Proof. 
We only prove the case for 2 β 0 2 L n ( θ n ) p m β 0 β 0 ( θ 0 ) , as the proof for other cases is similar. From the law of large numbers, we know that 2 β 0 2 L n ( θ 0 ) p m β 0 β 0 ( θ 0 ) . Then, we need to prove that 2 β 0 2 L n ( θ n ) 2 β 0 2 L n ( θ 0 ) p 0 uniformly over | | θ n θ 0 | | < τ n , where τ n n r , r > 0 .
By the repeatedly using autoregressive formula, log α t can be expressed as
log α t = β 0 k = 1 t 1 β 1 k 1 β 2 k = 1 t 1 β 1 k 1 exp ( β 3 Y t k ) + β 1 t 1 log α 1 0 .
We have
α t β 0 = α t k = 1 t 1 β 1 k 1 , 2 α t β 0 2 = α t k = 1 t 1 β 1 k 1 2 ,
L n ( θ ) β 0 = 1 n t = 1 n I ( Y t > 0 ) k = 1 t 1 β 1 k 1 ( ζ t + 1 ) Y t α t + Y t 1 , 2 L n ( θ ) β 0 2 = 1 n t = 1 n I ( Y t > 0 ) k = 1 t 1 β 1 k 1 2 ( ζ t + 1 ) α t Y t ( α t + Y t ) 2 .
Then
| 2 β 0 2 L n ( θ n ) 2 β 0 2 L n ( θ 0 ) | = | 1 n t = 1 n I ( Y t > 0 ) Y t k = 1 t 1 β 1 k 1 2 ( ζ t + 1 ) α t ( α t + Y t ) 2 k = 1 t 1 ( β 1 0 ) k 1 2 ( ζ t 0 + 1 ) α t 0 ( α t 0 + Y t ) 2 | 1 n t = 1 n I ( Y t > 0 ) Y t | k = 1 t 1 β 1 k 1 2 ( ζ t + 1 ) α t ( α t + Y t ) 2 k = 1 t 1 β 1 k 1 2 ( ζ t 0 + 1 ) α t 0 ( α t 0 + Y t ) 2 | + 1 n t = 1 n I ( Y t > 0 ) Y t | k = 1 t 1 β 1 k 1 2 ( ζ t 0 + 1 ) α t 0 ( α t 0 + Y t ) 2 k = 1 t 1 ( β 1 0 ) k 1 2 ( ζ t 0 + 1 ) α t 0 ( α t 0 + Y t ) 2 | = : I + I I ,
where
I = 1 n t = 1 n I ( Y t > 0 ) Y t k = 1 t 1 β 1 k 1 2 | ( ζ t + 1 ) α t ( α t + Y t ) 2 ( ζ t 0 + 1 ) α t 0 ( α t 0 + Y t ) 2 | = 1 n t = 1 n I ( Y t > 0 ) Y t k = 1 t 1 β 1 k 1 2 | α t ( ζ t + 1 ) ( α t 0 + Y t ) 2 α t 0 ( ζ t 0 + 1 ) ( α t + Y t ) 2 | ( α t + Y t ) 2 ( α t 0 + Y t ) 2 1 n t = 1 n I ( Y t > 0 ) Y t [ | α t 0 ( α t α t 0 + 2 α t Y t + Y t 2 ) | | ζ t ζ t 0 | + | α t α t 0 ( ζ t 0 + 1 ) Y t 2 ( ζ t + 1 ) | | α t α t 0 | ] ( 1 β 1 ) 2 ( α t + Y t ) 2 ( α t 0 + Y t ) 2 1 n ( 1 β 1 ) 2 t = 1 n I ( Y t > 0 ) [ | α t 0 ( α t α t 0 + 2 α t Y t + Y t 2 ) | | ζ t ζ t 0 | + | α t α t 0 ( ζ t 0 + 1 ) Y t 2 ( ζ t + 1 ) | | α t α t 0 | ] α t ( α t 0 ) 2 1 n ( 1 β 1 ) 2 t = 1 n I ( Y t > 0 ) Y t 2 α t α t 0 + 2 Y t α t 0 + 1 | ζ t ζ t 0 | + | ζ t 0 + 1 α t 0 Y t 2 ( ζ t + 1 ) α t ( α t 0 ) 2 | | α t α t 0 | ,
It is known that { α t , ζ t } is bounded, so E ( Y t ) = α t / [ ( ξ t 1 ) ( 1 + γ ) ξ t ] < , E ( Y t 2 ) = 2 α t 2 / [ ( 2 ξ t ) ( 1 ξ t ) ( 1 + γ ) ξ t ] < and Y t 2 α t α t 0 + 2 Y t α t 0 + 1 , ζ t 0 + 1 α t 0 Y t 2 ( ζ t + 1 ) α t ( α t 0 ) 2 are bounded too. Therefore, by Lemmas A3(a) and A4(a),
I 1 n t = 1 n ( | ζ t ζ t 0 | + | α t α t 0 | ) O p ( τ n ) 0 .
I I = 1 n t = 1 n I ( Y t > 0 ) | ( ξ t 0 + 1 ) α t 0 | Y t ( α t 0 + Y t ) 2 | k = 1 t 1 β 1 k 1 2 k = 1 t 1 ( β 1 0 ) k 1 2 | 2 τ n ( 1 C b ) 3 1 n t = 1 n I ( Y t > 0 ) | ( ξ t 0 + 1 ) α t 0 | Y t ( α t 0 + Y t ) 2 2 τ n ( 1 C b ) 3 1 n t = 1 n I ( Y t > 0 ) | ξ t 0 + 1 α t 0 | Y t 2 M τ n ( 1 C b ) 3 1 n t = 1 n I ( Y t > 0 ) Y t = O p ( τ n ) 0 .
where the first inequality comes from the fact that k = 1 t β 1 k 1 < 1 / ( 1 β 1 ) 1 / ( 1 C b ) and
| k = 1 t 1 β 1 k 1 2 k = 1 t 1 ( β 1 0 ) k 1 2 | | 1 1 β 1 2 1 1 β 1 0 2 | 2 τ n ( 1 C b ) 3 = O p ( τ n ) ,
where C b is a constant and 0 < C b < 1 . The last inequality of I I shows the boundedness of { α t 0 , ζ t 0 } , so there exists | ( ξ t 0 + 1 ) / α t 0 | M , and E ( Y t ) = α t / [ ( ξ t 1 ) ( 1 + γ ) ξ t ] < . □
Lemma A6.
Under the same conditions as in Theorem 2, two positive constants C and C b < 1 exist such that for all θ Θ and t 1 ,
  • (a) | α t α ˜ t | C C b t 1 , (b) | α t Φ α ˜ t Φ | C t C b t 1 , (c) | 2 α t Φ i Φ j 2 α ˜ t Φ i Φ j | C t 2 C b t 1 ,
  • (d) | ζ t ζ ˜ t | C C b t 1 , (e) | ζ t Ψ ζ ˜ t Ψ | C t C b t 1 (f) | 2 ζ t Ψ i Ψ j 2 ζ ˜ t Ψ i Ψ j | C t 2 C b t 1 .
Proof. 
The proof is omitted because it follows from direct calculation. □
Lemma A7.
Under the same conditions as in Theorem A2,
  • (a) 2 θ i θ j L ˜ n ( θ ) p m θ i θ j ( θ 0 ) , uniformly over | | θ θ 0 | | < τ n , where τ n n r , r > 0 ,
  • (b) ( τ n ) 1 θ L ˜ n ( θ 0 ) θ L n ( θ 0 ) p 0 if τ n n .
Proof. 
(a) By the result of Lemma A5, we have 2 θ i θ j L n ( θ ) 2 θ i θ j L n ( θ 0 ) p 0 uniformly over | | θ θ 0 | | < τ n , so we need to prove that 2 θ i θ j L ˜ n ( θ ) 2 θ i θ j L n ( θ ) p 0 uniformly under the claimed region. Based on Lemma A6, we can use the same method as in the proof of Lemma A5, then it is omitted.
(b) Here we only prove for β 0 L ˜ n ( θ 0 ) , as other proofs are similar.
| 1 τ n β 0 L ˜ n ( θ 0 ) β 0 L n ( θ 0 ) | = | 1 n τ n t = 1 n I ( Y t > 0 ) k = 1 t 1 β 1 k 1 ξ ˜ t Y t α ˜ t α ˜ t + Y t ξ t Y t α t α t + Y t | 1 n τ n t = 1 n k = 1 t 1 β 1 k 1 I ( Y t > 0 ) | ξ ˜ t Y t α ˜ t α ˜ t + Y t ξ t Y t α t α t + Y t | 1 n τ n ( 1 β 1 ) t = 1 n I ( Y t > 0 ) | ξ ˜ t Y t α ˜ t α ˜ t + Y t ξ t Y t α t α t + Y t | = 1 n τ n ( 1 β 1 ) t = 1 n I ( Y t > 0 ) | ξ ˜ t Y t α ˜ t α ˜ t + Y t ξ ˜ t Y t α t ˜ α t + Y t + ξ ˜ t Y t α t ˜ α t + Y t ξ t Y t α t α t + Y t | = 1 n τ n ( 1 β 1 ) t = 1 n I ( Y t > 0 ) | ( ξ ˜ t Y t α ˜ t ) ( α ˜ t α t ) ( α ˜ t + Y t ) ( α t + Y t ) + Y t ( ξ ˜ t ξ t ) α t + Y t α ˜ t α t α t + Y t | C n τ n ( 1 β 1 ) t = 1 n I ( Y t > 0 ) C b t 1 | ξ ˜ t Y t α ˜ t | α L 2 + 1 α L + 1 C n τ n ( 1 β 1 ) t = 1 n I ( Y t > 0 ) C b t 1 ξ ˜ t Y t α L 2 + α U α L 2 + 1 α L + 1 ,
where the second-to-last inequality comes from Lemma A6 (a) and (d). Next, we need to prove the boundedness of t = 1 n C b t 1 ξ ˜ t Y t .
E t = 1 n C b t 1 ξ t Y t E t = 1 C b t 1 ξ t Y t = t = 1 C b t 1 E ( ξ t Y t ) = t = 1 C b t 1 E [ E ( ξ t Y t ) | ξ t ] = t = 1 C b t 1 E α t ξ t ( ξ t 1 ) ( 1 + u ) ξ t t = 1 C b t 1 E ξ U α U ξ L 1 < .
Therefore, when n τ n ,
1 τ n β 0 L ˜ n ( θ 0 ) β 0 L n ( θ 0 ) p 0 .
Lemma A8.
Under the same conditions as in Theorem 2,
1 n t = 1 n l t ( θ 0 ) θ N ( 0 , M 0 ) ,
where M 0 is the Fisher information matrix at θ 0 .
Proof. 
The proof of the lemma is similar Zhao et al. (2018) [31], so it is omitted. □
Proof of Theorem 2.
Theorem 2 can be proved by Lemmas A6–A8, and the details can be seen in Shen et al. (2020) [27]. The proof is similar to Shen et al. (2020) [27], so it is omitted. The main difference is that we denote f n ( t , y ) = τ n 2 L ˜ n ( β 0 + τ n t , Φ 0 + τ n y ) , Φ 0 = ( β 1 0 , β 2 0 , β 3 0 , γ 0 0 , γ 0 1 , γ 2 0 , γ 3 0 ) , where t R , y R 7 . □
Proof of Theorem 3.
Theorem 3 can be proved by Lemmas A7 and A8, and the detail can be seen in Zhao et al. (2018) [31]. The proof is similar to Zhao et al. (2018) [31], so it is omitted. □

References

  1. 2021 World Air Quality Report. 2022. Available online: https://www.iqair.com/world-air-quality-report (accessed on 9 April 2022).
  2. Sun, H.; Yang, X.; Leng, Z. Research on the spatial effects of haze pollution on public health: Spatial–temporal evidence from the Yangtze River Delta urban agglomerations, China. Environ. Sci. Pollut. Res. 2022, 1–20. [Google Scholar] [CrossRef] [PubMed]
  3. Shen, W.T.; Yu, X.; Zhong, S.B.; Ge, H.R. Population health effects of air pollution: Fresh evidence from China health and retirement longitudinal survey. Front. Public Health 2021, 9, 779552. [Google Scholar] [CrossRef] [PubMed]
  4. Maji, K.J.; Dikshit, A.K.; Arora, M.; Deshpande, A. Estimating premature mortality attributable to PM2. 5 exposure and benefit of air pollution control policies in China for 2020. Sci. Total Environ. 2018, 612, 683–693. [Google Scholar] [CrossRef] [PubMed]
  5. Bell, J.E.; Brown, C.L.; Conlon, K.; Herring, S.; Kunkel, K.E.; Lawrimore, J.; Luber, G.; Schreck, C.; Smith, A.; Uejio, C. Changes in extreme events and the potential impacts on human health. J. Air Waste Manag. Assoc. 2018, 68, 265–287. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Raaschou-Nielsen, O.; Andersen, Z.J.; Beelen, R.; Samoli, E.; Stafoggia, M.; Weinmayr, G.; Hoffmann, B.; Fischer, P.; Nieuwenhuijsen, M.J.; Brunekreef, B.; et al. Air pollution and lung cancer incidence in 17 European cohorts: Prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). Lancet Oncol. 2013, 14, 813–822. [Google Scholar] [CrossRef]
  7. Gan, W.Q.; Davies, H.W.; Koehoorn, M.; Brauer, M. Association of long-term exposure to community noise and traffic-related air pollution with coronary heart disease mortality. Am. J. Epidemiol. 2012, 175, 898–906. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Brunekreef, B.; Beelen, R.; Hoek, G.; Schouten, L.; Bausch-Goldbohm, S.; Fischer, P.; Armstrong, B.; Hughes, E.; Jerrett, M.; van den Brandt, P. Effects of long-term exposure to traffic-related air pollution on respiratory and cardiovascular mortality in the Netherlands: The NLCS-AIR study. Res. Rep. 2009, 139, 5–71. [Google Scholar]
  9. Kim, S.Y.; Sheppard, L.; Kim, H. Health effects of long-term air pollution: Influence of exposure prediction methods. Epidemiology 2009, 20, 442–450. [Google Scholar] [CrossRef]
  10. Cheng, Z.; Li, L.; Liu, J. Identifying the spatial effects and driving factors of urban PM2. 5 pollution in China. Ecol. Indic. 2017, 82, 61–75. [Google Scholar] [CrossRef]
  11. Gan, T.; Yang, H.; Liang, W.; Liao, X. Do economic development and population agglomeration inevitably aggravate haze pollution in China? New evidence from spatial econometric analysis. Environ. Sci. Pollut. Res. 2021, 28, 5063–5079. [Google Scholar] [CrossRef]
  12. Ma, R.; Wang, C.; Jin, Y.; Zhou, X. Estimating the effects of economic agglomeration on haze pollution in Yangtze River Delta China using an econometric analysis. Sustainability 2019, 11, 1893. [Google Scholar] [CrossRef] [Green Version]
  13. Xie, Q.; Xu, X.; Liu, X. Is there an EKC between economic growth and smog pollution in China? New evidence from semiparametric spatial autoregressive models. J. Clean. Prod. 2019, 220, 873–883. [Google Scholar] [CrossRef]
  14. Zhang, X.; Xu, X.; Ding, Y.; Liu, Y.; Zhang, H.; Wang, Y.; Zhong, J. The impact of meteorological changes from 2013 to 2017 on PM2. 5 mass reduction in key regions in China. Sci. China Earth Sci. 2019, 62, 1885–1902. [Google Scholar] [CrossRef]
  15. Fontes, T.; Li, P.; Barros, N.; Zhao, P. Trends of PM2. 5 concentrations in China: A long term approach. J. Environ. Manag. 2017, 196, 719–732. [Google Scholar] [CrossRef]
  16. Pui, D.Y.H.; Chen, S.C.; Zuo, Z. PM2.5 in China: Measurements, sources, visibility and health effects, and mitigation. Particuology 2014, 13, 1–26. [Google Scholar] [CrossRef]
  17. Liang, X.; Zou, T.; Guo, B.; Li, S.; Zhang, H.; Zhang, S.; Huang, H.; Chen, S.X. Assessing Beijing’s PM2.5 pollution: Severity, weather impact, APEC and winter heating. Proc. R. Soc. A 2015, 471, 20150257. [Google Scholar] [CrossRef] [Green Version]
  18. Zhang, S.; Guo, B.; Dong, A.; He, J.; Xu, Z.; Chen, S.X. Cautionary tales on air-quality improvement in Beijing. Proc. R. Soc. A 2017, 473, 20170457. [Google Scholar] [CrossRef] [Green Version]
  19. Chen, L.; Guo, B.; Huang, J.; He, J.; Wang, H.; Zhang, S.; Chen, S.X. Assessing air-quality in Beijing-Tianjin-Hebei Region: The method and mixed tales of PM2.5 and O3. Atmos. Environ. 2018, 193, 290–301. [Google Scholar] [CrossRef]
  20. Wu, H.; Zheng, X.; Zhu, J.; Lin, W.; Zheng, H.; Chen, X.; Wang, W.; Wang, Z.; Chen, S.X. Improving PM2.5 forecasts in China suing an initial error transport mode. Environ. Sci. Technol. 2020, 54, 10493–10501. [Google Scholar] [CrossRef]
  21. Wan, Y.; Xu, M.; Huang, H.; Xi Chen, S. A spatio-temporal model for the analysis and prediction of fine particulate matter concentration in Beijing. Enviromentrics 2020, 32, e2648. [Google Scholar] [CrossRef]
  22. Zhu, Y.; Liang, Y.; Chen, S. Assessing local emission for air pollution via data experiments. Atmos. Environ. 2021, 252, 118323. [Google Scholar] [CrossRef]
  23. Wang, J.; Cohan, D.S.; Xu, H. Spatiotemporal ozone pollution LUR models: Suitable statistical algorithms and time scales for a megacity scale. Atmos. Environ. 2020, 237, 117671. [Google Scholar] [CrossRef]
  24. Pickands, J. Statistical inference using extreme order statistics. Ann. Stat. 1975, 3, 119–131. [Google Scholar]
  25. Tencaliec, P.; Favre, A.C.; Naveau, P.; Prieur, C.; Nicolet, G. Flexible semiparametric generalized Pareto modeling of the entire range of rainfall amount. Environmetrics 2020, 31, e2582. [Google Scholar] [CrossRef]
  26. Gharib, A.; Davies, E.G.R.; Goss, G.G.; Faramarzi, M. Assessment of the combined effects of threshold selection and parameter estimation of generalized Pareto distribution with applications to flood frequency analysis. Water 2017, 9, 692. [Google Scholar] [CrossRef]
  27. Shen, Z.Y.; Chen, Y.; Shi, R.X. Modeling tail index with autoregressive conditional Pareto model. J. Bus. Econ. Stat. 2022, 40, 458–466. [Google Scholar] [CrossRef]
  28. Chen, Y.; Yu, W. Setting the margins of Hang Seng Index Futures on different positions using an APARCH-GPD Model based on extreme value theory. Phys. A Stat. Mech. Its Appl. 2020, 544, 123207. [Google Scholar] [CrossRef]
  29. Park, E.; Brorsen, B.W.; Harri, A. Using Bayesian Kriging for spatial smoothing in crop insurance rating. Am. J. Agric. Econ. 2019, 101, 330–351. [Google Scholar] [CrossRef]
  30. Liu, X.H.; Zhang, X.; Xue, J.Y. Fraud risk measurement of basic medical insurance for urban and rural residents in China. Econ. Comput. Econ. Cybern. Stud. Res. 2019, 53, 277–296. [Google Scholar] [CrossRef]
  31. Zhao, Z.; Zhang, Z.; Chen, R. Modeling maxima with autoregressive conditional Fréchet model. J. Econom. 2018, 207, 325–351. [Google Scholar] [CrossRef]
  32. Chavez-Demoulin, V.; Embrechts, P.; Sardy, S. Extreme-quantile tracking for financial time seriesl. J. Econom. 2014, 181, 44–52. [Google Scholar] [CrossRef]
  33. Kelly, B.; Jiang, H. Tail risk and asset prices. Rev. Financ. Stud. 2014, 27, 2841–2871. [Google Scholar] [CrossRef] [Green Version]
  34. Massacci, D. Tail risk dynamics in stock returns: Links to the macroeconomy and global markets connectedness. Manag. Sci. 2017, 63, 3072–3089. [Google Scholar] [CrossRef]
  35. Deng, L.; Yu, M.X.; Zhang, Z.J. Statistical learning of the worst regional smog extremes with dynamic conditional modeling. Atmosphere 2020, 11, 665. [Google Scholar] [CrossRef]
  36. Choulakian, V.; Stephens, M.A. Goodness-of-fit tests for the generalized pareto distribution. Technometrics 2001, 43, 478–484. [Google Scholar] [CrossRef]
  37. Bermudez, P.Z.; Turkman, M.A.A.; Turkman, K.F. A predictive approach to tail probability estimation. Extremes 2001, 4, 295–314. [Google Scholar] [CrossRef]
  38. Bader, B.; Yan, J.; Zhang, X.B. Automated threshold selection for extreme value analysis via ordered goodness-of-fit tests with adjustment for false discovery rate. Ann. Appl. Stat. 2018, 12, 310–329. [Google Scholar] [CrossRef]
  39. Yang, X.; Zhang, J.; Ren, W.X. Threshold selection for extreme value estimation of vehicle load effect on bridges. Int. J. Distrib. Sens. Netw. 2018, 14, 1550147718757698. [Google Scholar] [CrossRef] [Green Version]
  40. Schneider, L.F.; Krajina, A.; Krivobokova, T. Threshold selection in univariate extreme value analysis. Extremes 2021, 24, 881–913. [Google Scholar] [CrossRef]
  41. Boznar, M.; Lesjak, M.; Mlakar, P. A neural network-based method for short-term predictions of ambient SO2 concentrations in highly polluted industrial areas of complex terrain. Atmos. Environ. Part B Urban Atmos. 1993, 27, 221–230. [Google Scholar] [CrossRef]
  42. Neagu, C.D.; Avouris, N.; Kalapanidas, E.; Palade, V. Neural and neuro-fuzzy integration in a knowledge-based system for air quality prediction. Appl. Intell. 2002, 17, 141–169. [Google Scholar] [CrossRef]
  43. Esfandani, M.A.; Nematzadeh, H. Predicting air pollution in tehran: Genetic algorithm and back propagation neural network. J. Data Min. 2016, 4, 49–54. [Google Scholar]
  44. Amarpuri, L.; Yadav, N.; Kumar, G.; Agrawal, S. Prediction of CO2 emissions using deep learning hybrid approach: A case study in indian context. In Proceedings of the 2019 Twelfth International Conference on Contemporary Computing (IC3), Noida, India, 8–10 August 2019. [Google Scholar]
  45. Menéndez García, L.A.; Sánchez Lasheras, F.; García Nieto, P.J.; Álvarez de Prado, L.; Bernardo Sánchez, A. Predicting Benzene concentration using machine learning and time series algorithms. Mathematics 2020, 8, 2205. [Google Scholar] [CrossRef]
  46. Sánchez-Pérez, J.F.; Mena-Requena, M.R.; Cánovas, M. Mathematical modeling and simulation of a gas emission source using the network simulation method. Mathematics 2020, 8, 1996. [Google Scholar] [CrossRef]
  47. Sayeed, A.; Lops, Y.; Choi, Y.; Jung, J.; Salman, A.K. Bias correcting and extending the PM forecast by CMAQ up to 7 days using deep convolutional neural networks. Atmos. Environ. 2021, 253, 118376. [Google Scholar] [CrossRef]
  48. Kang, S.; Song, J. Parameter and quantile estimation for the generalized Pareto distribution in peaks over threshold framework. J. Korean Stat. Soc. 2017, 46, 487–501. [Google Scholar] [CrossRef]
  49. Balkema, A.A.; De Haan, L. Residual life time at great age. Ann. Probab. 1974, 2, 792–804. [Google Scholar] [CrossRef]
  50. Davison, A.C.; Smith, R.L. Models for exceedances over high thresholds. J. R. Stat. Soc. Ser. B (Methodol.) 1990, 52, 393–425. [Google Scholar] [CrossRef]
Figure 1. Structure of LSTM.
Figure 1. Structure of LSTM.
Mathematics 10 01433 g001
Figure 2. The graph of PM 2.5 time series in Beijing.
Figure 2. The graph of PM 2.5 time series in Beijing.
Mathematics 10 01433 g002
Figure 3. Tail index ζ ^ t estimated by MLE and simulated tail index ζ t . (a) The DCP model with weather factors from 2015 to 2020. (b) The DCP model with weather factors from 2018 to 2020. (c) The DCP model with air quality factors from 2015 to 2020. (d) The DCP model with air quality factors from 2018 to 2020. (e) The DCP model with mixed factors from 2015 to 2020. (f) The DCP model with mixed factors from 2018 to 2020.
Figure 3. Tail index ζ ^ t estimated by MLE and simulated tail index ζ t . (a) The DCP model with weather factors from 2015 to 2020. (b) The DCP model with weather factors from 2018 to 2020. (c) The DCP model with air quality factors from 2015 to 2020. (d) The DCP model with air quality factors from 2018 to 2020. (e) The DCP model with mixed factors from 2015 to 2020. (f) The DCP model with mixed factors from 2018 to 2020.
Mathematics 10 01433 g003
Figure 4. Exponential QQ plot of real PM 2.5 data in Beijing: (a) from 3 January 2015 to 8 August 2020; (b) from 1 January 2018 to 8 August 2020.
Figure 4. Exponential QQ plot of real PM 2.5 data in Beijing: (a) from 3 January 2015 to 8 August 2020; (b) from 1 January 2018 to 8 August 2020.
Mathematics 10 01433 g004
Figure 5. Estimated tail index ζ ^ t from the three DCP models and positive exceedances Y t . (a) The DCP model with weather factors from 3 January 2015 to 8 August 2020 with u = 2.4660 . (b) The DCP model with weather factors from 1 January 2018 to 8 August 2020 with u = 0.5716 . (c) The DCP model with air quality factors from from 3 January 2015 to 8 August 2020 with u = 2.4660 . (d) The DCP model with air quality factors from 1 January 2018 to 8 August 2020 with u = 0.5716 . (e) The DCP model with mixed factors from from 3 January 2015 to 8 August 2020 with u = 2.4660 . (f) The DCP model with mixed factors from 1 January 2018 to 8 August 2020 with u = 0.5716 .
Figure 5. Estimated tail index ζ ^ t from the three DCP models and positive exceedances Y t . (a) The DCP model with weather factors from 3 January 2015 to 8 August 2020 with u = 2.4660 . (b) The DCP model with weather factors from 1 January 2018 to 8 August 2020 with u = 0.5716 . (c) The DCP model with air quality factors from from 3 January 2015 to 8 August 2020 with u = 2.4660 . (d) The DCP model with air quality factors from 1 January 2018 to 8 August 2020 with u = 0.5716 . (e) The DCP model with mixed factors from from 3 January 2015 to 8 August 2020 with u = 2.4660 . (f) The DCP model with mixed factors from 1 January 2018 to 8 August 2020 with u = 0.5716 .
Mathematics 10 01433 g005aMathematics 10 01433 g005b
Figure 6. The line graphs of fitted Y ^ t values and real exceedances Y t .
Figure 6. The line graphs of fitted Y ^ t values and real exceedances Y t .
Mathematics 10 01433 g006
Figure 7. Estimated standard deviation for DCP vs. GARCH. (a) The DCP model with weather factors from 3 January 2015 to 8 August 2020. (b) The DCP model with weather factors from 1 January 2018 to 8 August 2020. (c) The DCP model with air quality factors from 3 January 2015 to 8 August 2020. (d) The DCP model with air quality factors from 1 January 2018 to 8 August 2020. (e) The DCP model with mixed factors from 3 January 2015 to 8 August 2020. (f) The DCP model with mixed factors from 1 January 2018 to 8 August 2020.
Figure 7. Estimated standard deviation for DCP vs. GARCH. (a) The DCP model with weather factors from 3 January 2015 to 8 August 2020. (b) The DCP model with weather factors from 1 January 2018 to 8 August 2020. (c) The DCP model with air quality factors from 3 January 2015 to 8 August 2020. (d) The DCP model with air quality factors from 1 January 2018 to 8 August 2020. (e) The DCP model with mixed factors from 3 January 2015 to 8 August 2020. (f) The DCP model with mixed factors from 1 January 2018 to 8 August 2020.
Mathematics 10 01433 g007
Figure 8. The line graphs of predicted Y ^ t values and real exceedances Y t .
Figure 8. The line graphs of predicted Y ^ t values and real exceedances Y t .
Mathematics 10 01433 g008
Figure 9. The long-term prediction of PM 2.5 values in Beijing.
Figure 9. The long-term prediction of PM 2.5 values in Beijing.
Mathematics 10 01433 g009
Table 1. Parameter estimation of the DCP models. Weather1 and weather2 represent the DCP models with weather factors from 3 January 2015 to 8 August 2020 and from 1 January 2018 to 8 August 2020. Air1 and air2, and mixed1 and mixed2 are similar to weather1 and weather2.
Table 1. Parameter estimation of the DCP models. Weather1 and weather2 represent the DCP models with weather factors from 3 January 2015 to 8 August 2020 and from 1 January 2018 to 8 August 2020. Air1 and air2, and mixed1 and mixed2 are similar to weather1 and weather2.
Weather1Weather2Air1Air2Mixed1Mixed2
β 0 0.29890.0950−0.01910.5932−0.01860.8154
β 1 0.00000.92100.00330.37440.00000.0000
β 2 0.00000.03030.00000.06020.00000.0696
β 3 6.402833.60972.17600.00012.82850.2035
β 4 8.23520.15240.00422.04130.39892.9834
β 5 0.91590.47560.35410.77556.12660.0001
β 6 4.40011.07340.00010.00010.10560.6691
β 7 0.61960.0001
γ 0 −1.3618−0.0116−0.34710.3087−1.9032−0.6329
γ 1 0.37390.19310.31620.01820.04750.0524
γ 2 2.19281.28301.30701.04923.24662.0631
γ 3 0.33670.88140.00010.54500.00010.1949
γ 4 0.06660.01920.18090.18480.06550.1533
γ 5 0.16910.31380.26000.22090.11400.0785
γ 6 0.00010.00010.23530.24720.11410.1859
γ 7 0.00010.0001
Table 2. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with weather factors from 3 January 2015 to 8 August 2020.
Table 2. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with weather factors from 3 January 2015 to 8 August 2020.
ParaTrue Valuen = 1000n = 2000
MeanSdRMSEAbiasMeanSdRMSEAbias
β 0 0.29890.79351.49911.57710.66340.50550.92020.94220.3709
β 1 0.00000.21330.28980.35960.21330.15250.24990.29250.1525
β 2 0.00000.69801.61771.76040.69800.29820.96591.01000.2982
β 3 6.40286.812616.840316.82858.13588.557016.994517.11378.6371
β 4 8.23522.85202.46645.92035.43154.14132.58764.84174.1485
β 5 0.91590.88240.69610.69620.57260.83900.61260.61680.4842
β 6 4.40011.66631.59773.16562.86932.31851.45522.53902.1497
γ 0 −1.3618−1.06250.87500.92390.6968−1.28800.93610.93800.6209
γ 1 0.37390.35190.14390.14550.11550.35140.11190.11400.0887
γ 2 2.19281.95360.94170.97070.73762.15371.00541.00510.6710
γ 3 0.33671.17425.12845.19121.00351.05935.04685.09330.8586
γ 4 0.06660.12060.18020.18800.07640.07950.06620.06740.0363
γ 5 0.16910.32910.27290.31610.17220.23800.12960.14670.0889
γ 6 0.00010.02590.06390.06890.02580.01170.02370.02630.0116
Table 3. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with air quality factors from 3 January 2015 to 8 August 2020.
Table 3. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with air quality factors from 3 January 2015 to 8 August 2020.
ParaTrue Valuen = 1000n = 2000
MeanSdRMSEAbiasMeanSdRMSEAbias
β 0 −0.01910.62581.84371.95140.71660.68092.06942.18260.7723
β 1 0.00330.23800.32230.39850.23680.19680.28390.34340.1959
β 2 0.00001.34432.27932.64421.34431.01342.12222.34991.0134
β 3 2.176013.680828.888531.068313.384914.114728.135130.537413.8677
β 4 0.00423.78576.29867.34123.78344.30868.01549.09104.3064
β 5 0.35411.38501.95862.21161.32611.25591.72331.94351.1972
β 6 0.00011.62312.56653.03441.62301.46592.30982.73371.4658
γ 0 −0.3471-0.31560.26490.26650.1966−0.33390.22460.22470.1594
γ 1 0.31620.28230.10970.11470.08750.29960.08820.08960.0689
γ 2 1.30701.50390.78120.80490.35581.35080.28090.28400.1981
γ 3 0.00010.19691.80291.81180.19680.05430.58050.58250.0542
γ 4 0.18090.26980.28710.30030.14610.21820.26260.26500.0926
γ 5 0.26000.34290.22740.24180.16660.29270.13130.13520.0965
γ 6 0.23530.34710.30840.32770.20230.28600.16910.17640.1214
Table 4. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with mixed weather and air quality factors from 3 January 2015 to 8 August 2020.
Table 4. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with mixed weather and air quality factors from 3 January 2015 to 8 August 2020.
ParaTrue Valuen = 1000n = 2000
MeanSdRMSEAbiasMeanSdRMSEAbias
β 0 −0.01860.49151.58531.66390.59780.38811.20491.27050.4428
β 1 0.00000.22190.30090.37360.22190.22280.31110.38240.2228
β 2 0.00000.58491.63371.73370.58490.38531.23111.28880.3853
β 3 2.82857.728219.591420.17587.77478.265719.307720.04017.9301
β 4 0.39891.57972.84943.08171.61431.79262.99133.29741.7668
β 5 6.12662.95854.87715.81165.12233.10624.37785.31504.7418
β 6 0.10561.04661.10511.45070.99860.92040.93811.24180.8764
β 7 0.61961.34002.07522.19471.32411.43972.27632.41741.4125
γ 0 −1.9032−1.73370.95570.96970.7988−1.80520.87610.88070.7080
γ 1 0.04750.07300.08990.09330.07050.06660.08030.08240.0628
γ 2 3.24663.01960.96980.99500.82493.09700.89190.90350.7355
γ 3 0.00010.02430.33760.33810.02420.00070.00330.00340.0006
γ 4 0.06550.07120.05170.05200.03190.06880.03260.03280.0236
γ 5 0.11400.16210.13880.14670.07540.13700.06460.06850.0486
γ 6 0.11410.16260.11920.12860.07170.13450.05870.06210.0428
γ 7 0.00010.01440.03110.03420.01430.00880.01360.01610.0087
Table 5. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with weather factors from 1 January 2018 to 8 August 2020.
Table 5. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with weather factors from 1 January 2018 to 8 August 2020.
ParaTrue Valuen = 1000n = 2000
MeanSdRMSEAbiasMeanSdRMSEAbias
β 0 0.09500.74991.59261.72050.68390.32800.84200.87280.2556
β 1 0.92100.64310.35980.45430.30290.80720.23810.26370.1355
β 2 0.03030.47271.53671.59760.47290.18540.79420.80840.1764
β 3 33.609724.535938.619739.633736.200329.108940.340540.550636.7034
β 4 0.15241.41971.95962.33211.35890.45090.62200.68930.3917
β 5 0.47561.09751.24451.39010.98000.55540.59250.59720.4135
β 6 1.07341.57821.66041.73391.21051.16250.88260.88620.6713
γ 0 −0.01160.12670.29860.32890.26120.08860.21560.23760.1884
γ 1 0.19310.18900.08230.08230.06690.19170.05740.05730.0451
γ 2 1.28301.14130.30720.33800.27021.17410.22230.24730.2020
γ 3 0.88141.02760.74110.75470.41880.90040.33280.33300.2592
γ 4 0.01920.02560.03660.03710.02230.02220.02040.02060.0163
γ 5 0.31380.40600.15000.17590.11870.36420.08100.09530.0718
γ 6 0.00010.01850.02970.03490.01840.01070.01650.01960.0106
Table 6. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with air quality factors from 1 January 2018 to 8 August 2020.
Table 6. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with air quality factors from 1 January 2018 to 8 August 2020.
ParaTrue Valuen = 1000n = 2000
MeanSdRMSEAbiasMeanSdRMSEAbias
β 0 0.59320.95931.12731.18420.48070.60850.08710.08840.0667
β 1 0.37440.24340.28570.31400.28320.35250.08050.08340.0613
β 2 0.06020.35071.12241.15830.34550.06380.03500.03510.0271
β 3 0.000110.732125.804727.923610.73201.45679.05509.16251.4566
β 4 2.04134.61547.33627.76784.33552.15360.37250.38870.2452
β 5 0.77551.49472.18042.29391.39440.82790.52460.52670.2043
β 6 0.00011.09522.24922.49961.09510.05610.11180.12490.0560
γ 0 0.30870.33790.19030.19230.14880.33850.14610.14900.1185
γ 1 0.01820.05460.06500.07440.05160.03920.04560.05020.0358
γ 2 1.04920.94730.21530.23800.19100.97570.16390.17950.1448
γ 3 0.54500.49200.28180.28640.20760.49960.21120.21580.1548
γ 4 0.18480.22080.10690.11280.08330.20310.06900.07140.0420
γ 5 0.22090.25390.09430.09980.07520.24450.05980.06420.0441
γ 6 0.24720.28100.09790.10350.07710.27140.05810.06290.0479
Table 7. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with mixed weather and air quality factors from 1 January 2018 to 8 August 2020.
Table 7. Mean, standard deviation, RMSE and Abias of 500 corresponding parameter values estimated by MLE in the DCP with mixed weather and air quality factors from 1 January 2018 to 8 August 2020.
ParaTrue Valuen = 1000n = 2000
MeanSdRMSEAbiasMeanSdRMSEAbias
β 0 0.81540.85770.95340.95340.39410.67150.92470.93490.2631
β 1 0.00000.22110.29850.37120.22110.05010.15020.15820.0501
β 2 0.06960.30030.92900.95630.30310.07700.07310.07340.0559
β 3 0.203512.226626.374528.961612.16606.138214.568715.71766.0234
β 4 2.98342.80163.44273.44402.88492.52241.11801.20830.7971
β 5 0.00011.78713.24193.69891.78700.91011.76221.98170.9100
β 6 0.66911.03051.07841.13640.83320.91230.98291.01160.5585
β 7 0.00011.52302.43262.86791.52290.66331.49381.63300.6632
γ 0 −0.6329−0.47510.36280.39530.3139-0.92501.19421.22820.6621
γ 1 0.05240.06200.06670.06730.05380.06300.05590.05690.0429
γ 2 2.06311.88300.39080.42990.33852.30101.14431.16770.6600
γ 3 0.19490.19040.08510.08520.06280.29050.96090.96470.1879
γ 4 0.15330.17990.06070.06620.04710.16890.09150.09270.0539
γ 5 0.07850.09820.04160.04600.03440.08770.07400.07450.0319
γ 6 0.18590.22430.06310.07380.05330.20610.11220.11390.0612
γ 7 0.00010.01230.02100.02430.01220.00790.01430.01630.0078
Table 8. Comparison of the DCP and DCW models based on AIC and BIC criteria.
Table 8. Comparison of the DCP and DCW models based on AIC and BIC criteria.
AICBIC
WeatherAirMixedWeatherAirMixed
3 January 2015–8 August 2020DCP588.9960535.8680531.0784666.7786613.6507619.9729
DCW3327.68503907.59604908.16103411.02403990.93505002.6110
1 January 2018–8 August 2020DCP1035.39501019.25301001.32401035.39501086.02801077.6380
DCW2717.43003219.02701700.41802788.97403290.57101700.4180
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, C.; Zhao, X.; Cheng, W.; Ji, Q.; Duan, Q.; Han, Y. Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors. Mathematics 2022, 10, 1433. https://doi.org/10.3390/math10091433

AMA Style

Huang C, Zhao X, Cheng W, Ji Q, Duan Q, Han Y. Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors. Mathematics. 2022; 10(9):1433. https://doi.org/10.3390/math10091433

Chicago/Turabian Style

Huang, Chunli, Xu Zhao, Weihu Cheng, Qingqing Ji, Qiao Duan, and Yufei Han. 2022. "Statistical Inference of Dynamic Conditional Generalized Pareto Distribution with Weather and Air Quality Factors" Mathematics 10, no. 9: 1433. https://doi.org/10.3390/math10091433

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop