Next Article in Journal
Perioperative Hypothermia—A Narrative Review
Next Article in Special Issue
Contrasting Trends of Surface PM2.5, O3, and NO2 and Their Relationships with Meteorological Parameters in Typical Coastal and Inland Cities in the Yangtze River Delta
Previous Article in Journal
Changes in Adolescents’ Psychosocial Functioning and Well-Being as a Consequence of Long-Term COVID-19 Restrictions
Previous Article in Special Issue
Influence of Wind Speed on CO2 and CH4 Concentrations at a Rural Site
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling the Characteristics of Unhealthy Air Pollution Events: A Copula Approach

by
Nurulkamal Masseran
Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia
Int. J. Environ. Res. Public Health 2021, 18(16), 8751; https://doi.org/10.3390/ijerph18168751
Submission received: 5 July 2021 / Revised: 9 August 2021 / Accepted: 17 August 2021 / Published: 19 August 2021
(This article belongs to the Special Issue Advances in Air Pollution Meteorology Research)

Abstract

:
This study proposes the concept of duration (D) and severity (S) measures, which were derived from unhealthy air pollution events. In parallel with that, the application of a copula model is proposed to evaluate unhealthy air pollution events with respect to their duration and severity characteristics. The bivariate criteria represented by duration and severity indicate their structural dependency, long-tail, and non-identically marginal distributions. A copula approach can provide a good statistical tool to deal with these issues and enable the extraction of valuable information from air pollution data. Based on the copula model, several statistical measurements are proposed for describing the characteristics of unhealthy air pollution events, including the Kendall’s τ correlation of the copula, the conditional probability of air pollution severity based on a given duration, the joint OR/AND return period, and the conditional D|S and conditional S|D return periods. A case study based on air pollution data indices was conducted in Klang, Malaysia. The results indicate that a copula approach is beneficial for deriving valuable information for planning and mitigating the risks of unhealthy air pollution events.

1. Introduction

Air pollution is an important issue that needs to be addressed worldwide, particularly in urban areas. This issue relates to the condition of the air environment corresponding to unhealthy air pollution events [1,2]. Unhealthy air pollution events can be characterized based on their duration and severity. The duration of an air pollution event is determined based on the period in which the air pollution index (API) values are harmful to human health for consecutive periods [3], while severity measures the magnitude of the air pollution event based on the cumulative effect of an unhealthy API during a particular air pollution event. These measures are related because the severity of an air pollution event always depends on its duration. Separate analyses of the duration and severity of an air pollution event cannot reveal their significant association. To model their relationship, joint modeling is required, in which the marginal distributions of both variables are combined [4,5]. Several different bivariate distributions have been proposed by previous researchers in order to model the joint characteristics of bivariate variables. For example, bivariate normal distribution has been applied in many research areas, particularly for the cases where each considered variable can be easily described by a Gaussian/normal distribution [6,7,8,9]. In a similar vein, bivariate lognormal distribution has been employed to describe a phenomenon that can be represented by a lognormal-distributed marginal [10,11]. The Gumbel logistic model [12,13] and the Gumbel mixed model [14,15] were proposed to describe a phenomenon that can be represented by random variables corresponding to a Gumbel-distributed marginal. A bivariate exponential model [16,17] and a Nagao–Kadoya bivariate exponential model [7] have been proposed to model a phenomenon where their marginal random variables can be represented by an exponential model [18]. Apart from that, there are various forms of bivariate gamma models, such as Royen’s Bivariate Gamma model [19], the Izawa Bigamma model [20], the Moran model [21], Schmeiser and Lal’s Bivariate Gamma model [22], and the Loaciga and Leipnik Bivariate Gamma model [23], have been proposed to model the bivariate random variable with a combination of marginal Gamma distribution [24,25,26]. However, if the marginal distribution among the variables is non-identically distributed, obtaining the analytical or closed form of the joint probability model is difficult [27], which makes it challenging to use the available bivariate models to describe the statistical properties of the data. Fortunately, this issue can be resolved using the copula approach [28,29].
Copula functions have the advantage of providing a platform for representing a multivariate distribution for any type or form of marginal variable distribution involved [30,31]. Several researchers have employed the copula model to analyze the air pollution data. Sak et al. [32], for instance, used a copula model to quantify the pollution risk of PM2.5 concentration in several cities in China. Chan and So [33] used a copula approach to model extreme spatial air pollution events in Guangdong province, China. Falk et al. [34] used a copula approach to evaluate the joint exceedance probabilities for air pollutants in Milan, Italy. Kim et al. [35] characterized the dependence between several cities in China based on a measure of directional dependence estimated from the PM2.5 pollutant variable. Masseran and Hussain [36] modelled and visualized the dependency fluctuation among the air pollution variables in Malaysia using a dynamic Copula approach. He at al. [37] employed a copula model to investigate the time-varying correlations between meteorological factors and atmospheric pollutants in the cities of Beijing and Guangzhou, China. However, most of the available literature focuses on the copula analysis of real values or the magnitude of air pollution events. Thus, this study attempts look at a different perspective by attempting to investigate another important aspect of air pollution events, which is characterized by their duration and severity size. In summary, all of the previous studies found that the copula approach could provide good modeling flexibility for the multivariate distribution of random variables. However, when the dimensionality of the dataset increases, a single copula model parameter is unreliable to describe a simultaneous relationship among the variables [28,29]. To overcome this problem, a vine-copula approach needs to be adopted [38,39]. Fortunately, for the case of bivariate data analysis, the problem of dimensional complexity does not occur [40,41].
A bivariate copula of air pollution severity and duration can effectively reveal the significant relationship between these characteristics. In addition, the copula model can be used to measure the conditional probability and return period of air pollution episodes, which are essential for predicting air pollution events and mitigating their impacts. This information can be beneficial to environmental management authorities in planning for and measuring the risks associated with unhealthy air pollution events. The rest of this paper is organized as follows: Section 2 describes the study area and data. Section 3 provides a description of copula modeling on air pollution characteristics. A variety of copula models are presented in Section 4. In Section 5, an application of flexible parameter estimation and model selection are proposed based on pseudo maximum likelihood approach. In Section 6, the performance of each fitted copula model on the characteristics of unhealthy air pollution data are evaluated. Apart from that, Section 6 also presents several valuable statistical measures for assessing air pollution risk, including (i) the Kendall’s τ correlation of the copula to measure the dependency between the duration and severity of an air pollution event, (ii) the conditional probability of a certain air pollution severity given the air pollution duration, (iii) the joint OR/AND return period, and (iv) the conditional D|S and S|D return periods. Finally, in Section 7, the conclusions about the overall findings are provided.

2. Study Area and Data

This study investigates the API data in the area of Klang, Peninsular Malaysia, which is located at latitude 101°26′44.023 E and longitude 3°2′41.701 N. Klang is one of the largest cities in Malaysia, with a land area of approximately 573 km2. This city has a dense population and is very actively involved in a range of important economic and industrial activities, particularly with the import and export trade. Klang has also been recognized as the 13th busiest trans-shipment port and the 16th busiest container port in the world [42]. Although many advantages are associated with its rapid development, economic activity, and dense population, Klang is prone to poor air quality, which makes it an ideal case for the investigation and continuous evaluation of the Malaysia’s air quality [43]. Figure 1 shows maps of Klang and Peninsular Malaysia [44].
In Malaysia, the responsibility for collecting, supervising, and reporting API data is held by the Department of Environment (DOE), Malaysia. As a case study, hourly API data in Klang for the period of 1 January 1997 to 31 August 2020 was used in the analysis. A total of five major pollutant variables are recorded hourly, including carbon monoxide (CO), ozone (O3), nitrogen dioxide (NO2), sulfur dioxide (SO2), and particulate matter less than 10 microns in size (PM10). The observed data for O3, CO, NO2, and SO2, have been measured in terms of the parts per million (ppm) unit mass of a contaminant, while the observed PM10 data have been measured in terms of micrograms per cubic meter (μg/m3). Thus, to measure the API indices, these pollutant variables need to be standardized to derive individual indices. Based on each individual indices, their values can then be integrated based on the highest sub-indices to determine the API indices at particular times [45]. A detailed calculation on the standardization and determination of the sub-indices values for each pollutant variable can be referred to in Masseran and Safari [46]. Figure 2 illustrate the schematic process for determining the API value based on five pollutant variables [47].
Based on the API index values, the DOE classifies API values higher than 100 as unhealthy air pollution events [48], and consecutive API values greater than 100 in a given period indicate the duration of an air pollution event. Assuming that the random variable Di represents the duration of any pollution event, then i = 1, 2, 3, …, n. Mathematically, the duration of an air pollution event can be determined as follows:
D i = j = 1 N I i A P I j , f o r i = 1 , 2 , 3 , , n ,
where N is the total number of observations, and I i A P I j is an indicator function, which is determined as follows:
I i A P I j = 1 , i f A P I j > 100 , 0 , i f A P I j 100 .
For any pollution event of a certain duration, the severity of that event can be determined based on the cumulative API values obtained during that period using the following equation:
S i = j = 1 D i A P I j ,
where Si is a random variable representing the severity of a pollution event, i = 1, 2, 3, …, n. Figure 3 shows a graphical representation of the process used to determine the severity and duration of an air pollution event.

3. Copula Description of Air Pollution Characteristics

Sklar’s theorem [49] provides an important foundation for the theory and application of the copula model. Copulas play a significant role as mapping functions that enable the combination of uniformly distributed marginal models. Specifically, copulas represent a joint distribution and describe the dependence structure of any arbitrarily distributed dependent variables [50]. Based on Sklar’s theorem, a copula function can be constructed using the following equation:
H x 1 , x 2 , , x 2 = C F 1 x 1 , F 2 x 2 , , F n x n = C u 1 , u 2 , , u n
where H x 1 , x 2 , , x 2 denotes an n-dimensional distribution function, F i x i = u i is a specified univariate marginal distribution function for random variable Xi with U ~ u n i f o r m 0 , 1 , and C denotes the copula function [29]. In the context of the bivariate model used in air pollution studies, assuming that the random variables X1 = D and X2 = S represent the duration and severity of an air pollution event, respectively, the copula model described in Equation (4) can be simplified as follows:
H d , s = C F D d , F S s = C u 1 , u 2 .
Equation (5) defines a two-dimensional copula as a mapping function of 0 , 1 2 0 , 1 , which implies that a bivariate distribution function can be defined in 0 , 1 2 with the standard univariate margins u 1 and u 2 [31]. Thus, the bivariate density function of a copula can be obtained as follows:
c u 1 , u 2 = 2 C u 1 , u 2 u 1 u 2 .
Based on the copula model, the dependency between the pollutant duration and severity can be determined using Kendall’s τ correlation. This correlation provides a scale-invariant measure for determining this dependency based on the concept of concordance [51]. As mentioned by Klein et al. [52], Kendall’s τ has good resistance to outliers and provides a wider class of dependencies. Generally, the Kendall’s τ correlation can be related to the copula parameter, θ, using the following equation:
τ = 4 0 , 1 C u , v ; θ c u 1 , u 2 ; θ d u 1 d u 2 1 .
Another important piece of information that can be determined is the conditional probability of the air pollution severity given a certain air pollution duration. This information can benefit authorities who are tasked with evaluating the patterns and trends of air pollution behavior, particularly during critical air pollution events. Mathematically, this conditional probability can be computed using the following formula [40]:
P S s | D d = F S s C F D d , F S s 1 F D d = u 2 C u 1 , u 2 1 u 1 .
The copula model can also be used to estimate the return periods of air pollution characteristics. An understanding of the information provided by the return period can serve as a basis for planning and developing monitoring systems to reduce the risk associated with extreme air pollution events. The return period provides insight into the average recurrence interval between unhealthy air pollution events. This information could be beneficial for evaluating the patterns and trends of air pollution behaviors, particularly during critical air pollution events. According to Shiau [40], the copula model is useful for providing information about the joint and conditional return periods of a bivariate event. The joint return period for D d or S s can be determined as follows:
T D S = E L P D d S s = E L 1 F D S d , s = E L 1 C F D d , F S s ,
where E L is the expected inter-arrival time of an air pollution event. Likewise, the joint return period for D d and S s can be determined as follows:
T D S = E L P D d S s = E L 1 F D d F S s + F D S d , s = E L 1 F D d F S s + C F D d , F S s .
On the other hand, the conditional return period for D, given S s , can be determined as follows:
T D | S s = E L 1 F S s × 1 1 F D d F S s + F D S d , s = 1 1 F S s 1 F D d F S s + C F D d , F S s ,
Additionally, the conditional return period for S given D d can be determined as follows:
T S | D d = E L 1 F D d × 1 1 F D d F S s + F D S d , s = 1 1 F D d 1 F D d F S s + C F D d , F S s .

4. Copula Models

Various copula models are available in the literature for modeling the relationship between the duration–severity variables. However, to obtain an accurate result, comparing several copula models is important to find the one that provides the best representation of the dataset.

4.1. Clayton Copula

The Clayton copula model is useful for capturing the positive dependence of bivariate variables in which the strength of the dependency is dictated by the parameter 0 θ < . With a particular rotation, this copula is appropriate for modeling either a positive or negative dependence structure that exists in a dataset [53]. The bivariate joint cumulative distribution function (CDF) and probability density function (PDF) of a Clayton copula are given as:
C u 1 , u 2 = u 1 θ + u 2 θ 1 1 θ ,
c u 1 , u 2 = θ + 1 u 1 θ + u 2 θ 1 1 + 2 θ θ u 1 u 2 θ 1 .
The Kendall’s τ correlation determined by the Clayton copula is given as:
τ = θ θ + 2

4.2. Ali–Mikhail–Haq (AMH) Copula

The AMH copula is suitable for datasets in which there is a weaker dependence among the variables [53]. However, this copula is always used for comparison with other copulas in terms of model fitting. The bivariate joint CDF and PDF of the AMH copula are given as:
C u 1 , u 2 = u 1 u 2 1 θ 1 u 1 1 u 2 ,
c u 1 , u 2 = 1 θ 1 u 1 1 u 2 + 2 θ u 1 u 2 1 θ 1 u 1 1 u 2 3 ,
With the parameter space [−1,1), the Kendall’s τ correlation determined by the AMH copula is given as:
τ = 3 θ 2 3 θ 2 1 θ 2 3 θ 2 ln 1 θ .

4.3. Frank Copula

The Frank copula provides a versatile dependency measure because it can accommodate the entire range of dependencies, τ θ 1 , 1 [41]. Thus, with the correct rotation, the Frank copula is an appropriate model for describing either a positive or negative dependence structure in the dataset [54]. The bivariate joint CDF and PDF of the Frank copula are given as:
C u 1 , u 2 = 1 θ ln 1 + e θ u 1 1 e θ u 2 1 e θ 1 ,
c u 1 , u 2 = θ e θ u 1 + u 2 e θ 1 e θ u 1 + u 2 e θ u 1 e θ u 2 + e θ 2 ,
The Kendall’s τ correlation determined by the Frank copula is given as:
τ = 1 + 4 D 1 θ 1 θ ,
where D 1 θ = 0 θ t k e t 1 d t θ is a Debye function [50].

4.4. Plackett Copula

The Plackett copula is also appropriate for modeling either the positive or negative dependence structures in a dataset [53]. The bivariate joint CDF and PDF of the Plackett copula are given as:
C u 1 , u 2 = 1 + θ 1 u 1 + u 2 1 + θ 1 u 1 + u 2 2 4 θ θ 1 u 1 u 2 1 2 2 θ 1 ,
c u 1 , u 2 = 1 + θ 1 u 1 + u 2 2 4 θ θ 1 u 1 u 2 3 2 θ 1 + θ 1 u 1 + u 2 2 u 1 u 2
with the parameter space 0 , . However, the Plackett copula has no analytical form for Kendall’s τ correlation. Thus, the measure of Kendall’s τ correlation must be obtained by the numerical integration of Equation (7) [50].

4.5. Gumbel–Hougaard (GH) Copula

The GH copula is suitable to be used to model the positive dependence structure among variables [41,53]. The GH copula can be extended to model the negative dependence by using the concept of rotations [38]. The bivariate joint CDF and PDF of the GH copula are given as:
C u 1 , u 2 = e ln u 1 θ + ln u 1 θ 1 θ ,
c u 1 , u 2 = C u 1 , u 2 ln u 1 + ln u 2 θ 1 ϑ 2 θ 2 θ 1 ϑ 1 θ + 1 u 1 u 2 ,
with the parameter space 1 , , and ϑ = ln u 1 θ + ln u 2 θ . The Kendall’s τ correlation determined by the GH copula is given as:
τ = 1 1 θ .

4.6. Joe Copula

The Joe copula is also suitable for describing the positive dependence among variables [41,53]. As reported by McNeil et al. [55], the Joe copula is a good choice for cases in which the dataset exhibits a higher positive correlation among the random variables involved. Apart from that, the Joe copula can be extended to model negative dependence by using the concept of rotations [38]. The bivariate joint CDF and PDF of the Joe copula are given as:
C u 1 , u 2 = 1 1 u 1 θ + 1 u 2 θ 1 u 1 θ 1 u 2 θ 1 θ ,
c u 1 , u 2 = 1 1 u 1 θ + 1 u 2 θ 1 u 1 θ 1 u 2 θ 1 θ u 1 u 2 ,
with the parameter space 1 , . The Kendall’s τ correlation for the Joe copula is obtained as follows:
τ = 1 + 2 + 2 γ + 2 ln 2 + Ψ 1 θ + Ψ 1 θ 2 + θ θ + θ θ 2 ,
where γ = lim n i = 1 n 1 i ln n 0.57721 is a Euler constant, and Ψ x = d d x ln Γ x is a digamma function [38].

5. Parameter Estimation and Model Selection

Several methods are available in the literature for estimating copula parameters. In this study, the pseudo maximum likelihood estimation (pseudo-MLE) method is used to estimate the parameter of each copula model. This method has the advantage of not requiring accurate information of the parametric form in the marginal model for each variable. Thus, this method provides a more flexible approach than other estimation methods [31].

5.1. Pseudo Maximum Likelihood Estimation (Pseudo-MLE)

In general, the pseudo-MLE is a semi-parametric method that uses a nonparametric empirical distribution (NED) to represent the marginal distribution of the variables involved in the copula model [56]. The NED for both the duration and severity variables is determined as follows:
F ^ i u = j = 1 n 1 U i j u n + 1 , i = 1 , 2 .
Next, using the NED as a basis for the marginal models, the copula parameter is estimated using the maximum likelihood approach by maximizing the pseudo-log-likelihood function as follows:
log L θ = j = 1 n ln c F ^ 1 u 1 j , F ^ 2 u 2 j ; θ .

5.2. Cross-Validation Copula Information Criterion (cvCIC)

The cvCIC criterion is used to select the copula model that best fits a dataset. Hofert et al. [29] briefly reported that the cvCIC measure for a fitted copula model can be determined as follows:
x v n = i = 1 n log c θ n , i F n , i U i n ,
where θ n , i represents the maximum pseudo-likelihood estimate, and F n , i U i = F n , 1 , i u 1 , F n , 2 , i u 2 can be computed using the following equation:
F n , j , i u k = 1 n 1 U k j u n , i f u min k 1 , 2 , , n \ i U k j , 1 n , o t h e r w i s e .
The copula model that maximizes the cvCIC criterion provides the best-fitted model to the dataset [57]. A detailed discussion of the cvCIC criterion can be found in [58,59].

6. Results and Discussion

Before conducting a detailed analysis of the structure of the dependency of the variables for duration and severity, providing a statistical summary of the dataset is useful. Figure 4 show that the time series plot corresponds to an unhealthy air pollution events threshold in Klang for the period of January 1, 1997 to August 31, 2020. Based on the observed API data, the proportion of unhealthy air pollution events is found to be 2.44%. As described by Masseran and Safari [48], although this proportion is quite small, however, these events provide the most valuable information, particularly on pollution risk management and mitigation.
Table 1 show the descriptive statistics for the severity and duration data. Based on Table 1, the center measures for both variables provided by their means and medians show a large discrepancy, and the measure of spread shows a very large range between the maximum and minimum values. In fact, their standard deviations are also very large, which indicates significant variation in both variables. These findings are more pronounced for severity, but both have significant skewness with a long right-tail distribution. These results are based on measures of skewness and kurtosis. A long right-tail distribution also indicates the presence of extreme and rare events for both variables.
The empirical distribution plots shown in Figure 5 illustrate these findings. The scatter plot in Figure 5 also indicates that the duration and severity have a strong positive dependence, with a Pearson correlation coefficient of 0.95. Note that the asterisks *** indicate the significant correlation at 1% significance level. However, the Pearson correlation measure is not a reliable method for dealing with data that exhibit skewness, long-tail, and non-identically marginal distributions [28]. Kendall’s τ correlation, which is derived from a fitted copula model that takes into account the dependence structure among the variables, could provide a more reliable measurement [29]. In addition, some large data points are observed to deviate from the diagonal line, which could be because of the skewness of the data set. To check further, a K-plot is used to reveal the dependent relationships between the points close to the curve, which imply high dependence. If the points are close to the diagonal, this indicates relatively weaker dependence [60]. Figure 6 shows that all of the data points for air pollution severity–duration are very close to the curve, thus confirming the positive relationship between the two.
Based on the empirical statistics and plots, first, the marginal model is determined for the duration–severity data, which serves as a building block for constructing a copula model. As indicated by Table 1 and Figure 5, the distributions of both the duration and severity data are highly skewed to the right. In the literature, the exponential, gamma, lognormal, and Weibull distributions of the models are used to represent marginal models of the duration and severity data [40,61]. Figure 7 shows the fitted marginal distribution used to represent the marginal model. The lognormal and gamma distributions seem to provide an approximate representation of the duration and severity datasets. However, to ensure the accuracy of these fitted models, their goodness of fit needs to be evaluated.
Table 2 presents the results of the goodness-of-fit evaluation on each of the fitted statistical distributions obtained using the Kolmogorov–Smirnov (KS) test. We found that the p-values corresponding to the K–S statistics for all of the fitted distributions of both variables are not significant at α = 0.05 , which means that the null hypotheses (i.e., the data follow a specified distribution) is rejected. Thus, although the plots in Figure 7 show a good approximation, these fitted distributions cannot reliably represent a marginal model. An accurate marginal model is critical for ensuring precision in copula modeling [62]. To overcome this weakness, an empirical distribution determined using Equation (30) will be used to represent a marginal model.
Based on the empirical distribution, a pseudo-MLE method is used to estimate the parameters of each fitted copula model for the dataset. Table 3 shows the parameter estimates, cvCIC, and Kendall’s τ correlation obtained from each fitted copula model. Based on the estimated parameters for each fitted copula model, the Joe copula model was found to obtain the maximum cvCIC criterion value. Thus, we can conclude that the Joe copula is better than the other copula models in describing the severity–duration relationship. This result is supported by the graphical representation given in Figure 8 of each fitted copula model, in which it is clearly shown that the relationship between the duration and severity of air pollution events is of highly positive dependence with a stronger structural dependency in the right-upper tails of their distributions. Except for the AMH copula model, all of the fitted copula models provide a good representation of the dataset. However, as determined by the cvCIC criterion shown in Table 3, the model that best represents the data is the Joe copula, which has the maximum cvCIC criterion value. The Kendall’s τ correlations determined for each copula model do not differ very much, with the exception of that of the AMH copula. The Kendall’s τ correlation (0.9195) determined by the Joe copula model is slightly lower that that obtained for the Pearson correlation (0.9500) and is more reliable because it considers the properties of data skewness and long-tail behavior in its computation. Next, based on the fitted copula model, the dependence structure of the severity–duration relationship can be analyzed. This information could benefit the governing authorities who are responsible for managing the risk of extreme air pollution, particularly during critical air pollution events. In addition, the copula approach can work as a simulated model for air pollution risk assessment. For any given air pollution duration size, the analyst can employ the copula model to determine the estimated severity size of air pollution event. Figure 9 shows the PDF and CDF of the fitted Joe copula as density and contour plots.
The PDF and CDF density and contour plots depict the probable behaviors of the duration–severity relationship and confirm the dependence structure shown in the scatter plot of Figure 8. However, the information provided by the PDF and CDF are described with respect to their joint probability distribution of the duration–severity relationship. Therefore, PDF and CDF information are referred to as the likelihood or probability of a certain duration period and severity level occurring simultaneously. The PDF and CDF contour plots efficiently illustrate the positive dependence and stronger structural dependency in the right-upper tails of the duration–severity data. From the fitted Joe copula, the probabilities of air pollution events under certain circumstances with a specific severity or duration can determined. Figure 10 shows the curves of various conditional probabilities of a certain severity level given the duration of an air pollution event. For example, the authorities might want to determine the risk probability for an air pollution severity of less than 8000 and 15,000 for a given duration exceeding 64 h. By normalizing these values using the probability integral transform to obtain a uniform (0,1), the computed probabilities obtained are Pr X 2 0.90 | X 1 0.89 = 0.908 and Pr X 2 0.9608 | X 1 0.89 = 1 , respectively.
Based on the fitted Joe copula model, the return periods for recurrent air pollution events of different durations and severities can be computed, as shown in Table 4. The rows correspond to the first and second columns in Table 4, indicating given specific levels of duration and severity sizes that represent various scenarios related to air pollution events. On the other hand, all of the values represented in the third, fourth, fifth, and sixth columns indicate the estimated joint and conditional return periods for the air pollution duration–severity pairs in Klang. If the authorities want to evaluate the expected recurrence of air pollution events of a particular duration and severity, four types of information can be determined, as described in Equations (9)–(12). For example, for an air pollution event with a duration of 50 h, with a possibility that this event will achieve a severity level of 20,000, its joint OR return period ( D 50 h     S 20,000 ) is approximately 34.4 days, and its AND return period ( D 50 h     S 20,000 ) is approximately 135.4 days. The conditional D|S return period for D = 50 h given S 20,000 is approximately 4153.6 days, and the conditional S|D return period for S = 20,000 given D 50   h is approximately 1056 days. The above information could be very beneficial to the authorities for air pollution even planning and for mitigating the risks associated with unhealthy air pollution events.

7. Conclusions

This study proposes the concept of duration and severity measures for evaluating unhealthy air pollution events. Since these bivariate criteria indicate the properties of structural dependency, long-tail, and non- identically marginal distributions, a copula model is proposed to deal with these issues. In particular, this study presents the application of a copula model in evaluating the behaviors of unhealthy air pollution events with respect to their duration and severity characteristics. The duration of an air pollution event was defined as a period in which the air pollution index values qualify as unhealthy for a period of consecutive days. On the other hand, the severity was defined as a magnitude of an air pollution event based on the cumulative effect of an unhealthy API during air pollution events. These two characteristics are related because the severity of an air pollution event always depends on its duration. Thus, considering their dependency, copula models were used to jointly model a combination of the marginal distributions of both variables.
A case study was conducted using data from Klang, Malaysia. To ensure the accuracy of the marginal model, an empirical distribution approach was proposed in the copula model. A total of six types of copula models, namely the Clayton, Ali–Mikhail–Haq, Frank, Plackett, Gumbel–Hougaard, and Joe copulas, were considered for evaluating the relationship between the duration and severity of unhealthy air pollution events in Klang. Based on its efficient fit, the Joe copula was found to best fit the data. Based on the Joe copula model, several valuable statistical measures for assessing air pollution risk were proposed, including (i) the Kendall’s τ correlation of the copula to measure the dependency between the duration and severity of an air pollution event, (ii) the conditional probability of a certain air pollution severity given the air pollution duration, (iii) the joint OR/AND return period, and (iv) the conditional D|S and S|D return periods. All of the these measures could be very beneficial for planning and mitigating the risks associated with unhealthy air pollution events. On the other hand, this approach is also applicable in other areas such as for natural hazards, extreme wind speeds, climate change, extreme temperature analysis, etc. Particularly, for the purpose of investigating the severity impact of some particular risky events corresponding to their duration size. Further research is recommended to evaluate how the trivariate relationship corresponds to the duration, severity, and intensity of air pollution event. A vine-copula approach could be used to analyze various combination structures represented by a duration–severity–intensity relationship.

Funding

This research and the APC was funded by UNIVERSITI KEBANGSAAN MALAYSIA, grant number GP-2020-K020446.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The author is indebted the Malaysian Department of Environment for providing air pollution data.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Murena, F. Measuring air quality over large urban areas: Development and application of an air pollution index at the urban area of Naples. Atmos. Environ. 2004, 38, 6195–6202. [Google Scholar] [CrossRef]
  2. Masseran, N. Modeling fluctuation of PM10 data with existence of volatility effect. Environ. Eng. Sci. 2017, 34, 816–827. [Google Scholar] [CrossRef]
  3. Masseran, N. Power-law behaviors of the duration size of unhealthy air pollution events. Stoch. Environ. Res. Risk Assess. 2021, 35, 1499–1508. [Google Scholar] [CrossRef]
  4. Genest, C.; Quessy, J.-F.; Rémillard, B. Goodness-of-fit procedures for copula models based on the probability integral trans-formation. Scand. J. Stat. 2006, 32, 337–366. [Google Scholar] [CrossRef]
  5. Huard, D.; Évin, G.; Favre, A.-C. Bayesian copulas selection. Comput. Stat. Data Anal. 2006, 51, 809–822. [Google Scholar] [CrossRef]
  6. Marcotte, D.; Henry, E. Automatic joint set clustering using a mixture of bivariate normal distributions. Int. J. Rock Mech. Min. Sci. 2002, 39, 323–334. [Google Scholar] [CrossRef]
  7. Lai, C.D.; Balakrishnan, N. Continuous Bivariate Distributions; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
  8. Tran, K.P.; Castagliola, P.; Celano, G. Monitoring the ratio of population means of a bivariate normal distribution using CUSUM type control charts. Stat. Pap. 2016, 59, 387–413. [Google Scholar] [CrossRef]
  9. Choi, J.-E.; Shin, D.W. Three regime bivariate normal distribution: A new estimation method for co-value-at-risk, CoVaR. Eur. J. Financ. 2019, 25, 1817–1833. [Google Scholar] [CrossRef]
  10. Yue, S. The bivariate lognormal distribution to model a multivariate flood episode. Hydrol. Process. 2000, 14, 2575–2588. [Google Scholar] [CrossRef]
  11. Pundir, S.; Amala, R. Detecting diagnostic accuracy of two biomarkers through a bivariate log-normal ROC curve. J. Appl. Stat. 2015, 42, 2671–2685. [Google Scholar] [CrossRef]
  12. Gumbel, E.J. Bivariate logistic distributions. J. Am. Stat. Assoc. 1961, 56, 335–349. [Google Scholar] [CrossRef]
  13. Yue, S. The Gumbel logistic model for representing a multivariate storm event. Adv. Water Resour. 2000, 24, 179–185. [Google Scholar] [CrossRef]
  14. Yue, S.; Ouarda, T.; Bobée, B.; Legendre, P.; Bruneau, P. The gumbel mixed model for flood frequency analysis. J. Hydrol. 1999, 226, 88–100. [Google Scholar] [CrossRef]
  15. Yue, S. A bivariate extreme value distribution applied to flood frequency analysis. Hydrol. Res. 2001, 32, 49–64. [Google Scholar] [CrossRef]
  16. Marshall, A.W.; Ingram, O. A multivariate exponential distribution. J. Am. Stat. Assoc. 1967, 62, 30–44. [Google Scholar] [CrossRef]
  17. Bacchi, B.; Becciu, G.; Kottegoda, N. Bivariate exponential model applied to intensities and durations of extreme rainfall. J. Hydrol. 1994, 155, 225–236. [Google Scholar] [CrossRef]
  18. Won, J.; Choi, J.; Lee, O.; Park, M.J.; Kim, S. Two ways to quantify Korean drought frequency: Partial duration series and bivariate exponential distribution, and application to climate change. Atmosphere 2020, 11, 476. [Google Scholar] [CrossRef]
  19. Royen, T. Expansions for the multivariate chi-square distribution. J. Multivar. Anal. 1991, 38, 213–232. [Google Scholar] [CrossRef] [Green Version]
  20. Izawa, T. Two or multi-dimensional gamma-type distribution and its application to rainfall data. Pap. Meteorol. Geophys. 1965, 15, 167–200. [Google Scholar] [CrossRef] [Green Version]
  21. Moran, P.A.P. Statistical inference with bivariate gamma distribution. Biometrika 1969, 54, 385–394. [Google Scholar] [CrossRef]
  22. Schmeiser, B.W.; Lal, R. Bivariate gamma random vectors. Oper. Res. 1982, 30, 355–374. [Google Scholar] [CrossRef]
  23. Loaciga, H.A.; Leipnik, R.B. Correlated gamma variables in the analysis of microbial densities in water. Adv. Water Resour. 2005, 28, 329–335. [Google Scholar] [CrossRef]
  24. Yue, S.; Ouarda, T.; Bobée, B. A review of bivariate gamma distributions for hydrological application. J. Hydrol. 2001, 246, 1–18. [Google Scholar] [CrossRef]
  25. Nadarajah, S. A bivariate gamma model for drought. Water Resour. Res. 2007, 43, W08501. [Google Scholar] [CrossRef] [Green Version]
  26. Antão, E.M.; Soares, C.G. Approximation of the joint probability density of wave steepness and height with a bivariate gam-ma distribution. Ocean. Eng. 2016, 126, 402–410. [Google Scholar] [CrossRef]
  27. Lambert, P.; Vandenhende, F. A copula-based model for multivariate non-normal longitudinal data: Analysis of a dose titra-tion safety study on a new antidepressant. Stat. Med. 2002, 21, 3197–3217. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Joe, H. Dependence Modeling with Copulas; Chapman and Hall/CRC: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
  29. Hofert, M.; Kojadinovic, I.; Mächler, M.; Yan, J. Elements of Copula Modeling with R; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar] [CrossRef]
  30. Kojadinovic, I.; Yan, J. Modeling multivariate distributions with continuous margins using the copula R package. J. Stat. Softw. 2010, 34, 17192. [Google Scholar] [CrossRef] [Green Version]
  31. Zhang, L.; Singh, V.P. Copulas and Their Applications in Water Resources Engineering; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar] [CrossRef] [Green Version]
  32. Sak, H.; Yang, G.; Li, B.; Li, W. A copula-based model for air pollution portfolio risk and its efficient simulation. Stoch. Environ. Res. Risk Assess. 2017, 31, 2607–2616. [Google Scholar] [CrossRef]
  33. Chan, R.K.S.; So, M.K.P. Multivariate modelling of spatial extremes based on copulas. J. Stat. Comput. Simul. 2018, 88, 2404–2424. [Google Scholar] [CrossRef]
  34. Falk, M.; Padoan, S.A.; Wisheckel, F. Generalized pareto copulas: A key to multivariate extremes. J. Multivar. Anal. 2019, 174, 104538. [Google Scholar] [CrossRef] [Green Version]
  35. Kim, J.-M.; Lee, N.; Xiao, X. Directional dependence between major cities in China based on copula regression on air pollution measurements. PLoS ONE 2019, 14, e0213148. [Google Scholar] [CrossRef] [PubMed]
  36. Masseran, N.; Hussain, S. Copula modelling on the dynamic dependence structure of multiple air pollutant variables. Mathematics 2020, 8, 1910. [Google Scholar] [CrossRef]
  37. He, S.; Li, Z.; Wang, W.; Yu, M.; Liu, L.; Alam, M.N.; Gao, Q.; Wang, T. Dynamic relationship between meteorological condi-tions and air pollutants based on a mixed Copula model. Int. J. Climatol. 2021, 41, 2611–2624. [Google Scholar] [CrossRef]
  38. Czado, C. Analyzing Dependent Data with Vine Copulas; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar] [CrossRef]
  39. Aas, K. Pair-copula constructions for financial applications: A review. Econometrics 2016, 4, 43. [Google Scholar] [CrossRef] [Green Version]
  40. Shiau, J.T. Fitting drought duration and severity with two-dimensional copulas. Water Resour. Manag. 2006, 20, 795–815. [Google Scholar] [CrossRef]
  41. Latif, S.; Mustafa, F. Bivariate flood distribution analysis under parametric copula framework: A case study for Kelantan River basin in Malaysia. Acta. Geophys. 2020, 68, 821–859. [Google Scholar] [CrossRef]
  42. Al-Dhurafi, N.A.; Masseran, N.; Zamzuri, Z.H.; Razali, A.M. Modeling unhealthy air pollution index using a peaks-over-threshold method. Environ. Eng. Sci. 2018, 35, 101–110. [Google Scholar] [CrossRef]
  43. Masseran, N.; Safari, M.A.M. Intensity–duration–frequency approach for risk assessment of air pollution events. J. Environ. Manage. 2020, 264, 110429. [Google Scholar] [CrossRef] [PubMed]
  44. Google Maps. 2019. Available online: https://www.google.com/maps/place/Klang,+Selangor/@3.2467558,101.2650693,9.1z/data=!4m5!3m4!1s0x31cc534c4ffe81cf:0xeb61f5772fd54514!8m2!3d3.044917!4d101.4455621 (accessed on 25 March 2019).
  45. AL-Dhurafi, N.A.; Masseran, N.; Zamzuri, Z.H. Hierarchical-generalized Pareto model for estimation of unhealthy air pol-lution index. Environ. Model. Assess. 2020, 25, 555–564. [Google Scholar] [CrossRef]
  46. Masseran, N.; Safari, M. Mixed POT-BM approach for modeling unhealthy air pollution events. Int. J. Environ. Res. Public Heal. 2021, 18, 6754. [Google Scholar] [CrossRef] [PubMed]
  47. A guide to air pollutant index in Malaysia (API). Available online: https://aqicn.org/images/aqi-scales/malaysia-api-guide.pdf (accessed on 4 June 2020).
  48. Masseran, N.; Safari, M.A.M. Risk assessment of extreme air pollution based on partial duration series: IDF approach. Stoch. Environ. Res. Risk Assess. 2020, 34, 545–559. [Google Scholar] [CrossRef]
  49. Sklar, A. Fonctions de repartition à n dimensionls et leurs marges. Publ. Inst. Statist. Univ. Paris 1959, 8, 229–231. [Google Scholar]
  50. Chowdhary, H.; Escobar, L.A.; Singh, V.P. Identification of suitable copulas for bivariate frequency analysis of flood peak and flood volume data. Hydrol. Res. 2011, 42, 193–216. [Google Scholar] [CrossRef]
  51. Nelsen, R.B. An Introduction to Copulas; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  52. Klein, B.; Schumann, A.H.; Pahlow, M. Copulas—New risk assessment methodology for Dam safety. Flood Risk Assess. Manag. 2010, 149–185. [Google Scholar] [CrossRef]
  53. Yusof, F.; Hui-Mean, F.; Suhaila, J.; Yusof, Z.; Jamaludin, S.S.S. Characterisation of drought properties with bivariate copula analysis. Water Resour. Manag. 2013, 27, 4183–4207. [Google Scholar] [CrossRef]
  54. Tosunoglu, F.; Can, I. Application of copulas for regional bivariate frequency analysis of meteorological droughts in Turkey. Nat. Hazards 2016, 82, 1457–1477. [Google Scholar] [CrossRef]
  55. McNeil, A.J.; Frey, R.; Embrechts, P. Quantitative risk Management: Concepts, Techniques and Tools—Revised Edition; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
  56. Genest, C.; Ghoudi, K.; Rivest, L. A semi-parametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 1995, 82, 543–552. [Google Scholar] [CrossRef]
  57. Hofert, M.; Kojadinovic, I.; Maechler, M.; Yan, J.; Nešlehová, J.G. Copula: Multivariate Dependence with Copulas. R Package Version 0.999-19.1. Available online: https://cran.r-project.org/web/packages/copula/index.html (accessed on 13 January 2021).
  58. Grønneberg, S.; Hjort, N.L. The copula information criteria. Scand. J. Stat. 2014, 41, 436–459. [Google Scholar] [CrossRef] [Green Version]
  59. Jordanger, L.A.; Tjøstheim, D. Model selection of copulas: AIC versus a cross validation copula information criterion. Stat. Probab. Lett. 2014, 92, 249–255. [Google Scholar] [CrossRef]
  60. Genest, C.; Boies, J.-C. Detecting dependence with Kendall plots. Am. Stat. 2003, 57, 275–284. [Google Scholar] [CrossRef]
  61. Xu, Y.; Huang, G.; Fan, Y. Multivariate flood risk analysis for Wei River. Stoch. Environ. Res. Risk Assess. 2015, 31, 225–242. [Google Scholar] [CrossRef]
  62. Kim, G.; Silvapulle, M.J.; Silvapulle, P. Comparison of semiparametric and parametric methods for estimating copulas. Comput. Stat. Data Anal. 2006, 51, 2836–2850. [Google Scholar] [CrossRef]
Figure 1. Maps of (a) Peninsular Malaysia (Klang location identified by red dot) and (b) the city of Klang.
Figure 1. Maps of (a) Peninsular Malaysia (Klang location identified by red dot) and (b) the city of Klang.
Ijerph 18 08751 g001
Figure 2. The process of determining the API value.
Figure 2. The process of determining the API value.
Ijerph 18 08751 g002
Figure 3. Time series plot corresponding to unhealthy air pollution event threshold.
Figure 3. Time series plot corresponding to unhealthy air pollution event threshold.
Ijerph 18 08751 g003
Figure 4. Time series plot corresponds to unhealthy threshold of air pollution events.
Figure 4. Time series plot corresponds to unhealthy threshold of air pollution events.
Ijerph 18 08751 g004
Figure 5. Empirical plots of the relationship of air pollution severity and duration.
Figure 5. Empirical plots of the relationship of air pollution severity and duration.
Ijerph 18 08751 g005
Figure 6. K-plot of the relationship between the duration and severity of air pollution events.
Figure 6. K-plot of the relationship between the duration and severity of air pollution events.
Ijerph 18 08751 g006
Figure 7. Fitted statistical distributions of duration and severity data.
Figure 7. Fitted statistical distributions of duration and severity data.
Ijerph 18 08751 g007
Figure 8. Fitted copula models for the severity–duration dataset.
Figure 8. Fitted copula models for the severity–duration dataset.
Ijerph 18 08751 g008
Figure 9. PDF and CDF plots for the fitted Joe copula model corresponding to their contour representations.
Figure 9. PDF and CDF plots for the fitted Joe copula model corresponding to their contour representations.
Ijerph 18 08751 g009
Figure 10. Conditional probability of severity level given the duration.
Figure 10. Conditional probability of severity level given the duration.
Ijerph 18 08751 g010
Table 1. Descriptive statistics for the severity and duration data.
Table 1. Descriptive statistics for the severity and duration data.
VariableMeanMedianMin. ValueMax. ValueStd. DeviationSkewnessKurtosis
Duration (hours)21.393.001.00224.0064.812.789.23
Severity 2876.50367.00102.0036,677.0015,193.683.3813.12
Table 2. Goodness-of-fit evaluations of the fitted statistical distributions.
Table 2. Goodness-of-fit evaluations of the fitted statistical distributions.
Variable Fitted DistributionKS-Statisticp-Value
DurationExponential0.38430.0000
Gamma0.18340.0009
Lognormal0.18770.0006
Weibull0.22270.0002
SeverityExponential0.39730.0000
Gamma0.29690.0000
Lognormal0.18340.0009
Weibull0.21390.0005
Table 3. Parameter estimates and cvCIC and Kendall’ τ measures obtained by each fitted copula model.
Table 3. Parameter estimates and cvCIC and Kendall’ τ measures obtained by each fitted copula model.
Copula Model Parameter Estimate (θ)cVCICKendall’s τ
Clayton22.8549.1080.9195
Ali–Mikhail–Haq135.6890.3333
Frank48142.7080.9195
Placket846.5221.5300.9195
Gumbel12.42199.3610.9194
Joe25.38255.2580.9195
Table 4. Joint and conditional return periods for the air pollution duration–severity pairs in Klang using the Joe copula.
Table 4. Joint and conditional return periods for the air pollution duration–severity pairs in Klang using the Joe copula.
Duration SizeAPI SeverityJoint OR Return
Period, T D S (Days)
Joint AND Return Period, T D S (Days)Conditional D|S
Return Period, T D | S s (Days)
Conditional S|D
Return Period, T S | D d (Days)
50-h1004.4 34.434.6268.5
100010.834.484.3268.5
10,00034.456.4720.9439.9
20,00034.4 135.44153.61056.0
30,00034.4406.34328.53168.0
80-h 1004.456.456.7721.1
100010.856.4138.1721.1
10,00054.858.2743.2743.2
20,00056.4135.44153.61730.7
30,00056.4406.37382.45192.0
100-h 1004.4101.6102.02336.4
100010.8101.6248.62336.4
10,00056.4101.61297.92336.4
20,000101.5135.54153.93115.4
30,000101.6406.37382.49345.6
120-h 1004.4169.3170.06490.0
100010.8169.3414.26490.0
10,00056.4169.32163.16490.0
20,000135.4169.35193.56491.8
30,000169.3406.37382.415,576.0
150-h 1004.4225.7226.711,537.9
100010.8225.7552.311,537.9
10,00056.4225.72884.111,537.9
20,000135.4225.76922.711,537.9
30,000225.7406.37382.420,768.1
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Masseran, N. Modeling the Characteristics of Unhealthy Air Pollution Events: A Copula Approach. Int. J. Environ. Res. Public Health 2021, 18, 8751. https://doi.org/10.3390/ijerph18168751

AMA Style

Masseran N. Modeling the Characteristics of Unhealthy Air Pollution Events: A Copula Approach. International Journal of Environmental Research and Public Health. 2021; 18(16):8751. https://doi.org/10.3390/ijerph18168751

Chicago/Turabian Style

Masseran, Nurulkamal. 2021. "Modeling the Characteristics of Unhealthy Air Pollution Events: A Copula Approach" International Journal of Environmental Research and Public Health 18, no. 16: 8751. https://doi.org/10.3390/ijerph18168751

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop