Next Article in Journal
Mine Water for the Generation and Storage of Renewable Energy: A Hybrid Hydro–Wind System
Previous Article in Journal
A Systematic Review and Meta-Analysis of Telemonitoring Interventions on Severe COPD Exacerbations
Previous Article in Special Issue
Dynamic Optimization and Coordination of Cooperative Emission Reduction in a Dual-Channel Supply Chain Considering Reference Low-Carbon Effect and Low-Carbon Goodwill
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mixed POT-BM Approach for Modeling Unhealthy Air Pollution Events

by
Nurulkamal Masseran
1,* and
Muhammad Aslam Mohd Safari
2
1
Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia
2
Department of Mathematics, Faculty of Science, Universiti Putra Malaysia, UPM, Serdang 43400, Selangor, Malaysia
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2021, 18(13), 6754; https://doi.org/10.3390/ijerph18136754
Submission received: 17 April 2021 / Revised: 8 June 2021 / Accepted: 18 June 2021 / Published: 23 June 2021

Abstract

:
This article proposes a novel data selection technique called the mixed peak-over-threshold–block-maxima (POT-BM) approach for modeling unhealthy air pollution events. The POT technique is employed to obtain a group of blocks containing data points satisfying extreme-event criteria that are greater than a particular threshold u. The selected groups are defined as POT blocks. In parallel with that, a declustering technique is used to overcome the problem of dependency behaviors that occurs among adjacent POT blocks. Finally, the BM concept is integrated to determine the maximum data points for each POT block. Results show that the extreme data points determined by the mixed POT-BM approach satisfy the independent properties of extreme events, with satisfactory fitted model precision results. Overall, this study concludes that the mixed POT-BM approach provides a balanced tradeoff between bias and variance in the statistical modeling of extreme-value events. A case study was conducted by modeling an extreme event based on unhealthy air pollution events with a threshold u > 100 in Klang, Malaysia.

1. Introduction

In the practical application of statistical modeling, the phenomena under investigation generally involve several rare events occurring with very high or very low values, relatively corresponding to numerous data points. The existence of such rare events can affect data distribution characteristics, such as in the form of high skewness and kurtosis, which correspond to long- and heavy-tail behaviors. In environmental phenomena, the occurrence of extreme events can be observed in various fields, such as extreme precipitation and flooding [1], extreme wind speeds [2], droughts [3], extreme temperatures [4], natural hazards [5], and air pollution events [6,7].
The presence of rare and extreme events leads to a heavy-tail property that makes commonly used normal/Gaussian models unsuitable for describing the data distribution [8,9]. Thus, to address this issue, two statistical extreme-value models for extreme events have been developed, namely the generalized extreme-value (GEV) and generalized Pareto distribution (GPD) models [10,11]. Both models have been found to be useful tools for providing good insights into the behaviors of extreme air pollution events [12,13,14,15,16,17,18,19,20,21]. For example, Al-Dhurafi et al. [12] employed the GPD model to represent probability distributions in an unhealthy air pollution index for three highly polluted urban areas in Peninsular Malaysia. Martins et al. [13] employed the GEV and GPD models to investigate the behavior of air pollutants in two large urban regions (Metropolitan Area of São Paulo and Metropolitan Area of Rio de Janeiro) in South America. Masseran et al. [18] used the GPD model as a tool to investigate the risk of occurrence of unhealthy air pollution events for eight urban areas in Peninsular Malaysia. Reyes et al. [19] provided an estimation of the high ozone levels trend for seven of the meteorological stations located in the Metropolitan Area of Mexico city using the quantile function determined from a fitted GEV model. Battista et al. [21] evaluated the behavior of the air pollution level in the city of Rome (Italy) using the GEV model. Overall, all of these studies found that the results obtained with extreme-value models provided valuable information to use as a basis of managing the risk of air pollution events.
Conceptually, the GEV model is derived from the block-maxima (BM) approach, which generally uses a data selection method based on annual maximum extreme events, and this implies that only one extreme event per year can be used for GEV modeling [22,23,24]. However, this process can lead to a loss of the information contained in other large-sample values [12,25]. In reality, hazardous phenomena that occur within a year may not only be those that exert a maximum effect but also those that are the second or third most serious, which may lead to extreme events in different time periods within the year [26]. The BM approach fails to represent actual phenomena involving the occurrence of multiple extreme events within a year [27]. Therefore, a single maximum point per year definitely reduces the quality of the data representing events of interest, which could significantly contribute to the outcome bias of the extreme value analysis [28]. Although the BM size can be adjusted to include additional extreme data points, no clear rules exist to determine the optimal block size for practical applications.
GPD modeling based on the peak-over-threshold (POT) approach is conducted to determine extreme events as data points greater than a particular threshold level [29,30]. This approach can enable flexible selection of extreme data points, allowing the inclusion of a wide range of extreme events in the analysis [31,32,33]. A high number of selected data points in the extreme-value modeling process provides better precision in terms of the inference and parameter estimation of the statistical model, as well as its quantile estimates [34,35]. However, extreme data points determined with the POT approach tend to lead to the problem of dependency behaviors, which implies that there is a bias in the statistical modeling process. Although the threshold can be adjusted to overcome the dependency behavior problem, the process of determining an optimal threshold can provide only a subjective solution. This subjective threshold selection may not be able to provide an interpretable meaning for the problem under investigation [12,18]. For a problem related to an unhealthy air pollution event (air pollution index (API) > 100), a fixed threshold u = 100 can provide a meaningful analysis [36].
In summary, the BM and POT approaches have their own advantages and limitations. Therefore, this article proposes an alternative approach by combining the BM and POT approaches to provide improved tradeoff results in terms of the bias and variance of the fitted model, as seen in particular for a practical application in air pollution modeling.

2. Study Area and API Data

Klang, which is located in Peninsular Malaysia at a latitude of 101°26′44.023 E and a longitude of 3°2′41.701 N, is one of Malaysia’s largest cities, with a land area of approximately 573 km2. The population density in Klang is among the highest of any city in Malaysia, 1034 people per square kilometer [37]. Klang is also the center of important industrial and economic interests of the country. Moreover, Klang has been recognized as the 13th busiest trans-shipment port and 16th busiest container port in the world [38,39]. However, the rapid development of urban commercial and industrial areas in Klang in recent decades has elevated its risk of atmospheric pollution. Therefore, given the importance of the city’s industrial activities, monitoring and evaluating the behavior of extreme pollution events in Klang is crucial. Figure 1 presents a map of Peninsular Malaysia with the location of Klang [40].
The API in Malaysia is derived from the five main pollutants: suspended particulate matter less than ten microns in size (PM10), nitrogen dioxide (NO2), carbon monoxide (CO), ozone (O3), and sulfur dioxide (SO2). Afroz et al. [41] reported that there are three major sources that influence the concentration levels for the CO, SO2, NO2, O3, and PM10 pollutant variables in Malaysia. They categorized thee sources as mobile sources, stationary sources, and open burning sources, which in turn can be decomposed into various factors, including industrial and construction activities, transportation exhaust emission, soil dust, open burning activity, haze events, and so on [42,43,44,45,46]. The observed data for CO, SO2, NO2, and O3 are quantified in terms of parts per million (ppm) units for the mass of a contaminant. The observed PM10 data are quantified in terms of micrograms per cubic meter (μg/m3). Thus, to measure the API indices, these pollutant variables need to be standardized to derive individual indices. Based on the observed data, the calculation of the sub-API indices can be undertaken using the mathematical formulas provided by the Department of Environment Malaysia [47]. The sub-API value for the CO pollutant can be computed using the following equation:
I d x ( C O ) = { C O × 11.11111 , i f C O < 9 ppm ,   100 + { [ C O 9 ] × 16.66667 } , i f 9 C O < 15 ppm , 200 + { [ C O 15 ] × 6.66667 } , i f 15 C O < 30 ppm , 300 + { [ C O 30 ] × 10 } , i f C O 30 ppm .
The sub-API value for the O3 pollutant can be computed using the following equation:
I d x ( O 3 ) = { O 3 × 1000 , i f O 3 < 0.2 ppm , 200 + { [ O 3 0.2 ] × 500 } , i f 0.2 C O < 0.4 ppm ,   300 + { [ O 3 0.4 ] × 1000 } , i f O 3 0.4 ppm .
The sub-API value for the NO2 pollutant can be computed using the following equation:
I d x ( N O 2 ) = { N O 2 × 588.23529 , i f N O 2 < 0.17 ppm , 100 + { [ N O 2 0.17 ] × 232.56 } , i f 0.17 N O 2 < 0.6 ppm , 200 + { [ N O 2 0.6 ] × 166.667 } , i f 0.6 N O 2 < 1.2 ppm , 300 + { [ N O 2 1.2 ] × 250 } , i f N O 2 1.2 ppm .
The sub-API value for the SO2 pollutant can be computed using the following equation:
I d x ( S O 2 ) = { S O 2 × 2500 , i f S O 2 < 0.04 ppm , 100 + { [ S O 2 0.04 ] × 384.61 } , i f 0.04 S O 2 < 0.3 ppm , 200 + { [ S O 2 0.3 ] × 333.333 } , i f 0.3 S O 2 < 0.6 ppm , 300 + { [ S O 2 0.6 ] × 500 } , i f S O 2 0.6 ppm .
The sub-API value for the PM10 pollutant can be computed using the following equation:
I d x ( P M 10 ) = { P M 10 , i f P M 10 < 50 ug / m 3 , 50 + { [ P M 10 50 ] × 0.5 } , i f 50 P M 10 < 350 ug / m 3 , 200 + { [ P M 10 350 ] × 1.4286 } , i f 350 P M 10 < 420 ug / m 3 , 300 + { [ P M 10 420 ] × 1.25 } , i f 420 P M 10 < 500 ug / m 3 , 400 + [ P M 10 500 ] , i f P M 10 500 ug / m 3 .
Based on these individual indices, the API value at a particular time can be determined based on the highest value among these sub-indices [48,49]. Figure 2 shows a schematic illustration of the process of determining the API value. Table 1 describes the air quality statuses corresponding to API values at particular time [47].
The higher the API value at a particular time, the greater the threat of the occurrence of extreme pollution events. Such scenarios disrupt the economic activities of the country, lead to a high risk health problems among populations, and negatively affect environmental ecosystems. Thus, it is important to investigate the behavior of air pollution events in order to gain an understanding of these issues. This study used hourly API data from Klang for the period from 1 January 1997 to 31 August 2020 as a case study. Details of the raw API data and how to request access are available on the Department of Environment Malaysia website [50]. This study was interested in the occurrence of extreme air pollution events determined by unhealthy API indices.

3. Extreme-Value Modeling

3.1. BM Approach

In the BM approach, the extreme behavior of an air pollution event can be evaluated based on a maximum recorded API value determined from a particular period, which is defined as a block. If we let X 1 , X 2 , , X n be a random variable representing API data following a particular density function F, then the probability behavior of the maximum API value from a particular block can be written as Y = max ( X 1 , X 2 , , X n ) . The density function of the random variable Y is determined from the following equation:
P ( Y y ) = P ( X 1 x , X 2 x , , X n y ) = F n ( y ) .
The density of F n can be used as an accurate approximation model for probability distribution of the variable Y = max ( X 1 , X 2 , , X n ) despite the independent and identical conditions of the original random variable X 1 , X 2 , , X n not being satisfied. This flexibility makes conducting extreme-value analysis based on the BM approach for various types of problems, including those involving complex phenomena, plausible [51].
In a practical application involving a dataset, the actual density function F representing the distribution of the phenomenon under investigation is unknown. This concept implies that the distribution of F must first be estimated from the observed data before it can be used in Equation (6). However, small discrepancies in the F determination will lead to substantial discrepancies in F n . In parallel, subsequent analyses using the F n density function will contribute to large errors, consequently leading to incorrect results [52]. To overcome this problem, the density function F can be assumed to be unknown, whereas the determination of the F n density function can be approximated to a particular limiting distribution form as n . Mathematically, this limiting distribution G(y) is valid if a sequence of constants { a n > 0 } and { c n } exists, such that:
P { Y c n a n y } G ( y ) , a s n ,
where G(y) is also a nondegenerate function [53]. According to the literature, the limiting distribution in Equation (7) provides the form of the GEV distribution as the following equation:
G ( y ) = { exp { [ 1 κ ( y ξ α ) ] 1 κ } , f o r κ 0 , exp [ exp { ( y ξ α ) } ] , f o r κ = 0 ,
where the location, scale, and shape parameters are denoted by the notations ξ , α , and κ , respectively. These parameters must be estimated to provide a practical distribution function that can be used to represent the data distribution. However, owing to the problem of small sample size deriving from the BM approach, estimation using the likelihood-based approach, such as maximum likelihood estimation, will not generate satisfactory results. As described by Hosking et al. [54], the L-moment method is an effective alternative approach to overcome this issue. The L-moment estimators of the GEV model are given as follows:
ξ ^ = λ ^ 1 α ^ κ ^ { 1 Γ ( 1 + κ ^ ) } ,
α ^ = λ ^ 2 κ ^ ( 1 2 κ ^ ) Γ ( 1 + κ ^ ) ,
κ ^ = 7.859 c + 2.9554 c 2 ,
where c = 2 ( 3 + τ ^ 3 ) log ( 2 ) log ( 3 ) . In conjunction with Equations (9)–(11), the L-moment estimators of the terms λ ^ 1 , λ ^ 2 , λ ^ 3 and τ ^ 3 = λ ^ 3 / λ ^ 2 must be estimated from the probability weighted moments determined from the following equation:
β r = ξ + α [ 1 ( r + 1 ) κ Γ ( 1 + κ ) ] κ ( r + 1 ) ,
where λ 1 = β 0 , λ 2 = 2 β 1 β 0 , and λ 3 = 6 β 2 6 β 1 + β 0 . The unbiased estimator of β r is determined as follows:
b r = i = 1 n [ ( i 1 ) ( i 2 ) ( i 3 ) ( i r ) n ( n 1 ) ( n 1 ) ( n r ) y ( i ) ] , f o r = 0 , 1 , 2 , ,
where y ( i ) is the order statistic determined from the sample data (for additional details on the L-moment estimators of the GEV model, refer to [55,56,57]). The quantile function of the GEV model can be obtained by inverting its cumulative density function (CDF) [58], which is given as follow:
G 1 ( p , ξ , α , κ ) = { ξ + α κ [ 1 ( ln ( p ) ) κ ] , f o r κ 0 , ξ α ln ( ln ( p ) ) , f o r κ = 0 .
The quantile function knowledge in Equation (14) is crucial for determining the return period related to an extreme pollution event. The return period, referring to the probability of the time (T) block extreme event being exceeded, is 1 T in every time block period. Thus, for any particular return period x T , the critical value of the maximum API level can be determined using the following equation:
P ( Y > y T ) = G ( y T ) = 1 T .
In parallel, the maximum API level corresponds to a particular period T, which can be computed as follows:
y T = G 1 ( T 1 T ) .

3.2. POT Approach

This approach uses a particular threshold value to isolate data points considered extreme from the rest of the dataset before determining a statistical model to describe an extreme phenomenon. To relate this approach to the analysis of an unhealthy air pollution event, based on Table 1, an API value exceeding the threshold of 100 is determined as the unhealthy air pollution event. Thus, a fixed threshold value indicated by u = 100 has a significant meaning in air pollution studies. Let X 1 , X 2 , , X n represent a random variable in the API data following an unknown density function F. Mathematically, an unhealthy air pollution event phenomenon can be represented by a conditional event corresponding to a threshold greater than u. The conditional exceedance distribution function F [ u ] is determined as follows:
F [ u ] ( x ) = Pr ( X x | X > u ) = Pr { X x , X > u } Pr { X > u } = F ( x ) F ( u ) 1 F ( u ) , x u .
This conditional exceedance distribution function has a left endpoint of F [ u ] ; that is, α ( F [ u ] ) = inf ( y : F [ u ] ( y ) > 0 ) equal to 0. However, if we let Y = X u represent a random variable in the data above the threshold u, then the empirical cumulative density function F ^ k ( y ; . ) is given by the following equation:
F ^ k ( y ; . ) = ( F ^ k ( y ; . ) ) [ u ] .
Equations (17) and (18) provide the same information [51]. To determine the parametric form of this density function, the limiting distribution of the normalized values exceeding the threshold must be derived. According to the literature, as the threshold approaches the endpoint of the variable, the limiting distribution of the cumulative density in Equation (18) will follow a GPD [11,59], which can be given as follows:
F ( x ) = P ( X x | X > u ) = { 1 ( 1 + ξ ( x u σ ) ) 1 ξ , i f ξ 0 , 1 e ( x u σ ) , i f ξ = 0 .
Meanwhile, in terms of the random variable Y = X u , we obtain:
G ( y ) = P ( Y y ) = { 1 ( 1 + ξ y σ ) 1 ξ , i f ξ 0 , 1 e y σ , i f ξ = 0 ,
where 1 + ξ y σ > 0 . The threshold, shape, and scale parameters are represented by u , ξ , and σ , respectively [60,61]. The tail behavior influenced by the existence of extreme data can be described by the shape parameter ξ . If ξ = 0 , then the right upper-tail distribution of the data indicates the properties of a medium-sized tail, which implies that the GPD can be approximated to an exponential distribution. On the one hand, if ξ > 0 , then it indicates the properties of a short-tailed distribution, which leads the GPD approximation to a Pareto type II model. On the other hand, if ξ < 0 , then it indicates the existence of long-tailed behavior that can approximate the GPD to an ordinary Pareto distribution. Moreover, the mean of the data exceeding the threshold u can be described by the scale parameter σ [25].
To determine the parameter estimates of the GPD model, a method based on its likelihood function can be used, as large data points can commonly be obtained for the POT approach. For the values of y 1 , y 2 , , y k , where k is the total number of data points exceeding the threshold u, the log-likelihood function of the GPD is given as follows:
log ( L ) = k log σ ( 1 + 1 ξ ) i = 1 k log ( 1 + ξ y i σ ) .
However, no analytical solution can be obtained from Equation (21). Thus, a numerical procedure must be utilized to obtain a final solution for the parameter estimation.
Next, similar to the GEV model, the concept of the return period can be employed in the GPD model. A return period is interpreted as the average period between extreme events exceeding a threshold u within a particular period of time [27], which is derived based on knowledge of its distribution function. The return period of the GPD model is given as the following equation [53]:
P ( Y > y | Y > u ) = [ 1 + ξ ( y u σ ) ] 1 ξ .
If we let ζ u = P ( X > u ) = k n , where k is the number of data points x i exceeding the threshold u, then the return period presented in Equation (22) can be simplified as:
P ( Y > y ) = ζ u [ 1 + ξ ( y u σ ) ] 1 ξ ,
which implies that the API level exceeded once every m series of observations on average can be determined using the following equation:
1 m = ζ u [ 1 + ξ ( y u u σ ) ] 1 ξ .
Specifically, for an unhealthy air pollution event indicated by threshold u = 100, Equation (24) can be simplified as
y u = u + σ ξ [ ( m ζ u ) ξ 1 ] .
In addition, by manipulating the return period formula, information on the expected return level corresponding to a particular return period can be obtained [62] as follows:
u = G 1 ( 1 1 P R ( u ) ) .
where G 1 is the quantile function of the GPD model.

3.3. Mixed POT-BM Approach

The relationship between the GEV model (BM approach) and GPD model (POT approach) can be described using asymptotic model characterization [63]. As described above, in a sequence of independent random variables X 1 , X 2 , , X n with a common density function F, the maximum value of data in a particular block is determined as Y = max ( X 1 , X 2 , , X n ) . In a large n, the asymptotic density function of Y is P ( Y y ) G ( y ) , where G ( y ) is the GEV model presented in Equation (8). In parallel, for a particular adequately large threshold u, the density function of the random variable X u conditional on X > u can be approximated to the density function of the GPD model presented in Equation (20) [53]. In a mixed POT-BM approach, for a series of random variables X 1 , X 2 , , X n , the POT data are determined first by grouping them into a block containing data points satisfying the criteria of X > u . However, some extreme events may occur in sequence over time. For example, an extreme air pollution event will likely be followed by another extreme air pollution event for several consecutive hours or days. Thus, this scenario violates the independence assumption for the selected POT data points. To overcome this problem, a declustering technique can be used to filter dependent consecutive extreme values exceeding the threshold u. Declustering is performed by selecting only the POT data blocks with a minimum separation r from one another. According to Karim et al. [64], r = 240 h is sufficient as the minimum number of separation hours between two extreme events.
After the declustering of the POT blocks has been determined, the BM concept is applied to select the maximum value of the data points in each POT block. This step implies that the extreme data points selected by the mixed POT-BM approach can satisfy or at least satisfactorily approximate the properties of an independent variable. Thus, instead of making a pragmatic choice, such as a block size length of one year—as is common practice in the BM approach, leading to a small number of selected data points and to high variance in the statistical modeling—all data points exceeding the threshold u in the POT approach can be selected; however, this choice violates the asymptotic properties of the extreme-value model, thereby implying a high bias. The mixed POT-BM approach can be a reasonable alternative for providing a balanced tradeoff between bias and variance generated by the data selection in the extreme-value statistical modeling.
In a similar vein, based on the relationship between the GEV model (BM approach) and GPD model (POT approach) described above, the argument regarding this matter leads to an intuitive justification that, for a particular adequately large threshold u, several independent maximum block data governed by the GEV model can be obtained, which implies that the mixed POT-BM approach can achieve satisfactory behaviors similar to the BM approach. Thus, the density function corresponding to the mixed POT-BM approach can be approximated to the GEV model presented in Equation (8). However, the POT-BM approach can be a highly reliable method, as it can determine more extreme data points than the common BM approach. Large independent extreme data points determined using this approach increase GEV modeling precision, as a low estimation variance can be obtained. Moreover, the model bias problem can be avoided by integrating the declustering technique in the POT-BM approach.
By taking advantage of the large number of extreme data points generated by the POT-BM approach, the maximum likelihood can be used instead of the L-moments method for parameter estimation. According to Coles [53], the likelihood-based method is equipped with a convenient set of “off-the-shelf” inference properties and can quantify uncertainties during estimation. The log-likelihood function of the GEV model for κ 0 can be written as
( θ | y ) = n log ( α ) + i = 1 n [ ( 1 κ 1 ) ln ( z i ) ( z i ) 1 κ ] ,
where θ = [ ξ , α , κ ] and z i = [ 1 ( κ / α ) ( y i ξ ) ] [57,65]. However, the log-likelihood function of the GEV model presented in Equation (27) cannot provide a simple analytical solution for parameters ξ , α , and κ . Thus, parameters that can maximize the log-likelihood function can be determined via a numerical optimization technique, which is available in many programming language software, such as R [66].

4. Results and Discussion

Before a detailed analysis is conducted, deriving basic information on the descriptive statistics data would be worthwhile. Figure 3 illustrates the time series plot for the observed hourly API data in the Klang area for the period from 1 January 1997 to 31 August 2020, and Table 2 presents several descriptive measures of the data.
The information shows that most of the time, in healthy conditions, the API values in Klang fluctuate (below a threshold of 100) around the mean, which is 55.222. However, variance is relatively high (20.970). This high variability derives from unhealthy pollution events that occur repeatedly, as shown by the spike in the data points in the time series plot. The maximum API value recorded in the 24 year data is approximately 541, which indicates a hazardous air quality status (as described in Table 1). These anomalies of API values closely relate to the occurrence of haze events in Malaysia [67]. As reported by the Department of Environment Malaysia [68], haze episodes have recurrently occurred in the years 2005, 2012, 2014, and 2015. In particular, with regard to the API value of approximately 541, this scenario stems from a severe haze event that occurred on 11 August 2005. During that period, the government of Malaysia declared a state of emergency in the Klang area. This result implies that the unhealthy and extreme air pollution event risks in Klang should not be taken lightly. Meanwhile, the skewness and kurtosis measures indicate that the data are not normally distributed. This information can lead to the reasonable application of extreme-value modeling for the data.
Moreover, as an API greater than 100 indicates the occurrence of unhealthy air pollution events, a fixed threshold u = 100 was essential in this study as the basis for the extreme-value modeling and analysis. For the POT approach, the selected threshold can affect GPD modeling precision. Thus, as this study used a fixed threshold u = 100 to assess the suitability of a POT approach, a mean residual life (MRL) plot could be utilized. The MRL plot should indicate an approximate linear behavior in the threshold u [53]. Figure 4 shows the MRL plot for the API data. In the threshold u = 100, approximate linear behavior is satisfied. Thus, u = 100 is a valid threshold for extreme-value modeling using the GPD model. Table 3 provides the results of the parameter estimation for each fitted extreme-value model. The estimated parameters of the GEV fitted model based on the BM approach have the largest standard error compared with those of the other approaches, which implies less precise results. The parameters of the GPD fitted model based on the POT approach have the smallest standard error, which implies high precision. Moreover, the estimated parameters for the GEV approximation based on the POT-BM approach can provide a considerably reduced standard error compared with those of the GEV model based on the BM approach, and it is slightly higher than the standard error obtained by the GPD model based on the POT approach. This result indicates that the application of the GEV model with the BM approach was a less precise method to deal with the data of extreme air pollution events, while both the POT method and the GEV with POT-BM approach were found to produce better results with small standard errors. However, as described above, another important criterion that needs to be evaluated for these methods is their dependency behaviors in order to avoid the bias problem in extreme-value modeling.
Based on the results of the parameter estimation, Figure 5 presents a comparison of the fitted extreme-value models that clearly shows the most precise model in terms of the parameter estimation standard error. According to Figure 5, the fitted density of the GEV model based on the BM approach provides relatively rough results, indicating that the less precise model could represent the occurrence of extreme phenomena in the dataset. Meanwhile, GPD modeling based on POT and GEV approximation based on the POT-BM approach could provide improved model fitting with high precision. The comparison of PP plots in Figure 6 accords with the results in Figure 5 and Table 3.
Apart from the precision of the fitted models in terms of their standard error, representing variance in the model estimation, another important criterion is the independence assumption for the selected extreme data. Violation of the independence assumption leads to the bias problem in the fitted models. Figure 7 shows the autocorrelation function of the data based on the different extreme-value approaches. The original observed hourly API data (top left) have a very strong dependency on their previous time lag, which clearly violates the independence assumption. Thus, the use of the BM approach (top right) could eliminate the problem of dependency behaviors, whereas the use of the POT approach (bottom left) could slightly reduce dependency behaviors, which remained noticeably significant. Moreover, the use of the mixed POT-BM approach (bottom left) made it possible to solve the problem of dependency behaviors.
The highlighted result was obtained from the results presented in Table 3 and Figure 5, Figure 6 and Figure 7, which indicate that extreme-value modeling based on the BM approach is unbiased but less precise, with large estimation variance. However, though the POT approach tends to demonstrate high precision in extreme-value modeling, it suffers from bias due to the violation of the independence assumption. Meanwhile, the mixed POT-BM approach can provide an improved tradeoff between bias and variance. In addition, the mixed POT-BM approach can satisfy the unbiased properties of extreme-value modeling with much lower variance compared to the BM approach. In a similar vein, Figure 8 presents a comparison of the return level plot of each fitted extreme-value model. Based on Figure 8, the return level estimate determined by the GPD model based on the POT approach has a small confidence interval, but its estimation is clearly biased compared with the empirical data. For the return level estimate determined by the GEV model based on the BM approach, its precision with the empirical data is satisfactory. However, the return level plot derived from the GEV model with the POT-BM approach was found to produce a better result. The plot shows that the estimated return level curves within the range of the observed data are accurate with a small confidence interval, particularly in short return periods of time, such as 5 or 10 years. However, for a long return period, the confidence interval of the return level estimation is larger. This implies that the accuracy of the return level estimation decreases. Thus, to provide a better assessment, we suggest that the GEV (POT-BM) model needs to be re-fitted with a current set of data to obtain the most up-to-date short term (5 or 10 years of the future) evaluation of extreme pollution event over time.
This finding implies there is considerable uncertainty with regard to the practical use of the results. Fortunately, the mixed POT-BM approach can provide improved precision with a small confidence interval in the return level estimation. Short return periods less than 10 years can provide satisfactory estimation results for the return level estimates of extreme API events. For moderate return periods between 10 to 50 years, the precision of the estimates decreases as the range of the confidence interval increases. However, information on long-term return periods over than 50 years is not recommended for use as a reference for air pollution management planning and decisions in Klang. Overall, this study concluded that the mixed POT-BM approach is a satisfactory alternative solution for determining extreme data points for the statistical modeling of extreme events, such as unhealthy air pollution.

5. Conclusions

This article proposes the use of a novel alternative approach for determining extreme event data called the mixed POT-BM method for extreme-value modeling. A case study was conducted using the API data from Klang, Malaysia, for the period from 1 January 1997 to 31 August 2020. The extreme events of interest corresponded to unhealthy air pollution events with a threshold u > 100. Since the threshold u > 100 was used as the basis for determining the POT blocks, this approach led to the problem of dependency behaviors among the selected extreme data points, which implied a bias in the statistical modeling process. Thus, to overcome the problem of dependency behaviors present in adjacent POT blocks, a declustering technique was employed. The declustering technique was used by filtering consecutive dependent extreme data points with 240 h minimum separation from one another. It was found that the application of the declustering technique in POT blocks was able to solve the problem of dependency behaviors for selected extreme data. In addition, the BM concept was integrated to determine the maximum data points in each POT block. The extreme data points determined by the mixed POT-BM approach satisfied the independent properties of independent extreme events, with satisfactory fitted model precision results. The estimated parameters for the GEV approximation based on the POT-BM approach were found to produce a considerably lower standard error compared to the original GEV modeling with the BM approach. Apart from that, the return level plot derived from GEV model with the POT-BM approach was also found to produce a good result. The return level estimation based on short return periods of time (5 or 10 years) was found to be more accurate compared to other extreme-value models. Overall, this study concludes that the results obtained by the POT-BM method can provide an improved balance in the tradeoff between bias and variance in extreme-value modeling. Thus, the POT-BM approach is a satisfactory practical application alternative for extreme-value event analysis and modeling.

Author Contributions

Conceptualization, methodology, formal analysis, visualization, writing—original draft preparation, funding acquisition, N.M.; validation, writing—review and editing M.A.M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universiti Kebangsaan Malaysia, grant number GP-2020-K020446, and the APC was funded by Universiti Kebangsaan Malaysia, grant number PP-FST-2021.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Due to confidentiality agreements, supporting data can only be made available to bona fide researchers subject to a non-disclosure agreement. Details of the data and how to request access are available from the Department of Environment Malaysia, https://www.doe.gov.my/portalv1/en/at (accessed on 4 March 2021).

Acknowledgments

The authors are indebted to the Malaysian Department of the Environment for providing air pollution data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tabari, H. Extreme value analysis dilemma for climate change impact assessment on global flood and extreme precipitation. J. Hydrol. 2021, 593, 125932. [Google Scholar] [CrossRef]
  2. Xu, H.; Lin, N.; Huang, M.; Lou, W. Design tropical cyclone wind speed when considering climate change. J. Struct. Eng. 2020, 146, 04020063. [Google Scholar] [CrossRef] [Green Version]
  3. Sarailidis, G.; Vasiliades, L.; Loukas, A. Analysis of streamflow droughts using fixed and variable thresholds. Hydrol. Process. 2019, 33, 414–431. [Google Scholar] [CrossRef]
  4. O’Sullivan, J.; Sweeney, C.; Parnell, A.C. Bayesian spatial extreme value analysis of maximum temperatures in County Dublin, Ireland. Environmetrics 2020, 31, e2621. [Google Scholar] [CrossRef] [Green Version]
  5. Beirlant, J.; Kijko, A.; Reynkens, T.; Einmahl, J.H.J. Estimating the maximum possible earthquake magnitude using extreme value methodology: The Groningen case. Nat. Hazards 2019, 98, 1091–1113. [Google Scholar] [CrossRef] [Green Version]
  6. Masseran, N. Modeling fluctuation of PM10 data with existence of volatility effect. Environ. Eng. Sci. 2017, 34, 816–827. [Google Scholar] [CrossRef]
  7. AL-Dhurafi, N.A.; Masseran, N.; Zamzuri, Z.H. Hierarchical-Generalized Pareto model for estimation of unhealthy air pollution index. Environ. Model. Assess. 2020, 25, 555–564. [Google Scholar] [CrossRef]
  8. Resnick, S.I. Heavy-Tail Phenomena: Probability and Statistical Modeling; Springer: New York, NY, USA, 2007. [Google Scholar]
  9. Bradley, B.O.; Taqqu, M.S. Financial Risk and Heavy Tails. In Handbook of Heavy Tailed Distributions in Finance; Rachev, S.T., Ed.; Elsevier: Amsterdam, The Netherlands, 2003; Volume 1, pp. 35–103. [Google Scholar]
  10. Fisher, R.A.; Tippett, L.H.C. Limiting forms of the frequency distribution of the largest or smallest member of sample. Math. Proc. Camb. Philos. Soc. 1928, 24, 180–190. [Google Scholar] [CrossRef]
  11. Pickands, J. Statistical inference using extreme order statistics. Ann. Stat. 1975, 3, 119–131. [Google Scholar]
  12. Al-Dhurafi, N.A.; Masseran, N.; Zamzuri, Z.H.; Razali, A.M. Modeling unhealthy air pollution index using a peaks-over-threshold method. Environ. Eng. Sci. 2018, 35, 101–110. [Google Scholar] [CrossRef]
  13. Martins, L.D.; Wikuats, C.F.H.; Capucim, M.N.; de Almeida, D.S.; da Costa, S.C.; Albuquerque, T.; Carvalho, V.S.B.; de Freitas, E.D.; de Fátima Andrade, M.; Martins, J.A. Extreme value analysis of air pollution data and their comparison between two large urban regions of South America. Weather Clim. Extrem. 2017, 18, 44–54. [Google Scholar] [CrossRef]
  14. Eastoe, E.M.; Tawn, J.A. Modelling non-stationary extremes with application to surface level ozone. J. R. Stat. Soc. Ser. C Appl. Stat. 2009, 58, 25–45. [Google Scholar] [CrossRef]
  15. Gyarmati-Szabó, J.; Bogachev, L.V.; Chen, H. Nonstationary POT modelling of air pollution concentrations: Statistical analysis of the traffic and meteorological impact. Environmetrics 2017, 28, e2449. [Google Scholar] [CrossRef] [Green Version]
  16. Hazarika, S.; Borah, P.; Prakash, A. The assessment of return probability of maximum ozone concentrations in an urban environment of Delhi: A Generalized Extreme Value analysis approach. Atmos. Environ. 2019, 202, 53–63. [Google Scholar] [CrossRef]
  17. Kütchenhoff, H.; Thamerus, M. Extreme value analysis of Munich air pollution data. Environ. Ecol. Stat. 1996, 3, 127–141. [Google Scholar] [CrossRef] [Green Version]
  18. Masseran, N.; Razali, A.M.; Ibrahim, K.; Latif, M.T. Modeling air quality in main cities of Peninsular Malaysia by using a generalized Pareto model. Environ. Monit. Assess. 2016, 188, 65. [Google Scholar] [CrossRef] [PubMed]
  19. Reyes, H.J.; Vaquera, H.; Villaseñor, J.A. Estimation of trends in high urban ozone levels using the quantiles of (GEV). Environmetrics 2010, 21, 470–481. [Google Scholar] [CrossRef]
  20. Su, F.-C.; Jia, C.; Batterman, B. Extreme value analyses of VOC exposures and risks: A comparison of RIOPA and NHANES datasets. Atmos. Environ. 2012, 62, 97–106. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Battista, G.; Pagliaroli, T.; Mauri, M.; Basilicata, C.; Vollaro, R.D.L. Assessment of the Air Pollution Level in the City of Rome (Italy). Sustainability 2016, 8, 838. [Google Scholar] [CrossRef] [Green Version]
  22. Gumbel, E.J. Statistics of Extremes; Colombia University Press: New York, NY, USA, 1958. [Google Scholar]
  23. Huang, W.; Xu, S.; Nnaji, S. Evaluation of GEV model for frequency analysis of annual maximum water levels in the coast of United States. Ocean Eng. 2008, 35, 1132–1147. [Google Scholar] [CrossRef]
  24. Nguyen, T.-H.; Outayek, S.E.; Lim, S.H.; Nguyen, V.-T.-V. A systematic approach to selecting the best probability models for annual maximum rainfalls—A case study using data in Ontario (Canada). J. Hydrol. 2017, 553, 49–58. [Google Scholar] [CrossRef]
  25. Li, Z.; Li, C.; Xu, Z.; Zhou, X. Frequency analysis of precipitation extremes in Heihe River basin based on generalized Pareto distribution. Stoch. Environ. Res. Risk Assess. 2014, 28, 1709–1721. [Google Scholar] [CrossRef]
  26. Xia, J.; Du, H.; Zeng, S.; She, D.; Zhang, Y.; Yan, Z.; Ye, Y. Temporal and spatial variations and statistical models of extreme runoff in Huaihe River Basin during 1956–2010. J. Geogr. Sci. 2012, 22, 1045–1060. [Google Scholar] [CrossRef]
  27. Vrban, S.; Wang, Y.; McBean, E.A.; Binns, A.; Gharabaghi, B. Evaluation of stormwater infrastructure design storms de-veloped using partial duration and annual maximum series models. J. Hydrol. Eng. 2018, 23, 04018051. [Google Scholar] [CrossRef]
  28. Madsen, H.; Rasmussen, P.F.; Rosbjerg, D. Comparison of annual maximum series and partial duration series methods for modeling extreme hydrologic events: 1. At-site modeling. Water Resour. Res. 1997, 33, 747–757. [Google Scholar] [CrossRef]
  29. Davison, A.C. Modelling Excesses over High Thresholds, with an Application. In Statistical Extremes and Applications; de Oliveira, J.T., Ed.; Springer: Dordrecht, The Netherlands, 1984; Volume 131, pp. 461–482. [Google Scholar]
  30. Leadbetter, M.R. On a basis for ‘Peaks over Threshold’ modeling. Stat. Probab. Lett. 1991, 12, 357–362. [Google Scholar] [CrossRef]
  31. Lang, M.; Ouarda, T.B.M.J.; Bobee, B. Towards operational guidelines for over-threshold modeling. J. Hydrol. 1999, 225, 103–117. [Google Scholar] [CrossRef]
  32. Brabson, B.B.; Palutikof, J.P. Tests of the generalized Pareto distribution for predicting extreme wind speeds. J. Appl. Meteorol. Climatol. 2000, 39, 1627–1640. [Google Scholar] [CrossRef] [Green Version]
  33. Cebrián, A.C.; Denuit, M.; Lambert, P. Generalized Pareto fit to the society of actuaries’ large claims database. N. Am. Actuar. J. 2003, 7, 18–36. [Google Scholar] [CrossRef]
  34. Khaliq, M.N.; Ouarda, T.B.M.J.; Ondo, J.-C.; Gachon, P.; Bobee, B. Frequency analysis of sequence of dependent and/or non-stationary hydro-meteorological observations: A review. J. Hydrol. 2006, 329, 534–552. [Google Scholar] [CrossRef]
  35. Palutikof, J.P.; Brabson, B.B.; Lister, D.H.; Adcock, S.T. A review of methods to calculate extreme wind speeds. Meteorol. Appl. 1999, 6, 119–132. [Google Scholar] [CrossRef]
  36. Masseran, N.; Safari, M.A.M. Risk assessment of extreme air pollution based on partial duration series: IDF approach. Stoch. Environ. Res. Risk Assess. 2020, 34, 545–559. [Google Scholar] [CrossRef]
  37. Port Klang ICM Webpage. Available online: https://luas.gov.my/icm/knowledge_center/bckground_demogrphy.htm (accessed on 5 June 2021).
  38. Masseran, N. Power-law behaviors of the duration size of unhealthy air pollution events. Stoch. Environ. Res. Risk Assess. 2021, in press. [Google Scholar] [CrossRef]
  39. Masseran, N.; Safari, M.A.M. Modeling the transition behaviors of PM10 pollution index. Environ. Monit. Assess. 2020, 192, 441. [Google Scholar] [CrossRef]
  40. Google. 2019. Available online: https://maps.googleapis.com/maps/api/geocode/json?address=Klang%2CSelangor&key=xxx (accessed on 25 March 2019).
  41. Afroz, R.; Hassan, M.N.; Ibrahim, N.A. Review of air pollution and health impacts in Malaysia. Environ. Res. 2003, 92, 71–77. [Google Scholar] [CrossRef]
  42. Awang, M.B.; Jaafar, A.B.; Abdullah, A.M.; Ismail, M.B.; Hassan, M.N.; Abdullah, R.; Johan, S.; Noor, H. Air quality in Malaysia: Impacts, management issues and future challenges. Respirology 2000, 5, 183–196. [Google Scholar] [CrossRef] [PubMed]
  43. Azid, A.; Juahir, H.; Toriman, M.E.; Endut, A.; Kamarudin, M.K.A.; Abdul Rahman, M.N.; Che Hasnam, C.N.; Mohd Saudi, A.S.; Yunus, K. Source Apportionment of Air Pollution: A Case Study In Malaysia. J. Teknol. 2015, 72, 83–88. [Google Scholar] [CrossRef] [Green Version]
  44. Dominick, D.; Juahir, H.; Latif, M.T.; Zain, S.M.; Aris, A.Z. Spatial assessment of air quality patterns in Malaysia using multivariate analysis. Atmos. Environ. 2012, 60, 172–181. [Google Scholar] [CrossRef]
  45. Latif, M.T.; Othman, M.; Idris, N.; Juneng, L.; Abdullah, A.M.; Hamzah, W.P.; Khan, M.F.; Sulaiman, N.M.N.; Jewaratnam, J.; Aghamohammadi, N.; et al. Impact of regional haze towards air quality in Malaysia: A review. Atmos. Environ. 2018, 177, 28–44. [Google Scholar] [CrossRef]
  46. Jamhari, A.A.; Sahani, M.; Latif, M.T.; Chan, K.M.; Tan, H.S.; Khan, M.F.; Tahir, N.M. Concentration and source identification of polycyclic aromatic hydrocarbons (PAHs) in PM10 of urban, industrial and semi-urban areas in Malaysia. Atmos. Environ. 2014, 86, 16–27. [Google Scholar] [CrossRef]
  47. Department of Environment. A Guide to Air Pollutant Index in Malaysia (API); Ministry of Science, Technology and the Environment: Kuala Lumpur, Malaysia, 1997; Available online: https://aqicn.org/images/aqi-scales/malaysia-api-guide.pdf (accessed on 4 June 2020).
  48. Al-Dhurafi, N.A.; Masseran, N.; Zamzuri, Z.H.; Safari, M.A.M. Modeling the Air Pollution Index based on its structure and descriptive status. Air Qual. Atmos. Health 2018, 11, 171–179. [Google Scholar] [CrossRef]
  49. AL-Dhurafi, N.A.; Masseran, N.; Zamzuri, Z.H. Compositional time series analysis for air pollution index data. Stoch. Environ. Res. Risk Assess. 2018, 32, 2903–2911. [Google Scholar] [CrossRef]
  50. Department of Environment Malaysia. Available online: https://www.doe.gov.my/portalv1/en/ (accessed on 23 August 2020).
  51. Reiss, R.-D.; Thomas, M. Statistical Analysis of Extreme Values: With Application to Insurance, Finance, Hydrology and Other Fields; Die Deutsche Bibliothek: Berlin, Germany, 2007. [Google Scholar]
  52. Masseran, N.; Mohd Safari, M.A. Intensity–duration–frequency approach for risk assessment of air pollution events. J. Environ. Manag. 2020, 264, 110429. [Google Scholar] [CrossRef]
  53. Coles, S. An Introduction to Statistical Modeling of Extreme Values; Springer: London, UK, 2001. [Google Scholar]
  54. Hosking, J.R.M.; Wallis, J.R.; Wood, E.F. Estimation of the generalized extreme-value distribution by the method of probability weighted moments. Technometrics 1985, 27, 251–261. [Google Scholar] [CrossRef]
  55. Hosking, J.R.M. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. Ser. B Methodol. 1990, 52, 105–124. [Google Scholar] [CrossRef]
  56. Wang, Q.J. Direct sample estimators of L moments. Water Resour. Res. 1996, 32, 3617–3619. [Google Scholar] [CrossRef]
  57. Martins, E.S.; Stedinger, J.R. Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resour. Res. 2000, 36, 737–744. [Google Scholar] [CrossRef]
  58. Katz, R.W.; Parlange, M.B.; Naveau, P. Statistics of extremes in hydrology. Adv. Water Resour. 2002, 25, 1287–1304. [Google Scholar] [CrossRef] [Green Version]
  59. Ribatet, M. POT: Modelling peak over a threshold. R News 2007, 7, 33–36. [Google Scholar]
  60. Southworth, H.; Heffernan, J.E. texmex: Statistical Modelling of Extreme Values; R Package, Version 2.1; Microsoft R Application Network: Redmond, WA, USA, 2014. [Google Scholar]
  61. Masseran, N.; Hussain, S.I. Copula modelling on the dynamic dependence structure of multiple air pollutant variables. Mathematics 2020, 8, 1910. [Google Scholar] [CrossRef]
  62. Zhou, S.-M.; Deng, Q.-H.; Lui, W.-W. Extreme air pollution events: Modeling and prediction. J. Cent. South Univ. 2012, 19, 1668–1672. [Google Scholar] [CrossRef]
  63. Ding, Y.; Cheng, B.; Jiang, Z. A newly-discovered GPD-GEV relationship together with comparing their models of extreme precipitation in summer. Adv. Atmos. Sci. 2008, 25, 507. [Google Scholar] [CrossRef]
  64. Karim, F.; Hasan, M.; Marvanek, S. Evaluating annual maximum and partial duration series for estimating frequency of small magnitude floods. Water 2017, 9, 481. [Google Scholar] [CrossRef] [Green Version]
  65. Hosking, J.R.M. Algorithm as 215: Maximum-likelihood estimation of the parameters of the generalized extreme-value distribution. J. R. Stat. Soc. Ser. C Appl. Stat. 1985, 34, 301–310. [Google Scholar] [CrossRef]
  66. Gilleland, E.; Katz, R.W. extRemes 2.0: An Extreme Value Analysis Package in R. J. Stat. Softw. 2016, 72, 1–39. [Google Scholar] [CrossRef] [Green Version]
  67. Othman, J.; Sahani, M.; Mahmud, M.; Ahmad, M.K.S. Transboundary smoke haze pollution in Malaysia: Inpatient health impacts and economic valuation. Environ. Pollut. 2014, 189, 194–201. [Google Scholar] [CrossRef] [PubMed]
  68. Department of Environment (DOE). Chronology of Haze Episodes in Malaysia. Available online: https://www.doe.gov.my/portalv1/wp-content/uploads/2015/09/Chronology-of-Haze-Episodes-in-Malaysia.pdf (accessed on 4 June 2021).
Figure 1. (a) Map of Peninsular Malaysia (Klang is identified by the red dot); (b) map of Klang.
Figure 1. (a) Map of Peninsular Malaysia (Klang is identified by the red dot); (b) map of Klang.
Ijerph 18 06754 g001
Figure 2. The process of determining the API value.
Figure 2. The process of determining the API value.
Ijerph 18 06754 g002
Figure 3. Time series plot corresponding to unhealthy air pollution event threshold.
Figure 3. Time series plot corresponding to unhealthy air pollution event threshold.
Ijerph 18 06754 g003
Figure 4. MRL plot corresponding to threshold u = 100.
Figure 4. MRL plot corresponding to threshold u = 100.
Ijerph 18 06754 g004
Figure 5. Comparison of fitted models.
Figure 5. Comparison of fitted models.
Ijerph 18 06754 g005
Figure 6. PP plot for each fitted model.
Figure 6. PP plot for each fitted model.
Ijerph 18 06754 g006
Figure 7. Autocorrelation function of data based on different extreme-value approaches.
Figure 7. Autocorrelation function of data based on different extreme-value approaches.
Ijerph 18 06754 g007
Figure 8. Comparison of return level plot of fitted models.
Figure 8. Comparison of return level plot of fitted models.
Ijerph 18 06754 g008
Table 1. Air quality statuses corresponding to API values.
Table 1. Air quality statuses corresponding to API values.
Pollution IndexStatusHealth Effect
0–50GoodLow pollution with no ill effects on health
51–100ModerateModerate pollution that poses no ill effects on health
101–200UnhealthyWorsens the health conditions of high-risk individuals with heart and lung complications
201–300Very unhealthyWorsens the health conditions and reduces the tolerance to physical exercise of individuals with heart and lung complications; affects public health
>300HazardousHazardous to high-risk individuals and public health in general
Table 2. Descriptive statistics of API data in the Klang area.
Table 2. Descriptive statistics of API data in the Klang area.
LocationMeanStandard DeviationMin. ValueMax. ValueSkewnessKurtosis
Klang55.22220.97005434.53765.133
Table 3. Results of parameter estimation for each fitted extreme-value model.
Table 3. Results of parameter estimation for each fitted extreme-value model.
ModelParameter EstimatedStandard Error
GEV based on BMLocation = 143.80513.293
Scale = 46.40417.071
Shape = 0.4150.127
GPD based on POTShape = 0.29330.017
Scale = 23.74010.511
GEV approximation based on POT-BMLocation = 110.8631.234
Scale = 12.9111.376
Shape = 0.7880.118
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Masseran, N.; Safari, M.A.M. Mixed POT-BM Approach for Modeling Unhealthy Air Pollution Events. Int. J. Environ. Res. Public Health 2021, 18, 6754. https://doi.org/10.3390/ijerph18136754

AMA Style

Masseran N, Safari MAM. Mixed POT-BM Approach for Modeling Unhealthy Air Pollution Events. International Journal of Environmental Research and Public Health. 2021; 18(13):6754. https://doi.org/10.3390/ijerph18136754

Chicago/Turabian Style

Masseran, Nurulkamal, and Muhammad Aslam Mohd Safari. 2021. "Mixed POT-BM Approach for Modeling Unhealthy Air Pollution Events" International Journal of Environmental Research and Public Health 18, no. 13: 6754. https://doi.org/10.3390/ijerph18136754

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop