Applying Machine Learning for Threshold Selection in Drought Early Warning System

Luo, Hui; Bhardwaj, Jessica; Choy, Suelynn; Kuleshov, Yuriy

doi:10.3390/cli10070097

Open AccessArticle

Applying Machine Learning for Threshold Selection in Drought Early Warning System

¹

SPACE Research Centre, School of Science, Royal Melbourne Institute of Technology (RMIT) University, Melbourne 3000, Australia

²

Bureau of Meteorology, Docklands 3008, Australia

^*

Author to whom correspondence should be addressed.

Climate 2022, 10(7), 97; https://doi.org/10.3390/cli10070097

Submission received: 26 May 2022 / Revised: 24 June 2022 / Accepted: 27 June 2022 / Published: 30 June 2022

(This article belongs to the Special Issue Climate Change, Sustainable Development and Disaster Risks)

Download

Browse Figures

Versions Notes

Abstract

:

This study investigates the relationship between the Normalized Difference Vegetation Index (NDVI) and meteorological drought category to identify NDVI thresholds that correspond to varying drought categories. The gridded evaluation was performed across a 34-year period from 1982 to 2016 on a monthly time scale for Grassland and Temperate regions in Australia. To label the drought category for each grid inside the climate zone, we use the Australian Gridded Climate Dataset (AGCD) across a 120-year period from 1900 to 2020 on a monthly scale and calculate percentiles corresponding to drought categories. The drought category classification model takes NDVI data as the input and outputs of drought categories. Then, we propose a threshold selection algorithm to distinguish the NDVI threshold to indicate the boundary between two adjacent drought categories. The performance of the drought category classification model is evaluated using the accuracy metric, and visual interpretation is performed using the heat map. The drought classification model provides a concept to evaluate drought severity, as well as the relationship between NDVI data and drought severity. The results of this study demonstrate the potential application of this concept toward early drought warning systems.

Keywords:

drought classification; machine learning; normalized difference vegetation index; Australian gridded climate dataset

1. Introduction

Drought is a kind of costly natural hazard which has widespread impact on societies, economies, and the environment [1,2]. In most cases, the effect of drought can accumulate and linger for many years after the cessation of the event [3]. It is imperative to build an early drought warning system to support local stakeholders to assist with effective decision making for drought-vulnerable communities [4].

In general terms, rainfall deficiency for a long dry period leads to drought. However, there is no uniform definition of drought. Various types of droughts include meteorological (when dry conditions within an area are influenced by climactic factors), agricultural (when dry conditions have an impact on agricultural productivity and crops), etc. To measure drought severity, there are some indices which are proposed based on meteorological, hydrological, agricultural and socioeconomic information [5,6,7,8]. Aitkenhead et al. [8] develop a region-specific drought risk index based on the integration of drought vulnerability, exposure, and hazard indices, in order to evaluate drought in the Northern Murray–Darling Basin. Vicente-Serrano et al. [9] propose a climatic drought index, namely, the standardized precipitation evapotranspiration index (SPEI), by exploring precipitation and temperature data. Sayari et al. [10] use three drought indices, SPI, Precipitation Index Percent of Normal (PNPI), and Agricultural Rainfall Index (ARI), which monitor drought intensity and duration in the Kashafrood basin (northeast Iran). One of the mostly commonly used indices to assess the meteorological drought is Standardized Precipitation index (SPI) [11], which classifies precipitation as standard deviations from the long-term mean. The SPI has been considered as a universal drought index in numerous hydrological and meteorological services [12]. From the perspective of agriculture application, the Normalized Difference Vegetation Index (NDVI) [13] provides an indication of crop health. Chua et al. [14] perform a case study to illustrate that the Vegetation Health Index (VHI), outgoing longwave radiation (OLR) anomaly, and the SPI can capture the spatial and temporal aspects of the severe 2015–2016 El Niño-induced drought in Papua New Guinea accurately. Lotsch et al. [15] adopt global precipitation and satellite NDVI data in time series to study both spatial and temporal variability between terrestrial ecosystems and precipitation regimes.

In this study, we examine the relationship between the NDVI and drought category. NDVI is related to the biomass and greenness of vegetation, which serves as an indicator to understand vegetation health at a both uniform and global scale [16,17]. Our problem can be formulated as follows: Given the NDVI data, and the Australian Gridded Climate Dataset (AGCD), return the optimal NDVI threshold values to represent the boundary of different drought categories, namely, extreme drought, severe drought, moderate drought, mild drought, and no drought. Specifically, we aim to find the NDVI threshold value to provide justification on why a larger (or a smaller) NDVI is related to a different drought category and whether or not the NDVI varies across different climate regions.

A drought classification model can be built based on the available datasets to further explore the influenced factors on the drought category classification [6,18,19]. Machine learning techniques are often adopted to perform drought classification. In particular, Santos et al. [20] consider analysing short-, medium-, and long-term droughts in Brazil from 1998 to 2015 based on the SPI data, where four types of drought events were studied for drought categories. An et al. [21] propose a deep convolutional neural network to classify maize drought stress, where three categories are considered. Lima et al. [22] develop a classification system to evaluate four different drought indices: SPI, Percent of Normal Precipitation (PNP), Deciles Method (DM), and Rainfall Anomaly Index (RAI). Felsche et al. [23] use a list of 30 atmospheric and soil variables and apply the artificial neural network to classify drought or no drought in two European domains. Then, they use the Shapely values to provide an explanation of the classification model and calculate the contribution percentage of each dataset to the classification model. Quang Tri et al. [24] establish drought classification maps in the Ba River basin, Vietnam, based on three meteorological drought indices, namely, the SPI, the Soil and Water Assessment Tool model, and the hydrological drought index. Vidyarthi et al. [25] aim to extract knowledge from the artificial neural network (ANN) drought classification model to improve the comprehensibility of the black-box classification model. Moreira et al. [26] classify drought severity using the loglinear model based on the SPI data from 14 rainfall stations in 12 months from September 1932 to June 2006. Rani et al. [27] devise a classifier model to know about the severity of drought by predicting the climate conditions with a focus on the drought-vulnerable Indian state of Andhra Pradesh. Their model adopts a artificial neural network which is coupled with a feed forward neural network to predict the rainfall in the future and fuzzy c-means for the purpose of partitioning the forecasted data into low, medium, and high rainfall. Danandeh Mehr [28] adopt the gradient-boosting decision tree to classify the drought severity into three categories of wet, normal, and dry events. Danandeh Mehr [7] present a fuzzy random forest model by incorporating the Standardized Precipitation Evapotranspiration Index (SPEI) from 1961 to 2015 around the Central Antalya Basin, Turkey. Won et al. [29] study the impact of two drought indices, namely, the SPI and the Evaporative Demand Drought Index (EDDI), on future droughts in South Korea. Paulo et al. [30] propose a Markov chain approach to characterize the stochasticity of droughts which aims to predict the transition from one class of severity to another up to three months ahead. Chiang et al. [31] perform a comparative study by establishing four models such as support vector machine and artificial neural networks to forecast the reservoir drought status in the next few days. To predict the drought, they take four kinds of features as the input, such as reservoir storage capacity and inflows. Malik et al. [32] propose several heuristic approaches to forecast meteorological droughts using the Effective Drought Index (EDI) in Uttarakhand State, India.

Drought classification is an active and heavily researched area of ML application. When specifying drought thresholds, prior work [4] often relies on the human-involved decision rules by using the static thresholds. Most importantly, this requires rich domain knowledge from experts and has potential to be subjective at times. The objectives of this work are: (1) uncovering the influence of the NDVI dataset on meteorological drought category (derived from the AGCD dataset); (2) studying the application of our drought classification model in two areas with different climate zones in Australia, namely, Temperate and Grassland region; (3) pinpointing the “optimal” thresholds of NDVI leading to different drought categories.

The paper is organised as follows. Section 2 introduces the study area, data source, and methodology in the study. Section 3 describes the experimental setup and experimental results, while Section 4 discusses the results of drought classification findings, as well as their geospatial and temporal impact. Section 5 summarises the major observations and findings.

2. Materials and Methods

2.1. The Study Area

To study the relationship between NDVI data and drought category, we selected two areas with different climate zones in Australia, namely, the Temperate Region of Australia in Victoria and the Grassland Region of Australia in NSW. In the temperate region in Victoria, the annual rainfall ranges from 300 to 800 mm. There is distinct seasonality throughout the year, where summers tend to be mild–warm, with rainfall averages usually below 100 mm, and winters tend to be markedly wet, with rainfall ranges between 100 and 500 mm. The most common land-use type in this region tends to be grazing modified pastures, dryland cropping, and production forestry. In the grassland region in NSW, annual rainfall ranges from 50 to 400 mm. Summers tend to be hot and dry with rainfall averages between 25 and 50 mm. Winter rainfall is also comparatively drier, ranging from 25 to 100 mm. The most common land use type in this region tends to be grazing native vegetation and grazing modified pastures. (https://www.awe.gov.au/abares/aclump/land-use) (accessed on 10 May 2022). The main orographic feature in Australia is the Great Dividing Range that borders the country’s eastern seaboard and is comprised of a complex range of hills, mountains, and plateaus that range in altitude from 300 to 2100 m. However, in our temperate and grassland regions, the topography is relatively flat with maximum elevations below 250 m. (http://www.bom.gov.au/cgi-bin/climate/change/averagemaps.cgi?map=rain&season=0608) (accessed on 22 May 2022). The latitude and longitude ranges of these two climate zones are shown in Table 1, and the geographical visualizations are depicted in Figure 1, where the two study regions are highlighted in red and green, respectively.

2.2. Data Source

We adopt two datasets: AGCD and NDVI. The dataset description is presented in Table 2. Next, we will describe the detail of these two datasets.

2.2.1. Australian Gridded Climate Data (AGCD)

The AGCD is the Australian Bureau of Meteorology’s operational dataset for monthly gridded rainfall analysis [33] (ACGD can be accessed from https://dapds00.nci.org.au/thredds/catalog/zv2/agcd/v2/precip/total/catalog.html) (accessed on 12 May 2022). It produces precipitation estimates at a very fine spatial resolution by applying statistical interpolation methods to rain gauge data.

2.2.2. Normalised Difference Vegetation Index (NDVI)

The NDVI is a widely used remote-sensing index (https://www.usgs.gov/centers/eros/science/usgs-eros-archive-avhrr-normalized-difference-vegetation-index-ndvi-composites) (accessed on 4 May 2022) that calculates the ratio of red band reflectance (RED—around 640 nm) and near-infrared band (NIR—around 830 nm) (NDVI data were obtained from https://www.ncei.noaa.gov/data/avhrr-land-normalized-difference-vegetation-index/access/) (accessed on 24 May 2022).

2.3. Methodology

Next, we introduce our solution to obtain a drought classification model using real NDVI and AGCD datasets.

2.3.1. Method Overview

In this section, we describe the proposed three-phase approach to solve the problem —data wrangling, modelling training, and threshold selection in Section 2.3.2, Section 2.3.3, and Section 2.3.4, respectively. The first phase of data wrangling aims to process the dataset to obtain the training dataset for the second phase. In the second phase of modelling training, we build a drought category classifier model using the training dataset. Thirdly, we propose a threshold selection algorithm to obtain four NDVI threshold values,

τ_{1}

,

τ_{2}

,

τ_{3}

, and

τ_{4}

, which indicate the boundaries between five specific drought categories: extreme drought, severe drought, moderate drought, mild drought, and no drought. The framework of our solution is illustrated in Figure 2.

2.3.2. Data Wrangling

In this stage, we are given the two input datasets, “NDVI” and “AGCD”, which are further pre-processed by data wrangling to obtain two output datasets “monthly averaged NDVI” and “labelled drought category”.

To obtain the first output dataset “monthly averaged NDVI”, we directly calculated the average NDVI value for each month since the raw NDVI value is on a daily basis. To obtain the second output dataset “labelled drought category”, we first defined several levels of drought category to denote varying drought severity. Prior to drought classification, it is necessary to define a quantitative metric to measure different levels of drought conditions. Similar to the literature [13,34,35], we divided the severity of drought into five categories: extreme drought, severe drought, moderate drought, mild drought, and no drought. Then, we used the AGCD percentile value to label the drought category. Specifically, we defined the following rules in Table 3 when we labelled a drought category:

The percentile indicates the ranking of a particular value compared with all values in an area (http://www.bom.gov.au/jsp/ncc/climate_averages/rainfall-percentiles/index.jsp#how) (accessed on 10 May 2022). For instance, if there are 10 available rainfall values recorded for an area, we first sort them in an ascending order of the rainfall values. Among these values, the k-th smallest values are considered as the k-th percentile, where k is in a range of 0 to 100.

In our case, the AGCD percentile information was calculated based on the AGCD datasets from 1900 to 2020. We obtained the list of AGCD values for each month over 120 years and calculated the 5th, 10th, 20th, and 30th percentiles. The results in two study areas for 12 months are shown in Table 4 and Table 5, respectively. For example, if an AGCD value is 10 in January of the Temperate Region of Australia, the corresponding drought category can be labelled as “severe drought”, since 10 is between the percentile 5 value of 6.97 and the percentile 10 value of 12.15 from Table 4.

Additionally, we reported the statistical distribution of NDVI values for five drought categories in January 2015 as an example for the Temperate Region of Australia in Figure 3. Figure 3 only serves the purpose of showing the real statistics of the number of data points with different NDVI values. We found that large overlap exists regarding the NDVI values for the five drought categories. To alleviate the issue, we defined the “safe interval” concept for each drought category. In each safe interval, say [a, b], for a particular drought category, we need to know two values: the lower bound value “a” and the upper bound value “b”. In such a way, we selected the grids with NDVI dropping inside the safe interval as the training dataset. To obtain the two bound values a and b, we first calculated the real maximal value for each drought category. Then, we took the average of the two maximal values for two adjacent drought categories as the upper bound b. Basically, the lower bound value “a” of a particular drought category is directly equal to the upper bound value “b” of the adjacent drought category. For example, if the maximal NDVI values of extreme drought, severe drought, moderate drought, and mild drought are 0.1, 0.2, 0.3, and 0.4, respectively, it is clear to find that some unreasonable data points exist for the moderate drought and mild drought. Therefore, we can modify the upper bound of the safe interval for the moderate drought as 0.35, which is equal to 0.3 plus 0.4 and then divided by 2. Then, we cut off the data points dropping inside the moderate drought with the NDVI values larger than 0.35.

2.3.3. Model Training

After obtaining the two datasets’ NDVIs and “labelled drought category” from the first stage, we were ready to fit our datasets into the model. We adopted the decision tree structure to build the drought category classification model, namely, by obtaining the drought category from five types: extreme drought, severe drought, moderate drought, mild drought, and no drought when an NDVI value is given. A decision tree divides a whole dataset into smaller subsets continuously to form the tree structure. Finally, a tree is built with the decision nodes (i.e., conditions in the red rectangle) and leaf nodes (i.e., different drought categories in the green rectangle). A decision node can generate two or more branches, which represents values for a corresponding attribute examined. The topmost decision node in a tree is called a root node. A leaf node indicates a decision based on the values selection in the tree branches leading to a particular drought category.

2.3.4. Threshold Selection

Based on the pre-trained drought classification model from the second stage, we further analysed the model to pinpoint the NDVI threshold values. To resolve the problem, we proposed a threshold selection algorithm. The intuition is that we could enumerate a list of NDVI values to go through the drought classification model; then, the corresponding drought categories would be obtained. We distinguished the NDVI threshold boundaries that make the obtained drought category change, say, from mild drought to moderate drought, when we have a larger NDVI value.

The pseudo-code of the threshold selection algorithm is shown in Algorithm A1 (see Appendix A). At line 1, we initialized an NDVI value x as −1 since the lower bound of the NDVI values is −1. Then, at line 2, we initialized three lists, list_1, list_2, list_3, and list_4, to represent the set of NDVIs which drop into extreme drought, severe drought, moderate drought, and mild drought, respectively. We omitted a list to save the NDVI values which lead to no drought as the maximal value of list_4 is equal to the minimal value of the list for no drought. At line 3, we started from −1.0 to 1 and obtained the drought category c using the pre-trained drought classification model in the second stage at line 4. From lines 5–12, we performed the if-else check and observed which x leads to a particular drought category. We set the incremental step of x to 0.001 at line 11. Finally, from lines 14–17, we directly obtained the maximal value of each of the three lists, namely,

τ_{1}

,

τ_{2}

,

τ_{3}

, and

τ_{4}

, to denote the NDVI threshold values for drought categories from extreme drought to no drought.

3. Results

In this section, we perform the experimental study to investigate the performance of our proposed framework on the real datasets. In our experimental study, we aim to investigate the following questions:

Q1. How does the drought classification model perform in terms of accuracy metric?
Q2. How does the threshold selection algorithm reflect the NDVI threshold values in the two study areas?

3.1. Performance Measurement

We perform effectiveness evaluations for our proposed method. Specifically, we report the accuracy metric, which is defined as the ratio of the number of right classifications over the total number of testing samples.

3.2. Experimental Results

3.2.1. Model performance (Q1)

In our experiment, the data from the years 1982–2014 were used as training data, and the data from the years 2015–2016 were used for the testing purpose. We report the accuracy of the drought classification model in Table 6 and Table 7 for two study areas, respectively. When we compute the accuracy metric, we distinguish the total number of grids with drought as the “total number of testing examples” and the total number of grids with correct classification as the “number of right classification”.

In addition, we plot heatmaps to show the geolocated grids for the Temperate Region of Australia in 2015 for 12 months in Figure 4. More heat maps related to other areas can be found in Appendix B. Specifically, for the Temperate Region of Australia, the y-axis starts from −39.30° to −36.75° (latitude) with an incremental step of 0.05°, while the x-axis starts from 141.00° to 150.10° (longitude) with an incremental step of 0.05° as well. For the Grassland Region of Australia, the y-axis starts from −35.85° to −33.66° (latitude) with an incremental step of 0.05 degree similarly, while the x-axis starts from 141.00° to 145.56° (longitude) with an incremental step of 0.05 degree as well.

In the heatmaps, there are two colours to represent the two types:

Cream, which denotes the grids which are not classified correctly;
Red, which indicate the grids which are classified correctly.

3.2.2. NDVI threshold values (Q2)

We report the NDVI range for different drought categories for 12 months in 2015 for two climate zones in Table 8 and Table 9, respectively. Clearly, we can find that, in January in the Temperate Region of Victoria, Australia, the NDVI threshold values

τ_{1}

,

τ_{2}

,

τ_{3}

, and

τ_{4}

are 0.00, 0.03, 0.06, and 0.22, respectively.

4. Discussion

4.1. Drought Classification Performance Impacted by the Data Noise

As shown in Table 6 and Table 7, the accuracies of two study areas are rather moderate since a high percentage of the data are not classified correctly to generate the training data during the data wrangling phase. For the Temperate Region of Australia, the highest accuracy of drought classification model remains 28.73% in May 2015. For the Grassland Region of Australia, the highest accuracy of the drought classification model remains at 71.69% in October 2016. However, compared to the study by Felsche et al. [23], the drought classifier precision in their paper remains at around 22% for Lisbon and 18% for the Munich case. Peters et al. [36] use the z-scores of the NDVI distribution to estimate the probability of vegetation condition deviation from normal status, which is computed by the weekly NDVI values. However, this method is a good indicator for short-term weather conditions. Our method has the advantage of making drought classification for long-term drought severity status since we have learnt the drought information for 34 years. In addition, based on the drought classification model, we can analyse the NDVI threshold values easily and quickly.

Based on the results, we find that a major challenge of a drought classifier model is to avoid the chances of inconsistent NDVI with drought categories and no drought situation [27]. Here, inconsistent NDVI data with drought categories are referred to as data noise (i.e., outlies in the model training), which also brings the disadvantage of our solution. Naturally, the greener the vegetation is, the less serious the drought will be. However, there may exist some inconsistent/noisy data points which fluctuate the drought classifier model. As shown in Figure 3 for the statistical distribution of NDVI values for five drought categories in January 2015 of the Temperate Region of Australia Victoria, we can easily find the distribution of NDVI values of all grids in the study area are similar. There exists huge overlap with regard to the NDVI values for five drought categories. For example, the NDVI value mapping with moderate drought may be smaller than the one mapping with the extreme drought. This issue should be pinpointed to improve the accuracy of classification of drought. Therefore, it would be interesting to study how to reduce such cases or find a more accurate way to obtain the actual drought associated with the NDVI values.

4.2. Drought Classfication Model Limited by Inputs

Our drought classifier model is trained based on only two kinds of data, namely, the NDVI data and the AGCD data. On one hand, the drought category is purely labelled by comparing the real AGCD data on a monthly basis with the percentile information. In such a way, there exist a lot of data points with higher NDVI values but with even more serious drought condition. On the other hand, the drought condition may be affected due to multiple influence factors, such as temperatures and soil moisture, while we only use the NDVI data to be plugged into the classifier model. However, we recognize that only incorporating NDVI data limits the performance of the drought classification model. Mishra et al. [37] provide a range of fundamental concepts of drought indices, such as the SPI, the Palmer Drought Severity Index (PDSI) and the Crop Moisture Index (CMI). Mo et al. [38] propose a drought classification model based on the mean index of SPI, total soil moisture (SM) percentiles, and the standardized runoff index (SRI). Sen et al. [39] present a novel Actual Precipitation Index (API) for drought classification to reflect the real situation based on original hydrometeorology records. Hao et al. [40] explore a drought classification approach for a multivariate drought index by combining the SPI, the Standardized Soil Moisture Index (SSI), and the Standardized Runoff Index (SRI).

4.3. Geospatial and Temporal Impact

From the accuracy of the drought classification model report in Table 6 and Table 7, we find that the studied Grassland Region of NSW, Australia, performs relatively better than the Temperate Region of Victoria, Australia, in most cases. In terms of the geospatial region size, the Temperate Region is larger than Grassland Region from Table 1. However, both areas are selected manually to represent two different climate zones. As shown in Table 8 and Table 9, the NDVI boundaries in the Grassland Region of Australia are larger than those in the Temperate Region of Australia when a drought occurs due to the climate condition. As for the temporal periods, the classification performance in 2016 is superior than the one in 2015. The NDVI ranges vary in different months for the two study areas in Table 8 and Table 9, respectively. For example, the NDVI values are relatively larger for the extreme drought in March 2015 of the Grassland Region, Australia.

5. Conclusions

In this study, we investigate the relationship between NDVI values and the drought categories across a 34-year period from 1982 to 2016 on a monthly time scale in a gridded manner for two areas with different climate zones: the Temperate Region of Victoria, Australia, and the Grassland Region of NSW, Australia. Then, we propose a threshold selection algorithm to explore the NDVI threshold values leading to different drought categories. It was found that some important NDVI boundary values are analyzed to distinguish different drought severity categories. The results are promising as they indicate that NDVI thresholds are not fixed and vary across climate regions and drought category. If incorporated into a early drought warning system, the accuracy of this algorithm would be a concern, but nonetheless, this study unpacks a promising initial proof of concept that illustrates the value of including region specific vegetation health datasets. For the wider research community, our model provides an idea of drought severity and an important basis for understanding the relationship between NDVI data and the drought severity in the future drought classification models, and sheds light on the investigation of a single input for 12 months on the drought severity category. However, our method still requires further improvement on dividing more stable drought severity categories and increasing the accuracy of the classification model.

Author Contributions

Conceptualization, H.L., J.B., S.C. and Y.K.; methodology, H.L. and J.B.; software, H.L. and J.B.; validation, H.L.; formal analysis, H.L.; investigation, H.L. and J.B.; resources, S.C.; data curation, H.L. and J.B.; writing—original draft preparation, H.L.; writing—review and editing, H.L., J.B., S.C. and Y.K.; visualization, H.L.; supervision, S.C. and Y.K.; project administration, S.C. and Y.K.; funding acquisition, S.C. and Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

AGCD data sourced from the Bureau of Meteorology (BoM). NDVI data provided by National Centres for Environmental Information (NCEI).

Acknowledgments

We would like to acknowledge the BoM and the NCEI for making the data available for research.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

We put the pseudo-code of the threshold selection algorithm as below.

Algorithm A1: Threshold Selection Algorithm

Input: a pre-trained drought classification model
Output: four thresholds

τ_{1}

,

τ_{2}

,

τ_{3}

, and

τ_{4}

x = −1
list_1, list_2, list_3, list_4 = []
while −1 <= x and x <= 1 do
obtain the drought category c given x using the pre-trained drought classification model
If c == ‘extreme’:
list_1.append(x)
elif c == ‘severe’:
list_2.append(x)
elif c == ‘moderate’:
list_3.append(x)
elif c == ‘mild:
list_4.append(x)
x + = 0.001
$τ_{1}$ = max(list_1)
$τ_{2}$ = max(list_2)
$τ_{3}$ = max(list_3)
$τ_{4}$ = max(list_4)

Appendix B

We plot more heatmaps to show the geolocated grids for Temperate Region and Grassland Region of Australia in 2015 and 2016 for 12 months in Figure A1, Figure A2 and Figure A3, respectively.

Figure A1. Heatmaps of the geolocated grids for Temperate Region of Australia in 2016 for 12 months. Cream colour denotes the grids which are not classified correctly. Red colour indicates the grids which are classified correctly. Figures (a–l) represent the heat map for 12 different months, respectively.

Figure A2. Heatmaps of the geolocated grids for Grassland Region of Australia in 2015 for 12 months. Cream colour denotes the grids which are not classified correctly. Red colour indicates the grids which are classified correctly. Figures (a–l) represent the heat map for 12 different months, respectively.

Figure A3. Heatmaps of the geolocated grids for Grassland Region of Australia in 2016 for 12 months. Cream colour denotes the grids which are not classified correctly. Red colour indicates the grids which are classified correctly. Figures (a–l) represent the heat map for 12 different months, respectively.

References

Wilhite, D.A.; Glantz, M.H. Understanding: The drought phenomenon: The role of definitions. Water Int. 1985, 10, 111–120. [Google Scholar] [CrossRef] [Green Version]
Hao, Z.; AghaKouchak, A.; Nakhjiri, N.; Farahmand, A. Global integrated drought monitoring and prediction system. Sci. Data 2014, 1, 1–10. [Google Scholar] [CrossRef] [PubMed]
Wilhite, D.A.; Svoboda, M.D. Drought early warning systems in the context of drought preparedness and mitigation. In Early Warning Systems for Drought Preparedness and Drought Management; World Meteorological Organization (WMO): Geneva, Switzerland, 2000; pp. 1–21. [Google Scholar]
Bhardwaj, J.; Kuleshov, Y.; Chua, Z.-W.; Watkins, A.B.; Choy, S.; Sun, Q. Building Capacity for a User-Centred Integrated Early Warning System for Drought in Papua New Guinea. Remote Sens. 2021, 13, 3307. [Google Scholar] [CrossRef]
Hobeichi, S.; Abramowitz, G.; Evans, J.P.; Ukkola, A. Toward a Robust, Impact-Based, Predictive Drought Metric. Water Resour. Res. 2022, 58, e2021WR031829. [Google Scholar] [CrossRef]
Stricevic, R.; Djurovic, N.; Djurovic, Z. Drought classification in Northern Serbia based on SPI and statistical pattern recognition. Meteorol. Appl. 2011, 18, 60–69. [Google Scholar] [CrossRef]
Danandeh Mehr, A.; Tur, R.; Çalışkan, C.; Tas, E. A novel fuzzy random forest model for meteorological drought classification and prediction in ungauged catchments. Pure Appl. Geophys. 2020, 177, 5993–6006. [Google Scholar] [CrossRef]
Aitkenhead, I.; Kuleshov, Y.; Watkins, A.B.; Bhardwaj, J.; Asghari, A. Assessing agricultural drought management strategies in the Northern Murray–Darling Basin. Nat. Hazards 2021, 109, 1425–1455. [Google Scholar] [CrossRef]
Vicente-Serrano, S.M.; Beguería, S.; López-Moreno, J.I. A multiscalar drought index sensitive to global warming: The standardized precipitation evapotranspiration index. J. Clim. 2010, 23, 1696–1718. [Google Scholar] [CrossRef] [Green Version]
Sayari, N.; Bannayan, M.; Alizadeh, A.; Farid, A. Using drought indices to assess climate change impacts on drought conditions in the northeast of Iran (case study: Kashafrood basin). Meteorol. Appl. 2013, 20, 115–127. [Google Scholar] [CrossRef]
McKee, T.B.; Doesken, N.J.; Kleist, J. The relationship of drought frequency and duration to time scales. In Proceedings of the 8th Conference on Applied Climatology, Anaheim, CA, USA, 17–22 January 1993; pp. 179–183. [Google Scholar]
Cheval, S. The standardized precipitation index—An overview. Rom. J. Meteorol. 2015, 12, 17–64. [Google Scholar]
Pei, Z.; Fang, S.; Yang, W.; Wang, L.; Wu, M.; Zhang, Q.; Han, W.; Khoi, D.N. The relationship between NDVI and climate factors at different monthly time scales: A case study of grasslands in inner Mongolia, China (1982–2015). Sustainability 2019, 11, 7243. [Google Scholar] [CrossRef] [Green Version]
Chua, Z.-W.; Kuleshov, Y.; Watkins, A.B. Drought detection over papua new guinea using satellite-derived products. Remote Sens. 2020, 12, 3859. [Google Scholar] [CrossRef]
Lotsch, A.; Friedl, M.A.; Anderson, B.T.; Tucker, C.J. Coupled vegetation-precipitation variability observed from satellite and climate records. Geophys. Res. Lett. 2003, 30. [Google Scholar] [CrossRef]
De Keersmaecker, W.; Lhermitte, S.; Hill, M.J.; Tits, L.; Coppin, P.; Somers, B. Assessment of regional vegetation response to climate anomalies: A case study for Australia using GIMMS NDVI time series between 1982 and 2006. Remote Sens. 2017, 9, 34. [Google Scholar] [CrossRef] [Green Version]
Nejadrekabi, M.; Eslamian, S.; Zareian, M. Spatial statistics techniques for SPEI and NDVI drought indices: A case study of Khuzestan Province. Int. J. Environ. Sci. Technol. 2022, 19, 6573–6594. [Google Scholar] [CrossRef]
Sin, H.-S.; Park, M.-J. Spatial-temporal drought analysis of South Korea based on neural networks. J. Korea Water Resour. Assoc. 1999, 32, 15–29. [Google Scholar]
Fung, K.; Huang, Y.; Koo, C.; Soh, Y. Drought forecasting: A review of modelling approaches 2007–2017. J. Water Clim. Chang. 2020, 11, 771–799. [Google Scholar] [CrossRef]
Santos, C.A.G.; Brasil Neto, R.M.; da Silva, R.M.; dos Santos, D.C. Innovative approach for geospatial drought severity classification: A case study of Paraíba state, Brazil. Stoch. Environ. Res. Risk Assess. 2019, 33, 545–562. [Google Scholar] [CrossRef] [Green Version]
An, J.; Li, W.; Li, M.; Cui, S.; Yue, H. Identification and classification of maize drought stress using deep convolutional neural network. Symmetry 2019, 11, 256. [Google Scholar] [CrossRef] [Green Version]
Lima, R.P.; SILVA, D.D.; Moreira, M.C.; Passos, J.B.; Coelho, C.D.; Elesbon, A.A. Development of an annual drought classification system based on drought severity indexes. An. Acad. Bras. Ciências 2019, 91, e20180188. [Google Scholar] [CrossRef]
Felsche, E.; Ludwig, R. Applying machine learning for drought prediction using data from a large ensemble of climate simulations. Nat. Hazards Earth Syst. Sci. Discuss 2021, 21, 3679–3691. [Google Scholar] [CrossRef]
Quang Tri, D.; Tho Dat, T.; Duc Truong, D. Application of meteorological and hydrological drought indices to establish drought classification maps of the Ba River Basin in Vietnam. Hydrology 2019, 6, 49. [Google Scholar] [CrossRef] [Green Version]
Vidyarthi, V.K.; Jain, A. Knowledge extraction from trained ANN drought classification model. J. Hydrol. 2020, 585, 124804. [Google Scholar] [CrossRef]
Moreira, E.E.; Coelho, C.A.; Paulo, A.A.; Pereira, L.S.; Mexia, J.T. SPI-based drought category prediction using loglinear models. J. Hydrol. 2008, 354, 116–130. [Google Scholar] [CrossRef] [Green Version]
Rani, B.K.; Govardhan, A. DC (Drought Classifier): Forecasting and classification of drought using association rules. In Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014, Bhubaneswar, India, 14–15 November 2014; pp. 123–130. [Google Scholar]
Danandeh Mehr, A. Drought classification using gradient boosting decision tree. Acta Geophys. 2021, 69, 909–918. [Google Scholar] [CrossRef]
Won, J.; Kim, S. Future drought analysis using SPI and EDDI to consider climate change in South Korea. Water Supply 2020, 20, 3266–3280. [Google Scholar] [CrossRef]
Paulo, A.A.; Pereira, L.S. Prediction of SPI drought class transitions using Markov chains. Water Resour. Manag. 2007, 21, 1813–1827. [Google Scholar] [CrossRef]
Chiang, J.L.; Tsai, Y.S. Reservoir drought prediction using support vector machines. In Proceedings of the Applied Mechanics and Materials, Adelaide, Australia, 9–12 December 2012; pp. 455–459. [Google Scholar]
Malik, A.; Kumar, A. Meteorological drought prediction using heuristic approaches based on effective drought index: A case study in Uttarakhand. Arab. J. Geosci. 2020, 13, 1–17. [Google Scholar] [CrossRef]
Evans, A.; Jones, D.; Smalley, R.; Lellyett, S. An Enhanced Gridded Rainfall Analysis Scheme for Australia; Bureau of Meteorology: Melbourne, Australia, 2020; Volume 66.
Huang, W.C.; Yuan, L.C. A drought early warning system on real-time multireservoir operations. Water Resour. Res. 2004, W06401. [Google Scholar] [CrossRef]
Kędzior, M.; Zawadzki, J. SMOS data as a source of the agricultural drought information: Case study of the Vistula catchment, Poland. Geoderma 2017, 306, 167–182. [Google Scholar] [CrossRef]
Peters, A.J.; Walter-Shea, E.A.; Ji, L.; Vina, A.; Hayes, M.; Svoboda, M.D. Drought monitoring with NDVI-based standardized vegetation index. Photogramm. Eng. Remote Sens. 2002, 68, 71–75. [Google Scholar]
Mishra, A.K.; Singh, V.P. A review of drought concepts. J. Hydrol. 2010, 391, 202–216. [Google Scholar] [CrossRef]
Mo, K.C.; Lettenmaier, D.P. Objective drought classification using multiple land surface models. J. Hydrometeorol. 2014, 15, 990–1010. [Google Scholar] [CrossRef]
Şen, Z.; Almazroui, M. Actual precipitation index (API) for drought classification. Earth Syst. Environ. 2021, 5, 59–70. [Google Scholar] [CrossRef]
Hao, Z.; Hao, F.; Singh, V.P.; Xia, Y.; Ouyang, W.; Shen, X. A theoretical drought classification method for the multivariate drought index based on distribution properties of standardized drought indices. Adv. Water Resour. 2016, 92, 240–247. [Google Scholar] [CrossRef]

Figure 1. Geographical Visualization of Temperate Region of Australia (highlighted in red) and Grassland Region of Australia (highlighted in green).

Figure 2. The framework of our solution. The blue texts in bold denote the output in the current phase.

Figure 3. The statistical distribution of NDVI values for five drought categories in January 2015 for the Temperate Region of Australia. Figures (a–e) represent the statistical distribution of five drought categories, respectively.

Figure 4. Heatmaps of the geolocated grids for Temperate Region of Australia in 2015 for 12 months. Cream colour denotes the grids which are not classified correctly. Red colour indicates the grids which are classified correctly. Figures (a–l) represent the heat map for 12 different months, respectively.

Table 1. Statistics of two climate zones used for our experiments.

	Latitude Range	Longitude Range
Areas	Latitude Range	Longitude Range
Temperate Region of Australia (Victoria)	[−36.75°, −39.30°]	[141.00°, 150.10°]
Grassland Region of Australia (NSW)	[−33.66°, −35.85°]	[141.00°, 145.56°]

Table 2. Dataset description.

	Temporal Range	Latitude Range	Longitude Range	Spatial Resolution
Dataset	Temporal Range	Latitude Range	Longitude Range	Spatial Resolution
NDVI	Daily, 1981–2021	[89.975°, −89.975°]	[−179.975°, 179.975°]	0.05° × 0.05° grid
AGCD	Monthly, 1900–2020	[−10.0°, −44.5°]	[112.0°, 156.25°]	0.05° × 0.05° grid

Table 3. The relationship between drought severity and AGCD percentile values.

Drought Severity	AGCD Percentile Range
Extreme drought	<5
Severe drought	5–10
Moderate drought	10–20
Mild drought	20–30
No drought	30>

Table 4. Precipitation amounts (mm) corresponding to drought percentiles for the Temperate Region of Australia in different months.

	Month	January	February	March	April	May	June
Percentile		January	February	March	April	May	June
5		6.97	4.72	9.42	13.34	20.63	26.34
10		12.15	7.93	13.57	18.09	26.99	33.37
20		21.72	14.33	21.45	26.09	36.93	44.85
30		28.44	21.03	28.30	33.82	46.29	54.71
	Month	July	August	September	October	November	December
Percentile		July	August	September	October	November	December
5		32.90	31.24	33.02	22.72	21.34	13.35
10		39.29	39.04	39.34	31.47	26.92	19.34
20		49.50	51.94	48.81	43.21	35.34	28.93
30		58.78	62.64	56.36	52.37	43.00	36.56

Table 5. Precipitation amounts (mm) corresponding to drought percentiles for the Grassland Region of Australia in different months.

	Month	January	February	March	April	May	June
Percentile		January	February	March	April	May	June
5		0.82	0.68	0.74	0.88	4.63	5.38
10		2.01	1.39	1.40	2.23	7.05	8.86
20		4.55	3.28	3.20	5.06	11.38	13.73
30		7.45	6.03	6.23	8.18	16.08	17.21
	Month	July	August	September	October	November	December
Percentile		July	August	September	October	November	December
5		7.92	6.10	6.10	3.46	2.56	2.33
10		10.70	8.81	8.60	6.13	4.45	3.88
20		16.28	14.36	13.13	10.89	8.15	6.44
30		20.25	19.50	16.91	15.44	12.68	9.50

Table 6. The accuracy of the drought classification model in 12 months (Temperate Region of Australia Victoria).

	Month	January	February	March	April	May	June
Year		January	February	March	April	May	June
2015		18.21%	22.10%	8.08%	12.13%	28.73%	18.97%
2016		12.40%	18.40%	11.71%	24.13%	20.50%	19.29%
	Month	July	August	September	October	November	December
Year		July	August	September	October	November	December
2015		21.33%	22.57%	18.49%	20.39%	14.46%	18.97%
2016		17.06%	25.53%	5.07%	26.86%	18.88%	22.40%

Table 7. The accuracy of the drought classification model in 12 months (Grassland Region of Australia NSW).

	Month	January	February	March	April	May	June
Year		January	February	March	April	May	June
2015		6.03%	21.39%	32.77%	0.92%	26.14%	31.94%
2016		5.37%	28.27%	1.87%	21.54%	11.96%	29.62%
	Month	July	August	September	October	November	December
Year		July	August	September	October	November	December
2015		31.51%	19.47%	28.21%	36.34%	6.72%	24.07%
2016		44.88%	66.27%	14.26%	71.68%	47.99%	18.61%

Table 8. The NDVI range for different drought categories for 12 months in 2015 (Temperate Region of Australia).

Month	Extreme Drought	Severe Drought	Moderate Drought	Mild Drought
January	[−1, 0.00]	[0.00, 0.03]	[0.0, 0.06]	[0.06, 0.22]
February	[−1, 0.00]	[0.00, 0.06]	[0.062, 0.12]	[0.12, 0.24]
March	[−1, 0.00]	[0.00, 0.05]	[0.05, 0.10]	[0.10, 0.21]
April	[−1, 0.00]	[0.00, 0.02]	[0.02, 0.05]	[0.05, 0.17]
May	[−1, 0.00]	[0.00, 0.01]	[0.01, 0.02]	[0.02, 0.13]
June	[−1, 0.00]	[0.00, 0.00]	[0.00, 0.00]	[0.00, 0.07]
July	[−1, 0.00]	[0.00, 0.00]	[0.00, 0.01]	[0.01, 0.12]
August	[−1, 0.00]	[0.00, 0.01]	[0.01, 0.02]	[0.02, 0.19]
September	[−1, 0.00]	[0.00, 0.01]	[0.01, 0.03]	[0.03, 0.24]
October	[−1, 0.00]	[0.00, 0.03]	[0.03, 0.06]	[0.06, 0.28]
November	[−1, 0.00]	[0.00, 0.06]	[0.06, 0.12]	[0.12, 0.28]
December	[−1, 0.00]	[0.00, 0.03]	[0.03, 0.07]	[0.07, 0.25]

Table 9. The NDVI range for different drought categories for 12 months in 2015 (Grassland Region of Australia).

Month	Extreme Drought	Severe Drought	Moderate Drought	Mild Drought
January	[−1, 0.04]	[0.04, 0.12]	[0.12, 0.17]	[0.17, 0.21]
February	[−1, 0.07]	[0.07, 0.15]	[0.15, 0.18]	[0.18, 0.22]
March	[−1, 0.08]	[0.08, 0.15]	[0.15, 0.18]	[0.18, 0.22]
April	[−1, 0.04]	[0.04, 0.13]	[0.13, 0.16]	[0.16, 0.20]
May	[−1, 0.03]	[0.03, 0.11]	[0.11, 0.14]	[0.14, 0.17]
June	[−1, 0.00]	[0.00, 0.04]	[0.04, 0.13]	[0.13, 0.17]
July	[−1, 0.02]	[0.02, 0.09]	[0.09, 0.15]	[0.15, 0.19]
August	[−1, 0.04]	[0.04, 0.15]	[0.15, 0.21]	[0.21, 0.26]
September	[−1, 0.03]	[0.03, 0.18]	[0.18, 0.23]	[0.23, 0.29]
October	[−1, 0.06]	[0.06, 0.17]	[0.17, 0.21]	[0.21, 0.26]
November	[−1, 0.07]	[0.07, 0.15]	[0.15, 0.18]	[0.18, 0.21]
December	[−1, 0.03]	[0.03, 0.15]	[0.15, 0.18]	[0.18, 0.21]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, H.; Bhardwaj, J.; Choy, S.; Kuleshov, Y. Applying Machine Learning for Threshold Selection in Drought Early Warning System. Climate 2022, 10, 97. https://doi.org/10.3390/cli10070097

AMA Style

Luo H, Bhardwaj J, Choy S, Kuleshov Y. Applying Machine Learning for Threshold Selection in Drought Early Warning System. Climate. 2022; 10(7):97. https://doi.org/10.3390/cli10070097

Chicago/Turabian Style

Luo, Hui, Jessica Bhardwaj, Suelynn Choy, and Yuriy Kuleshov. 2022. "Applying Machine Learning for Threshold Selection in Drought Early Warning System" Climate 10, no. 7: 97. https://doi.org/10.3390/cli10070097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Applying Machine Learning for Threshold Selection in Drought Early Warning System

Abstract

1. Introduction

2. Materials and Methods

2.1. The Study Area

2.2. Data Source

2.2.1. Australian Gridded Climate Data (AGCD)

2.2.2. Normalised Difference Vegetation Index (NDVI)

2.3. Methodology

2.3.1. Method Overview

2.3.2. Data Wrangling

2.3.3. Model Training

2.3.4. Threshold Selection

3. Results

3.1. Performance Measurement

3.2. Experimental Results

3.2.1. Model performance (Q1)

3.2.2. NDVI threshold values (Q2)

4. Discussion

4.1. Drought Classification Performance Impacted by the Data Noise

4.2. Drought Classfication Model Limited by Inputs

4.3. Geospatial and Temporal Impact

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI