1. Introduction
The monitoring of water extent, use, and quality is becoming increasingly important for water and food security given that flood and drought events are becoming more frequent, intense, and unpredictable under climate change [
1,
2]. Accurate monitoring and auditing of water supplies is important for the defence of human rights and livelihoods against threats to water security from issues such as privatization, theft, and redirection [
3,
4].
The spectral range and variability of inland surface waters, adjacency effects, and spectral similarity with other land cover classes, as well as atmospheric and topographic (hill shade) illumination issues, present different challenges compared with ocean colour remote sensing. Inland surface waters can exhibit high concentrations of chlorophyll and phytoplankton biomass, mineral particles, detritus, suspended solids, and coloured dissolved organic matter, with depths and water body bottom materials in shallow waters that vary over space and time [
5,
6,
7]. The extent of surface water has changed significantly due to human activities and climatic change, and it is challenging to capture inter-annual changes of water bodies, particularly in arid environments, due to their high seasonal variation and abrupt climatically induced changes [
8]. Global Surface Water datasets have been found to underestimate the presence and extent of small and turbid water bodies in arid ecoregions that are essential for fauna and agriculturalists [
9]. These are typically ephemeral, filling up in rainy seasons and can disappear completely in dry seasons—a challenge for monitoring efforts [
9].
Most mapping efforts have required some form of exception handling and manual editing [
6], and the highest-performing water indices have relied on Shortwave Infrared (SWIR) bands to capture turbid waters. However, SWIR bands have lower spatial resolutions than the visible bands (Blue, Green, and Red) and the Near-Infrared band on multispectral sensors, such as Landsat and Sentinel-2, and are currently not common or feasible for most publicly accessible drones/UAVs and archival aerial photography. Indices are widely used for surface water mapping (
Table 1), especially in large-scale and time series analyses because of their high computational efficiency and easy implementation [
10].
The limitations of the normalized difference indices are well known. The popular ‘NDWI’, expressed as a normalized difference between the
NIR and SWIR1 bands, for example (which is essentially the same as the NDMI), is known to confuse built-up (urban) areas, sand, and exposed rocks, dark soils, and shadows with water bodies [
9,
21]. Similarly, ‘MNDWI’ [
22], which is essentially the same as the ‘NDSI’ [
14], is known to confuse vegetated areas with water bodies, particularly inundated vegetation or vegetated water [
9], as well as buildings and shadows [
21]. All these indices, except for the Normalized difference Green/
NIR index, require SWIR bands that are not available on most higher resolution sensors.
The perception of colour does not require knowledge of optical properties [
23] and colourimetry provides the potential for quantifiable measurements and easily communicable ontologies [
24].
Hue angle has been considered the most perceptibly intuitive way for humans to consistently differentiate and communicate their observations about land cover features [
25,
26]. In natural colour (R,G,B: Red, Green, Blue), clear water appears blue or black in satellite imagery, phytoplankton is typically perceived as having a green hue, suspended particle matter appears yellow, and coloured dissolved organic matter appears brown; however, when these features are mixed, spectral signatures are more difficult to separate uniquely [
23]. Colourimetric approaches have been applied to aquatic detection and classification studies based on hue, typically for RGB combinations of the SWIR1,
NIR, and Red bands [
23,
27,
28,
29,
30,
31]. The technique has proven effective for Global Surface Water mapping [
6].
It can be very challenging to separate water from other spectrally similar land cover classes, including high albedo features, such as urban roof tops and other infrastructure, mines, industrial sites, photovoltaic farms, snow, ice, and clouds, as well as low albedo non-water surfaces including hill shade and urban features such as asphalt roads, airport runways, building shadows, and coal and waste heaps [
6,
7]. Existing solutions to these problems include index thresholding, decision tree or rule-based approaches, spectral mixture analysis, linear discriminant analysis [
19], and supervised or unsupervised classification schemes, including machine learning algorithms [
9,
10]. Alternative approaches have combined existing water indices with other methods, including colour space transformation, principal components analysis, image segmentation, topographic masking [
32], or a combination of multiple water indices with ensembles for collaborative decision-making [
33]. However, differences among water bodies have rarely been considered, and the accuracies have been inconsistent among water body types [
32].
Holism is considered a fundamental characteristic of landscape ecology [
34]. It is commonly simplified as ‘the whole is more than the sum of its composing parts’, where each element receives its significance or measure based on its position and relationship with the surrounding elements [
35]. It has been referred to as a ‘shuttle analysis’, where zooming in and out from space to the smallest element in the landscape progressively reveals the details necessary to understand the landscape [
36]. Holistic analysis provides simplification by reducing observations to better understand complex Earth systems while attempting to maintain an understanding of the systems in their entirety [
37]. Loucks [
38] noted the need to balance holism and reductionism in ecological studies in order to ‘explain outcomes by looking at parts of complex systems (reductionist view) against the desire to understand how the parts work together in a fully functioning system (holistic view)’. Here, an approach of ‘holistic reduction’ will be demonstrated, showing how it can facilitate the classification of visually recognizable ecological or land cover features across seasons and scales. Applying holistic reduction, the myriad expressions of water, from shallow to deep, and from clear to turbid, can be reduced to a singular class of water. The way that this singular class relates to, and differentiates spectrally from, the rest of the landscape in different ecoregions around the world and how it varies across the seasons in turn needs to be considered holistically.
Satellite Imagery Interpretation (SII) has previously been proposed as a more rigorous analogue to Aerial Photographic Interpretation (API) for multispectral imagery analysis, with the potential for direct and repeatable quantitative measurement [
24]. SII is formalized by the creation of interpretation keys that summarize visual perception cues for the identification of land cover classes for the development of colourimetric or index-based ontologies for their classification.
This study aimed to create a holistic reductionist framework to identify the most suitable surface water indices for global monitoring with minimal sampling effort. The framework combines SII with a decision matrix [
39] to assess and compare the strengths and limitations of existing indices with:
- (1)
A new globally applicable multispectral index for mapping and monitoring surface water extent that is able to include all the expressions of water regardless of turbidity across all seasons.
- (2)
A new index with the same capabilities for sensors limited to the visible and NIR bands for local and retrospective monitoring of surface water extent at the highest possible resolution.
The proposed problem meets a recurring demand from analysts and decision-makers dealing with the monitoring, planning, and conservation of water resources.
2. Methods
2.1. Understanding Temporal Variability via Visualization
A holistic approach to surface water mapping must first consider temporal variability in water expression, not only to classify water accurately across the seasons, but to also avoid false positives in change detection studies.
The Global Surface Water (GSW) dataset [
6] characterizes this variability through a set of layers: Water Occurrence, Occurrence Change Intensity, Seasonality, Annual Recurrence, Transitions, and Maximum Extent. Lakes and rivers can vary seasonally in water level/volume, extent, and turbidity [
40], as illustrated in
Figure 1.
Additionally, vegetation and agricultural features also change seasonally. Their colourimetric expression in multispectral false colour RGB combinations vary dramatically, as do relative contrasts in different landscapes across the seasons. For example, the saturation of colours for water, intensely irrigated crops, and hill shadows can appear similarly dark and under-saturated, particularly in Winter (
Figure 2). Seasonal observations are therefore important to understand surface water extents and dynamics, and any indices created to map and monitor water need to take this variation into account.
2.2. Analysis Design
This study was conducted in phases. The first phase evaluated existing water indices from the literature (
Table 1), while new indices proposed in this study (
Table 2) were evaluated with SII at 100 m resolution using a decision matrix to define a set of visually discernible evaluation criteria. The criteria focused on known issues for discriminating surface water in a large and diverse area during the seasonal extremes of Summer and Winter. Interpretation keys with colourimetric descriptions were created for the water and non-water features that were compared in the decision matrix. Interpretation keys are formalized by a set of examples, which are as mutually exclusive as possible to support reliable recognition and communication of features [
24,
41,
42,
43]. The examples can be based on existing reference data or maps. The decision matrix was constructed to tally scores representing the symptomatic omission and commission of the different water and non-water features across the landscape based on visual observation in keeping with features defined in the interpretation keys and others like them around the study area according to the reference maps by each of the existing and new indices. The decision matrix was also applied to guide choices during the development of the new indices for water. The index with the highest score was considered the best. Criteria were all given a value of 1 to reduce the subjectivity involved with weighted scores. The severity or impact of each criterion is better assessed in the proceeding phases, particularly with the formal quantitative accuracy assessments.
Because the decision matrix did not produce distinct first and second places, and it is possible to miss certain details at 100 m resolution due to spectral mixture, a second phase was conducted. This was a process of elimination for the best performing indices, with additional visual assessments of the index classifications at high resolution for annual median-based image composites for landscape features that were identified in the first phase as problematic for the discrimination of surface water. These included: (1) a coastal wetland area, (2) an arid wetland area, (3) an intensely irrigated agricultural area, and (4) a complex urban area.
Due to the characteristically ephemeral nature of surface water bodies in the arid western zone, a conventional quantitative accuracy assessment was only conducted for the eastern temperate zone in the third phase. Omissions of very large semi-permanent water bodies in the arid western zone (e.g., Lake Urana, 35°17′33.6″S 146°11′16.7″E and Lake Cowal, 33°37′10.9″S, 147°26′43.0″E) in global scale products such as the European Space Agency’s (ESA) 10 m WorldCover product [
44] ‘Permanent water bodies’ class and the GSW Occurrence and Seasonality indicators meant that deriving a suitable validation dataset for that zone was not possible. Illumination masks were developed to reduce the amount of misclassification due to hill shading and urban/built-up features for the best performing indices limited to the typically higher resolutions bands. The accuracy assessment was conducted within the eastern temperate zone for three shortlisted indices with the selected illumination masks. In order to separate any issues arising from hill shading and snow cover, the accuracy assessment was conducted for a median-based image composite between Spring and Summer. A total of 10,000 validation points were used, derived from the ESA’s 2020 WorldCover product, with random sampling stratified with a 50%/50% split between the ‘Permanent water bodies’ class and the remaining land cover classes. An additional accuracy assessment was conducted for the best performing index with automatic Otsu thresholding [
45] for Sentinel-2 and Landsat 8 imagery due to its distinctly binary histogram distribution, with the dual purpose of testing for automation and sensor compatibility.
The fourth phase applied a visual inspection to examine the consistency of performance for the shortlisted indices in a wide range of geographic settings in the study area and around the world, as well as including a consideration of the effects of atmospheric haze in a tropical area.
2.3. Study Areas
To evaluate large scale performance, scalability, and global transferability of alternative water indices, a study zone was purposively selected to encompass diverse hydrological environments (temperate coastal, mountainous, alpine, semi-arid, and arid environments) that display an extensive range of low and high albedo features and a range of features that indices might fail to discriminate from water. The first and third phases of this analysis were conducted across New South Wales (NSW), covering an area of 801,150 km
2 in south-eastern Australia. NSW has a wide variety of water expressions as well as other diverse land cover classes that may (at times) have similar spectral properties over varied environments and terrain. It includes coastal and montane rivers and lakes, dams and agricultural ponds, and ephemeral lakes. It also encompasses spectrally diverse non-water landscape features including rainforests, plantations, grasslands, pastures, a range of freshwater and saline wetlands on the coast and inland, shrublands and chenopods [
46]; substantial areas of hill shading in mountainous terrain; an alpine range with Winter snow cover; dense urban areas with industrial zones; intensely irrigated agricultural areas; and a varied arid region exhibiting bright, exposed bare land surfaces. It is important to recognize the ephemeral nature of water in arid environments. Australia is a country where there is a great deal of inter-annual stochastic variability in rainfall. Understanding its phenology is challenging due to its diverse range of ecosystems and the high inter-annual variability they display, largely due to the combined circulation patterns of the ENSO (El Niño Southern Oscillation) and the IOD (Indian Ocean Dipole) [
47,
48].
The second phase of analysis examined the highest performing indices from the first phase at full resolution within and around the study zone in the neighbouring northern State of Queensland, for a coastal and an arid wetland area, as well as an intensely irrigated agricultural area. The city of Amsterdam in the Netherlands was used to examine the performance of indices in urban environments. Amsterdam displays a wide range of urban water features within a complex urban environment, including a large river, a multitude of narrow canals, and small park ponds. Further testing for water bodies within an intensely irrigated agricultural area was conducted in an area between the Condamine and Cecil Plains in Queensland, north-eastern Australia.
A wide range of locations from around the world were selected for the fourth phase to evaluate the best indices from the accuracy assessment. These included salt lakes in Bolivia and South Australia; inundated agricultural areas in Bangladesh, Taiwan, Thailand, and Vietnam; and arid areas in Chile, Iran, Central Sahara, the Arabian Peninsula, and Pakistan. The effects of atmospheric haze typical for tropical areas was examined on the southern coast of Papua New Guinea.
2.4. Imagery Preparation
Temporal aggregation from median-based image composites have been shown to significantly reduce data volume, anomalies, clouds, and shadows, resulting in faster and simpler analyses with equally high accuracy as time series data from single, purposively selected annual or monthly composites [
49,
50,
51]. Inter-annual median-based image composites were prepared in Google Earth Engine (GEE) for the southern hemisphere Summer and Winter seasons between the years 2016 and 2018, to simulate the typical seasonal variability expected throughout an ENSO (El Niño Southern Oscillation) and to avoid the flooding events in the subsequent years. The full Sentinel-2 Top-Of-Atmosphere reflectance Level 1C archive was considered, after removing scenes with a cloudy pixel percentage of 20%. Dense and cirrus clouds were also masked out using a per-pixel approach, via the bitmasks provided in the Level 1C processing algorithm. This process provided a seamless, well colour-balanced imagery mosaic indicative of the typical conditions expected during those seasons.
2.5. Satellite Imagery Interpretation Key with Colourimetric Benchmarks
Interpretation keys with colorimetric benchmarks were developed for each major water feature type and potentially similar non-water feature types around the study zone as a basis for SII. The benchmarks for the Water SII key (
Figure 3) were identified visually and selected to represent the variety of hues and saturations of each water feature. The interpretations were validated for agreement of extent with the GSW [
6], the ESA WorldCover ‘Permanent water bodies’ class, and the WOFS (Water Observations from Space) [
52] datasets and are described in the key.
The general pattern that can be observed from the SII key of colourimetric benchmarks in the ‘Land/Water’ RGB for water features is that water bodies are generally clearer in the temperate zone (in the lime green boundary), appearing black to dark blue, and they become more turbid towards more arid areas (in the orange boundary) where ephemeral water bodies are more typical, appearing either blue, purple, or magenta/fuchsia in colour.
Figure 4 shows the SII key for non-water feature benchmarks for a range of high and low albedo features that are commonly misclassified as water. The interpretations were validated from a variety of sources, including the map of Keith Vegetation Formations for NSW for forest and inland wetland features [
46], the Landuse Mapping for NSW 2017, v1.2 [
53] for agricultural features, and the mapping from Seamap Australia for detailed coastal wetland features (particularly the Estuarine macrophytes dataset) [
54]. It is worth noting, however, that several of these features may actually have surface water at certain times (for example, salt lakes, irrigated crops, and mangroves).
Most non-water features have obviously different colours to those from the water SII key; however, snow and ephemeral lakes both had magenta hues, and some features are also characterized by different tones of blue but can be differentiated contextually and by their shape and brightness with SII and validated with existing reference maps. It should be noted that most of the ambiguity occurs in the temperate zone (in the green boundary), where fewer ephemeral water bodies occur.
2.6. Creation of the New Water Indices Presented in This Study
The new indices presented in this study (
Table 2) were based on optical properties [
55] and refined by experimentation. This involved a large-scale holistic assessment of the omission or commission of the colourimetric benchmarks specified in the SII keys with a decision matrix. Preference was given to simple, non-parametric algebraic indices that would not need to be calibrated across sensors or different environmental landscapes.
Table 2.
Indices presented in this study.
Table 2.
Indices presented in this study.
| Index | Equation |
---|
Indices requiring SWIR bands | CHI | (Green − SWIR2)/NIR |
CAWI | Log10 (Green/SWIR2/NIR) |
CWI | R,G,B: SWIR2, NIR, Green → H,S,V: Hue and Saturation |
Indices limited to the visible and NIR bands | CATWIC (combination of HRCWI and SR) | Where HRCWI = (Green − Red)/NIR and SR = Red/NIR |
CHRWI | R,G,B: Red, (NIR + Blue)/2, Green → H,S,V: Hue |
BRCHRWI | R,G,B: Red/NIR, Blue/Green, Green/NIR → H,S,V: Hue |
NDCHRWI | R,G,B: ((Red − NIR)/(Red + NIR)) + 1, ((Blue − Green)/(Blue + Green)) + 1, ((Green − NIR)/(Green + NIR)) + 1 → H,S,V: Hue |
The
CHI (Comprehensive Hydrologic Index) was created by testing different bands in the equation format
(a −
b)/c, where
c = NIR, in order to try and capture a thematic gradient of moisture. The best performing index required the
SWIR2 band:
CHI was able to classify both clear and turbid water but was found to also classify snow and wetland elements such as mangroves as water, hence the name Comprehensive Hydrologic Index rather than Comprehensive Water Index. Dividing the bands in the same sequence as
CHI in the equation format
a/b/c appeared to serve as an automatic threshold index, namely
CAWI (Comprehensive Automatic Water Index). Applying a Log10 transformation provided a better histogram stretch:
Taking a colourimetric approach [
24] to decorrelate RGB composite hues from saturation and brightness, bands from
CHI and
CAWI were also tested to create a hue-based index from an HSV (
Hue, Saturation, Value) colour space transformation to provide a linear gradient where water was at an extreme, to be referred to as
CWI (Colourimetric Water Index). This HSV transformation also provided a Saturation that was able to mask out features such as dark agricultural features, urban/built-up, and coal and mining areas, which might be misclassified by the hue angle alone.
Further experimentation was conducted to find a combination of indices that would only require the visible and
NIR bands for the highest possible precision and compatibility with higher resolution sensors. The
HRCWI (High Resolution Clear Water Index) was thus created by replacing the
SWIR2 band in
CHI with the Red band, which performed best for clearer waters.
HRCWI was observed to saturate before being able to distinguish all turbid water bodies from other land cover features. Therefore, an accompanying ‘turbid water’ masking index was found to be necessary. The Simple ratio of Red/
NIR proved to be effective:
Their combination will be referred to as CATWIC (Clear and Turbid Water Index Combination).
Similarly, to
CWI, rearranging the bands from HRWCI to R,G,B: Red,
NIR, Green (as opposed to the textbook ‘
NIR composite’ of R,G,B:
NIR, Red, Green) also provided an uninterrupted linear gradient for water, which appeared to indicate vegetation moisture. This rearrangement displayed the greatest thematic correlation and optimal separability—the ordering of RGB channels is therefore important. The process is similar to the Tasselled Cap transformation in that the rearrangement of the bands in the 3D RGB colour space can optimize the separability of particular features with similar spectral characteristics. Some errors were observed on visual inspection, so the second channel with the
NIR band was modified to ‘tone down’ the hill shading characteristically produced by the
NIR band with an average between
NIR and the Blue band (the full range of the typically high resolution bands) to form
CHRWI:
Different strategies were tested to find any improvement with RGB combinations composed of only visible and NIR bands and to try to keep the solution to just one index. The values for the RGB channels were considered from either single band RGBs, band ratio RGBs, or normalized difference RGBs, which appear to maintain the highest definition of data with smoother gradients and the least noise (pixel speckle) compared to band subtraction RGBs, for example.
Two channels were selected to emphasize the majority of the feature of interest with a common denominator, and a third channel was selected to distinguish spectrally similar or overlapping features. A visual inspection of band ratios suggested that Red/
NIR displays a high contrast for turbid water with minimal hill shading but lacks definition for clear waters. Green/
NIR provides better contrast for clear water with minimal hill shading but will also include bright features such as urban features, mining areas, and highly irrigated agricultural crops. Blue/Green displays water and vegetation in a similar range but was selected to balance out the brightness from
NIR bands in the other two band ratios. Placing the Blue/Green ratio in the second channel produced a colour scheme with vegetation in green for a more intuitive interpretation to facilitate deductions for the presence of water. Combining these produced:
Lastly, a normalized difference RGB combination was created from the aforementioned band ratios to test for any improvements to form:
The addition of 1 to each normalized difference ratio is necessary if a HSV colour space transformation will have the Hue and Saturation within the range of 0 to 1, as it is in GEE. It was observed that the first channel in an RGB combination order is important, while the order for the second and third channels will only affect the hue angle distribution relative to the range of the colour circle, and thus only the visual appearance of the RGB.
2.7. Threshold Determination for Indices and Their Comparison in the Decision Matrix
The process for determining the thresholds for each index that was entered and compared in the decision matrix is illustrated in
Figure 5.
Each index was analysed for its ability to separate water from non-water features with visual density slicing in pseudo colour by referring to the colourimetric benchmarks in the interpretation keys in the ‘Land/Water’ RGB combination (R,G,B: NIR, SWIR1, Red). Preliminary classifications were performed holistically in a GIS at a resolution of 100 m with resampled Sentinel-2 median-based imagery for each seasonal extreme of Summer and Winter. Pseudo colour intervals were classified as water until they included all the recognizable water features from the SII key (including the ends of rivers and turbid lakes) or until they began to misclassify either hill shaded or urban areas excessively as water. High resolution background imagery/photography and the existing reference maps were referenced when water presence was not obvious. Each of the criteria in the decision matrix were scored based on visual interpretation of the colourimetric benchmarks.
Once all the general criteria had been assigned a score in the decision matrix, more precise thresholds and the criteria of whether or not they overlapped urban buildings excessively, or maintained narrow river detail in Summer, were then determined at the full image resolution in GEE in an AOI (Area of Interest).
Pixels in shallow and narrow water bodies are difficult to capture due to mixed reflectances caused by sediments and surrounding land [
32]. When applying density slicing to define index-based thresholds, it was observed that the ends of narrow rivers were consistently at the same intervals of the indices that captured the most turbid water bodies around the study area, and before intervals that began to include hill shade or other land cover classes. These intervals were therefore considered ideal for determining water index thresholds. These features can be considered PIFs (Pseudo Invariant Features)—historically consistent pseudo ground-truth points [
56,
57,
58]. The PIF used in this study was identified by associating a thin, tree-lined river by API in GEE with the VHR imagery/photography available there to the Sentinel-2 imagery. The PIF is shown by the point symbolized in cyan in
Figure 5, where the indices were no longer sensitive enough to further classify the river. The AOI was used for logistic practicality and efficiency to perform a more precise threshold determination because it represented a wide range of water features, including different turbid water bodies. It was also used as the area for determining the Otsu thresholds for the
CAWI index.
This holistic assessment created a shortlist of best performing indices, which were then further assessed at the full 10 m resolution with annual median-based image composites in GEE for a set of environments which showed the most errors for the majority of indices in the decision matrix. The following fixed thresholds were applied to the coastal and arid wetlands and the highly irrigated agricultural area:
CAWI ≥ 1.25,
CWI hue angle ≥ 0.4 and Saturation = 0.44,
HRCWI ≥ 0.2 and
SR ≥ 0.985,
CHRWI ≥ 0.4,
BRCHRWI hue angle ≥ 0.369, and
NDCHRWI hue angle ≥ 0.386. These thresholds were set by the same PIF in
Figure 5, except for the fixed thresholds for
SR and the Saturation for
CWI. A colourimetric gradient of black to dark blue, to blue, to purple, to magenta was deduced from a holistic scan of the study zone, as shown in the interpretation key to represent clear-to-turbid waters in the ‘Land/Water’ RGB in GEE. Since
SR was only intended to mask out the generally turbid end of the water spectrum, thresholds for it were derived from what it managed to cover from purple- and magenta-coloured agricultural dams before it began to misclassify non-water features. Visual association to the ESA WorldCover ‘Permanent water bodies’ class, the GSW Occurrence and Seasonality indicators, and interpretation with Google Earth Pro was used to confirm the colourimetric deduction for the extreme expressions of turbid water. The imagery time slider in Google Earth Pro also allowed the confirmation of any temporal changes that might have made the interpretation ambiguous across the seasons.
CAWI, CWI, and HRCWI by itself were selected for further assessment in the complex urban area of Amsterdam because of their high performance in Phase 1. Thresholds here were adjusted as required. CAWI was density sliced using 0.5 standard deviations, and HRCWI with 10 natural breaks, while CWI was colour clustered with GEE’s ee.Clusterer.wekaKMeans function.
2.8. Development of Indices to Reduce Misclassification of Hill Shade, Urban Areas, Dry Salt Lakes, and Snow
The results of the decision matrix showed that all the indices were affected by hill shade, urban/built-up areas, and snow.
In order to reduce hill shade effects without a topographic illumination correction, a Shadow Index (SI) [
59] was considered. However, it was observed that it masked out some lakes and rivers. The ee.Terrain.hill shade function available in GEE was also tested, but the results were considered too coarse at 30 m resolution for a 10 m resolution product. A simpler solution was determined to be the use of a slope with a threshold set at >10 degrees as a compromise between shaded slopes and the errors inherited from the available 30 m DEM [
60] using the ee.Image(‘USGS/SRTMGL1_003′).
It was observed that the Saturation from
CWI (which included a SWIR band) contributed to masking out bright buildings in urban areas. However, this was not the case for the RGBs limited to the higher resolution bands (Blue to
NIR). An RGB’s HSV Saturation is defined by:
Modifications of the Saturation equation were tested with the four higher resolution bands (Blue, Green, Red, and NIR) and on the NDCHRWI, and were checked by visual interpretation to appraise the range of urban/built-up environment that they were able to mask out without masking water. The modifications included the following:
- (1)
Range: Max(R,G,B) − Min(R,G,B)
- (2)
Simple ratio: Min(R,G,B)/Max(R,G,B)
- (3)
Normalized difference ratio: (Max(R,G,B) − Min(R,G,B))/(Max(R,G,B) − Min(R,G,B))
- (4)
Saturation of four bands: (Max(R,G,B,NIR) − Min(R,G,B,NIR))/Max(R,G,B,NIR)
For CATWIC, in order to keep the solution as simple as possible without needing to create an RGB for it, the Saturation of the four bands was selected as an urban mask. For the NDCHRWI, the Simple ratio was selected. Thresholds were density sliced by visual interpretation and set to the point where the index classified bright urban buildings correctly, but began to misclassify water in Lake Urana (146.1899, −35.2829), an ephemeral lake in the central south of the study zone.
These masked out most urban/built-up features, except for a very small proportion of very bright buildings. A High Resolution Snow Index (
HRSI) has been developed as part of concurrent research, which can mitigate these problems. This index is able to separate snow/ice mutually exclusively from water, hill shade, and other highly reflective surfaces which only require the visible and
NIR bands. The
HRSI was therefore applied as an additional mask to remove the very bright buildings, together with snow and dry, saline lakes in the arid interior.
The full classification chorology applied for CATWIC for the temperate eastern zone was therefore:
The full classification chorology applied for the
NDCHRWI for the temperate eastern zone was:
3. Results
3.1. Phase 1—Comparison of Existing and New Indices with a Decision Matrix
Table 3 provides a look-up table which lists all the indices that were tested and compared with the decision matrix in
Table 4, where the most positive score indicates best overall performance.
None of the indices was able to completely discriminate water from shadows; however, SR was the least affected by hill shade. Six of the new indices presented in this study ranked higher than existing indices. These included: (1) CAWI, (2) CWI Hue and Saturation, (3) CATWIC (combination of HRCWI and SR), (4) CHWRI Hue, (5) BRCHRWI Hue, and (6) NDCHRWI Hue. Of all the relatively successful indices that scored greater than −4, none was mutually exclusive of snow, all except for the colourimetric CWI misclassified coal as water, and all except CAWI and CWI misclassified most urban buildings excessively in both or either season. The HRCWI by itself, however, appeared to display minimal urban misclassification.
A comparison of threshold stability for each of the best performing indices, based on the PIF, is shown in
Table 5.
SR and the Saturation for
CWI remained stable because they were assigned fixed thresholds across the seasons. The hue-based indices displayed the greatest threshold stability; however,
CWI is better applied with a colour-clustering routine to automate the balance between its
Hue and Saturation.
CAWI displayed the greatest variability, while CATWIC’s combination of
HRCWI with
SR was the most stable algebraic index.
3.2. Phase 2—Assessment of Best Performing Indices in Wetland, Agricultural, and Urban Environments
All the reference RGB images in
Figure 6,
Figure 7 and
Figure 8 are in the ‘Land/Water’ RGB:
NIR, SWIR1, Red, with a linear stretch of min: 0, max: 0.3, in GEE for a visual perception of characteristic appearances and differences in land cover brightness and colour saturation.
3.2.1. A Coastal Wetland Area
The indices requiring SWIR bands displayed the most misclassification of mangroves, excessively so for
CWI (
Figure 6). The indices limited to the visual and
NIR bands displayed better performance. Zooming in to the imagery showed that CATWIC and the
NDCHRWI performed the best.
3.2.2. An Arid Wetland Area
Based on the deduction that the turbid water in
Figure 7 appears purple to magenta in colour, the results were the opposite in the arid wetland, where
CWI performed the best and the other indices underestimated the extent.
3.2.3. An Intensely Irrigated Agricultural Area
Water bodies in the intensely irrigated area (
Figure 8) are typically turbid agricultural dams and appeared dark purple in GEE with the ‘Land/Water’ RGB. The highly irrigated agricultural parcels displayed very low saturations which appeared dark blue in the SII key, which was based on a standard deviation stretch in the GIS. All the indices performed well for turbid water mapping within the extent of NSW. However, validation outside the extent in the northern State of Queensland displayed that CATWIC (combination of
HRCWI and
SR) classified most turbid water bodies except for those with a magenta colour, while the
NDCHRWI misclassified some intensely irrigated parcels as water.
CAWI and
CWI performed better.
CAWI omitted an insignificant number of very turbid ponds, while
CWI classified the full range. Colour clustering for
CWI produced varying results depending on the sampling extent and intensity; therefore, its
Hue and Saturation were set manually for consistency and processing feasibility.
3.2.4. A Complex Urban Area
On close inspection and comparison with the Google Map reference layer and the GSW water product in
Figure 9,
HRCWI included the most noise from buildings followed by
CAWI, mainly from shadows. The colour-clustered
CWI performed best and classified the most water from the small ponds in urban parks. Colour clustering was, however, much more processing intensive than thresholding. All the indices misclassified the low albedo dark steel from Amsterdam Central Station as water.
3.3. Phase 3—Accuracy Assessment
Based on the observations from Phase 2,
CAWI, CATWIC, and the
NDCHRWI Hue were considered for an accuracy assessment of the eastern zone (
Figure 3). The results in
Table 6,
Table 7 and
Table 8 indicate that
CAWI scored the highest overall accuracy of the three indices.
Working on the premise that the interpretation of water presented in the SII keys was correct, visual inspection of the errors suggested that they can generally be attributed to errors in the reference data, since the ESA land cover product evidently has errors due to the smoothing techniques that were applied to it. For example, isolated trees are not present, and many thin rivers have been classified as mangroves. Further visual analysis determined that CATWIC displayed slightly more misclassification of urban/built-up features than the NDCHRWI Hue.
The additional accuracy assessments for the
CAWI index with thresholds automatically set by the Otsu technique for Sentinel-2 and Landsat 8 imagery in
Table 9 and
Table 10 suggest that automation can provide acceptable results, though lower in overall accuracy than a manual effort. The
CAWI index maintained the same data distribution, albeit tighter for Landsat 8, but that the performance was better with Sentinel-2. This is expected due to the lower resolution of the Landsat 8 sensor.
A visual appraisal of CAWI’s performance across all the seasons with the automated thresholding indicated that the slope mask of 10 degrees was not sufficient to resolve misclassifications due to hill shading for either Sentinel-2 or Landsat 8 during Winter. Sentinel-2 displayed more misclassified hill shading, bright urban/built-up features, mangroves, and dark, intensely irrigated soils than Landsat 8, but Landsat 8 missed more thin rivers and small dams/ponds than Sentinel-2, suggesting that there are trade-offs between higher spatial resolution and errors of omission and commission.
3.4. Phase 4—Validation of the Selected Indices’ Performances across the Seasons and around the World
A visual assessment of the best performing threshold-based indices (CAWI, CATWIC, and the NDCHRWI Hue) confirmed that they were effective across all four seasons for the initial study zone of the State of NSW. However, the thresholds for CAWI were less stable and required more natural breaks to distinguish water during Autumn and Spring than it did during Summer and Winter.
An extended global visual inspection identified a limitation for all the indices in dry salt lakes, including the Salar de Uyuni in Bolivia and Lake Eyre in South Australia, which were misclassified as water. The SR index displayed the most misclassification of snow and misclassified very arid areas such as the north of the Atacama Desert in Chile, the Central Persian desert basins, the Tibesti–Jebel Uweinat montane xeric woodlands in Saharan Africa, the Red Sea–Arabian desert shrublands, the South Iran Nubo–Sindian desert, the Indus Valley desert, and the Rann of Kutch seasonal salt marsh in India. It was observed that these errors can be avoided, or at least minimized, with the HRSI and the modified saturation masks that were used to mask out the bright urban features for the accuracy assessment.
The global inspection also showed that inundated agriculture, for example, north of the Sundarban mangroves in Bangladesh; the Mekong River delta in the south of Vietnam; Yilan County in the north-west of Taiwan; or Samut Sakhon, south of Bangkok in Thailand, were classified as water, which concurred with the GSW mapping.
Comparison of Atmospheric Effects
A comparison was made for the best performing indices in an area in the tropics (
Figure 10) that is typically affected by cloud cover and atmospheric haze.
CAWI was affected to some degree, and
HRCWI (one of the indices that made up CATWIC) more so. This could have consequences for the classification of thin rivers under hazy atmospheric conditions. The
NDCHRWI did not appear to be affected at all. It should be noted, however, that both CATWIC and the
NDCHRWI did not have urban and snow masks applied, and that applying them introduced errors due to the haze. Therefore, caution is advised regarding where and when to apply those masks.
4. Discussion
The results showed that the new CAWI index effectively classified surface water across the seasons in a wide, globally representative range of environments, and that it is also possible to achieve high classification performances from indices limited to the typically higher resolution (visible and NIR) bands. Two alternatives for the latter, CATWIC and the NDCHRWI Hue, require additional masking for other high reflectance land cover features including urban buildings and snow. The workflow presented here is highly feasible given that only one visually interpreted PIF was necessary to threshold the indices.
Reference to Google Maps and Wikipedia was sufficient to facilitate the deductions that were made to interpret the false colour appearances of surface water bodies. This and the use of existing reference maps such as the ESA land cover classification and GSW indicators published in GEE can provide an analytical feasibility that analysts have not enjoyed such easy access to in the past. Therefore, their reference as benchmarks for regional and local classification refinements and questions relating to land cover mapping is highly recommended.
The results suggest that the international datasets used in this study were sufficient as references, but higher precision national mapping inventories with typically higher precisions may be preferred for other land cover classification efforts. The quality of available reference datasets will determine the quality of the results of this analysis. Any errors in the reference data will be inherited by the selected model. Errors may be due to differing land cover class definitions, low spatial resolution and classification accuracy of reference data, differing time of data collection and classification, and the number of spectrally similar classes [
58]. Sample datasets that include mixed pixels can also decrease the accuracy of algorithms and provide erroneous validations [
9]. The ESA WorldCover product only scored an overall accuracy of 74.4% on a global scale, with a user’s accuracy of 88.5% and a producer’s accuracy of 85% for ‘Permanent water bodies’. For Oceania, it had a lower overall accuracy of 67.5% [
44]. Those accuracy estimates were for 2020, whereas the inter-annual median-based image composites used in this study were for the years 2016 and 2018, to simulate typical landscape seasonal variability and to avoid outliers from the anomalous flooding events in the subsequent years.
Alvarez-Vanhard et al. [
61] identified the potential ecological insights that multiscale explanation could provide with data fusion and inter-operability between very high spatial resolution imagery from drones/UAVs and large-scale time series data from satellite-based sensors. While the visible and
NIR bands from multispectral sensors are not directly comparable with those of drones/UAVs [
62], it is expected that the results from the equivalent Sentinel-2 bands from this study could in the future translate to drones/UAVs once the necessary radiometric inter-calibration, testing, and refinement based on solar radiation conditions are conducted [
63,
64]. The indices developed here for application with sensors limited to the visible and
NIR bands should be further tested with the necessary calibration/simulation, with basic four-channel cameras mounted on drones/UAVs for local water mapping. These indices are also compatible with the new generation of high resolution satellites, such as those of PlanetScope, which can provide daily acquisition for dense time series change detection studies at 3 m resolution. They would also allow for retrospective time series analyses to be conducted with archival imagery from the SPOT, RapidEye, and WorldView satellites, and aerial photography from ADS40, for example. At the higher resampled resolution of 5 m,
HRCWI produced less noise compared with the 10 m imagery in the urban mapping assessment. The atmospheric effects on drone/UAV sensors are minimal in comparison with satellite sensors because they are so close to the surface [
64]; therefore, further testing is needed to determine if it might produce less noise at even higher resolutions with a sensor-calibrated drone/UAV.
All the indices analysed in this study misclassified snow/ice to some extent and were affected by hill shade in highly rugged mountainous terrain to varying degrees between Autumn and Spring. The new High Resolution Snow Index (HRSI) should help to resolve these problems. Unlike the NDSI, HRSI can separate snow/ice mutually exclusively from water and hill shade and only requires visible and NIR bands.
One way to reduce errors of commission related to hill shading in surface water mapping could be to mask it out, for example, with the GEE hill shade algorithm: ee.Terrain.hill shade. However, this requires a DEM, which is only currently publicly available in GEE at 30 m resolution. This would overlap fine features in 10 m resolution mapping, explaining why a simple layer of slope was used in this study as a globally applicable surrogate. Alternatively, imagery selection could be filtered to include only images with the azimuth set to less than the corresponding amount for the latitudinal range in question. This may be a useful approach if only the average annual presence of water is of interest. Further research would serve to improve on an index-based solution for shadow masking with an index such as Huemmrich’s [
59] SI (Shadow Index). The same applies to the development of an urban/built-up index that could completely separate all urban features mutually exclusively of water.
Ning and Lee [
65] suggested that the various water indices differ in their strengths and weaknesses, and that combining indices (and morphology), depending on the environment in question, may provide solutions to river mapping. A simple spatial stratification with minimal inputs would be more efficient than a multitude of ecoregionally specific masking rulesets for global monitoring. The GSW (Global Surface Water) layers appear to have omitted substantial areas of water bodies in urban areas around the world because they applied the GHSL (Global Human Settlement Layer) [
66] at 38 m resolution (or lower) as a mask on 30 m products. Rather than mask out non-water misclassifications, fewer errors of omission would be possible by classifying a designated set of zones with the optimal indices. A global hydrologic stratification is therefore proposed in
Table 11 to apply the most appropriate indices to six zones for a consistent multispectral satellite-based global monitoring effort.
For high resolution sensors limited to the visible and
NIR bands, a modified stratification is suggested in
Table 12. Note that the urban masks proposed in this study should only be used in urban areas, and not in arid areas prone to more turbid water bodies, or areas prone to high levels of haze.
The HSV colour space was used in this study because it enables colours to be communicated ontologically, both quantitatively and verbally, if the histogram stretch and extent is defined. For the communication of false colour RGB composites with SII keys, it is important to include the stretch that was used and the spatial extent that the stretch was applied to. For linear stretches with a specified minimum and maximum, this is not really an issue; however, when applying statistically derived stretches to enhance the contrast of features, such as a histogram equalization or standard deviations, for example, colours and their contrasts are expected to vary. The CWI {R,G,B: SWIR2, NIR, Green} index is both quantitatively indicative and ontologically communicable and is most effectively used with a colour-clustering technique. However, visualization of varying degrees of water turbidity appears more distinguishable in the ‘Land/Water’ {R,G,B: NIR, SWIR1, Red}, making the latter easier to identify differences and extremes of different water expressions when setting thresholds for density slicing.
Further multivariate colourimetric development based on sample-dependent machine learning approaches could optimize the automation and precision of classifications with more modern colour spaces such as CIE LUV or LAB and their LCH cylindrical transformations. Weighting each RGB channel with statistically derived coefficients might also improve performance.
Standard deviations or natural (Jenks) breaks will help narrow down the spatial extent of a land cover class if it has a limited spatial representation in the image extent. This will apply to all the indices presented in this study. However, if automation was desired, CAWI lends itself well to Otsu thresholding because of its binary distribution. The fact that CAWI displayed a lower threshold stability than the other candidate indices and required more natural breaks to distinguish it during Autumn and Spring suggests that its data distribution for water relative to other land cover classes is not proportional throughout the year. It may therefore require adaptive spatial–temporal thresholding for large-scale mapping efforts, such as with a moving window or by bioregions if processing capacities permit.
Given the high performance of the fixed thresholds for the indices designed for high resolution sensors limited to the visible and
NIR bands (aside from
HRCWI, which scored a very low standard deviation across the seasons in
Table 5), users will not need to approximate thresholds now that they have been established and assessed for accuracy, and for novel areas they will have a very good starting point. They may however want to validate and refine them for their particular study zones. The end of a thin, tree-lined river was used as a PIF in this study to define the index thresholds. If this type of feature is not present in another ecoregion, then it is recommended to find an area at the last interval of an index gradient that still captures purple or magenta-coloured water bodies in the ‘Land/Water’ RGB before misclassifying any other unrelated non-water features.
An incomplete consideration of ecological factors, landscape diversity, and variation of phenologic processes within limited study zones can lead to inadequate results or ecological inference fallacies. Scalability and inter-seasonality require a search for absolutes. An index can therefore be considered comprehensive if it can separate the full range of features of a particular land cover class mutually exclusively from other land cover classes throughout the seasons and across the full range of the planet’s ecoregions. A holistic, multi-scalar, ecophysiographic approach recognizes the need for visual validation to identify land cover class variations that may not be present in a particular study area’s field samples. In order to formulate an index with minimal sampling effort that is relevant on a global scale for monitoring efforts, both holism and reductionism need to be taken into account. One of the major advantages of the rapid access to global datasets, such as those of Google Earth Engine, is that it readily enables the testing of general assumptions for consistency across different environments around the world with different ecoregional limits and expressions. SII is only really expected to work over large regions and diverse landscapes—this is the holistic element for a spectrally complex reduced class such as water. SII aims to rationalize and facilitate the communication of deductions for virtual sampling to formalize proposed classification methods with accuracy assessments. Quantitative field sample-based modelling and accuracy assessments for the high resolution mapping of highly spatio-temporally variable land cover classes such as turbid ephemeral water bodies and snow over large and diverse regions would be very expensive and difficult to design. We believe that they can be worked towards with preliminary SII and expert agreement about the globally ubiquitous and distinguishable appearance of the land cover features of interest and the indices and thresholds that are necessary for them. The method proposed here offers a way to overcome this, with a trade-off between potential developmental subjectivity and broad, practical applicability.
In the context of remote sensing, holistic reduction and multi-scalar SII facilitate the effective discrimination of landscape features by identifying their appearance globally and across the seasons, and reducing or grouping them into land class ‘primitives’ [
71] which can be classified distinctly with one-class classifications, rather than considering them together with every other class in and across landscapes [
24,
72,
73]. Consequently, classifying sub-classes only from the extents of reduced super-classes (such as water or forest masks) can be more effective because their multispectral overlap with other classes will have been eliminated, reducing the analytical complexity of the data space to be considered.