Water Extraction in PolSAR Image Based on Superpixel and Graph Convolutional Network

Wan, Haoming; Tang, Panpan; Tian, Bangsen; Yu, Hongbo; Jin, Caifeng; Zhao, Bo; Wang, Hui

doi:10.3390/app13042610

Open AccessArticle

Water Extraction in PolSAR Image Based on Superpixel and Graph Convolutional Network

by

Haoming Wan

^1,2,

Panpan Tang

^1,2,*,

Bangsen Tian

^3,4

,

Hongbo Yu

⁵,

Caifeng Jin

⁶,

Bo Zhao

^1,2 and

Hui Wang

^1,2

¹

Research Center of Big Data Technology, Nanhu Laboratory, Jiaxing 314002, China

²

Advanced Institute of Big Data, Beijing 100093, China

³

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

⁴

Laboratory of Target Microwave Properties, Deqing Academy of Satellite Applications, Huzhou 313200, China

⁵

College of Nature Resources and Environment, South China Agricultural University, Guangzhou 510642, China

⁶

School of Civil Engineering and Architecture, Jiaxing Nanhu University, Jiaxing 314001, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(4), 2610; https://doi.org/10.3390/app13042610

Submission received: 4 January 2023 / Revised: 3 February 2023 / Accepted: 16 February 2023 / Published: 17 February 2023

Download

Browse Figures

Versions Notes

Abstract

:

The timely detection and mapping of surface water bodies from Polarimetric Synthetic Aperture Radar (PolSAR) images are of great significance for emergency management and post-disaster restoration tasks. Though various methods have been proposed in previous years, there are still some inherent flaws. Thus, this paper proposes a new surface water extraction method based on superpixels and Graph Convolutional Networks (GCN). First, the PolSAR images are segmented to generate superpixels as the basic unit of classification, and the graph structure data are established according to their connection to superpixels. Then, the features of each superpixel are extracted. Finally, a GCN is used to classify each superpixel unit using node features and their relationships. This study conducted experiments on a sudden flooding event due to heavy rain and a lake in the city. Detailed verification was carried out. Compared to traditional methods, the recall was improved by 3% while maintaining almost 100% accuracy in complex flood areas. The results show that the proposed method of surface water extraction from PolSAR images has great advantages, acquiring higher accuracy and better boundary adherence in cases of fewer samples. This paper also illustrates the advantage of using GCN to mine the contextual information of classification objects.

Keywords:

water extraction; object-based image analysis (OBIA); superpixel; synthetic aperture radar (SAR); graph convolutional networks (GCN); threshold segmentation

1. Introduction

Surface water is important in maintaining the biosphere and the basis of life [1] and severely impacts ecosystems and human activities. However, floods are one of the most destructive natural disasters in the world, posing tremendous damage and threat to people’s safety and social security [2], and about 10 to 20 million persons are affected by floods every year [3]. With an increasingly abnormal global climate and extreme precipitation, flood events are occurring more frequently and causing more serious losses than before [3,4]. Therefore, the rapid acquisition of surface water bodies benefits resource investigation, emergency relief, post-disaster management, etc., which is of great research significance [4,5].

Remote sensing (RS) techniques have the potential for highly detailed, high-temporal-resolution, high-efficiency, and large-scale surface water mapping. Compared to optical RS imagery, synthetic aperture radar (SAR) has advantages and is usually the prime candidate for the task [6,7]. First, SAR works in the active mode and is independent of sun illumination and weather conditions (capable of penetrating clouds and rain). Second, it can easily differentiate the roughness of land and water bodies. Through double or multiple scattering on a rough land cover surface, the sensor can acquire a high amount of ground signals. Specular backscattering tends to occur when the radar signal reflects off a very smooth water surface, sending most of the signals away from the sensor and creating dark areas in SAR images.

In recent years, SAR technology has advanced by leaps and bounds, and various satellites such as sentinel-1 and GF-3 have been launched. They can provide many images with low cost, large coverage, and over a short time period, which greatly reduces the obstacles of using SAR data. Polarimetric SAR (PolSAR) refers to the data with additional polarimetric information and is the orientation of the plane in which the radar wave oscillates. Different polarization modes carry information about the different scattering mechanisms and structures of the imaged surface [8,9]. Thus, multi-polarization data always perform better in various segmentation and classification tasks than single-polarization, and likewise, it is preferred in the extraction of surface water bodies [2,10,11].

Currently, many extraction methods of surface water bodies can be used. Since the scattering intensity of water bodies in PolSAR images is usually significantly lower than that of land covers [12], threshold segmentation (TS) can be directly applied to the intensity or composite images. Some commonly used algorithms include the Ostu algorithm, adaptive global method, entropy threshold method, minimum error threshold method [13,14,15], and active contour model (ACM) [16]. Due to the advantage of requiring no additional information, low computational complexity, and ease of use, TS is still the most frequently used method in disaster emergency scenarios [17]. However, TS methods have some natural defects, including: (1) the specific threshold value cannot be calculated quickly and often needs to be adjusted manually for better results, which means the accuracy is affected by subjective factors of the selector. (2) The segmentation result produces many noise patches, which requires a series of post-processing activities [18,19].

The supervised machine learning (ML) method adopts artificial design features, such as texture, terrain, and color at the pixel level, and classifiers such as support vector machine (SVM), random forest (RF), and gradient boosting decision tree (GBDT) to extract surface water [20,21]. In addition, the object-oriented analysis method is also commonly used by extracting features from the object of the superpixel [22,23]. Superpixel techniques segment images into regions by considering similarity measures defined using perceptual features. These methods often regard classification units as independent individuals and ignore their spatial relationships. Therefore, these methods’ effects on complex flood areas still need to be improved. Some studies have proposed the use of conditional random field (CRF) processing at the pixel level [24] or association methods using the adjacent frequency of spatial object categories at the object level [25] to introduce spatial relationships. However, these methods are cumbersome and cannot acquire enough contextual information, which limits accuracy improvement.

Deep learning (DL) methods have great performance in various segmentation tasks. Convolution neural networks (CNNs) have been widely used to identify the regions of interest in SAR images [26]. With its multi-layer network structure and strong learning ability, the DL method is an end-to-end classifier and does not need to design features manually. However, DL is a data-driven method requiring many samples in advance, which is time-consuming and costly [17,24]. Another point is that the dataset is a sampling of the real environment, which means that the trained network only has a better effect on data with a similar distribution. It is particularly obvious in remote-sensing tasks that the performance of models declines sharply when used in a totally unfamiliar dataset. Moreover, the results of CNNs are easily confused at the boundary, especially in complex flood-inundated areas. Above all, CNNs are difficult to map in cases of sudden floods in different regions.

In general, the TS method cannot accurately calculate the threshold and needs to be adjusted manually. ML methods treat each classified object as an isolated object and ignore their intrinsic relationship, which limits its accuracy. DL methods can achieve the highest accuracy, but they require large samples and computing resources, which are expensive, and their performance degrades when migrating across datasets.

In recent years, graph neural networks (GNNs) have received extensive attention and have shown considerable potential in graphing data, one-shot learning, and optimizing CNN results. In the field of RS, GNNs have also been successfully applied to image classification [27,28,29,30]. A significant reason for these successes is due to the aggregation ability of GNNs for global contextual information. The object-oriented analysis method can accurately capture the boundary and reduce the number of classified objects simultaneously, which is important for analyzing high-resolution images and beneficial for converting rule data into a graph structure.

Thus, a GCN based on the object-oriented analysis method is proposed to overcome the abovementioned problems. The object-oriented method can capture the boundary of the object well, and the GCN establishes the spatial relationship between superpixels [31]. The proposed method is tested in a complex flood area and a lake compared with the traditional method. This paper is organized as follows: the method we propose is described in Section 2; Section 2.1, Section 2.2 and Section 2.3 describe the superpixel segmentation and feature extraction, sample selection, graph construction, and training, respectively; experimental results and analysis are demonstrated in Section 3; and the discussion and conclusion are provided in Section 4 and Section 5, respectively.

2. Materials and Methods

The flowchart of the proposed method is shown in Figure 1. There are three main sub-processes: (a) superpixel segmentation and feature extraction, (b) selection of training samples, and (c) graph convolution networks construction, training, and prediction of unknown nodes.

First, the PolSAR image was over-segmented using the method of SLIC to obtain the superpixels as the classification unit. The features of each superpixel, involving scattering features, texture features, and statistics features, were calculated. Their importance in our task was also discussed.

Second, for unbiased selection of training samples (particularly to avoid the misclassification of the land with high water content), the t-SNE [32], a popular method for dimension reduction and visualization, was applied to roughly judge the numbers of the initial clustering center of superpixels. Then an unsupervised clustering algorithm was used for generating the coarse classification results to assist selecting samples by hand. Note that the processes of t-SNE and coarse classification are not indispensable in our methods and are just used for the extremely complex scenario where manual annotation is difficult.

Third, an undirected graph was constructed according to the relationships between adjacency superpixels and the features of each superpixel. After converting the raster image into a graph, the GCN was trained with the training samples, used for prediction of the whole image, and evaluated based on the ground truth labels.

2.1. Superpixel Segmentation and Feature Extraction

2.1.1. Superpixel Segmentation

An image is a kind of regular data in Euclidean space that can also be regarded as a special kind of graph structure data. However, using each pixel as a node when training a GNN requires huge computing resources. With the ability to aggregate similar pixels to form a more representative large ‘element’, superpixel segmentation can greatly reduce the number of nodes and the effect of speckles. Additionally, superpixels preserve most object information and adhere well to boundaries [33]. Therefore, the proposed method takes the superpixel as the basic classification unit. Available algorithms include FNEA, simple linear iterative clustering (SLIC) [31], improved SLIC method [34] and mixture-based methods [35].

The SLIC algorithm clusters pixels in the combined five-dimensional color (the L, a, b values of the CIELAB color space) and image plane space (x, y coordinates of the pixels) to cluster pixels. It includes three steps, initializing the center of clustering, local k-means clustering, and postprocessing. Due to its ease of use and effectiveness, it can be adapted to segment scenes such as ponds and lakes. The improved SLIC method modifies the SLIC clustering function to adapt the characteristics of polarimetric statistical measures and renews the initialization method to produce robust cluster centers. It is a better choice in complex scenes such as floods. In the following study areas of this paper, the improved version was adopted. Note that over-segmentation is required to obtain object boundaries accurately.

2.1.2. Feature Extraction

After superpixel segmentation, the features of each node were extracted, including:

(1): Scattering matrix and Statistical features

Usually, the electromagnetic scattering characteristics of radar targets in the far field are linear. If the scattering space coordinate system and the corresponding polarization base are selected, there is a polarization component between the radar illumination wave and the target scattered wave. Therefore, the variable polarization effect of the target can be expressed in the form of a complex two-dimensional matrix called the scattering matrix, which represents the full polarization information of the target at a specific attitude and observation frequency.

S = [\begin{matrix} S_{H H} & S_{H V} \\ S_{V H} & S_{V V} \end{matrix}]

(1)

where H stands for horizontal polarization and V means vertical polarization.

S_{H H}

and

S_{V V}

denote the co-polarization components of the scattering matrix.

S_{H V}

and

S_{V H}

are the cross-polarization components, equal for a reciprocal medium. All matrix elements are directly acquired from the complex values of multi-polarization single-look images. For dual-polarization data, only one co-polarization and one cross-polarization element were kept. These components tend to play different roles in extracting water. For instance,

S_{H H} / S_{V V}

images usually have a better signal-to-noise ratio than

S_{H V} / S_{V H}

in various targets, especially calm surface water bodies, due to the specular reflection mechanism;

S_{H H}

scarcely differs from

S_{V V}

; and

S_{H V} / S_{V H}

is more sensitive to rough water surfaces, such as flowing rivers or floods.

A scattering matrix is acquired for each pixel, and to extract the features representing each superpixel, all pixel-level matrix elements need to be translated into superpixel-level statistics values. For example, the mean, median, standard deviation, maximum, and minimum of each scattering matrix element should be calculated for all pixels belonging to the same superpixel. Among these features, maximum and minimum values are easily affected by the speckle noise, and the median can measure the central tendency better than the mean.

(2): Texture features and covariance matrix

Texture analysis is important for SAR segmentation and has already been investigated in some literature [36]. Textures can be described by first-order metrics, such as the mean, variance, or entropy, as well as second-order metrics, such as gamma distribution features [7]. They can be calculated in the superpixel space and serve as a direct indicator of ‘disorder’ within them. All the above are based on single-polarization images; although they can be calculated for each polarization channel, a more sensible method is to obtain a mixed texture value from multi-polarization data. Considering that heterogeneous regions are common in high-resolution SAR images, the spherically invariant random vector (SIRV) product model can be adopted [37]. In this model, the maximum likelihood estimator of the texture value is given by:

τ = \frac{k^{H} M^{- 1} k}{p}

(2)

where

k

is the target vector in the linear bases as

{[S_{H H}, \sqrt{2} S_{H V}, S_{V V}]}^{T}

for quad-polarization data.

M

refers to the normalized covariance matrix (

k \times k^{T}

).

H

denotes the conjugate transpose operator and

p

equals 2 (dual-polarization) or 3 (quad-polarization). It is important to note that the covariance matrix is the mean of all pixels in the superpixels rather than a single pixel. It contains all the polarization information and is usually taken as the basis of PolSAR processing. Thus, the upper triangular elements of

M

(symmetric matrices) could be used as features. The elements of the main diagonal are highly related to the statistical mean values of the

S

matrix, and they refer to the mean intensity and magnitude value, respectively.

(3): Land to Water pixel Ratio (LWR)

With no regard to various defects, TS methods are widely used in water extraction due to their convenience. Thus, their pre-segmentation results can be expected as an important input for a GNN network, and especially in some complex flood areas, they can facilitate the distinction between high-water soil and water bodies. However, almost all TS methods are concentrated on the pixel level, which cannot be used directly at the object level. Therefore, this study proposes an indicator called LWR based on superpixel and TS methods as follows:

The standard Ostu method [38] is first applied to all pixels to obtain a binary image. It is a self-adaptive maximum inter-Class variance method and is commonly used in gray image segmentation. Pixels with intensity values lower than the threshold are marked as 0, and others are marked as 1. Then, for each superpixel, the ratio of the land pixel number to water (1 to 0) is counted. The superpixel is more likely to be water as the LWR becomes smaller.

2.2. Sampling

Few samples should be chosen for GNN model training. It is an operation that manually assigns several superpixels their true ground labels (water or non-water). It is important to note that the non-water label probably contains several different ground features, such as bare soil, grass, and buildings. More seriously, some ground with a high-water content has a gray tone, which is easily confused with water in SAR images. Thus, samples should carefully include all features that probably affect the extraction of water. Any sample errors or deficiencies reduce the model’s precision. An unsupervised method could be optionally used to assist sample selection, the specific operation of which is as follows.

First, the feature vector of each superpixel is reduced to two dimensions using the t-SNE method [32], and the aggregation of the two-dimensional feature scatterplot is drawn after dimensionality reduction. Different cluster centers are regarded as different categories, other scattered points are ignored, and the amount of cluster centrality is recorded. Next, the K-Means clustering method is used to cluster the features of all superpixels. The number of cluster centers is the number of clusters recorded in the previous step using t-SNE. Using K-Means results as a reference, some representative water and non-water objects are selected by hand as samples (train sample data). It is important to note that K-Means results cannot be regarded as final segmentation results because they still have many errors that are removed later in the GNN network.

2.3. Graph Construction and Classification

2.3.1. Graph Construction

An image is in a regular grid structure where each pixel has four neighbors in its first-order neighborhood and eight in its second-order neighborhood. The regularity of the data in Euclidean space makes it simple to handle, which also can be expressed as graph structure data composed of several nodes and the edges of the adjacency relationship between them. Graph structure data can be expressed as G = (V, E), where V is the set of graph nodes with its feature vector, and E is the set of edges connected between nodes, including the attributes of edges. Based on the result of superpixel segmentation, an undirected graph (the weight of the edge is the same) is constructed by connecting each center point superpixel with all the surrounding superpixels with shared segmentation edges as the adjacency relationship and taking the features extracted by each superpixel unit as the features of each vertex in the graph structure. The undirected graph reflects the adjacency relationship between superpixels, and each node corresponds to a meaningful real entity. It reduces the amount of calculation through superpixel generation, which is particularly important in high-resolution images.

2.3.2. Node Classification via a Graph Convolutional Network

This study adopts the GraphSAGE framework, which is a kind of inductive learning model in the spatial domain [39]. The framework first aggregates the information of each node and its adjacent nodes through several graph convolution layers and then outputs the classification graph. The output classification graph is the graph data with the same structure as the input graph, but the feature of each point is the category of nodes, as shown in Figure 2.

In the hidden layer, each layer first samples each node and its surrounding neighbors and then aggregates the sampled data of each node to obtain the new characteristics of the node, as shown in Figure 3. There are two key points in the hidden layer: sampling and information aggregation. Taking node V2 as an example, first, we can randomly sample the neighbor nodes of V2, set the number of samples as N, and obtain the set

ϰ (N)

, while f is the dimension of node characteristics. If the number of vertex neighbors is less than k, the sampling method with put-back is adopted until n nodes are sampled. We can aggregate the mean value of the set sampled from each node and obtain the new characteristics of the node after linear transformation and activation function, as shown in the following formula:

H_{v}^{k} = σ (W \cdot M E A N (ϰ (N))

(3)

In the output layer, the cross entropy of the predicted label and the real label of each node is used as the loss function.

L = - \sum_{i} \sum_{c = 1}^{M} y_{i c} \log (p_{i c})

(4)

where

L

is the loss and

M

is the number of categories. If the real category of sample

i

is category

c

, we take

y_{i c} = 1

; otherwise,

y_{i c} = 0

,

p_{i c}

is the prediction probability that sample

i

belongs to category

c

.

3. Results

Our proposed method was tested on a Sentinel-1 image with a flood event and a GF-3 image for the river extraction. The results were compared to other pixel-based and superpixel-based methods such as RF and XGBoost [40].

3.1. Study Areas and Data

3.1.1. Study Areas

The main experimental area is the Xinfa reservoir (see Figure 4a), located in Inner Mongolia, China. It is used for flood control, water supply, the comprehensive utilization of power generation, and aquaculture irrigation. The total storage capacity of the reservoir is 38.08 million cubic meters, including 23.31 million cubic meters of flood-control storage and 5.26 million cubic meters of regulation storage. The standard return period of design flood is 30 years, and the corresponding peak flow is 303

m^{3} / s

. At 15:30 (UTC/GMT +8) on 18 July 2021, the Xinfa reservoir collapsed, causing the downstream highway to be washed away. Figure 4(a1,a2) show the Sentinel-1 SAR images acquired before and after the flood, respectively. Another study area is the Zhujiang River in Guangzhou province (see Figure 4b), where dense river branches and fishponds are distributed. Both study areas are challenging and can prove the superiority of the proposed method. In this paper, the comprehensive processing results of study area 1 and only the final classification results of study area 2 are presented.

3.1.2. Experimental Data

The experimental data include sentinel-1 and GaoFen-3. Sentinel-1 data in the experiment are dual-polarization GRD products (VV and VH polarized images) in the IW mode with the C-band. The final data used in the experiment were obtained after orbit correction, thermal noise removal, radiometric calibration, speckle filtering, and terrain correction [41]. All this preprocessing was achieved with SNAP software. Then, Python code was used for feature extraction, superpixel segmentation, samples, training GCN, and accuracy evaluation. GaoFen-3 is a C-band SAR imaging satellite launched by the Chinese government. The data used in the experiment were in full-polarization mode with a resolution of 1 m. The whole study area was labeled through a pre-classification operation and by manual correction, meaning each superpixel has its ground label (water or non-water). This was used to evaluate the accuracy of various methods and to validate the efficacy of the proposed method.

3.2. Results of Study Area 1

3.2.1. Superpixel Segmentation

For study area 1, the desired size of the superpixel was set to 225, which is the most commonly used value in previous studies. If the value is too small, the superpixel will lose statistical significance, introducing a large amount of noise and a great computational burden to the subsequent classification task. Likewise, a bigger value will affect the accuracy of land–water boundaries to some extent. It is important to note that the real superpixel sizes are changeable and adaptive to their locations. In this study, most sizes were between 200 and 300, with some smaller, reaching a given minimum value of 10. Figure 5 shows the superpixel-segmentation result, and even in the most complex regions, the superpixels adhere well to the boundaries of land and water. Thus, it can be concluded that superpixel-segmentation methods are suitable for pre- and over-segmentation in water extraction.

3.2.2. Analyzation of Features

All the features discussed in Section 2.1.2 compose a vector representing each superpixel. LWR was calculated for all superpixels, and the histogram result is shown in Figure 6. Figure 6 shows obvious double peaks, and that closer to 0 belongs to the water body.

For the redundancy analysis of features, in this part, their correlations were calculated, as shown in Figure 7a. The texture value is generally independent of other features. The LWR value is related to the mid-value in the HV mode and the mean and standard deviation values in the HH mode separately; furthermore, the mid of the HV mode is related to the standard deviation and mean in the HH mode. Generally, not every polarization mode shows the same trend, and the correlation between different polarization modes is not strong, which shows that different polarization modes are of different importance to water extractors; the LWR is independent of the texture value and scattering matrix and shows a certain correlation with statistical characteristics, as it is calculated from those features. On the other hand, we already know the strong correlation between LWR and water bodies, so the features with a strong correlation with LWR should also be conducive to water extraction; in other words, HH is more conducive to water extraction than HV polarization.

To analyze whether the design features are useful, the importance of the features is evaluated by the tree model, as shown in Figure 7b. According to the results, the most important features are texture information, then the LWR features and the scatter matrix. In terms of the statistical features, the most useful are the standard deviation, min, and mean in the HH mode, which further shows that the HH polarization mode is more suitable for water extraction.

In general, the above analysis shows that the features are effective, and the importance of texture and the HH polarization mode is demonstrated. Considering that a neural network can resist the risk caused by feature redundancy, different models have different weights for different features; all the above are selected as the features of nodes.

3.2.3. Samples

A total of 14,878 superpixels were acquired and then sampled according to the method in Section 2.2. Figure 8a shows that the t-SNE method classified all superpixels into three categories: land, water, and unknown. Unknown superpixels refer to shallow water or pretty wetland areas with backscattering characteristics between land and water. Figure 8b shows the visual distributions of these three categories. It is important to note that the pre-classification cannot replace the final GNN classification because it still has many errors and should be carefully selected for samples. The pre-classification is just used to assist in selecting training samples and is not a prerequisite. In this experiment, samples of 46 water superpixels and 67 non-water superpixels were used for the model training, and all others (already having ground truth labels due to manual and pre-classification) were used as the validation dataset. Though the training samples were small, we still obtained satisfying results through further model classification operations.

3.2.4. Classification Results

A small area is shown in Figure 9 to visualize the superpixels and constructed graph data. To highlight the performance of GNN, classifiers of RF and XGBoost were also used to classify the superpixels based on the selected features. Results show that they all achieved good visual performance (see Figure 10a–c) relative to the ground truth (see Figure 10g). To validate the effectiveness of superpixels, standard TS, RF, and XGBoost were also tested on the dimension of pixels. All pixels in 46 water superpixels and 67 non-water superpixels (the same as in the superpixel classification) were used for training, and all other pixels in the image (with labels) were used for validation. The results can be seen in Figure 10d–f. Pixel-based methods produced much more noise than superpixel-based methods, and standard TS had the worst performance. Of course, extra filter operations could improve the results to some extent, but in this research, we merely discuss standard processing.

Quantitative evaluations are given in Table 1. Both superpixel-based and pixel-based methods were tested, and scores were calculated in their own dimensions. To compare our method’s results with pixel-based methods, the scores should be recalculated again from the superpixel-level to pixel-level according to their spatial inclusion relations (see the last line of Table 1). Results show that our method had the highest precision, recall, and F1 scores. For superpixel-based methods, the recall score of our method was 3 to 4 percent higher than RF and XGBoost. XGBoost obtained the second-best scores. We also tested another graph convolution network called GAT [42], whose performance was worse as well. To further visualize the difference between XGBoost and GNN, regions of TP (true-positive), FN (false-negative), and FP (false-positive) are presented in Figure 11. We can see that XGBoost missed many positive superpixels in small water areas or the boundary of land–water areas. A GNN can deal with this problem better due to its powerful ability to capture contextual information. The second part of Table 1 clearly shows that our proposed method achieved much higher scores than pixel-based RF and XGBoost.

3.3. Results of Study Area 2

In study area 2, a total of 377,579 superpixels were acquired, and all of them were manually labeled through pre-classification. The ground truth can be seen in Figure 12g. A total of 575 water superpixels and 2717 land superpixels were used for training, and other superpixels were used for validation. Figure 12 shows the results of various methods. Overall, the GNN achieved the best performance in detecting small targets and preserving boundaries. The red rectangles in Figure 12a indicate where ships were accurately removed. Similar to study area 1, pixel-based methods (see Figure 12d–f) generated more noise than superpixel-based methods.

Quantitative evaluations are given in Table 2. For superpixel-based methods, our method acquired an increase of 5 to 6 percent in precision, 12 to 13 percent in recall, and 9 to 10 percent in F1 scores, compared to RF and XGBoost. GAT network acquired the second performance, and about 6 percent in F1 scores lower than ours. The positive effect of GNN network was validated in this experiment. Meanwhile, there were much bigger increases for pixel-based methods—a maximum of 12 percent in precision, 22 percent in recall, and 17 percent in F1 scores were acquired. This shows the power of our proposed method in processing complex scenarios.

4. Discussion

Based on superpixel segmentation, this study transforms a regular image into irregular graph structure data. Another more direct method is to take each pixel as the vertex, the pixel value as the features of the node, and the second-order neighborhood as the adjacency relationship. This causes two problems: the first is the network size and computational burden. Taking a single-channel small image (size of 1024 × 1024) as an example, 1,048,576 vertices and nearly 10 million edges are generated. Assuming that the superpixel is composed of 200 pixels on average, the number of vertices is reduced by 200 times, from one million to five thousand, which also means that the number of network calculations is reduced by 200 times. The second problem is the speckle noise. Different from an optical image, any pixel-level algorithm on SAR data is greatly affected by speckle noise. Statistical values on an adjacent area are more robust and meaningful than a single pixel value and are thus more often used for PolSAR classification tasks. Moreover, a prior knowledge graph can be constructed by superpixel segmentation. Thus, superpixels are preferred in a GNN network.

However, the usage of superpixels still causes some problems. First, several segmentation methods have been recently proposed for multi-polarization and single-polarization SAR data. Often, we need to choose the optimal method according to the data, and its precision and efficiency directly affect the GNN results. On the one hand, some small targets are ignored when the original size of the superpixel is too large. On the other hand, compared to simple TS methods, over-segmentation is time-consuming, especially for a large area. However, once superpixels are acquired, a GNN can classify them as quickly as other methods. Another problem is how to express the features of nodes. In this paper, a series of features were designed and analyzed, including statistical values of the scattering matrix, texture, covariance matrix, and LWR. For different data, the feature values probably have different expressions. Like texture value, several calculation methods are available. These features tend to be redundant because some of them are linearly dependent. However, the GNN network can deal with this redundancy, and the accuracy does not reduce, even when all available features are inputted. Moreover, the importance of features differs greatly. Through experiments, we determined that the texture and LWR are the most important, and the second most important are the scattering values of the HH polarization. For single-polarization data, texture and HH are the most important.

The parameters affecting the GNN results mainly include the sampling number of surrounding neighbors and the layer number. A large sampling number causes lots of redundant information and computation, while a small number leads to a feature deficiency in adjacent nodes. Usually, the mean value of neighbor nodes is the optimal number. Meanwhile, the layer number depends on how many layers of information you want to aggregate. If there are too many layers, the characteristics of the nodes in the same area converge, and the nodes are not separable because they all contain a lot of the same neighborhood information. Generally, graph convolution should not have more than five layers. We also tried the graph convolution network of GAT and found that the accuracy in this task is not as good as the GraphSAGE. That may be because GAT more easily smooths features between different nodes.

The proposed method is suitable for all kinds of water-related scenes, such as rivers, lakes, ponds, coastlines, floods, and probably paddies. It has advantages in uncommon and complicated scenes, where large samples are hard to collect, and trained DL models cannot be directly used. High precision can be expected, while for an extremely large area, the computation time is alarming compared to other simple TS methods. It is important to note that the proposed method can also be sped up by simplifying the steps of superpixel segmentation and feature selection. On the other hand, the use of superpixels leads to a problem: how to express the features of each superpixel reasonably. Manually designed SAR image features are shallowly expressed and can affect classification accuracy. Missing any key features likely leads to bad results. A simple operation involves designing and choosing all available features as much as possible, but this makes the method cumbersome and inefficient.

5. Conclusions

For all-weather imaging, PolSAR images play an increasingly important role in emergency response and disaster relief. This study proposed a method based on superpixel methods and graph convolution to extract the water area in PolSAR images, which has three inherent advantages: (1) superpixel segmentation makes the classified object correspond to a real-world object, reducing the influence of noise in the classification; (2) graph convolution no longer considers each classification object separately but considers the spatial relationship between objects, which improves the accuracy of classification. Moreover, this paper emphatically explains three processing parts: (1) feature extraction, including the statistical values of the scattering matrix, texture value, covariance matrix elements, and LWR, and their correlation and importance are discussed; (2) samples—an auxiliary unsupervised method was proposed to deal with complex scenes; (3) the GNN model, including graph construction and node classification. A GraphSAGE framework was chosen.

Comparison experiments were conducted in two study areas at both the superpixel level and the pixel level. The results show that superpixel-based methods achieved much better performance than pixel-based methods, proving the advantage of using superpixels in SAR images. Though the performances of the former can be improved using extra skills or advanced algorithms, superpixels are still preferred. Among the superpixel-based methods, the GNN classifier obtained the highest scores, especially the recall metrics, which were about 3–4 percent higher than XGBoost and RF. The experiments in this paper prove that our proposed method is suitable for finely mapping water bodies in complex areas, such as in the case of floods. The GNN uses an inductive learning framework to mine deeper spatial relationship information and can efficiently classify the nodes (superpixels). Compared with CNN methods, the proposed method does not require high-quality datasets and much computing power while maintaining high accuracy. The proposed method can be used for reference in extracting other ground features in SAR images.

6. Patents

Haoming Wan, Panpan Tang, et al., A refined flood area extraction method based on Spaceborne SAR image: China, ZL202210043155.8[P]. 14 January 2022.

Author Contributions

H.W. (Haoming Wan) designed the method and drafted the manuscript. P.T. completed the experiment and revised the manuscript. B.T. and H.Y. implemented the superpixel segmentation algorithm and revised the manuscript. C.J. provided the advice of experiment and manuscript. B.Z. and H.W. (Hui Wang) assisted the experiment and data processing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by Open Research Fund of Laboratory of Target Microwave Properties (No. 2022-KFJJ-002) and the nonprofit research project of Jiaxing City (2022AY30001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the Copernicus Open Access Hub (scihub.copernicus.eu) for providing Sentinel-1 SAR data accessed on 6 July 2020, and thank the CHEOS (cheosgrid.org.cn) for providing GF-3 data accessed 18 July 2020.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, Y.; Li, Z.; Zeng, C.; Xia, G.-S.; Shen, H. An Urban Water Extraction Method Combining Deep Learning and Google Earth Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 769–782. [Google Scholar] [CrossRef]
Ulloa, N.I.; Chiang, S.-H.; Yun, S.-H. Flood Proxy Mapping with Normalized Difference Sigma-Naught Index and Shannon’s Entropy. Remote Sens. 2020, 12, 1384. [Google Scholar] [CrossRef]
Tellman, B.; Sullivan, J.A.; Kuhn, C.; Kettner, A.J.; Doyle, C.S.; Brakenridge, G.R.; Erickson, T.A.; Slayback, D.A. Satellite imaging reveals increased proportion of population exposed to floods. Nature 2021, 596, 80–86. [Google Scholar] [CrossRef]
Jongman, B.; Winsemius, H.C.; Aerts, J.C.J.H.; de Perez, E.C.; van Aalst, M.K.; Kron, W.; Ward, P.J. Declining vulnerability to river floods and the global benefits of adaptation. Proc. Natl. Acad. Sci. USA 2015, 112, E2271–E2280. [Google Scholar] [CrossRef] [Green Version]
Voigt, S.; Kemper, T.; Riedlinger, T.; Kiefl, R.; Scholte, K.; Mehl, H. Satellite Image Analysis for Disaster and Crisis-Management Support. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1520–1528. [Google Scholar] [CrossRef]
Schumann, G.; Di Baldassarre, G.; Bates, P.D. The Utility of Spaceborne Radar to Render Flood Inundation Maps Based on Multialgorithm Ensembles. IEEE Trans. Geosci. Remote Sens. 2009, 47, 2801–2807. [Google Scholar] [CrossRef]
Pappas, O.A.; Anantrasirichai, N.; Achim, A.M.; Adams, B.A. River Planform Extraction From High-Resolution SAR Images via Generalized Gamma Distribution Superpixel Classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 3942–3955. [Google Scholar] [CrossRef]
Marghany, M. (Ed.) Nonlinear Ocean Dynamics: Synthetic Aperture Radar; Elsevier: Amsterdam, The Netherlands, 2021. [Google Scholar]
Marghany, M. (Ed.) Synthetic Aperture Radar Imaging Mechanism for Oil Spills; Gulf Professional Publishing: Oxford, UK, 2020. [Google Scholar]
Ndikumana, E.; Minh, D.H.T.; Baghdadi, N.; Courault, D.; Hossard, L. Deep Recurrent Neural Network for Agricultural Classification using multitemporal SAR Sentinel-1 for Camargue, France. Remote Sens. 2018, 10, 1217. [Google Scholar] [CrossRef] [Green Version]
Hoekstra, M.; Jiang, M.; Clausi, D.A.; Duguay, C. Lake Ice-Water Classification of RADARSAT-2 Images by Integrating IRGS Segmentation with Pixel-Based Random Forest Labeling. Remote Sens. 2020, 12, 1425. [Google Scholar] [CrossRef]
Zoka, M.; Psomiadis, E.; Dercas, N. The Complementary Use of Optical and SAR Data in Monitoring Flood Events and Their Effects. EWaS3 2018 2018, 2, 644. [Google Scholar] [CrossRef] [Green Version]
Henry, J.; Chastanet, P.; Fellah, K.; Desnos, Y. Envisat multi-polarized ASAR data for flood mapping. Int. J. Remote Sens. 2006, 27, 1921–1929. [Google Scholar] [CrossRef]
Brivio, P.A.; Colombo, R.; Maggi, M.; Tomasoni, R. Integration of remote sensing data and GIS for accurate mapping of flooded areas. Int. J. Remote Sens. 2002, 23, 429–441. [Google Scholar] [CrossRef]
Brown, K.M.; Hambidge, C.H.; Brownett, J.M. Progress in operational flood mapping using satellite synthetic aperture radar (SAR) and airborne light detection and ranging (LiDAR) data. Prog. Phys. Geogr. Earth Environ. 2016, 40, 196–214. [Google Scholar] [CrossRef]
Ying, L.; Liu, Z.; Zhang, H.; Wang, Y.; Li, N. Improved ACM Algorithm for Poyang Lake Monitoring. J. Electron. Inf. Technol. 2017, 39, 1064–1070. [Google Scholar]
Bao, L.; Lv, X.; Yao, J. Water Extraction in SAR Images Using Features Analysis and Dual-Threshold Graph Cut Model. Remote Sens. 2021, 13, 3465. [Google Scholar] [CrossRef]
Zhu, W.; Dai, Z.; Gu, H.; Zhu, X. Water Extraction Method Based on Multi-Texture Feature Fusion of Synthetic Aperture Radar Images. Sensors 2021, 21, 4945. [Google Scholar] [CrossRef]
Li, N.; Wang, R.; Deng, Y.; Chen, J.; Liu, Y.; Du, K.; Lu, P.; Zhang, Z.; Zhao, F. Waterline Mapping and Change Detection of Tangjiashan Dammed Lake After Wenchuan Earthquake From Multitemporal High-Resolution Airborne SAR Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 3200–3209. [Google Scholar] [CrossRef]
Xie, L.; Zhang, H.; Wang, C. Water-body types classification using Radarsat-2 fully polarimetric SAR data. In Proceedings of the 2015 IEEE International Conference on Aerospace Electronics and Remote Sensing Technology (ICARES), Bali, Indonesia, 3–5 December 2015; pp. 1–5. [Google Scholar]
Lv, W.; Yu, Q.; Yu, W. Water extraction in SAR images using GLCM and Support Vector Machine. In Proceedings of the IEEE 10th International Conference on Signal Processing, Beijing, China, 24–28 October 2010; pp. 740–743. [Google Scholar]
Hong, S.; Jang, H.; Kim, N.; Sohn, H.-G. Water Area Extraction Using RADARSAT SAR Imagery Combined with Landsat Imagery and Terrain Information. Sensors 2015, 15, 6652–6667. [Google Scholar] [CrossRef]
Huang, W.; Li, H.; Lin, G. Classifying forest stands based on multi-scale structure features using Quickbird image. In Proceedings of the 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM), Fuzhou, China, 8–10 July 2015. [Google Scholar]
Paul, S.; Ganju, S. Flood Segmentation on Sentinel-1 SAR Imagery with Semi-Supervised Learning. arXiv 2021, arXiv:2107.08396. [Google Scholar]
Tang, Y.; Qiu, F.; Jing, L.; Shi, F.; Li, X. A recurrent curve matching classification method integrating within-object spectral variability and between-object spatial association. Int. J. Appl. Earth Obs. Geoinf. 2021, 101, 102367. [Google Scholar] [CrossRef]
Zhu, X.X.; Montazeri, S.; Ali, M.; Hua, Y.; Wang, Y.; Mou, L.; Shi, Y.; Xu, F.; Bamler, R. Deep Learning Meets SAR: Concepts, Models, Pitfalls, and Perspectives. IEEE Geosci. Remote Sens. Mag. 2021, 9, 143–172. [Google Scholar] [CrossRef]
Hu, H.; Ji, D.; Gan, W.; Bai, S.; Wu, W.; Yan, J. Class-Wise Dynamic Graph Convolution for Semantic Segmentation. In Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 1–17. [Google Scholar]
Liu, Q.; Kampffmeyer, M.; Jenssen, R.; Salberg, A.B. SCG-Net: Self-Constructing Graph Neural Networks for Semantic Segmentation. arXiv 2020, arXiv:2009.0159. [Google Scholar]
He, C.; He, B.; Tu, M.; Wang, Y.; Qu, T.; Wang, D.; Liao, M. Fully Convolutional Networks and a Manifold Graph Embedding-Based Algorithm for PolSAR Image Classification. Remote Sens. 2020, 12, 1467. [Google Scholar] [CrossRef]
Zhang, B.; Xiao, J.; Jiao, J.; Wei, Y.; Zhao, Y. Affinity Attention Graph Neural Network for Weakly Supervised Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8082–8096. [Google Scholar] [CrossRef]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC Superpixels Compared to State-of-the-Art Superpixel Methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [Green Version]
Hinton, G.; Roweis, S. Stochastic Neighbor Embedding. Adv. Neural Inf. Process. Syst. 2003, 15, 833–840. [Google Scholar]
Xiang, D.; Tang, T.; Zhao, L.; Su, Y. Superpixel Generating Algorithm Based on Pixel Intensity and Location Similarity for SAR Image Classification. Geosci. Remote Sens. Lett. 2013, 10, 1414–1418. [Google Scholar] [CrossRef]
Yin, J.; Wang, T.; Du, Y.; Liu, X.; Zhou, L.; Yang, J. SLIC Superpixel Segmentation for Polarimetric SAR Images. IEEE Trans. Geosci. Remote Sens. 2020, 60, 99. [Google Scholar] [CrossRef]
Arisoy, S.; Kayabol, K. Mixture-Based Superpixel Segmentation and Classification of SAR Images. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1721–1725. [Google Scholar] [CrossRef]
Kandaswamy, U.; Adjeroh, D.A.; Lee, M.C. Efficient Texture Analysis of SAR Imagery. IEEE Trans. Geosci. Remote Sens. 2005, 43, 2075–2083. [Google Scholar] [CrossRef]
Xiang, D.; Ban, Y.; Wang, W.; Su, Y. Adaptive Superpixel Generation for Polarimetric SAR Images With Local Iterative Clustering and SIRV Model. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3115–3131. [Google Scholar] [CrossRef]
Ostu, N. A threshold selection method from gray-histogram. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar]
Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Marghany, M. (Ed.) Advanced Algorithms for Mineral and Hydrocarbon Exploration Using Synthetic Aperture Radar; Elsevier: Amsterdam, The Netherlands, 2022. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]

Figure 1. The flowchart of the proposed method.

Figure 2. Illustration of the structure of the graph convolutional neural network, where c1 and c2 are the output classes of nodes. (a) Input layer; (b) hidden layer; (c) output layer.

Figure 3. Illustration of information aggregation of GCN.

Figure 4. False color composite SAR images of two study areas. (a1) Study area 1, Sentinel-1 acquired on 6 July 2020; (a2) study area 1, Sentinel-1 acquired on 18 July 2020; (b) study area 2, Gaofen-3.

Figure 5. Over-segmentation results of study area 1. (a) General view; (b–d) are enlarged views of A, B, and C in (a), respectively.

Figure 6. Histogram of LWR results.

Figure 7. (a) Feature relevance (Tau denotes texture); (b) the top five most important features.

Figure 8. (a) Visualization of features after t-SNE operation—three classes were acquired; (b) pre-classification result through K-Means.

Figure 9. Constructed graph data from image. (a) Superpixels on the PolSAR image; (b) visualization of graph data.

Figure 10. Study area 1 classification results of (a) superpixel-based GNN, (b) superpixel-based RF, (c) superpixel-based XGBoost, (d) pixel-based Ostu, (e) pixel-based RF, and (f) pixel-based XGBoost. (g) shows the ground truth.

Figure 11. Comparison of XGBoost and GNN results (TP: true-positive; FN: false-negative; FP: false-positive; A: Rectangle A corresponding to details of XGBoost; B: Rectangle B corresponding to details of GNN).

Figure 12. Study Area 2 classification results of (a) superpixel-based GNN, (b) superpixel-based RF, (c) superpixel-based XGBoost, (d) pixel-based Ostu, (e) pixel-based RF, and (f) pixel-based XGBoost. (g) shows the ground truth. The red box area shows GNN is better to cope with noise.

Table 1. Performance evaluation in study area 1.

Dimension	Method	Precision	Recall	F1-score
Superpixel-based	Random forest	0.9949	0.9155	0.9536
	XGBoost	0.9939	0.9299	0.9608
	GAT	0.9543	0.9492	0.9518
	Ours	0.9955	0.9601	0.9775
Pixel-based	Random forest	0.6295	0.7716	0.6934
	XGBoost	0.6696	0.8120	0.7339
	Ours (transformed from superpixel-level to pixel-level)	0.9972	0.9623	0.9794

Table 2. Performance evaluation for study area 2.

Dimension	Method	Precision	Recall	F1
Superpixel-based	Random forest	0.9343	0.8586	0.8948
	XGBoost	0.9250	0.8437	0.8825
	GAT	0.9690	0.8827	0.9238
	Ours	0.9882	0.9785	0.9833
Pixel-based	Random forest	0.8758	0.7647	0.8165
	XGBoost	0.8818	0.7799	0.8277
	Ours (transformed from superpixel level to pixel level)	0.98925	0.9808	0.9850

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wan, H.; Tang, P.; Tian, B.; Yu, H.; Jin, C.; Zhao, B.; Wang, H. Water Extraction in PolSAR Image Based on Superpixel and Graph Convolutional Network. Appl. Sci. 2023, 13, 2610. https://doi.org/10.3390/app13042610

AMA Style

Wan H, Tang P, Tian B, Yu H, Jin C, Zhao B, Wang H. Water Extraction in PolSAR Image Based on Superpixel and Graph Convolutional Network. Applied Sciences. 2023; 13(4):2610. https://doi.org/10.3390/app13042610

Chicago/Turabian Style

Wan, Haoming, Panpan Tang, Bangsen Tian, Hongbo Yu, Caifeng Jin, Bo Zhao, and Hui Wang. 2023. "Water Extraction in PolSAR Image Based on Superpixel and Graph Convolutional Network" Applied Sciences 13, no. 4: 2610. https://doi.org/10.3390/app13042610

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Water Extraction in PolSAR Image Based on Superpixel and Graph Convolutional Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Superpixel Segmentation and Feature Extraction

2.1.1. Superpixel Segmentation

2.1.2. Feature Extraction

2.2. Sampling

2.3. Graph Construction and Classification

2.3.1. Graph Construction

2.3.2. Node Classification via a Graph Convolutional Network

3. Results

3.1. Study Areas and Data

3.1.1. Study Areas

3.1.2. Experimental Data

3.2. Results of Study Area 1

3.2.1. Superpixel Segmentation

3.2.2. Analyzation of Features

3.2.3. Samples

3.2.4. Classification Results

3.3. Results of Study Area 2

4. Discussion

5. Conclusions

6. Patents

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI