ASA-DRNet: An Improved Deeplabv3+ Framework for SAR Image Segmentation

Chen, Siyuan; Wei, Xueyun; Zheng, Wei

doi:10.3390/electronics12061300

Open AccessArticle

ASA-DRNet: An Improved Deeplabv3+ Framework for SAR Image Segmentation

by

Siyuan Chen

^1,2,*,

Xueyun Wei

^1,2

and

Wei Zheng

^1,2

¹

School of Electronics and Information, Jiangsu University of Science and Technology, Zhenjiang 212100, China

²

Zhenjiang Smart Ocean Information Perception and Transmission Laboratory, Zhenjiang 212003, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(6), 1300; https://doi.org/10.3390/electronics12061300

Submission received: 22 February 2023 / Revised: 5 March 2023 / Accepted: 5 March 2023 / Published: 8 March 2023

(This article belongs to the Special Issue Artificial Intelligence (AI) for Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Pollution caused by oil spills does irreversible harm to marine biosystems. To find maritime oil spills, Synthetic Aperture Radar (SAR) has emerged as a crucial mean. How to accurately distinguish oil spill areas from other types of areas is a committed step in detecting oil spills. Owing to its capacity to extract multiscale features and its distinctive decoder, the Deeplabv3+ framework has been developed into an excellent deep learning model in field of picture segmentation. However, in some SAR pictures, there is a lack of clarity in the segmentation of oil film edges and incorrect segmentation of small areas. In order to solve these problems, an improved network, named ASA-DRNet, has been proposed. Firstly, a new structure which combines an axial self-attention module with ResNet-18 is proposed as the backbone of DeepLabv3+ encoder. Secondly, a atrous spatial pyramid pooling (ASPP) module is optimized to improve the network’s capacity of extracting multiscale features and to increase the speed of model calculation and finally merging low-level features of different resolutions to enhance the competence of network to extract edge information. The experiments show that ASA-DRNet obtains the better results compared to other neural network models.

Keywords:

SAR images; image segmentation; deep learning model; oil spill detection

1. Introduction

Oil is called the “blood of industry” and plays a significant part in development of human society [1]. However, at the same time, the toxic and harmful substances contained in oil can also cause harm to the natural environment and human health. In the wake of rapid expansion of oil exploration at sea and the shipping industry, oil-spill accidents frequently occur, and oil-spill pollution at sea has become one of the major threats to the marine environment [2]. In the few past decades, there have been many major marine oil spills around the world, which have seriously damaged the local marine ecological environment [3]. In early March 2019, a cargo ship ran aground on a reef in the Solomon Islands, causing heavy oil spills. The local waters and coastline were and have been polluted due to untimely disposal, and spilled heavy oil is gradually approaching East Rennell Island, a World Natural Heritage Site (the world’s largest atoll-shaped coral island built up from coral) [4].

Synthetic Aperture Radar (SAR) detects oil spills at sea by emitting electromagnetic pulses [5,6]. SAR gains electromagnetic information on the sea level from reflected echoes of the target [7,8]. When the scattering mechanism occurs on a clean sea surface, strong Bragging scattering occurs, showing as bright areas in SAR images. When the scattering mechanism occurs on the sea level which is covered by an oil slick, it decreases Bragging scattering and shows as black areas in SAR images [8,9,10].

At present, there are two main approaches for marine oil spill images segmentation: traditional algorithms and deep learning models. The traditional algorithms mainly use some elementary features of oil spill images to segment images, such as methods based on different thresholds, segmentation methods based on edge detection, segmentation methods based on polarization features, etc.

Traditional methods have inevitable limitations: (1) The selection of parameters for threshold segmentation is heavily influenced by subjective factors or experience. (2) Single feature information cannot represent global features, which will affect the effect of segmentation. (3) It is challenging to extract the deep semantic information of the picture using traditional approaches as they primarily require low-level information from the image to perform the segmentation operation. Due to these restrictions, segmenting oil spill images using conventional techniques has a poor level of accuracy.

Several deep learning models have been applied in recent years for detection of maritime oil spills, such as decision trees, artificial neural networks [11,12,13], and support vector machines [14,15,16]. With the great success of deep learning, a breakthrough takes image segmentation to a new level. After the emergence of deep learning techniques, great success has been achieved in image segmentation tasks, especially in terms of the ability to extract high-level semantic information, which largely eliminates the issue of typical picture segmentation techniques lacking semantic information. Long et al. [17] proposed the FCN method in 2014, which is the pioneer of deep learning in the field of image segmentation. To solve the issue of disregarding global context information in the FCN method, Zhao et al. [18] suggested a multiscale network called the Pyramid Scene Parsing Network (PSPN) that can more effectively learn the global contextual representation of a scene. Another popular deep learning model for image segmentation is based on convolutional encoder–decoder systems, such as UNet and SegNet, and deformation of PSPN [19]. U-Net was proposed by Olaf Ronneberger et al. at Fitzpatrick University in 2015 and is richer in multiscale information than FCN and was originally used for medical image segmentation [20,21]. SegNet is very similar to FCN, except that the techniques used for pooling in the encoder and up-sampling in the decoder are not the same. Additionally, the first 13 layers of the VGG16 convolutional network are used in the encoder portion of SegNet, with each encoder layer corresponding to a decoder layer. The discovery of “degradation” and the design of “shortcut connection” for degradation, which considerably removes the problem of too-deep neural network training difficulties [22], are the key contributions of residual neural networks. Deeplabv3+ is an update of Deeplabv3, an encoding–decoding deep convolutional neural network (DCNN), that adds a simple yet efficient decoder module to refine segmentation results, particularly at object boundaries, greatly improving semantic image segmentation [23]. Kong et al. [24] proposed a unique Deeplabv3+ network for SAR imagery semantic segmentation based on the potential energy loss function of Gibbs distribution in order to construct semantic interdependence among distinct categories through the connection between various cliques in the neighborhood system. There are still certain restrictions on further increasing detection accuracy, even though the Deeplabv3+ model has produced positive detection results in oil spill detection tasks.

In this research, a more effective deep learning model called ASA-DRNet, built on axial self-attention, DeepLabv3+ framework, ASPP, and ResNet-18, is presented to address the aforementioned issues. It seeks to increase oil spill detection segmentation accuracy. As a result, the following contributions are made by this paper:

ResNet-18 and axial self-attention are combined as the new backbone of the DeepLabv3+ encoder to enhance the extraction of important features, and avoid the interference of errors and irrelevant features, to obtain more adequate and comprehensive deeper features.
Optimizing the structure of Atrous Spatial Pyramid Pooling (ASPP) is achieved by reducing the expansion rate of atrous convolutions in equal proportion, and then performing 2D decomposition which increases the perceptual field while reducing the parameters of the model. All of these improve the speed of detection and also avoid the loss of target information to obtain more comprehensive features.
The capacity of the network to extract edge information is optimized by integrating low-level features at different resolutions to improve the accuracy of segmentation.

This paper is organized as follows: Section 2 focuses on the related work of our research. In Section 3, we describe the proposed ASA-DRNet model based on the DeepLabv3+ framework, axial self-attention, improved ResNet-18, and optimized ASPP module. Section 4 focuses on SAR images datasets and validates the oil spill detection capability of the deep learning model. Section 5 contains the conclusion.

2. Related Work

Oil spill image segmentation algorithms based on conventional techniques and oil spill image segmentation algorithms based on deep learning models are the two primary categories that have been the focus of recent research in this area.

2.1. Traditional Methods

Some researchers have attempted to use traditional threshold segmentation methods, edge detection algorithms, and polarization feature-based segmentation algorithms for the task of segmenting oil spill images. Li et al. [25] proposed a double-threshold oil spill image segmentation algorithm based on a feature probability function. High and low thresholds are used to extract different levels of gray-scale information, and the feature probability function is used to morphologically segment the oil spill area. Wang et al. [26] first applied a 2D-Otsu threshold algorithm to SAR oil spill images’ segmentation and optimized the algorithm by creating a new histogram region of coherent speckle multiplicative noise. Li et al. [27] proposed an algorithm based on maximum entropy threshold segmentation.

Edge detection techniques for images are an important tool for SAR image processing. Yin et al. [28] proposed a fuzzy enhancement theory combined with a genetic algorithm for oil spill image edge segmentation by improving the Pal.King edge detection algorithm. Singha et al. [29] proposed a SAR image segmentation method combining a threshold detection algorithm and a Canny edge detection algorithm.

Entropy (H), geometric intensity (v), and total power (%) are polarization characteristics that can provide complicated coherence matrices, scattering matrices, and other polarization information for detecting oil spills. These features are used for oil spill detection and have recently become a research hotspot. Ren et al. [30]. proposed a new polarization feature G based on eigenvalue decomposition, which not only reflects the polarization state between different targets in the set, but also describes the impurity of different scattering types in a statistical sense. Shu et al. [31]. comprehensively analyzed the performance of 36 polarization features of the condensed polarization SAR in oil spill image segmentation and found that the best segmentation accuracy was achieved by the odd scattering coefficients in the condensed polarization features.

2.2. Deep Learning Methods

Deep learning models have been used for the segmentation of SAR oil spill photos in recent years, thanks to the rapid progress of machine learning. Li et al. [32] developed a multiscale conditional adversarial network for oil spill image segmentation based on limited data training. Fan et al. [33] combined different threshold segmentation algorithms with U-Net to merge the global features of SAR oil spill images and achieved a segmentation accuracy of 98.40%. Using a deep convolutional neural network (DCNN), Shirvany et al. [34] obtained an oil spill convolutional network (OSCN) which could achieve 94.01% accuracy and 83.51% recall by adjusting the hyperparameters of the SAR dark-spot dataset. Wang et al. [35] suggested a Deeplabv3+ semantic segmentation method with several loss limitations. Wang et al. [36] suggested an enhanced deep learning model and optimized the model’s hyperparameters using Bayesian optimization (BO) with an average accuracy of 74.69%. Liu et al. [37] suggested a DenseNet convolutional neural network-based densely linked network model. The model extracts multiscale features of images and improves the ability to capture subtle features and the accuracy of segmentation of images. Chen et al. [38] used a stacked autoencoder (SAE) and deep belief network (DBN) to improve the polarimetric feature sets and minimize the feature dimension via layer-wise unsupervised pre-training. Gallego et al. [39] used deep neural autoencoders to separate oil spills from Side-Looking Airborne Radar (SLAR) data and achieved a pixel-level score of 93.01%. Ma et al. [40] proposed a deep convolutional neural network (DCNN) based on amplitude and phase data from Sentinel-1 dual-polarimetric images to detect oil spills. The normalizing layer of a neural network is called group normalization (GN). The experimental findings demonstrated improved performance over those conventional techniques.

3. The Proposed ASA-DRNet Model

Here, we propose an optimized network structure to alleviate coarse segmentation in order to address unclear edge segmentation and mis-segmentation between oil spill areas and oil spill-like regions. The whole network is an end-to-end structure consisting of an encoder- decoder struture. The encoder is composed of an improved DCNN model and an optimized Atrous Spatial Pyramid Pooling (ASPP) model. In the structure of the decoder, output of the encoder is directly upsampled. The bottom edge features of different resolutions from improved DCNN are combined with higher level semantic features of output of the encoder for feature fusion operation. The overall structure of the proposed DeeplabV3+ network is shown in Figure 1.

3.1. DCNN Module (ResNet-18 + Axial Self Attention)

The DCNN network proposed in this paper is based on an improved ResNet-18 network. As shown in Figure 2 below, the network mainly consists of four residual blocks and Axial self-attention blocks. Each of the residual blocks has the same structure and are used to extract deep features in series. The Axial self-attention module is embedded behind the residual blocks instead of the original simple convolution operation, which on the one hand makes the extracted features more extensive and adequate, and on the other hand, enhances the extraction of important features and avoids the interference of redundant features.

3.2. The Proposed Axial Self Attention Block

Given an input U ∈ R^(H*^W*^C), compress it into K ∈ R^(H*⁽¹*^W)*^C) and Q ∈ R⁽⁽¹*^H)*^W*^C) feature matrices in the x-axis and y-axis directions. The channel attention feature matrix F ∈ R^(H*^W*^C) is obtained by multiplying K and Q and matching a softmax layer. This is done as follows:

K_{c}^{h} (h) = \frac{1}{w} \sum_{0 ≪ i ≪ w} x_{c} (h, i)

(1)

Q_{c}^{w} (w) = \frac{1}{h} \sum_{0 ≪ j ≪ w} x_{c} (j, w)

(2)

F_{i j} = S o f t m a x \frac{f (I_{i}, I_{j})}{\sqrt{C}}

(3)

As shown in the Figure 3 below:

F_{i j}

is the feature weight value of the

i

th row and

j

th column. Function

f

calculates their relationship.

Spatial attention branching: firstly, 1

\times

1 convolution is used to downscale the feature matrix. Then, two 3

\times

3 atrous convolutions are concatenated to increase the perceptual field. Finally, 1

\times

1 convolution is passed to obtain the V matrix through sigmoid activation function. This is done as follows:

V_{s} (U) = B N (f_{3}^{1 \times 1} (f_{2}^{3 \times 3} (f_{1}^{3 \times 3} (f_{0}^{1 \times 1} (U)))))

(4)

Finally, the original feature map is subjected to a weight rescaling operation, i.e., multiplying by the feature weights on the channels and space [40].

3.3. Optimized ASPP Module

After the DCNN module, we use the ASPP module to capture multi-scale information. The original network suffers from unclear edge segmentation when performing SAR oil spill image segmentation. Some of the images also have misspecification of segmentation of small oil spill areas from the background. This study aims to improve the ASPP module of the encoder in light of the aforementioned occurrences. We reduced the expansion rate of the three atrous convolutions in equal proportion to improve the network’s ability to extract multi-scale information. The 3 × 3 atrous convolution in the ASPP is also decomposed into 3 × 1 and 1 × 3 atrous convolutions by 2D decomposition, and the expansion rate is maintained. It is found that the number of convolutional parameters in this improved ASPP module is smaller than the original one, which effectively increases the computational speed of the model. In addition, connections are added between layers to enable the sharing of features to extract deeper semantic information. The overall structure of the ASPP module is showed in Figure 4 below.

One-dimensional dilated convolution’s precise calculation formula is:

Y (i) = \sum_{n = 1}^{N} (U [i + r \times n] \times W (n))

(5)

where, U stands for the input feature map, W represents the convolution kernel, N represents the filter size, and r represents the dilation rate of the dilated convolution. The feature extraction procedures’ concatenation computation is displayed in formula (6):

Y = c o n c a t (Z_{p o o l} (X), C (X), Y_{0}, Y_{1}, Y_{2})

(6)

where, Concat(·) represents the dense connections of different layers, Zpool(·) represents the normal pooling operations, and C(·) represents simple convolution operation with a convolutional kernel size of 1 × 1. The result of Yi in each layer is showed in Formula (7):

{\begin{matrix} Y_{0} = F_{3, 1 \times 3 & 3 \times 1} (X) \\ Y_{1} = F_{6, 1 \times 3 & 3 \times 1} (C o n c a t (X, Y_{0})) \\ Y_{2} = F_{9, 1 \times 3 & 3 \times 1} (C o n c a t (X, Y_{0}, Y_{1})) \end{matrix}

(7)

where, the dilated convolution with dilated rate r and convolutional kernel size n is represented as Fr,n_1&n₂(·).

3.4. Decoder Module

Because the encoder output scale is only 1/16th of the original image, the decoder directly upsamples the encoder output and performs a feature fusion operation by combining the high-level semantic features from the encoder output with a single low-level edge feature from the DCNN input in the original network. As the neural network’s layers expand, the extracted characteristics get more abstract, and this method might cause the margins of the segmentation results to blur. As a result, using solely low-level characteristics of a specific resolution as input to the decoder is insufficient. We attempt to combine low-level characteristics of various resolutions in order to improve the network’s capacity to extract edge information. The optimized Deeplabv3+ network is more accurate for edge segmentation.

4. Experimental Results

4.1. Dataset

As there have not been recognized or peer-accepted standard offshore oil spill databases for a long time, this experiment involves creating your own dataset; i.e., labeling the images. Although there is software, such as Labelme and ITK-snap, that can be used for labeling, this experiment is conducted on MATLAB 2022a. MATLAB also provides a very useful image labeling tool, Image Labeler. The pixel ROI annotation function is used in the semantic segmentation task. Some SAR images and labels in our datasets are showed below Figure 5.

In this work, considering the performance of the CPU, 600 images from our own oil spill detection dataset were chosen as the training set and 200 images were chosen as the testing set. Additionally, the original dataset’s picture size was decreased from 1250 × 650 pixels to 256 × 256 pixels. Oil-spill-Dataset was the name of the dataset utilized in this study. This dataset covered examples from the four categories of land, sea, oil spill area, and oil-like spill area. To lessen the impact of like-oil spill regions and lands on the categorization, these four categories were created.

The relevant environment for this experiment is shown in Table 1 and Table 2.

4.2. Evaluation Metrics

To evaluate the accuracy of the model segmentation, we used the mean intersection over union (mIOU), mean pixel accuracy (mPA), precision (P), and recall (R). According to Equation (8), the mIOU is the ratio of the model’s expected outcomes to the real values for each group in the combined collection.

mIOU = \frac{1}{j + 1} \sum_{i = o}^{j} \frac{T P}{F N + F P + T P + T N}

(8)

Equation (9) illustrates how the mPA is determined by independently computing the percentage of pixels that are properly categorized for each class, then adding and averaging the numbers.

mPA = \frac{1}{j + 1} \sum_{i = o}^{j} \frac{T P + T N}{F N + F P + T P + T N}

(9)

According to Equations (10) and (11), recall is the likelihood that the real value is properly anticipated while precision is the percentage of correctly predicted outcomes.

P = \frac{T P}{T P + F P}

(10)

R = \frac{T P}{T P + F N}

(11)

where, TP stands for true positives, FP stands for false positives, TN stands for true negatives, j stands for number of types of classification, and FN stands for false negatives.

4.3. Results and Analysis

4.3.1. Experimental Results of ASA-DRNet

On our own datasets, ASA-DRNet performed well in segmentation, and the IOU, PA, accuracy, and recall for each group in the training set are shown in Figure 6. The IOU, PA, precision, and recall for each category in the testing set are shown in Figure 7.

4.3.2. Results of the Comparison Experiment

To further validate the performance of our method compared to different deep learning networks, we qualitatively compared the segmentation results of images from different scenes and categories in the dataset, and we found that both ASA-DRNet and the other networks had some degree of error, but, overall, ASA-DRNet had better segmentation results. The segmentation results are shown in Figure 8.

We trained Resnet, SegNet, UNet, deepLabv3 + Resnet-18, deepLabv3 + Resnet-18 + SE block, and ASA-DRNet on our training datasets and tested them on the testing sets in order to further confirm the efficacy of our strategy. Each model’s mIOU, mPA, mPrecision (P), and mRecall (R) were computed.

Table 3 displays the test outcomes for each model on the testing datasets. The problem of training neural networks with too much depth is greatly eliminated by the clever application of a shortcut connection to ResNet. Its mIOU and mPA are 55.12% and 58.96%, respectively. SegNet is a symmetric network consisting of an encoder and a decoder. Its mIOU and mPA are higher than ResNet by 1.60% and 2.60% respectively. UNet, a major network in the field of image segmentation, gains from the feature fusion functionality offered by skip connections, and outperforms SegNet and ResNet in terms of segmentation performance. Its mIOU, mPA, mPrecision, and mRecall are 59.25%, 63.86%, 70.64%, and 73.18%, respectively, and its mIOU is higher than ResNet and SegNet by 4.13% and 2.53%, respectively. DeepLabv3+ResNet-18 is a network that has been proposed in recent years and has achieved even more superior results in image segmentation. Its mIOU and mPA are 2.51% and 1.3% higher than those of UNet. This is because it has a stronger ResNet-18 encoder and a multi-scale design based on dilated convolution. The improved SE-DRNet based on the DeepLabV3+ network has higher accuracy and recall, and their mIOU is also higher than the original DRNet, reaching 63.21%. ASA-DRNet outperforms the previous models in terms of segmentation performance; it has an accuracy and recall over DRNet of 2.74% and 2.09%, respectively. Its mIOU and mPA, which reach 64.47% and 68.72%, respectively, are likewise much greater than those of the other models.

Figure 8 shows the segmentation results of the six different networks. From the figure, we can clearly see that ResNet and SegNet have a large error in segmenting the edges of images and small regions. Its mPrecision and mRecall can only reach 64.36%, 69.92%, 68.74%, and 71.35%. Compared with the segmentation results of ResNet and SegNet, UNet has a better performance with an mPrecision of 70.64%, but it still has the problem of unclear segmentation of edge details. The combination of DRNet and the attention module is more effective in solving the problem of unclear segmentation edge details and small region segmentation errors. The mPrecision and mRecall of the proposed method can reach 74.98% and 76.75%, which are the best results.

5. Conclusions

Remote sensing image segmentation has recently attracted the interest of numerous academics because of the advancements made in deep learning and satellite imaging technologies. The segmentation of multicategory objects in remote sensing pictures is still a very challenging task. DRNet, based on axial self-attention, is the solution we provide in this study for the issues of low segmentation accuracy and many scales across several categories. Firstly, a new structure, which combines axial self-attention module with ResNet-18, is proposed as the backbone of the DeepLabv3+ encoder. Secondly, an ASPP module is optimized to improve the network’s ability to extract multi-scale features and to increase the speed of model calculation. Finally, merging low-level features of different resolutions is used to optimize the ability of network to extract edge information. The experiments show that ASA-DRNet performs the best results compared to other neural network models. On the home-made dataset of this experiment, our method achieved the highest MIOU of 64.47%, which is much higher than UNet’s 59.25% and outperforms advanced methods, such as DRNet and SE-DRNet. At the same time, the values of mPrecision and mRecall of our algorithm also outperformed other algorithms, reaching 74.98% and 76.75%.

In summary, our approach has a great generalizability and excellent segmentation accuracy in addition to substantially resolving the issues raised in this work. Our approach does, however, have certain limitations. Our selected datasets have rather low picture resolution for the experimental design. High-resolution photos are also often utilized in remote sensing, although the efficacy of our approach on these images has not yet been shown. In addition, certain remote sensing photographs with high noise still provide poor segmentation results, and blurred and noisy images continue to pose significant obstacles to the segmentation of remote sensing images. Future works will conduct further study.

Author Contributions

Conceptualization: S.C.; methodology: S.C.; validation: S.C.; writing—original draft preparation: S.C.; writing—review and editing: S.C.; X.W.; funding acquisition: W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Nature Science Foundation of China (NSFC) under Grant 61901195 and supported by Zhenjiang smart ocean information perception and transmission laboratory project GX2017004.

Acknowledgments

This work was supported by the project of National Natural Science Foundation of China and Zhenjiang smart ocean information perception and transmission laboratory project. The above funding did not lead to any conflict of interests regarding the publication of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SAR	synthetic aperture radar
FCN	fully convolutional network
PSPN	pyramid scene parsing network
DCNN	deep convolutional neural network
DRNet	deeplabv3+resnet-18
ASPP	atrous spatial pyramid pooling
SE	squeeze and excitation module
ASA	axial self attention

References

Kvenvolden, K.A.; Cooper, C.K. Natural seepage of crude oil into the marine environment. Geo-Mar. Lett. 2003, 23, 140–146. [Google Scholar] [CrossRef]
Jiao, Z.; Jia, G.; Cai, Y. A new approach to oil spill detection that combines deep learning with unmanned aerial vehicles. Comput. Ind. Eng. 2019, 135, 1300–1311. [Google Scholar] [CrossRef]
Liu, Y.; MacFadyen, A.; Ji, Z.G.; Weisberg, R.H. (Eds.) Monitoring and Modeling the Deepwater Horizon Oil Spill: A Recordbreaking Enterprise; American Geophysical Union, Geopress: Washington, DC, USA, 2011; Volume 195, p. 271. [Google Scholar]
White, H.K.; Hsing, P.Y.; Cho, W.; Shank, T.M.; Cordes, E.E.; Quattrini, A.M.; Nelson, R.K.; Camilli, R.; Demopoulos, A.W.J.; German, C.R.; et al. Impact of the Deepwater Horizon oil spill on a deep-water coral community in the Gulf of Mexico. Proc. Natl. Acad. Sci. USA 2012, 109, 20303–20308. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Keramitsoglou, I.; Cartalis, C.; Kiranoudis, C.T. Automatic identification of oil spills on satellite images. Environ. Model. Softw. 2006, 21, 640–652. [Google Scholar] [CrossRef]
Skrunes, S.; Brekke, C.; Eltoft, T. Characterization of marine surface slicks by Radarsat-2 multipolarization features. IEEE Trans. Geosci. Remote Sens. 2013, 52, 5302–5319. [Google Scholar] [CrossRef] [Green Version]
Zheng, H.; Zhang, Y.; Wang, Y.; Zhang, X.; Meng, J. The polarimetric features of oil spills in full polarimetric synthetic aperture radar images. Acta Oceanol. Sin. 2017, 36, 105–114. [Google Scholar] [CrossRef]
Zeng, K.; Wang, Y. A deep convolutional neural network for oil spill detection from spaceborne SAR images. Remote Sens. 2020, 12, 1015. [Google Scholar] [CrossRef] [Green Version]
Zhang, B.; Perrie, W.; Li, X.; Pichel, W.G. Mapping sea surface oil slicks using RADARSAT-2 quad-polarization SAR image. Geophys. Res. Lett. 2011, 38. [Google Scholar] [CrossRef] [Green Version]
Zheng, H.; Zhang, Y.; Wang, Y. Polarimetric features analysis of oil spills in C-band and L-band SAR images. In Proceedings of the International Geoscience and Remote Sensing Symposium, IGARSS 2016, Beijing, China, 11–15 July 2016; pp. 4683–4686. [Google Scholar]
Topouzelis, K.; Karathanassi, V.; Pavlakis, P.; Rokos, D. Detection and discrimination between oil spills and look-alike phenomena through neural networks. ISPRS J. Photogramm. Remote Sens. 2007, 62, 264–270. [Google Scholar] [CrossRef]
Del Frate, F.; Petrocchi, A.; Lichtenegger, J.; Calabresi, G. Neural networks for oil spill detection using ERS-SAR data. IEEE Trans. Geosci. Remote Sens. 2000, 38, 2282–2287. [Google Scholar] [CrossRef] [Green Version]
Singha, S.; Bellerby, T.J.; Trieschmann, O. Satellite oil spill detection using artificial neural networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 2355–2363. [Google Scholar] [CrossRef]
Brekke, C.; Solberg, A.H.S. Classifiers and confidence estimation for oil spill detection in ENVISAT ASAR images. IEEE Geosci. Remote Sens. Lett. 2008, 5, 65–69. [Google Scholar] [CrossRef]
Xu, L.; Li, J.; Brenning, A. A comparative study of different classification techniques for marine oil spill identification using RADARSAT-1 imagery. Remote Sens. Environ. 2014, 141, 14–23. [Google Scholar] [CrossRef]
Singha, S.; Ressel, R.; Velotto, D.; Lehner, S. A combination of traditional and polarimetric features for oil spill detection using TerraSAR-X. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 4979–4990. [Google Scholar] [CrossRef] [Green Version]
Shelhamer, E.; Long, J.; Darrell, T. Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 640–651. [Google Scholar] [CrossRef]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. arXiv 2019, arXiv:1904.00592. [Google Scholar] [CrossRef] [Green Version]
Jha, D.; Smedsrud, P.H.; Riegler, M.; Johansen, D.; de Lange, T.; Halvorsen, P.; Johansen, H.D. ResUNet++: An Advanced Architecture for Medical Image Segmentation. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; pp. 2225–2255. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Kong, Y.Y.; Liu, Y.J. A Novel Deeplabv3+ Network for SAR Imagery Semantic Segmentation Based on the Potential Energy Loss Function of Gibbs Distribution. Remote Sens. 2021, 13, 454. [Google Scholar] [CrossRef]
Solberg, A.H.S.; Solberg, R. A large-scale evaluation of features for automatic detection of oil spills in ERS SAR image. In Proceedings of the International Geoscience and Remote Sensing Symposium, IGARSS 1996, Lincoln, NE, USA, 2–31 May 1996; pp. 1484–1486. [Google Scholar]
Orfanidis, G.; Ioannidis, K.; Avgerinakis, K.; Vrochidis, S.; Kompatsiaris, I. A deep neural network for oil spill semantic segmentation in Sar images. In Proceedings of the International Conference on Image Processing, Athens, Greece, 7–10 October 2018; pp. 3773–3777. [Google Scholar]
Topouzelis, K.; Psyllosm, A. Oil spill feature selection and classification using decision tree forest on SAR image data. ISPRS J. Photogramm. Remote Sens. 2012, 68, 135–143. [Google Scholar] [CrossRef]
Yin, J.; Moon, W.; Yang, J. Model-based pseudo-quad-pol reconstruction from compact polarimetry and its application to oil-spill observation. J. Sens. 2015, 2015. [Google Scholar] [CrossRef] [Green Version]
Singha, S.; Vespe, M.; Trieschmann, O. Automatic Synthetic Aperture Radar based oil spill detection and performance estimation via a semi-automatic operational service benchmark. Mar. Pollut. Bull. 2013, 73, 199–209. [Google Scholar] [CrossRef] [PubMed]
Skrunes, S.; Brekke, C.; Eltoft, T. Oil spill characterization with multi-polarization C-and X-band SAR. In Proceedings of the International Geoscience and Remote Sensing Symposium, IGARSS 2012, Munich, Germany, 22–27 July 2012; pp. 5117–5120. [Google Scholar]
Bing, D.; Jinsong, C. An algorithm based on cross-polarization ratio of SAR image for discriminating between mineral oil and biogenic oil. Remote Sens. Technol. Appl. 2013, 28, 103–107. [Google Scholar]
Li, Y.Q.; Lyu, X.R. Oil Spill Detection with Multiscale Conditional Adversarial Networks with Small-Data Training. Remote Sens. 2021, 13, 2378. [Google Scholar] [CrossRef]
Fan, Y.L.; Rui, X.P. Feature merged network for oil spill detection using SAR images. Remote Sens. 2021, 13, 3174. [Google Scholar] [CrossRef]
Shirvany, R.; Chabert, M.; Tourneret, J.Y. Ship and oil-spill detection using the degree of polarization in linear and hybrid/compact dual-pol SAR. IEEE J. Sel. Top. Appl. Earth Observ. 2012, 5, 885–892. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Wang, C.; Wu, H.; Chen, P. An improved Deeplabv3+ semantic segmentation algorithm with multiple loss constraints. PLoS ONE 2022, 17, e0261582. [Google Scholar] [CrossRef]
Wang, D.W.; Wan, J.H.; Liu, S.W.; Chen, Y.L. BO-DRNet: An improved deep learning model for oil spill detection by polarimetric features from SAR images. Remote Sens. 2022, 14, 264. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
Chen, G.; Li, Y.; Sun, G.; Zhang, Y. Application of deep networks to oil spill detection using polarimetric synthetic aperture radar images. Appl. Sci. 2017, 7, 968. [Google Scholar] [CrossRef]
Gallego, A.J.; Gil, P.; Pertusa, A.; Fisher, R. Segmentation of oil spills on side-looking airborne radar imagery with autoencoders. Sensors 2018, 18, 797. [Google Scholar] [CrossRef] [Green Version]
Ma, X.; Xu, J.; Wu, P.; Kong, P. Oil Spill Detection Based on Deep Convolutional Neural Networks using Polarimetric Scattering Information from Sentinel-1 SAR Images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]

Figure 1. The overall structure of the proposed DeepLabV3+ network.

Figure 2. The structure of the improved DCNN model.

Figure 3. The structure of the Axial self-attention block.

Figure 4. The structure of the optimized ASPP module.

Figure 5. Some SAR images and labels in our datasets. (a,c) are SAR images taken by RADARSAT-2. (b,d) are labels labeled by ourselves. Light blue is the oil spill area.

Figure 6. Training results obtained on ASA-DRNet. (a) mIOU; (b) mPA; (c) mPrecision; (d) mRecall.

Figure 7. Testing results obtained on ASA-DRNet. (a) mIOU; (b) mPA; (c) mPrecision; (d) mRecall.

Figure 8. Images of the segmentation results obtained on our datasets. (a1,a2,a3,a4,a5) SAR image, (b1,b2,b3,b4,b5) label, (c1,c2,c3,c4,c5) result of Resnet, (d1,d2,d3,d4,d5) result of SegNet, (e1,e2,e3,e4,e5) result of UNet, (f1,f2,f3,f4,f5) result of DRNet, (g1,g2,g3,g4,g5) result of SE-DRNet, and (h1,h2,h3,h4,h5) result of ASA-DRNet. Light blue represents the sea, red represents the oil spill-like area, green represents land, and black represents the oil spill area.

Table 1. Experimental environment configuration.

Environment Name	Environmental Parameters
Computer system	MAX OSX
CPU	i5 5250U 1.6 Ghz
Graphics card	IntelHD Graphics 6000
Programming language	m
Deep learning framework	DeepNetworkdesigner
Visual library	Matlab 2022a

Table 2. Fixed hyperparameters and values for ASA-DRNet.

Hyperparameters	Value
LearnRateDropPeriod	10
LearnRateDropFactor	0.3
InitialLearnRate	0.001
MaxEpochs	10
MiniBatchSize	4
L2Regularization	0.005

Table 3. Comparison of different initial contour extraction results.

Method	mIOU	mPA	mPrecision	mRecall
ResNet	55.12%	58.96%	64.36%	69.92%
SegNet	56.72%	61.56%	68.74%	71.35%
UNet	59.25%	63.86%	70.64%	73.18%
DRNet	61.76%	65.16%	72.24%	74.66%
SE-DRNet	63.21%	67.42%	73.78%	75.43%
ASA-DRNet	64.47%	68.72%	74.98%	76.75%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, S.; Wei, X.; Zheng, W. ASA-DRNet: An Improved Deeplabv3+ Framework for SAR Image Segmentation. Electronics 2023, 12, 1300. https://doi.org/10.3390/electronics12061300

AMA Style

Chen S, Wei X, Zheng W. ASA-DRNet: An Improved Deeplabv3+ Framework for SAR Image Segmentation. Electronics. 2023; 12(6):1300. https://doi.org/10.3390/electronics12061300

Chicago/Turabian Style

Chen, Siyuan, Xueyun Wei, and Wei Zheng. 2023. "ASA-DRNet: An Improved Deeplabv3+ Framework for SAR Image Segmentation" Electronics 12, no. 6: 1300. https://doi.org/10.3390/electronics12061300

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ASA-DRNet: An Improved Deeplabv3+ Framework for SAR Image Segmentation

Abstract

1. Introduction

2. Related Work

2.1. Traditional Methods

2.2. Deep Learning Methods

3. The Proposed ASA-DRNet Model

3.1. DCNN Module (ResNet-18 + Axial Self Attention)

3.2. The Proposed Axial Self Attention Block

3.3. Optimized ASPP Module

3.4. Decoder Module

4. Experimental Results

4.1. Dataset

4.2. Evaluation Metrics

4.3. Results and Analysis

4.3.1. Experimental Results of ASA-DRNet

4.3.2. Results of the Comparison Experiment

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI