DAFNet: A Novel Change-Detection Model for High-Resolution Remote-Sensing Imagery Based on Feature Difference and Attention Mechanism

Ma, Chong; Yin, Hongyang; Weng, Liguo; Xia, Min; Lin, Haifeng

doi:10.3390/rs15153896

Open AccessArticle

DAFNet: A Novel Change-Detection Model for High-Resolution Remote-Sensing Imagery Based on Feature Difference and Attention Mechanism

by

Chong Ma

¹

,

Hongyang Yin

¹

,

Liguo Weng

^1,*,

Min Xia

¹

and

Haifeng Lin

²

¹

Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, B-DAT, Nanjing University of Information Science and Technology, Nanjing 210044, China

²

College of Information Science and Technology, Nanjing Forestry University, Nanjing 210000, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(15), 3896; https://doi.org/10.3390/rs15153896

Submission received: 9 June 2023 / Revised: 3 August 2023 / Accepted: 4 August 2023 / Published: 6 August 2023

(This article belongs to the Special Issue Advanced Machine Learning and Deep Learning Approaches for Remote Sensing III)

Download

Browse Figures

Versions Notes

Abstract

:

Change detection is an important component in the field of remote sensing. At present, deep-learning-based change-detection methods have acquired many breakthrough results. However, current algorithms still present issues such as target misdetection, false alarms, and blurry edges. To alleviate these problems, this work proposes a network based on feature differences and attention mechanisms. This network includes a Siamese architecture-encoding network that encodes images at different times, a Difference Feature-Extraction Module (DFEM) for extracting difference features from bitemporal images, an Attention-Regulation Module (ARM) for optimizing the extracted difference features through attention, and a Cross-Scale Feature-Fusion Module (CSFM) for merging features from different encoding stages. Experimental results demonstrate that this method effectively alleviates issues of target misdetection, false alarms, and blurry edges.

Keywords:

deep learning; change detection; feature difference; attention mechanism

1. Introduction

Human production activities have a profound impact on land cover on Earth’s surface [1,2]. As human activities of various types trend towards digitization and automation, it is necessary to continuously monitor terrestrial information changes for the protection of the natural environment [3]. Therefore, research on change detection has important practical significance, intending to identify pixels that have undergone semantic changes in dual temporal image pairs [4]. Change-detection technology can help humans efficiently complete some very labor-intensive tasks, such as environmental monitoring [5,6,7], climate change research [8,9,10,11,12], disaster assessment [13,14], agricultural management [15,16], urban planning [17,18,19], and water resource management [20,21,22,23]. Moreover, as this technology becomes increasingly sophisticated, it will be applied to even more areas [24,25].

Change-detection tasks require the simultaneous processing of a pair of bitemporal images [26]. Due to different timings, there are issues like changes in weather, seasonal variations, and illumination intensity between the two images [27]. Furthermore, current satellite technology still has limitations. It is challenging to have identical satellite height, angle, and other factors when capturing both images, which can lead to issues of shooting angle deviation [28]. These problems pose certain challenges to change detection. Ahead of the introduction of deep-learning methodologies, many scholars used traditional image analysis methods for change detection [29,30]. In the past, traditional methods have provided a wealth of vital information for global environmental monitoring and other fields [31]. Commonly used traditional change-detection methods include the image difference method [32], the post-classification method [33], principal component analysis (PCA) [34], change vector analysis (CVA) [35], etc. However, these methods have many limitations, such as being noise-sensitive, not fully utilizing information, requiring manual feature extraction, poor generalizability, and difficulty in handling high-dimensional data [36]. Consequently, traditional algorithms are gradually being replaced by deep-learning-based change-detection algorithms.

Recently, significant progress has been made in many tasks in computer vision [37,38], such as semantic segmentation [39,40] and object detection [41]. The change-detection task bears a strong resemblance to the semantic segmentation task, as both require pixel-level predictions of images. Daudt et al. [42] put forward a method grounded in FCN, which was the earliest to apply deep learning to change-detection tasks. This method adopted a shared-weight Siamese architecture and the concept of skip connections, which has had a profound impact on subsequent change-detection research. Liu et al. [43] suggested a method to assist change detection by first segmenting bitemporal image pairs. The model obtains more expressive features by leveraging the correlations between channels and spatial locations. Chen et al. [44] introduced an approach to enhance the detection precision of the model by harnessing far-distance relationships. This network captures the intrinsic correlation between distant pixels in bitemporal image pairs through an attention mechanism.

Although existing methods have achieved satisfactory outcomes, they tend to handle feature maps from different periods roughly, leading to problems such as target misdetection, omission, and edge blurring in the prediction results. To address these problems, we introduced a method based on feature difference and attention mechanism (DAFNet). The network employs a Siamese-structured encoder to process and encode the features of images from disparate time intervals, with ResNet18 serving as the backbone network. Upon obtaining image features from distinct timeframes, a Difference Feature-Extraction Module (DFEM) is introduced to mitigate the effects of coordinate displacement issues and draw out the differential features of bitemporal image pairs. In addition, we have designed an Attention Optimization Module (ARM) to adjust the weights of the extracted difference features, and a Cross-Scale Feature-Fusion Module (CSFM) to amalgamate features stemming from disparate temporal stages. Under the effects of these three auxiliary modules, our method achieves higher accuracy.

The main contribution of this work is the proposal of a new Remote-Sensing Image Change-Detection Network based on Feature Difference and Attention Mechanism (DAFNet). This network effectively addresses existing problems in change detection, such as target misdetection, omissions, and edge blurring. Results from the experiments conducted on the change-detection datasets GECDD and LEVIR-CD reveal that DAFNet, compared with other advanced methods, attains higher accuracy and efficiently mitigates issues such as target misdetection, omission, and the blurring of detection result edges.

2. Related Work

Remote-sensing image change detection is a technology that identifies changes in the surface or objects over time by analyzing remote-sensing image data in multiple periods. It is an important application field in remote-sensing technology. The main goal of change detection is to detect changes that occur in images from two or more time periods and provide quantitative and qualitative information about these changes. These changes may involve changes in the type of land cover, the addition or removal of buildings, the growth or decline of vegetation, etc. Through the detection and analysis of these changes, key information about the dynamic changes to the land surface can be obtained to provide support for decision-making in related fields.

2.1. Traditional Change-Detection Methods

In the past, traditional change-detection methods have mainly relied on pixel-level change detection and thresholding. The image-difference method is a common and intuitive change-detection method that determines surface changes by comparing pixel values of bitemporal image pairs [32]. However, this method is very sensitive to noise and is easily affected by seasonal changes, variations in illumination conditions, and so on. Liu et al. [33] proposed a method utilizing FCN for change detection after classification. The post-classification method first classifies each image individually and subsequently compares the classification results to identify surface changes. This method can provide information on the type and process of changes, but its accuracy depends on the accuracy of classification. Celik [34] and others used Principal Component Analysis (PCA) to transform multiband data with high correlation into principal components with lower correlation. Comparing the principal components at different times identifies surface changes. This method effectively reduces the data volume and mitigates the impact of noise, but it may lose some information. Lyu and Hui [45] put forward a method that leverages time-series analysis for detecting changes. Through the analysis of time-series data derived from bitemporal image pairs, the patterns of surface changes are discovered. For instance, trend analysis and seasonal decomposition methods could be used. This method considers the correlation and cyclicity in the time series, but it requires a long span of remote-sensing image data and cannot effectively use the positional relationship between pixels. Qiang and Yunhao [35] evaluates the alteration in texture and spectral characteristics using Change Vector Analysis (CVA) and subsequently applies an adaptive weighting method to blend the change intensity. This method considers the modifications in pixel values as well as the direction of these modifications, providing more information about the change. However, CVA can be affected by noise and changes in atmospheric conditions.

2.2. Change-Detection Method Based on Deep Learning

In recent years, with the development of deep-learning technology, methods based on deep learning have also been widely used in change detection and achieved good results. Chen and Shi [46] proposed a novel Siamese-based spatio-temporal attention neural network. The network designs a self-attention mechanism to model the spatio-temporal relationship of bitemporal images. Specifically, it calculates the attention weights of any two pixels at different times and locations through self-attention and uses them to generate more discriminative features. Then, by introducing self-attention in each sub-region, it can capture the spatio-temporal dependencies at different scales, resulting in better representations for objects of different sizes. Hou et al. [47] proposed a hyperspectral change-detection method based on multimorphic profiles (MMPs). The method first extracts the area attribute and height attribute of the multitemporal image pair. Second, a local absolute distance method based on spectral angle weighting is used to reconstruct the discriminative spectral domain. Then, the absolute distance is employed to extract changes in the constructed feature domain. Finally, a change map is obtained by guided filtering. Since the feature-extraction process uses a relatively time-consuming maxtree/mintree strategy, the calculation cost of this method is relatively high. Feng et al. [48] proposed a multispectral image change-detection method based on a multiscale adaptive kernel network and multimodal conditional random field. This method uses a weight-sharing bilateral encoding path to simultaneously extract the independent features of bitemporal multispectral images without introducing additional parameters and uses a convolution kernel block that can adaptively assign weights to extract the features in the image. By embedding an attention-based upsampling module in the decoding path, the feature maps can better express variation information in both channel and spatial dimensions. Finally, the detection results are smoothed using multimodal conditional random fields. Wang et al. [49] proposed a method for change detection in hyperspectral images (SST-Former). The method first encodes the position of each pixel of the multitemporal image. Subsequently, spectral sequence information and spatial texture information are extracted using a spectral transformer encoder and a spatial transformer encoder, respectively. Finally, the features of the multitemporal images are used as the input of the temporal transformer, the useful change features between the current image pair are extracted using the temporal transformer, and the detection results are obtained through the multilayer perceptron.

3. Methodology

The DAFNet introduced in this study adopts an encoder–decoder structure, which includes an encoder–decoder backbone and three assisting modules. To prevent the problem of gradient disappearance and gradient explosion, and to reduce the number of parameters of the model, the encoding network uses ResNet18 to encode the input remote-sensing image pair. The assisting modules are Difference Feature-Extraction Module (DFEM), Attention Refinement Module (ARM), and Cross-Scale Feature-Fusion Module (CSFM). We use DFEM to extract different features from the outputs of each stage of ResNet, which enhances the characteristics of change within the bitemporal image pairs. Following the DFEM process, we optimize the output features of each DFEM using ARM, leading the network to focus more on the feature regions and channels that significantly influence the detection results, therefore improving the efficiency of feature extraction. Considering that roughly merging features of distinct scales can impact the accuracy of the final results, after the ARM, we use the CSFM to merge features of different sizes in stages. This reduces feature redundancy and coordinate offset during feature fusion, therefore enhancing the accuracy of the outcomes. The structural illustration of DAFNet is depicted in Figure 1.

3.1. Network Architecture

Due to the complexity of the change characteristics in bitemporal image pairs, we use ResNet [50], which has strong feature-extraction capabilities, to extract features in the encoding stage [51]. After passing through the first encoding layer of the image, the bitemporal image pairs will be reduced to a quarter of their original size, and the channel count will be increased to 64. Every following layer will further reduce the image size by 1/2 and double the quantity of channels. The feature map output by the final encoding layer will be reduced to 1/32 of its original size, with 512 channels.

Following the encoding stage, we will obtain four groups of feature maps with different sizes and channel numbers. We input the feature maps from each layer into DFEM for difference feature extraction. After going through DFEM, we will obtain four groups of feature maps with 64 channels each, which include the image difference features. Subsequently, we input these feature maps into ARM for attention optimization, adjusting the weights between different channels and pixels, and finally, we will obtain four groups of feature maps with weights applied to feature channels and pixels.

In the decoding stage, we gradually restore the details of the feature map. Beginning with the final feature map, we perform upsampling on the feature map of each layer, and blend them with the feature map from the preceding layer. To reduce the phenomenon of feature redundancy that often occurs during fusion, we utilize CSFM to fuse feature maps at each step. Finally, we will obtain a group of feature maps with 64 channels, with the size being 1/4 of the original image size. Subsequently, we adjust the channel number of this set of feature maps through a

1 \times 1

convolution to obtain the prediction result of the change area of the bitemporal image pair.

3.2. Differential Feature-Extraction Module

During the encoding phase of the network, we first use ResNet with a Siamese architecture to extract features of the bitemporal image pairs. Upon completion of feature extraction, we can perform difference discrimination to obtain discrepancy maps. However, the discrepancy maps obtained by a simple subtraction operation often have large errors, affecting the precision of the final results. Hence, we put forward DFEM to procure more accurate difference attributes from two-time-period images. The input to this module is the characteristics obtained from the encoding network with a Siamese architecture, and the output is the discrepancy maps of the image pairs. After the discrepancy maps undergo ARM, CSFM, and other processing, the final detection result map can be obtained.

Figure 2 depicts the architecture schematic of DFEM. Let the input features from two different moments be

f_{1}

and

f_{2}

. First, we subtract and add the obtained feature maps

f_{1}

and

f_{2}

separately, then concatenate the results and send them to a

3 \times 3

convolution to acquire the discrepancy attention maps

f_{a}

of the image pairs. Subsequently, we send

f_{1}

and

f_{2}

into a

3 \times 3

convolution separately and concatenate them to acquire the feature maps

f_{c}

of the bitemporal image pairs. Next, we perform matrix multiplication of

f_{a}

and

f_{c}

, and then add the obtained result in

f_{a}

. Finally, after a

1 \times 1

convolution, the overall information of the bitemporal image is obtained and the difference feature map

f_{d}

of the bitemporal image pair is generated. Compared with directly subtracting the feature maps, this can alleviate the impact of interference factors such as light intensity and seasonal changes. The subsequent formula can represent this process:

f_{a} = f^{3} (C (f_{1} + f_{2}), (f_{1} - f_{2})))

(1)

f_{c} = C (f^{3} (f_{1}), f^{3} (f_{2}))

(2)

f_{d} = f^{1} (f_{a} \times (1 + f_{c}))

(3)

In the formula,

f^{n}

denotes a convolution operation with a kernel size measuring

n \times n

. Each convolution module is followed by a Batch Normalization (BN) and a Rectified Linear Unit (ReLU). C represents the concatenation of feature maps in the channel dimension.

3.3. Attention Refinement Module

Extensive data indicate that the land-cover conditions in most areas do not undergo significant changes over decades. Consequently, the existing public remote-sensing image change-detection datasets are typically skewed, with a far larger number of samples representing areas where no changes have occurred compared to samples where changes have taken place. As such, our model aims to assign higher weights to regions that have potentially undergone changes in the bitemporal image pairs, therefore allowing the network to focus more on these areas. At the same time, it assigns lower weights to areas that have remained largely unchanged, therefore ignoring irrelevant background regions. For this purpose, we propose a novel ARM that enables our network to focus more on areas likely to have changed, while suppressing areas that do not require excessive attention. In this way, some areas with small or insignificant changes are avoided from being missed.

Figure 3 shows the schematic diagram of the ARM architecture.

f_{i n}

represents the input to the module. Initially, we carry out both global max pooling and global average pooling on

f_{i n}

to acquire two sets of feature weights,

f_{m}

and

f_{a}

, respectively. Subsequently, we adjust the weights of

f_{i n}

based on the obtained

f_{m}

and

f_{a}

, and then add the results to obtain

f_{o}

. We then send

f_{o}

into a sigmoid activation function. Finally, we send

f_{o}

to a residual convolution module to acquire the expected output

f_{o u t}

. The aforementioned procedure can be explained by the equation below:

f_{o} = f_{i n} \times (g_{a} (f_{i n}) + g_{m} (f_{i n}))

(4)

f_{o u t} = σ (f_{o}) \times (1 + f^{3} (\cdot))

(5)

In the equation,

g_{a}

represents a global average pooling operation in one dimension. It calculates the average of all pixel values in the same channel to obtain a set of feature weights in the channel dimension.

g_{m}

represents a global max-pooling operation in one dimension, which obtains a set of feature weights in the channel dimension by taking the maximum value of all pixel values in the same channel.

σ

denotes the sigmoid activation function.

f^{3} (\cdot)

represents a convolution module with a kernel size of

3 \times 3

, along with a BN layer and a ReLU function.

Figure 4 illustrates the effectiveness of the Attention Refinement Module (ARM). Images a and b represent a pair of images taken at varying time instances. Image c is a heatmap without the application of ARM for attention refinement, and Image d is a heatmap of features that have undergone attention optimization using ARM. Image e represents the actual area of change. In Images c and d, warm-color areas are parts that the network focuses on, and cool-color areas are parts that the network suppresses. As can be seen from Image c, without ARM, the areas that the network focuses on are not accurate. It pays attention to some unchanging areas in addition to the changing areas, which may lead to omissions or false detections. As shown in Image d, after applying ARM, the network accurately identifies the areas that have changed. Moreover, it suppresses the boundaries of the changing area, therefore improving the precision of the boundaries and mitigating the issue of blurry edges in the final result.

3.4. Cross-Scale Feature-Fusion Module

During the decoding phase of the network, feature maps of varying sizes obtained during the encoding stage need to be progressively upsampled and fused. If we naively fuse feature maps of different sizes, for instance, by directly adding the feature maps or concatenating them along the channel dimension, it may cause phenomena such as coordinate offset and information redundancy, negatively influencing the predictive outcomes. To this end, we put forward a CSFM to reduce the coordinate offset and feature redundancy that commonly occur during feature fusion. This allows the network to integrate feature maps more efficiently from different stages, therefore improving the precision of the detection results.

Figure 5 depicts the schematic diagram of CSFM. This module takes feature maps from various encoding phases as input. Suppose the feature maps from two encoding stages are

f_{1}

and

f_{2}

, respectively. First, we initially fuse feature maps

f_{1}

and

f_{2}

through direct addition. Then, we subtract

f_{1}

and

f_{2}

to generate the discrepancy map between them. Afterward, we perform a maximum pooling operation on the discrepancy map to eliminate the estimation bias of the mean value caused by the convolution layer parameter error, and we obtain the difference attention map of

f_{1}

and

f_{2}

. Afterward, we use the discrepancy attention map of

f_{1}

and

f_{2}

to weight the preliminary fusion result of

f_{1}

and

f_{2}

, to reduce the coordinate offset and feature redundancy in the preliminary fusion stage. Finally, we obtain

f_{o u t}

after passing the fused feature map through a

3 \times 3

convolution module. The aforementioned procedure can be illustrated as:

f_{o u t} = f^{3} (P_{m} (f_{1} - f_{2}) \times (f_{1} + f_{2}))

(6)

In the equation,

f^{3} (\cdot)

represents a convolution module with a kernel size of

3 \times 3

, accompanied by a BN layer and a ReLU function.

P_{m}

represents maximum pooling in the channel dimension, by which a group of weights is obtained by selecting the maximum feature value at every position over all channels.

The combination of the above three modules constructs our DAFNet. This network first uses an encoder network with a Siamese architecture to perform feature encoding on the bitemporal images, p1 and p2, obtaining four groups of feature maps with different sizes. Then, each of them is fed into the DFEM to draw out the discrepancy features of the images. Subsequently, the ARM is employed to adjust the weights of the extracted difference features. Finally, the CSFM is utilized to merge the feature maps from the four distinct encoding phases. Ultimately, the change-detection result map of the bitemporal image pair is generated by adjusting the channel number of the feature map using a convolution classifier.

4. Experiment

4.1. Dataset

The accuracy and scale of the dataset have a significant impact on the training and evaluation of the network. Using a dataset that is large in scale and accurately annotated can make our model more precise and generalizable. In our work, we utilize a large-scale change-detection dataset, LEVIR-CD. Furthermore, we also established a well-annotated dataset (GECDD) to train and evaluate our model.

4.1.1. Google Earth Remote-Sensing Imagery Change-Detection Dataset (GECDD)

GECDD is a self-constructed change-detection dataset for remote-sensing images, with images sourced from Google Earth. We have collected 3500 pairs of high-resolution satellite images, each of size 512 × 512 pixels, covering an area of approximately 3000 square meters through Google Earth. The images were taken from different regions in China over 10 years. To enhance the diversity of the dataset and simulate real-world application scenarios as closely as possible, we specifically selected pairs of remote-sensing images that were taken in different seasons and include various object types. To ensure the accuracy of the dataset, the labels for each pair of images have been carefully verified. To better evaluate the model, we varied the proportion of positive and negative examples substantially, mainly between 5% and 50%. The dataset has been divided into a training set, validation set, and test set, which account for 70%, 10%, and 20% of the dataset, respectively. Figure 6 presents a selection of examples and collection regions of GECDD.

4.1.2. LEVIR-CD Dataset

LEVIR-CD is a large-scale building-change-detection dataset consisting of 637 pairs of high-resolution remote-sensing images, each of size 1024 × 1024 pixels. It encompasses a variety of building types, including but not limited to opulent villas, towering apartment buildings, personal homes, industrial factories, and warehouses. The time interval between the images in the dataset is anywhere between 5 and 14 years. To ensure accuracy, the labeling of each sample in the dataset has been double-checked. We have randomly divided the dataset into a training set, validation set, and test set, with ratios of 70%, 10%, and 20%, respectively. Figure 7 presents some samples from the LEVIR-CD.

4.1.3. CDD Dataset

The CDD dataset is a publicly available dataset for change detection in remote-sensing images [52]. The dataset has 16,000 remote-sensing image pairs that vary with seasons, the image size is 256 × 256 pixels, and the spatial resolution is 3 to 100 cm/pixel. We divide the dataset into 10,000 training sets and 3000 test and validation sets. Figure 8 presents some samples from the CDD.

4.2. Experiment Details

Our model employs ResNet18 as the encoding network, without loading any pre-trained weights. The learning rate at the beginning is configured to 0.001, and we subsequently adopt a learning rate decay approach to dynamically adjust it. During the training process, the calculation of the learning rate of each round goes as follows:

l r = b a s e_l r \times γ^{(n / T)}

(7)

In the formula,

b a s e_l r

denotes the learning rate at the commencement of training,

γ

is a constant used to control the rate of learning rate decay, n represents the current training round, and T represents the period of learning rate decay. Here, we set

γ

to 0.9, the maximum number of training rounds to 200, the decay period T to 4, and the batch size for each training to 8. The loss function uses the binary cross-entropy function, and the Adam optimizer is used as the optimization algorithm. We chose Precision (PR), Recall (RC), Intersection over Union (IoU), and F1 as evaluation metrics. Their formulas are as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(8)

R e c a l l = \frac{T P}{T P + F N}

(9)

I o U = \frac{T P}{T P + F P + F N}

(10)

F 1 = \frac{2 \times P R \times R C}{P R + R C}

(11)

In these formulas,

T P

represents the sum of correctly predicted change samples,

F N

represents the sum of change samples predicted as non-change samples,

F P

represents the sum of non-change samples predicted as change samples, and

T N

represents the sum of correctly predicted non-change samples. In addition, We also gave the time spent by each model to infer each pair of images.

4.3. Ablation Experiment on GECDD

To validate DFEM, ARM, and CSFM, we designed ablation experiments conducted on the GECDD dataset. The specific experimental method involves incrementally adding each module to the backbone network and observing the accuracy after the addition of the module. The experimental results are shown in Table 1.

(1) Ablation experiment of DFEM: This module avoids acquiring the difference map through direct feature subtraction. Instead, it adopts a smoother approach to capture the differences between two groups of features, therefore generating differential features of the bitemporal image pairs and reducing the loss brought by feature subtraction. Outcomes of the experiment reveal that the introduction of DFEM improves the network’s IoU and F1 scores by 1.45 and 0.99, respectively, confirming the effectiveness of DFEM.

(2) Ablation experiment of ARM: To verify whether ARM can guide the network to focus more on regions where changes may occur and ignore irrelevant background areas, we designed an ablation experiment for ARM. Experimental results demonstrate that the introduction of ARM allows the network to selectively assign higher weights to the changing areas. The network’s IoU and F1 scores improved by 1.17 and 0.80, respectively, verifying the effectiveness of ARM.

(3) Ablation experiment of CSFM: To confirm whether CSFM can enable the network to fuse feature maps more efficiently from different stages, we designed an ablation experiment for CSFM. Experimental results show that CSFM can reduce the coordinate offset and feature redundancy during feature fusion, leading to the network’s IoU and F1 scores increasing by 1.05 and 0.71, respectively, confirming the effectiveness of CSFM.

The results of the ablation experiments of the three modules demonstrate that the proposed DFEM, ARM, and CSFM can improve various metrics of the network, leading to better detection results. Adding these three modules can improve the IoU and F1 score of the network. This suggests that for change-detection tasks, the addition of these three modules can help the network extract the change features of bitemporal images more efficiently, pay more attention to areas where changes may occur, and fuse features of different sizes more efficiently.

4.4. Comparative Experiments with Different Methods on GECDD

To validate the advantages of the suggested approach, we designed comparative experiments, which were conducted on the GECDD. Within this experiment, we contrasted DAFNet with several other advanced change-detection methods. In the interest of experimental fairness, all methods employed the consistent training strategy and were trained on a consistent device. Additionally, to better assess the models, all methods did not load pre-trained weights. The experimental results of the comparison experiments are presented in Table 2. As can be observed from the results in the table, among other change-detection methods, Swin-Transformer achieved the best results, with composite evaluation metrics of IoU and F1 scores reaching 70.29% and 82.55%, respectively. Our DAFNet achieved the highest accuracy in both IoU and F1, higher than Swin-Transformer by 1.65% and 1.13%, respectively. In addition, DAFNet has a faster inference speed than Swin-Transformer and Segformer. This shows that in comparison to other advanced change-detection algorithms, DAFNet has a certain superiority.

Figure 9 presents the prediction outcomes of these methods on the GECDD. To better compare the detection results of each model, we mark the missed detection areas in green and the false detection areas in red. Among them, img1 and img2 are a pair of images to be detected. ’a’ represents the actual area of change in this pair of images, while ’b–j’ show the predictive outcomes of the corresponding methods. As seen in the figure, other change-detection methods all present issues to varying degrees, including object misdetection, omission, and blurry prediction result edges. In contrast, our DAFNet, considering the characteristic of two images as input for change-detection tasks, extracts the difference features of the two images through DFEM, enhancing the accuracy of detection results. Simultaneously, it allows the network to focus more on the changed regions and their edges through ARM, effectively avoiding the occurrence of object misdetection and omission, and ensuring more accurate boundaries in the detection results. As can be seen from the figure, DAFNet has fewer detection misses (green parts in the figure) and detection errors (red parts in the figure) than Swin-Transformer and Segformer.

4.5. Generalization Experiments on LEVIR-CD Dataset

To verify the generalization ability of DAFNet, we conducted generalization experiments on LEVIR-CD and CDD.

4.5.1. Generalization Experiments on LEVIR-CD Dataset

The outcomes from the generalization experiment on LEVIR-CD are displayed in Table 3. Outcomes from the generalization experiment on LEVIR-CD demonstrate that, among other change-detection methods, TFI-GR achieved the best detection results, with composite evaluation metrics of IoU and F1 scores reaching 79.32% and 88.47%, respectively. However, our DAFNet still achieved the highest accuracy on the LEVIR-CD dataset, with IoU and F1 scores reaching 80.75% and 89.35%, respectively. This is higher than the best-performing model, TFI-GR, by 1.43% and 0.88%, respectively.

Figure 10 presents the prediction results of these change-detection methods on LEVIR-CD. Among them, img1 and img2 are a pair of bitemporal remote-sensing images to be detected. ‘a’ represents the actual area of change in this pair of images, and ‘b–j’ are the prediction outcomes of different change-detection algorithms. As can be observed from the figure, other algorithms such as FC-Siam-Conc and MFGAN have severe omission phenomena. Moreover, the edges of the detection results from most networks are blurry, failing to successfully separate adjacent change areas. However, our DAFNet, under the influence of DFEM and ARM, effectively solves the problem of object omission and misdetection. Using ARM to suppress unchanged areas within adjacent change regions, it successfully segments adjacent change areas, providing clear boundaries for the detection results. This showcases the superiority of our proposed DAFNet.

4.5.2. Generalization Experiments on CDD Dataset

The outcomes from the generalization experiment on CDD are displayed in Table 4. Outcomes from the generalization experiment demonstrate that, among other change-detection methods, Segformer achieved the best detection results, with composite evaluation metrics of IoU and F1 scores reaching 79.52% and 88.89% on CDD, respectively. However, our DAFNet still achieved the highest accuracy on CDD, with IoU and F1 scores reaching 81.47% and 89.79%, respectively. This is higher than the best-performing model, Segformer, by 1.95% and 1.20%, respectively.

Figure 11 presents the prediction results of these change-detection methods on CDD. Among them, img1 and img2 are a pair of bitemporal remote-sensing images to be detected. ‘a’ represents the actual area of change in this pair of images, and ‘b–j’ are the prediction outcomes of different change-detection algorithms. It can be seen from the figure that there are large areas of missed detection in the detection results of other models. In addition, the detection results of some models even have serious false detections. However, only a small number of missed detections appeared in the detection results of our proposed DAFNet, and almost no false detections occurred. This also showcases the superiority of our proposed DAFNet. Generalization experiments on LEVIR-CD and CDD show that DAFNet can be applied to most change-detection scenarios, showing excellent generalization and robustness.

5. Discussion

5.1. Advantages of the Proposed Method

The method proposed in this paper outperforms other methods in both comparison experiments and generalization experiments, and can effectively detect the changed regions in bitemporal image pairs. Experimental results on three datasets demonstrate the effectiveness and superiority of the method. Compared with other methods, this method has higher detection accuracy. In the encoding stage of the network, we input the feature map of each layer into DFEM for differential feature extraction. Compared with directly subtracting the feature map, this can alleviate the impact of interference factors such as light intensity and seasonal changes. Subsequently, we input these extracted features into the ARM for attention optimization and adjust the weights between each channel and pixel to make the network pay more attention to the areas that may change in the bitemporal image pair. In the decoding stage, we use CSFM to fuse the feature maps of each stage. Compared with directly adding or splicing feature maps together, using CSFM to fuse feature maps can reduce feature redundancy that is easy to occur during feature fusion. In practical applications, our proposed method performs well in various images of different resolutions and complex scenes, effectively alleviates false detection and missed detection, and has strong generalization and robustness.

5.2. Limitations and Future Research Directions

Although our method achieves the highest accuracy on all three datasets, the prediction time of the model is not the fastest. Therefore, we will optimize the model to have a faster inference speed. In addition, this method has high requirements for the accuracy of the dataset. Although there is no need to manually extract features like traditional change-detection methods, labeling a large change-detection dataset still requires a lot of human resources. To this end, we will investigate methods based on semi-supervised learning to further reduce the dependence on the dataset.

6. Summary

In this paper, we proposed a remote-sensing change-detection network based on feature differences and an attention mechanism, namely DAFNet. Targeting the dual-input characteristic of change detection, the DAFNet employs a DFEM to draw out the difference features of bitemporal image pairs, reducing the loss and coordinate offset problems associated with deriving difference maps through feature subtraction. Subsequently, an ARM is utilized to perform attention optimization on the extracted difference features, enabling the network to focus more on the changing areas in the bitemporal image pairs. It also suppresses the boundaries of change areas and large unchanged regions, therefore reducing interference from irrelevant areas and achieving more precise boundaries in detection results. Finally, a CSFM is employed to merge feature maps from various periods. This avoids the loss brought about by coarse fusion, reduces redundancy and displacement issues during the feature fusion process, and further enhances detection accuracy. The experimental results affirm the superiority and generalization ability of this network, with its performance on the GECDD and LEVIR-CD datasets surpassing other change-detection methods.

Author Contributions

Conceptualization, C.M. and L.W.; methodology, M.X. and C.M.; software, C.M. and H.Y.; validation, L.W. and H.L.; formal analysis, H.L.; investigation, C.M. and H.Y.; resources, M.X.; data curation, C.M.; writing—original draft preparation, C.M. and H.Y.; writing—review and editing, M.X.; visualization, L.W.; supervision, L.W.; project administration, M.X.; funding acquisition, M.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Natural Science Foundation of PR China (42075130).

Data Availability Statement

The data and the code of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hu, K.; Zhang, E.; Xia, M.; Weng, L.; Lin, H. Mcanet: A multi-branch network for cloud/snow segmentation in high-resolution remote sensing images. Remote Sens. 2023, 15, 1055. [Google Scholar] [CrossRef]
Dai, X.; Xia, M.; Weng, L.; Hu, K.; Lin, H.; Qian, M. Multi-Scale Location Attention Network for Building and Water Segmentation of Remote Sensing Image. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5609519. [Google Scholar] [CrossRef]
Yin, H.; Weng, L.; Li, Y.; Xia, M.; Hu, K.; Lin, H.; Qian, M. Attention-guided siamese networks for change detection in high resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2023, 117, 103206. [Google Scholar] [CrossRef]
Singh, A. Review Article Digital change detection techniques using remotely-sensed data. Int. J. Remote Sens. 1989, 10, 989–1003. [Google Scholar] [CrossRef] [Green Version]
Desclée, B.; Bogaert, P.; Defourny, P. Forest change detection by statistical object-based method. Remote Sens. Environ. 2006, 102, 1–11. [Google Scholar] [CrossRef]
Ji, H.; Xia, M.; Zhang, D.; Lin, H. Multi-Supervised Feature Fusion Attention Network for Clouds and Shadows Detection. ISPRS Int. J. Geo-Inf. 2023, 12, 247. [Google Scholar] [CrossRef]
Chen, B.; Xia, M.; Qian, M.; Huang, J. MANet: A multi-level aggregation network for semantic segmentation of high-resolution remote sensing images. Int. J. Remote Sens. 2022, 43, 5874–5894. [Google Scholar] [CrossRef]
Rokni, K.; Ahmad, A.; Selamat, A.; Hazini, S. Water Feature Extraction and Change Detection Using Multitemporal Landsat Imagery. Remote Sens. 2014, 6, 4173–4189. [Google Scholar] [CrossRef] [Green Version]
Qu, Y.; Xia, M.; Zhang, Y. Strip pooling channel spatial attention network for the segmentation of cloud and cloud shadow. Comput. Geosci. 2021, 157, 104940. [Google Scholar] [CrossRef]
Lu, C.; Xia, M.; Lin, H. Multi-scale strip pooling feature aggregation network for cloud and cloud shadow segmentation. Neural Comput. Appl. 2022, 34, 6149–6162. [Google Scholar] [CrossRef]
Miao, S.; Xia, M.; Qian, M.; Zhang, Y.; Liu, J.; Lin, H. Cloud/shadow segmentation based on multi-level feature enhanced network for remote sensing imagery. Int. J. Remote Sens. 2022, 43, 5940–5960. [Google Scholar] [CrossRef]
Lu, C.; Xia, M.; Qian, M.; Chen, B. Dual-branch network for cloud and cloud shadow segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5410012. [Google Scholar] [CrossRef]
Lei, T.; Zhang, Y.; Lv, Z.; Li, S.; Liu, S.; Nandi, A.K. Landslide Inventory Mapping From Bitemporal Images Using Deep Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2019, 16, 982–986. [Google Scholar] [CrossRef]
Gao, J.; Weng, L.; Xia, M.; Lin, H. MLNet: Multichannel feature fusion lozenge network for land segmentation. J. Appl. Remote Sens. 2022, 16, 016513. [Google Scholar] [CrossRef]
Sommer, S.; Hill, J.; Mégier, J. The potential of remote sensing for monitoring rural land use changes and their effects on soil conditions. Agric. Ecosyst. Environ. 1998, 67, 197–209. [Google Scholar] [CrossRef]
Ma, Z.; Xia, M.; Lin, H.; Qian, M.; Zhang, Y. FENet: Feature enhancement network for land cover classification. Int. J. Remote Sens. 2023, 44, 1702–1725. [Google Scholar] [CrossRef]
Ji, S.; Wei, S.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
Chen, J.; Xia, M.; Wang, D.; Lin, H. Double Branch Parallel Network for Segmentation of Buildings and Waters in Remote Sensing Images. Remote Sens. 2023, 15, 1536. [Google Scholar] [CrossRef]
Chen, B.; Xia, M.; Huang, J. MFANet: A Multi-Level Feature Aggregation Network for Semantic Segmentation of Land Cover. Remote Sens. 2021, 13, 731. [Google Scholar] [CrossRef]
Hussain, M.; Chen, D.; Cheng, A.; Wei, H.; Stanley, D. Change detection from remotely sensed images: From pixel-based to object-based approaches—ScienceDirect. ISPRS J. Photogramm. Remote Sens. 2013, 80, 91–106. [Google Scholar] [CrossRef]
Ma, Z.; Xia, M.; Weng, L.; Lin, H. Local Feature Search Network for Building and Water Segmentation of Remote Sensing Image. Sustainability 2023, 15, 3034. [Google Scholar] [CrossRef]
Hu, K.; Li, M.; Xia, M.; Lin, H. Multi-scale feature aggregation network for water area segmentation. Remote Sens. 2022, 14, 206. [Google Scholar] [CrossRef]
Hu, K.; Wang, T.; Shen, C.; Weng, C.; Zhou, F.; Xia, M.; Weng, L. Overview of Underwater 3D Reconstruction Technology Based on Optical Images. J. Mar. Sci. Eng. 2023, 11, 949. [Google Scholar] [CrossRef]
Chen, H.; Qi, Z.; Shi, Z. Remote Sensing Image Change Detection With Transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5607514. [Google Scholar] [CrossRef]
Wang, D.; Weng, L.; Xia, M.; Lin, H. MBCNet: Multi-Branch Collaborative Change-Detection Network Based on Siamese Structure. Remote Sens. 2023, 15, 2237. [Google Scholar] [CrossRef]
Ma, C.; Weng, L.; Xia, M.; Lin, H.; Qian, M.; Zhang, Y. Dual-branch network for change detection of remote sensing image. Eng. Appl. Artif. Intell. 2023, 123, 106324. [Google Scholar] [CrossRef]
Huang, R.; Wang, R.; Guo, Q.; Zhang, Y.; Fan, W. IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection. arXiv 2022, arXiv:2207.09240. [Google Scholar]
Wang, M.; Tan, K.; Jia, X.; Wang, X.; Chen, Y. A Deep Siamese Network with Hybrid Convolutional Feature Extraction Module for Change Detection Based on Multi-sensor Remote Sensing Images. Remote Sens. 2020, 12, 205. [Google Scholar] [CrossRef] [Green Version]
Tewkesbury, A.P.; Comber, A.J.; Tate, N.J.; Lamb, A.; Fisher, P.F. A critical synthesis of remotely sensed optical image change detection techniques. Remote Sens. Environ. 2015, 160, 1–14. [Google Scholar] [CrossRef] [Green Version]
Song, L.; Xia, M.; Weng, L.; Lin, H.; Qian, M.; Chen, B. Axial cross attention meets CNN: Bibranch fusion network for change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 16, 32–43. [Google Scholar] [CrossRef]
Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8007805. [Google Scholar] [CrossRef]
Weismiller, R.A.; Kristof, S.J.; Scholz, D.K.; Anuta, P.E.; Momin, S.A. Change detection in coastal zone environments. Photogramm. Eng. Remote Sens. 1978, 43, 1533–1539. [Google Scholar]
Liu, J.; Gong, M.; Qin, K.; Zhang, P. A Deep Convolutional Coupling Network for Change Detection Based on Heterogeneous Optical and Radar Images. IEEE Trans. Neural Netw. Learn. Syst. 2016, 29, 545–559. [Google Scholar] [CrossRef] [PubMed]
Celik, T. Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and k-Means Clustering. IEEE Geosci. Remote Sens. Lett. 2009, 6, 772–776. [Google Scholar] [CrossRef]
Qiang, C.; Yunhao, C. Multi-Feature Object-Based Change Detection Using Self-Adaptive Weight Change Vector Analysis. Remote Sens. 2016, 8, 549. [Google Scholar]
Chu, S.; Li, P.; Xia, M. MFGAN: Multi feature guided aggregation network for remote sensing image. Neural Comput. Appl. 2022, 34, 10157–10173. [Google Scholar] [CrossRef]
Hu, K.; Zhang, D.; Xia, M.; Qian, M.; Chen, B. LCDNet: Light-weighted cloud detection network for high-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4809–4823. [Google Scholar] [CrossRef]
Chen, K.; Xia, M.; Lin, H.; Qian, M. Multi-scale Attention Feature Aggregation Network for Cloud and Cloud Shadow Segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5612216. [Google Scholar] [CrossRef]
Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 680–688. [Google Scholar] [CrossRef]
Weng, L.; Pang, K.; Xia, M.; Lin, H.; Qian, M.; Zhu, C. Sgformer: A Local and Global Features Coupling Network for Semantic Segmentation of Land Cover. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 6812–6824. [Google Scholar] [CrossRef]
Deng, Z.; Zhou, S.; Zhao, J.; Zou, H. Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm Remote Sens. 2018, 145, 3–22. [Google Scholar] [CrossRef]
Daudt, R.C.; Saux, B.L.; Boulch, A. Fully Convolutional Siamese Networks for Change Detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018. [Google Scholar]
Liu, Y.; Pang, C.; Zhan, Z.; Zhang, X.; Yang, X. Building Change Detection for Remote Sensing Images Using a Dual-Task Constrained Deep Siamese Convolutional Network Model. IEEE Geosci. Remote Sens. Lett. 2020, 18, 811–815. [Google Scholar] [CrossRef]
Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual Attentive Fully Convolutional Siamese Networks for Change Detection in High-Resolution Satellite Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1194–1206. [Google Scholar] [CrossRef]
Lyu, H.; Hui, L. Learning a transferable change detection method by Recurrent Neural Network. In Proceedings of the IGARSS 2016—2016 IEEE International Geoscience and Remote Sensing Symposium, Beijing, China, 10–15 July 2016. [Google Scholar]
Chen, H.; Shi, Z. A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
Hou, Z.; Li, W.; Li, L.; Tao, R.; Du, Q. Hyperspectral change detection based on multiple morphological profiles. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5507312. [Google Scholar] [CrossRef]
Feng, S.; Fan, Y.; Tang, Y.; Cheng, H.; Zhao, C.; Zhu, Y.; Cheng, C. A Change Detection Method Based on Multi-Scale Adaptive Convolution Kernel Network and Multimodal Conditional Random Field for Multi-Temporal Multispectral Images. Remote Sens. 2022, 14, 5368. [Google Scholar] [CrossRef]
Wang, Y.; Hong, D.; Sha, J.; Gao, L.; Liu, L.; Zhang, Y.; Rong, X. Spectral–spatial–temporal transformers for hyperspectral image change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5536814. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Zhang, S.; Weng, L. STPGTN—A Multi-Branch Parameters Identification Method Considering Spatial Constraints and Transient Measurement Data. CMES Comput. Model. Eng. Sci. 2023, 136, 2635–2654. [Google Scholar] [CrossRef]
Lebedev, M.A.; Vizilter, Y.V.; Vygolov, O.V.; Knyaz, V.A.; Rubis, A.Y. Change detection in remote sensing images using conditional adversarial networks. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 565–571. [Google Scholar] [CrossRef] [Green Version]
Qian, J.; Xia, M.; Zhang, Y.; Liu, J.; Xu, Y. TCDNet: Trilateral Change Detection Network for Google Earth Image. Remote Sens. 2020, 12, 2669. [Google Scholar] [CrossRef]
Song, L.; Xia, M.; Jin, J.; Qian, M.; Zhang, Y. SUACDNet: Attentional change detection network based on siamese U-shaped structure. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102597. [Google Scholar] [CrossRef]
Li, Z.; Tang, C.; Wang, L.; Zomaya, A.Y. Remote sensing change detection via temporal feature interaction and guided refinement. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5628711. [Google Scholar] [CrossRef]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Varghese, A.; Gubbi, J.; Ramaswamy, A.; Balamuralidhar, P. ChangeNet: A deep learning architecture for visual change detection. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]

Figure 1. The architecture schematic of DAFNet.

Figure 2. Differential Feature-Extraction Module (DFEM).

Figure 3. Attention refinement module.

Figure 4. A comparison of heatmaps generated with and without the use of ARM. Among them, Image (a,b) represent a pair of images taken at varying time instances, Image (c) is the heatmap without using ARM, Image (d) is the heatmap after using ARM, and Image (e) is the label.

Figure 5. Cross-scale feature-fusion module.

Figure 6. Partial samples from the GECDD dataset.

Figure 7. Partial samples from the LEVIR-CD. In the figure, (a–e) are some representative bitemporal image pairs and their corresponding change regions in the LEVIR-CD dataset.

Figure 8. Partial samples from the CDD. In the figure, (a–e) are some representative bitemporal image pairs and their corresponding change regions in the CDD dataset.

Figure 9. The prediction results of different change-detection methods on the GECDD dataset are shown in the figure. In the figure, numbers 1, 2, 3 represent three representative bitemporal image pairs. Img1 and Img2 represent the bitemporal Google Earth images, respectively. The letters (a–j), respectively, represent the label, the prediction maps of FC-Siam-Conc, SNUNet, STANet, SUACDNet, TFI-GR, Segformer, ChangeNet, Swin-Transformer, and our proposed network DAFNet.

Figure 10. The prediction results of various change-detection methods on LEVIR-CD are shown in the figure. In the figure, numbers 1, 2, 3 represent three representative bitemporal image pairs. Img1 and Img2, respectively, represent the bitemporal Google Earth images, and (a–j) correspond to the label, and the prediction maps of FC-Siam-Conc, STANet, MFGAN, ChangeNet, SUACDNet, Segformer, Swin-Transformer TFI-GR, and our proposed network DAFNet, respectively.

Figure 11. The prediction results of various change-detection methods on CDD are shown in the figure. In the figure, numbers 1, 2, 3 represent three representative bitemporal image pairs. Img1 and Img2, respectively, represent the bitemporal Google Earth images, and (a–j) correspond to the label, and the prediction maps of FC-Siam-Conc, DASNet, MFGAN, STANet, Swin-Transformer, TFI-GR, SUACDNet, Segformer and our proposed network DAFNet, respectively.

Table 1. Ablation experiment of DAFNet.

Method	PR (%)	RC (%)	IoU (%)	F1 (%)
Backbone + ARM + CSFM	88.33	77.72	70.49	82.69
Backbone + DFEM + CSFM	88.23	78.15	70.77	82.88
Backbone + DFEM + ARM	88.65	77.97	70.89	82.97
Backbone + DFEM + ARM + CSFM	88.89	79.04	71.94	83.68

Table 2. Comparative experimental results on GECDD.

Method	PR (%)	RC (%)	IoU (%)	F1 (%)	Time (s)
FC-EF [42]	73.36	43.34	37.45	54.49	0.34
FC-Siam-Conc [42]	78.08	47.46	41.88	59.03	0.34
FC-Siam-Diff [42]	77.63	46.26	40.82	57.97	0.31
TCDNet [53]	88.27	74.03	67.40	80.53	0.43
SNUNet [31]	88.65	75.11	68.52	81.32	0.48
MFGAN [36]	87.99	76.11	68.95	81.62	0.52
STANet [46]	89.53	75.27	69.18	81.78	0.55
DASNet [44]	89.74	75.46	69.47	81.98	1.04
SUACDNet [54]	89.50	75.69	69.52	82.02	1.11
TFI-GR [55]	89.71	75.83	69.76	82.19	0.54
Segformer [56]	87.63	77.76	70.07	82.40	1.39
ChangeNet [57]	88.57	77.11	70.14	82.45	0.64
Swin-Transformer [58]	89.82	76.38	70.29	82.55	1.29
DAFNet (Ours)	88.89	79.04	71.94	83.68	0.55

Table 3. Generalization Experiments on LEVIR-CD.

Method	PR (%)	RC (%)	IoU (%)	F1 (%)	Time (s)
FC-EF	87.65	81.73	73.29	84.59	0.13
FC-Siam-Conc	87.40	84.87	75.62	86.12	0.13
FC-Siam-Diff	89.35	82.71	75.29	85.90	0.12
SNUNet	87.80	85.73	76.60	86.75	0.18
TCDNet	88.16	85.40	76.62	86.76	0.15
DASNet	90.84	83.09	76.67	86.79	0.37
STANet	89.26	85.10	77.20	87.13	0.22
MFGAN	88.79	86.21	77.74	87.48	0.19
ChangeNet	90.20	85.27	78.04	87.67	0.23
SUACDNet	89.58	85.89	78.09	87.70	0.37
Segformer	91.37	84.91	78.61	88.02	0.48
Swin-Transformer	92.22	84.84	79.18	88.38	0.44
TFI-GR	91.84	85.33	79.32	88.47	0.19
DAFNet (Ours)	92.64	86.29	80.75	89.35	0.20

Table 4. Generalization Experiments on CDD.

Method	PR (%)	RC (%)	IoU (%)	F1 (%)	Time (s)
FC-EF	86.41	60.05	54.87	70.86	0.12
FC-Siam-Conc	83.46	64.44	57.14	72.73	0.12
FC-Siam-Diff	84.63	63.15	56.65	72.33	0.11
ChangeNet	82.06	90.30	75.42	85.99	0.22
SNUNet	84.85	89.82	77.40	87.26	0.16
TCDNet	83.50	91.12	77.21	87.14	0.15
DASNet	84.94	90.32	77.85	87.55	0.36
MFGAN	83.54	92.99	78.59	88.01	0.19
STANet	83.22	93.62	78.76	88.12	0.21
Swin-Transformer	84.09	93.15	79.19	88.39	0.46
TFI-GR	84.40	92.50	78.99	88.27	0.19
SUACDNet	83.52	93.95	79.26	88.43	0.38
Segformer	83.67	94.13	79.52	88.59	0.49
DAFNet (Ours)	85.86	94.09	81.47	89.79	0.19

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, C.; Yin, H.; Weng, L.; Xia, M.; Lin, H. DAFNet: A Novel Change-Detection Model for High-Resolution Remote-Sensing Imagery Based on Feature Difference and Attention Mechanism. Remote Sens. 2023, 15, 3896. https://doi.org/10.3390/rs15153896

AMA Style

Ma C, Yin H, Weng L, Xia M, Lin H. DAFNet: A Novel Change-Detection Model for High-Resolution Remote-Sensing Imagery Based on Feature Difference and Attention Mechanism. Remote Sensing. 2023; 15(15):3896. https://doi.org/10.3390/rs15153896

Chicago/Turabian Style

Ma, Chong, Hongyang Yin, Liguo Weng, Min Xia, and Haifeng Lin. 2023. "DAFNet: A Novel Change-Detection Model for High-Resolution Remote-Sensing Imagery Based on Feature Difference and Attention Mechanism" Remote Sensing 15, no. 15: 3896. https://doi.org/10.3390/rs15153896

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DAFNet: A Novel Change-Detection Model for High-Resolution Remote-Sensing Imagery Based on Feature Difference and Attention Mechanism

Abstract

1. Introduction

2. Related Work

2.1. Traditional Change-Detection Methods

2.2. Change-Detection Method Based on Deep Learning

3. Methodology

3.1. Network Architecture

3.2. Differential Feature-Extraction Module

3.3. Attention Refinement Module

3.4. Cross-Scale Feature-Fusion Module

4. Experiment

4.1. Dataset

4.1.1. Google Earth Remote-Sensing Imagery Change-Detection Dataset (GECDD)

4.1.2. LEVIR-CD Dataset

4.1.3. CDD Dataset

4.2. Experiment Details

4.3. Ablation Experiment on GECDD

4.4. Comparative Experiments with Different Methods on GECDD

4.5. Generalization Experiments on LEVIR-CD Dataset

4.5.1. Generalization Experiments on LEVIR-CD Dataset

4.5.2. Generalization Experiments on CDD Dataset

5. Discussion

5.1. Advantages of the Proposed Method

5.2. Limitations and Future Research Directions

6. Summary

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI