Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Full-Scale Feature Fusion Siamese Network for Remote Sensing Change Detection

Electronics 2023, 12(1), 35; https://doi.org/10.3390/electronics12010035

by Huaping Zhou, Minglong Song

and Kelei Sun^*

Reviewer 1:

Chunlei Huo

Reviewer 2:

Ali Abboud

Reviewer 3:

Nathir A. Rawashdeh

Reviewer 4: Anonymous

Reviewer 5:

Xuan Truong Nguyen

Electronics 2023, 12(1), 35; https://doi.org/10.3390/electronics12010035

Submission received: 3 October 2022 / Revised: 13 December 2022 / Accepted: 18 December 2022 / Published: 22 December 2022

Round 1

Reviewer 1 Report

1、The idea is novel, however, full-scale classifier fusion is naive and needs being improved.

2、English should be improved.

3、considering the similarity between SNUNet and the proposed approach, novelty should be further clarified.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

1. Add table for efficiency analysis in terms of consumed time and memory and number of parameters for proposed and compared approaches.

2. Improve the quality of figures.

3. Add related works section especially dedicated to the most recent similar works (2022).

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Discuss how the image registration or region-of-interest alignment is performed in x,y, and rotation coordinates. How does it affect algorithm performance?

Author Response

Dear reviewer：

Thank you very much for your comments and suggestions.

The image pairs in the dataset are already aligned. Our study focuses on "full scale" and there is no discussion around this.

Reviewer 4 Report

The paper introduces a Siamese network for remote sensing change detection. The network includes several components: feature extractor, decoder and classifier. Extensive experiments have been conducted and the results are good. I have several minor concerns:

The paper states that "the full-scale classifier is novel". However, it seems to me that the classifier is just a combination of classifiers in different scales. The novelty is not significant.

Dense connections are considered in the feature extractor. However, its effect is not investigated.

The literature review is not comprehensive. Multi-modality fusion has been extensively studied in computer vision research. For example, Matnet: Motion-attentive transition network for zero-shot video object segmentation works on the fusion of motion and appearance cues, which should be included.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 5 Report

This study presents a full-scale feature fusion Siamese network (F3SNet) for change detection in remote sensing. Although the paper reports some performance enhancements over the previous works, there is no clear technical contribution and not a sufficient theoretical explanation and experimental results to support the authors’ conclusion. Therefore, I strongly believe that the paper cannot be published in its current form. Here are some detailed comments:

1. Multi-scale or hierarchical feature generation is widely explored in many previous works [20]-[30], [40], [42] or some general object detection networks such as YOLO-v3,4, 5, 6, and 7. However, what is the motivation to use a full-scale approach instead of a multi-scale one? There is no clear theoretical explanation or a visual example for misclassification caused by the existing multi-scale approaches. Furthermore, it is intuitively expected that having too many classifiers will cause a large computation overhead and “noisy” classification decisions because some shallow classifiers have more impact on the final output. The authors are encouraged to clarify the limitations of the previous multi-scaled approaches.

2. Multi-scale (adaptive) feature fusion is also intensively studied in many works [A1]-[A3] in remote sensing. In particular, due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories, naïve fusing of the extracted multi-scale features may fail to provide satisfactory results. The authors are encouraged to handle such problems.

3. Why does the proposed method equally sum up the classification results of multi-scale classifiers in Fig. 8 or Eq. (5)? How do classifiers compensate each other in detecting change?

4. Similarly, densely connected networks are widely used in previous works for multi-scale feature generation. The authors are encouraged to clarify the difference between this work and others.

5. How do multi-scale feature extractors and multi-scale classifiers affect the rates of true-positive, true-negative, false-positive, and false-negative?

6. Why and how does the baseline (without full-scale feature extractors and full-scale classifiers) outperform all previous works except SNUNet as shown in Tables 1 and 2? In particular, the baseline achieves F1-core 0.9308 which is much better than other SOTAs in Table 1 such as IFN, STANet, DASNet. Meanwhile, the numbers of parameters and FLOPs of the baseline are smaller than that of IFN, and DASNet.

7. It is expected that full-scaled feature extractors and full-scale classifiers require more computation and memory access. Why does the proposed method consume less computation? In addition, it is better to compare execution time or both computation and memory access because some operations such as concatenation and shortcut or skip connections do not cost many ALU operations but consume a lot of time due to their memory access.

[A1]. R. Shang et. al, “Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images,” Remote Sens. 2020, 12(5), 872; https://doi.org/10.3390/rs12050872

[A2]. W. Zhang et. al, “Multi-Scale and Occlusion Aware Network for Vehicle Detection and Segmentation on UAV Aerial Images,”Remote Sens. 2020, 12(11), 1760; https://doi.org/10.3390/rs12111760

[A3]. J. Zhao, et. al, “Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation,” ACM Transactions on Intelligent Systems and Technology, Volume 12, Issue 6, December 2021 Article No.: 82pp 1–20https://doi.org/10.1145/3484440

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

1.Ablation experiments on FFE and FC have explained the necessity of both. However, too many dense connections are used in the design of FFE. Can you further explain the necessity of these dense connections?

2.The idea of SNUNet comes from Unet++, which want to design a deep adaptive Unet. Can you explain whether the dense connections in the paper will cause the model to be too complex and the effectiveness of deep supervision?

Author Response

Dear reviewer

Thank you very much for your comments and suggestions.

Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

Point 1: Ablation experiments on FFE and FC have explained the necessity of both. However, too many dense connections are used in the design of FFE. Can you further explain the necessity of these dense connections?

Response 1: Top-down connections enhance the spatial localization of deep features and bottom-up branching complements the changing semantics of shallow features, and this fusion of features at different scales facilitates the supplementation of information that is lacking in each other. We argue that although this may be mixed with a large amount of redundant information, feature fusion usually does not bring negative effects to the network, so we adopt the feature fargueusion that is most beneficial to information transfer.

Point 2: The idea of SNUNet comes from Unet++, which want to design a deep adaptive Unet. Can you explain whether the dense connections in the paper will cause the model to be too complex and the effectiveness of deep supervision?

Response 2: Although the model appears to be complex, it is designed around a very simple idea, which is to use as much information as possible between features at different scales to complement each other. Furthermore, Figure 9(1) demonstrates the effectiveness of deep supervision, which we believe is effective in large part because it motivates the network to be more fully trained.

Reviewer 4 Report

The revision has addressed most of my concerns. However, the references need a carefully re-checked in the final version, since some of the references are problematic, for example, [36] is accepted to AAAI rather than CVPR.

Author Response

Dear reviewer

Thank you very much for your comments and suggestions.

References have been carefully re-checked.[36] was replaced with “Zhou, T.; Shunzhou Wang, Y.Z.; Yao, Y. Motion-Attentive Transition for Zero-Shot Video Object Segmentation. Proceedings of the AAAI Conference on Artifificial Intelligence 2020, 34, 13066–13073.”

Reviewer 5 Report

This is the revised and resubmitted version of the paper Electronics-1978509.

The readability of the paper is improved. Although the process of fusing multi-scale classifiers is relatively simple, it outperforms other previous works in terms of accuracy. For future work, it is better to provide more visual examples and an in-depth analysis of why and how multi-scale classifiers could compensate each other to enhance overall classification.

I suggest "Accept" recommendation for this paper.

Author Response

Thank you for your criticism, I have further revised the paper carefully to improve the quality.

Round 3

Reviewer 1 Report

The paper could be accepted.

Article Menu

A Full-Scale Feature Fusion Siamese Network for Remote Sensing Change Detection

[A1]. R. Shang et. al, “Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images,” Remote Sens. 2020, 12(5), 872; https://doi.org/10.3390/rs12050872

[A2]. W. Zhang et. al, “Multi-Scale and Occlusion Aware Network for Vehicle Detection and Segmentation on UAV Aerial Images,”Remote Sens. 2020, 12(11), 1760; https://doi.org/10.3390/rs12111760

[A3]. J. Zhao, et. al, “Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation,” ACM Transactions on Intelligent Systems and Technology, Volume 12, Issue 6, December 2021 Article No.: 82pp 1–20https://doi.org/10.1145/3484440

Further Information

Guidelines

MDPI Initiatives

Follow MDPI