Next Article in Journal
A Study on the Dynamic Effects and Ecological Stress of Eco-Environment in the Headwaters of the Yangtze River Based on Improved DeepLab V3+ Network
Next Article in Special Issue
MSAC-Net: 3D Multi-Scale Attention Convolutional Network for Multi-Spectral Imagery Pansharpening
Previous Article in Journal
Multi-Classifier Fusion for Open-Set Specific Emitter Identification
Previous Article in Special Issue
Multiscale Spatial–Spectral Interaction Transformer for Pan-Sharpening
 
 
Article
Peer-Review Record

A Network Combining a Transformer and a Convolutional Neural Network for Remote Sensing Image Change Detection

Remote Sens. 2022, 14(9), 2228; https://doi.org/10.3390/rs14092228
by Guanghui Wang 1,2, Bin Li 1,*, Tao Zhang 2 and Shubi Zhang 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Reviewer 6: Anonymous
Remote Sens. 2022, 14(9), 2228; https://doi.org/10.3390/rs14092228
Submission received: 24 March 2022 / Revised: 23 April 2022 / Accepted: 28 April 2022 / Published: 6 May 2022
(This article belongs to the Special Issue Deep Reinforcement Learning in Remote Sensing Image Processing)

Round 1

Reviewer 1 Report

The paper is a re-written form of an existing paper and I detected plagiarism too.. which is serious ethical concern.

Major English Editing is required too..

In this study, the feature map of the spatiotemporal transformer module is considered, and the results shows that UVACD method can help to extract more effective change feature maps by suppressing irrelevant regions in pre-time images and enhancing the regions of interest in post-time images.

Textual Comparison with IFNet, SNUNet and similiar methods should be in form of tables not in text (Show in section 4)

 

Supplementary Material's should be provided to verify the claims author have made.

Please add in the 2nd last paragraph of your introduction – regarding aim of this study and objectives.

For a study like this, it is important to separately add some recent literature Section, Also focusing on how did you conducted keywords search and mention relevant sources (Google Scholars, WoS, Scopus) - so that readers can get better idea of relevant literature in this domain.

Conclusion section can be improved by highlighting key findings, limitation of the research and recommendation for future studies.

Methods used in the study needs to be compared well with the efficiency of one method over the other being compared statistically, I can see that missing clearly.

Explain why the current method was selected for the study, its importance and compare with traditional methods.

Also add risk factors in your study – risk associated with this domain – risk matrix. / False positive and True negative should be added. 

Please pay special attention to how the methodology and the methods used are presented. As it is currently presented, the reader does not get a clear picture of what analysis was done in the research.

Another issue is the lack of a clear message about a novelty in the study. There is a need for a clear message of what the authors have elaborated.

 

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

In this manuscript, the authors presented the “Learning a Spatiotemporal Transformer and Conv3d for Remote Sensing Image Change Detection” in which convolutional neural networks (CNNs) based on change detection are used for remote sensing images. They used two Datasets of buildings.

1: Grammatically mistakes in the Abstract and described in poor English

2: Literature Review is missing. (You should discuss the literature of past and recent papers

3: The Siamese Network is not explained through architecture.

4: you used 256×256 patches of large images. You should reference the past papers where authors used this size of patches. 

5: Results section is very complex and not described properly. It shows you are segmenting the images but not showing the change detection. You should clearly describe the changes between the two images

 

The overall organization of the paper is good. However, literature is missing. The quality of the sentence structure of this paper can be improved. Also, check for typos.  I suggest the authors address the above issues.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Thank you for giving me the opportunity to read “Learning a Spatiotemporal Transformer and Conv3d for Remote Sensing Image Change Detection”, I have the following comments:

  • Please get the paper proofread for language and grammar issues. Some sentences are very long and hard to follow. For example, the first four lines of the abstract is a single sentence. Please break such sentences down into simpler ones for ease of understanding of the readers.
  • There are many abbreviations in the paper without full forms. Please add a table of abbreviations to the paper before the introduction and spell
  • Please check the number of contributions; there is a typo here. Are these three or four?
  • Please add a method flow chart at the start of the method section to explain the different steps followed in this study.
  • Justifications must be provided to show how and why LEVIR-CD and WHU are reliable datasets for such studies. Have any other studies used these? Also, a link should be provided to the public datasets along with their access dates.
  • Figure 6 needs further explanation. What are the key takeaways from this figure? There are multiple labels in the figure; please discuss and compare them in a detailed paragraph.
  • The same comment applies to figure 7 as well. It must be properly discussed.
  • The paper lacks critical discussions. Merging the discussion and conclusion sections has undermined the discussion portion of the paper and made the conclusions very weak.
  • Please separate the conclusions and discussions section, focus on critical discussion and improve this section. These are intermingled and confusing, leaving the readers unclear about what are the key takeaways of the study.
  • The conclusion section should be rewritten to focus on the key takeaways and add the limitations and future research directions for this study.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

 

  • The authors should spell out all abbreviations on the first appearance – UVACD? LEVIR-CD? WHU-CD?
  • The title looks confusing, together with many other language issues. The paper must be proofread to remove language issues.
  • The term Cov3D should be explained for a non-technical audience, and its relevance to spatiotemporal analyses should be discussed at length.
  • The authors should justify why there is a need to have yet another CNN model? This should link the model to the problem addressed precisely.
  • What are the novelties of the proposed transformer in comparison to ViVit? The authors stated the model is inspired by ViVit but never really compared the two or stated the comparative innovations
  • Please add more details about figure 2 to clarify the various terms and variables in the figure. For example, why are dilation etc, used?
  • The authors should explain why ResNet50 is used in this study? What is the basis for this selection?
  • There are two figures named Figure 6. One on page 10 and one on page 12. Please make the corrections in the paper and in the text.
  • The authors should compare different results in Figure 6 (on page 12), Figures 7, and 8. There are many results in these figures, but the authors haven’t discussed them properly.
  • The discussions of the paper are very weak and must be rewritten, focusing on key messages and findings. Some discussion in the results section can be moved to discussion, and the section should be improved.
  • The conclusion should be a separate section. In addition, the authors should add limitations of the study here.
  • Also, the suggestion for future works is missing from the study.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 5 Report

This paper presents a CNN network to detect changes in two benchmark datasets. Although the topic is interesting but it is unclear what the research premise of this paper is. The literature is overwhelmed with papers changing the order of DL components such as convolutions layers, etc. to achieve minor improvements only on benchmark datasets not even real-world examples. I would appreciate if the authors made it clear on why we should use their method for change detection and not someone else’s work. I believe the methodology part of the paper is also unclear due to missing the definition of parameters and inconsistencies some of which are listed below. The authors need to do a better job in clarifying what they have done and why that is novel. Here are more detailed comments:

 Define abbreviations where they first show up in the text e.g. UVACD, Vit, etc.

Abstract:

F1 and IoU of the two datasets are reported. How should we know these are good numbers? How do they compare with the state-of-the-art methods?

Introduction:

What does “long-range global interactions” mean? Clearly define

“large-scale variations” such as?

“enjoyed” is an awkward choice of word please fix.

Define what “transformer” is and explain what is does as it is an important part of your work.

Grammar issues:

“Have been rapidly developed”

“Transformers can be seen as special self-attention mechanisms [22], has and they”??

“Transformers has not”

“which receives ab original image in two”

Formula:

The authors have made their formula look complex by using unnecessary indices such as Hz*Wz. What is wrong with saying images are of size W*H?

All the parameters in the formula and in figures need to be introduced. What is s, sigma, g,  C, etc.?

Figures:

The naming needs to be consistent between images and text. ASPP (which is never define by the way) is used in the txt but in the caption of the images it is called as Aspp3D.

Section 2.1.2

“where the transformer enhancement component, which is an optional part of the network for enhancing” why optional? When is it used? When do you not use this “option”?

Methodology:

Because of the unclarity of the formula, wording, etc. of this paper, I cannot comment on the methodology section.

Results:

“When Uva is used, the recall increases significantly” by significantly you mean around 0.5%?

From Table 2, some numbers on benchmark datasets seem to not matching to that of reported in the literature. For example for DTCDSN, F1 is is reported 71.95 while in the paper [19] it is reported 89.01; please define where the differences come from.

 Last but maybe more important comment: improvements seem to be negligible please discuss why should we use your method and not a simpler method? How do their method compare to even very basic change detection methods such as image differencing, PCA, MAD, machine learning such as RF, etc.? It should be clear if we spend so much time implementing these methods whether it is worth the effort or should we be better off with simpler solutions again not for benchmark datasets but for real-world applications.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 6 Report

Pag. 3. Experimental results in contribution 4 is not a contribution.

In general, this paper is well organized in all sections but the authors should show to the readers in an easy way what are the methods proposed by the authors or the modifications used in the methodologies?

Include experimental results of the standard methods used in the proposed method, such as the Vit, this is to demonstrate that the proposed method outperforms the based-methods.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The paper is significantly improved and should be warrant publication.

Reviewer 2 Report

All of my concerns have been addressed in the manuscript. I recommend the paper to be accepted in its current form.

Reviewer 3 Report

Thank you for addressing my comments.

Reviewer 4 Report

My comments have been addressed. Thank you

Reviewer 6 Report

I agree with the revised paper

Back to TopTop