Next Article in Journal
Camouflaged Object Detection Based on Ternary Cascade Perception
Previous Article in Journal
A Target-Based Non-Uniformity Self-Correction Method for Infrared Push-Broom Hyperspectral Sensors
 
 
Article
Peer-Review Record

Text Semantic Fusion Relation Graph Reasoning for Few-Shot Object Detection on Remote Sensing Images

Remote Sens. 2023, 15(5), 1187; https://doi.org/10.3390/rs15051187
by Sanxing Zhang 1,2,3, Fei Song 1,4, Xianyuan Liu 1,2,3, Xuying Hao 1,2,3, Yujia Liu 1,2,3, Tao Lei 1,* and Ping Jiang 1
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Remote Sens. 2023, 15(5), 1187; https://doi.org/10.3390/rs15051187
Submission received: 9 January 2023 / Revised: 13 February 2023 / Accepted: 16 February 2023 / Published: 21 February 2023
(This article belongs to the Section Remote Sensing Image Processing)

Round 1

Reviewer 1 Report

Paper Summary

This paper proposed a remote sensing few-shot object detection method using text semantic fusion relation graph reasoning. They build a corpus containing text language descriptions about object attributes and relations. And graph reasoning is used to learn key spatial and semantic relationships and enhance the robustness of few-shot object feature representation. Experimental results on two benchmark datasets demonstrate their effectiveness.

 

Strengths

1. This paper builds a text corpus of corresponding category information description for NWPU VHR-10 and DIOR datasets. The Gated Graph Neural Networks are used to learn semantic and spatial relationships between region proposals.

 

Weakness

1. Three main components in Fig. 2 are confusing. It is not easy to find TSE, RGL and JRR modules in overview architecture.

2. In Sec. 4.2, how do you align the semantic information of the proposals with the semantic information of the labels?

3. In Sec. 4.4, although the semantic relation from the knowledge graph contains additional information, only using quick fine-tuning seems inadequate for FSOD.

4. More details about a text corpus of category information description are needed. Since this is an important contribution.

5. More related works should be cited:

[1] Zhu C, Chen F, Ahmed U, et al. Semantic relation reasoning for shot-stable few-shot object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 8782-8791. [2] Chen W, Xiong W, Yan X, et al. Variational knowledge graph reasoning[J]. arXiv preprint arXiv:1803.06581, 2018.

 

 

 

Author Response

Thank you for your comments on our manuscript. We have made careful and detailed changes to improve the manuscript, and all changes have been marked using the "track changes" function. Attached is a point-by-point response to the comments.

Author Response File: Author Response.docx

Reviewer 2 Report

- The idea is novel. Gathering a text corpus for the semantic reasoning to tackle data scarcity is a very good idea.

- The paper is well-written and structured

- The claim of light weight is not supported. The number of parameters, FLOPS, inference time are required to support the light-weight claim.

- Citation of related approaches in object detection e.g., "A Semantic Relation Graph Reasoning Network for Object Detection" 2021

- Experiments showing the base and novel performance, bAP and nAP, respectively are missing. This is important to show the base and novel performance as well as the extent of catastrophic forgetting.

- The related FSOD works is relatively outdated. There are more recent finetuning-based two-stage object detection work such as Decoupled Faster R-CNN (DeFRCN). 

- Are there any data augmentation techniques used? If not, would it further improve the overall detection performance?

 

supplementary comments:

1. Due to the extensive time and labor required to collect and label remote sensing images, scarce data samples are acquired. The main goal of this paper is to leverage prior knowledge to train a detector via limited remote sensing image samples via few-shot object detection (FSOD). The main question addresses by the research is how to exploit text semantic fusion relation graph reasoning), which learns various types of relationships from common sense knowledge, to empower the detector training in a low-data regime. 2. The topic of few-shot object detection is highly relevant yet already tackled by few works. 3. This work adds the semantic text reasoning approach to complement the FSOD training. This empowers the detector training by incorporating additional knowledge. 4. Authors could consider two things: First, data augmentation techniques to account for the data scarcity. Two, tackling the catastrophic forgetting when learning new objects from limited data. 5. The conclusions are consistent with the evidence and arguments presented and also address the main question posed. However, one argument have not been backe up or supported where they claim it is a light weight model. Quantitative comparisons of the model complexity in comparison with the other models is required. This includes the number of parameters, number of FLOPS, memory required, and inference time. 6. The references are relatively appropriate and I have mentioned couple of references in the review. 7. Figures and tables are well-presented.

Author Response

Thank you for your comments on our manuscript. We have made careful and detailed changes to improve the manuscript, and all changes have been marked using the "track changes" function. Attached is a point-by-point response to the comments.

Author Response File: Author Response.docx

Reviewer 3 Report

Most object detection methods based on remote sensing images generally are dependent on a large amount of high-quality labeled training data. However, due to the slow acquisition cycle of remote sensing images and the difficulty in labeling, many types of data samples are scarce. This makes few-shot object detection an urgent and necessary research problem. In this paper, we introduce a remote sensing few-shot object detection method based on text semantic fusion relation graph reasoning (TSF-RGR), which learns various types of relationships from common sense knowledge in an end-to-end manner, thereby empowering the detector to reason over all classes. Specifically, based on the region proposals provided by the basic detection network, we first build a corpus containing a large number of text language descriptions, such as object attributes and relations, which are used to encode the corresponding common sense embeddings for each region. Then, graph structures are constructed between regions to propagate and learn key spatial and semantic relationships. Finally, a joint relation reasoning module is proposed to actively enhance the reliability and robustness of few-shot object feature representation by focusing on the degree of influence of different relations. Our TSF-RGR is lightweight and easy to expand, and it can incorporate any form of common sense information. Sufficient experiments show that the proposed method exhibits improved performance under different shots, obtaining highly competitive results on two benchmark datasets (NWPU 16 VHR-10 and DIOR). This is an interesting research paper. There are some suggestions for revision.

1)       At the end of the abstract, it will be more intuitive and convincing to illustrate the qualitative results of a large number of experiments for verifying the superiority and effectiveness.

2)       The motivation is not clear. Please specify the importance of the proposed solution.

3)       The listed contributions are a little bit weak. Please highlight the innovations of the proposed solution.

4)       The related work is weak. The authors ignore some relevant papers. For example, "Small object detection method based on adaptive spatial parallel convolution and fast multi-scale fusion", Remote Sensing 14 (2), 420, 2022 and "Object Detection in Remote Sensing Images by Combining Feature Enhancement and Hybrid Attention". Applied Sciences. 12(12):6237, 2022. The authors should discuss them in this paper.

5)       In the Section 3.2. Gated Graph Neural Networks, adding clear and intuitive diagrams in this part can better explain the logical process and theoretical basis.

6)       In the Section 4.1. Text Semantic Encoding Lines 225 to 227, What is the operation content and processing procedure of scoring weight mentioned here?

7)       More technical details of the proposed solution should be given.

8)       The explanation of equation (2) is not weak. The symbol on the right side of the equation causes confusion.

9)       Please discuss how to obtain the suitable parameter values used in the proposed solution.

10)    Make sure your conclusions appropriately reflect on the strengths and weaknesses of your work, how others in the field can benefit from it, and thoroughly discuss future work.

11)    In the reference section, it will be better to search and cite more latest research, which can better reflect the innovation of this thesis.

Author Response

Thank you for your comments on our manuscript. We have made careful and detailed changes to improve the manuscript, and all changes have been marked using the "track changes" function. Attached is a point-by-point response to the comments.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have addressed all my concerns. I recommend to accept this paper.

Reviewer 3 Report

All my concerns have been addressed. I recommend this paper for publication. 

Back to TopTop