Next Article in Journal
Designing Unmanned Aerial Survey Monitoring Program to Assess Floating Litter Contamination
Previous Article in Journal
3PCD-TP: A 3D Point Cloud Descriptor for Loop Closure Detection with Twice Projection
 
 
Article
Peer-Review Record

A Lightweight Object Detector Based on Spatial-Coordinate Self-Attention for UAV Aerial Images

Remote Sens. 2023, 15(1), 83; https://doi.org/10.3390/rs15010083
by Chen Liu 1, Degang Yang 1,*, Liu Tang 1, Xun Zhou 2 and Yi Deng 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2023, 15(1), 83; https://doi.org/10.3390/rs15010083
Submission received: 14 October 2022 / Revised: 1 December 2022 / Accepted: 21 December 2022 / Published: 23 December 2022

Round 1

Reviewer 1 Report

The YOLO-UAVlite proposed in this manuscript is very interesting, and the authors have done meticulous and convincing work. 

Some comments are as follows:

1. P3, Lines 81-99 are not necessary, or can be mentioned in the Conclusion;

2. Explain alpha in equation(20);

3. There are some syntax errors, such as in P1 L25 and P12 L305, etc.

Author Response

Dear Reviewer:


On behalf of my co-authors, I would like to thank you very much for your time and effort in providing feedback on our manuscript and for your insightful comments and valuable improvements to our paper.


We have incorporated most of the suggestions you have made. These changes have been highlighted in the manuscript. Please see the attached text in red for a point-by-point response to these comments. All page numbers refer to the revised manuscript document with these changes tracked.

We sincerely thank you for your enthusiastic work and your comments have been very helpful in revising the manuscript.

Author Response File: Author Response.docx

Reviewer 2 Report

Summary:

The paper addresses UAV-based object detection in aerial imagery. Two major challenges are focused by the authors: The real-time image processing resp. object detection on limited hardware ressources in UAV applications and the detection of small objects in aerial images. The paper presents a YOLO5-N based network named YOLO-UAVlite, that has a by 20% reduced parameter set, while improving the mean average precision by 10% regarding the original YOLO5-N network and processing the VisDrone-DET2021 dataset.

The introduction chapter presents related work, the two-stage and one-stage approaches and briefly discusses the different YOLO versions. It highlights the challenges and problems using these approaches for UAV applications and object detection tasks in aerial images. Finally the three major contributions of the paper are summarized, addressing these challenges.

The second chapter presents related work to lightweight networks, the attention mechanism and ghost convolution.

The third chapter presents the proposed network in detail.

Chapter four presents the performance results of the proposed approach and compares the results with several other single-stage detectors using the VisDrone-DET2021 dataset.

 

Evaluation:

First of all, the paper suffers from a lot spelling and grammatical errors, which is not helpful for understanding and following the content.

The second weak point is the presenting style: The whole paper uses a lot of abbreviations and proprietary terms, which are only partially introduced or referenced.

Overall, it is quite hard to understand the "core idea", the actual innovation, the new approach of this paper. In chapter three, for example, the whole detector network is presented in detail like a highly detailed assembly instruction, but without explaining, why this network is designed in this way, how the network is working and what is the difference in the object detection processing compared to the original detector? Why does your network even perform 10% better with 20% less parameters than the original detector? Are there only performance benefits or is there any weakness or compromise compared to the original detector? The differences and the key points of this contribution should be explained from a higher level point of view, too, not only in terms of modules, convolutions or operators.

Finally, the paper is comprehensive and presents several different approaches at once (SCSANet, Slim-BiFPN, new bounding box regression model etc.). Are all of these approaches connected and dependent on each other or are all of these approaches a new innovation of its own and could be used separately within other contexts or applications, too?

Author Response

Dear Reviewer:


On behalf of my co-authors, I would like to thank you very much for your time and effort in providing feedback on our manuscript and for your insightful comments and valuable improvements to our paper.


We have incorporated most of the suggestions you have made. These changes have been highlighted in the manuscript. Please see the attached text in red for a point-by-point response to these comments. All page numbers refer to the revised manuscript document with these changes tracked.

We sincerely thank you for your enthusiastic work and your comments have been very helpful in revising the manuscript.

Author Response File: Author Response.docx

Reviewer 3 Report

In this work, the authors have proposed a backbone SCSASNet network based on the Enhanced ShuffleNet network and Spatial-Coordinate Self-attention (SCSA) module to improve feature extraction and reduce model size. This is a custom designed to solve the problem of heterogeneous image shooting angles and flying height shooting angles of the UAVs. The work is timely and interesting.

1. Please elaborate on the technical challenges that are solved by this proposal.

2. The missing part is how to deploy the proposed model for real-time operation in UAV and what are the tradeoffs. Because UAVs have energy and computational capacity limitation. 

 

 

 

 

 

Author Response

Dear Reviewer:


On behalf of my co-authors, I would like to thank you very much for your time and effort in providing feedback on our manuscript and for your insightful comments and valuable improvements to our paper.


We have incorporated most of the suggestions you have made. These changes have been highlighted in the manuscript. Please see the attached text in red for a point-by-point response to these comments. All page numbers refer to the revised manuscript document with these changes tracked.

We sincerely thank you for your enthusiastic work and your comments have been very helpful in revising the manuscript.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Dear authors,

thanks for your modifications and additional paragraphs - this helps.

There is still some fine/minor spell checking required and sometimes you have to checks the commas again.

For example, here are some findings:

Line 34:  but also a core problem (not the!)

Line 39   a support vector machines

Line 112: This enhances the object representation, and enables the model

Figure 3: Silm-BiFPN

Line 336: Silm-BIFPN

Line 443: Silm-FPN

Author Response

Dear Reviewer:

Thank you for giving us the opportunity to submit a revised version of ‘A Lightweight Object Detector Based on Spatial-Coordinate Self-Attention for UAV Aerial Images’. We appreciate your insightful comments and valuable improvements to our paper.

We have incorporated most of the suggestions you have made. These changes have been highlighted in the manuscript. Please see below, in red, a point-by-point response to the comments. All page numbers refer to the revised manuscript document with the changes tracked.

Author Response File: Author Response.pdf

Back to TopTop